Jump to content

Recommended Posts

Posted

I've been experimenting with various RAID configurations on my pc, but I have a problem. Every few weeks the RAID 'degrades'. It's done this in RAID 5, RAID 0 and RAID 1 (the current configuration), I think about 4 times in total now. The last time it happened I ran Spinrite to check the drives, no problems were found at all. So I ripped out the 'degraded' drive and bought a new one. Tonight it's gone belly up as well.

I don't know why this keeps happening, but I suspect that basically my RAID software is crap. I'm using the built-in RAID functionality offered by my PW5-DH Deluxe motherboard. The drives are Seagate Barracuda 320 gigs.

Anyone got any ideas on how to stop my RAID degrading? Would buying a dedicated hardware RAID controller likely solve my problems? If so, what do they cost, and what models are decent?

Lucky I keep a separate external backup :o

Thanks.

Posted (edited)

Dunno about the RAID controller on your main board, I'm using the built in software RAID 5 of Windows Server (you can trick regular Windows XP to do it too, try a Google search) and it's worked just fine for several months with no issues using four Seagate 360Gig PATA drives, it rebuilds if you don't shut down the server properly but otherwise no issues.

Are your drives overheating, that may explain a degraded system that shows no problems when the drives have cooled off?

You could also try using FreeNAS, a FreeBSD based NAS appliance that supports software RAID up to 2 Terabytes per array. Obviously you need a separate machine to run this as a dedicated RAID server but no specialist (expensive) controllers.

EDIT Have a look here for the XP hack http://www.tomshardware.com/2004/11/19/usi..._raid_5_happen/ I've tried it, it really does work :o

Edited by Crossy
Posted

Are these events happening when you've rebooted or had a system crash?

Many (most?) RAID systems do not actually verify checksums during normal operation, so the only event that would trigger a degradation event would be an I/O error (block read or write failure) on a disk or some logic in the management layer that detects an inconsistency, for example if the array were shutdown improperly and the disks are not in a consistent state with each other.

Did you check the SMART data for the drive(s) that were getting ejected from the array? Were there "pending" or "reallocated" blocks?

Is it possible that your power supply is inadequate for the number of disks drawing power? This could lead to I/O errors. Do you have a proper UPS and does it pass a self test?

At my house, we see frequent brownouts that my UPSs report, even when most appliances and lights do not flicker. I see the power dip as low as 170 volts quite frequently in evenings. I've turned my UPSs up to "high" sensitivity level so that they will cut over to battery power sooner before letting the mains level drop for the computer. In the last 10 days of uptime on one computer, the UPS shows 21 transitions to battery for a total of 56 seconds on battery! I don't think we've observed any blackouts during this time.

Posted
Are these events happening when you've rebooted or had a system crash?

Many (most?) RAID systems do not actually verify checksums during normal operation, so the only event that would trigger a degradation event would be an I/O error (block read or write failure) on a disk or some logic in the management layer that detects an inconsistency, for example if the array were shutdown improperly and the disks are not in a consistent state with each other.

Did you check the SMART data for the drive(s) that were getting ejected from the array? Were there "pending" or "reallocated" blocks?

Is it possible that your power supply is inadequate for the number of disks drawing power? This could lead to I/O errors. Do you have a proper UPS and does it pass a self test?

Yes, sometimes my PC hangs during startup or shutdown leaving no option but to reset, and blackouts are a fact of life in my area (I got stuck in the lift in the dark for 10 minutes last week, bloody scary). I don't think I can avoid either of these. I have no idea about SMART but I'll look into it. I am only running 2 disks so the power should[?] be ok. I have a basic UPS, but as I need to leave the machine running unattended any longish blackout will nock it over. It has a serial cable and some 'ups management software' that unfortunately has never worked.

So, maybe time for a UPS upgrade. But how to deal with ungraceful shutdowns? Is RAID simply not appropriate for this kind of situation?

Posted

Are you running Windows or Linux for the machine? We had the same problem on our main office server-- one disk went bad, and a second followed immediately after. Power hits could have contributed to the problem, but from what we can tell it was a software issue as well. Not sure about all the details, but it also borked the backups for a month, so we were hit pretty hard. This machine runs an embedded Linux (Buffalo Terrastation).

BTW, if you are just using two drives, just use RAID 1 (Mirroring), rather than RAID 5 (Striping). The parity portion reduces the reliability and overhead in this environment. RAID 0 (Striping) is only useful for improving speed, not reliability-- it actually cuts your reliability in half. Many people recommend going with software RAID for RAID 5, since losing your RAID controller might make the individual drives unrecoverable.

Anybody have any good suggestions for a backup management system? At this point, we are looking at a snapshot backup server for business continuity plus weekly full/daily incremental tarballs onto external USB drives that are rotated off site every 4 weeks for disaster recovery. It's a whole hel_l of a lot of effort for ~100GB though, but we avoid cumbersome backup software.

Posted (edited)

I'm running Windows Vista Ultimate on this machine. The RAID 1 is fairly fast and easy to rebuild, but it's kind of annoying to have to pull the computer apart every few weeks, and sort of defeats the purpose of having it (I abandoned RAID 5 because it took many hours to rebuild with only three drives and made me nervous).

Your backup/offsite system sounds ok to me! I know one Thai computing agency lost 1 TB of data and all their computers/servers and UPS when their building was hit by lightning. Happened to them twice now.

Edited by Crushdepth
Posted

I don't know about Windows, but I use software RAID 5 on Linux and have had a few unexpected background rebuilds after an unclean shutdown. This is nerve wracking but required no intervention on my part, and the SMART diagnostics do not show any problems with the drives. The only real problem I've had was with a disk that was actually giving read errors and reallocated blocks, and it seems to me that the problem was related to the power supply in the computer. I went through two drives before I moved everything to a new case and power supply and have had no problems since. (I went with a new case because I wasn't sure if it was power or heat related, but the drives don't actually report much cooler temperatures in spite of the increased air flow.)

I use APC brand UPSs and with the USB monitoring cable my OS is able to do a clean shutdown if it detects the UPS battery charge getting too low.

It has been an eye opener to see the frequent switches to battery power being reported in my system logs. I never would have guessed that the mains power was this flaky just by observing lights and appliances. In fact, I resolved a repeated problem with an ethernet switch by powering it via UPS. It turns out that problems I had guessed were overheating were really frequent and extended brownout conditions in the evening.

Posted

Actually there probably is a voltage problem in my apartment now that I think of it. The fluoro lights sometimes won't turn on (unless you turn on some of the adjacent incandescent ones as well, which seems to 'kick start' them).

Sigh...

Posted
Actually there probably is a voltage problem in my apartment now that I think of it. The fluoro lights sometimes won't turn on (unless you turn on some of the adjacent incandescent ones as well, which seems to 'kick start' them).

Sigh...

Make sure to get a really good UPS. One that actually shuts down Windows when the power is out like it should. That's the first step to resolving your problems.

These things are not built for power outages or insanely fluctuating voltage like in Thailand where it could be from 150V - 220V, along with some nice spikes when the power comes back on...

After that, as long as you are only using Windows, I'd use the built-in software RAID over a hardware RAID. It's been a long time since I've looked into this whole matter but hardware RAIDs tended to be on the unreliable side. Except if you have some professional grade equipment.

my 2ct

Posted
I don't know why this keeps happening, but I suspect that basically my RAID software is crap. I'm using the built-in RAID functionality offered by my PW5-DH Deluxe motherboard. The drives are Seagate Barracuda 320 gigs.

Anyone got any ideas on how to stop my RAID degrading? Would buying a dedicated hardware RAID controller likely solve my problems? If so, what do they cost, and what models are decent?

Yes, a dedicated RAID controller would help.

Then, it has to work with the disks attached. The motherboard software could do well with whatever the manufacturer has tested - but not with anything.

The disks have different caching and de-staging algorithms, at the low end - no manufacturer has ever tried to make everyone happy.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.



×
×
  • Create New...