Centos Raid

h90 · May 31, 2009

Since years I have a Server in Netherlands which is doing the webpages and emails.

It runs rock solid and I never take care of it.

I know it has two harddisk in Raid and so I have some safety against lost data due to a HD failure.

Now I just found out that I messed up Raid 0 and Raid 1 in my mind (thought Raid 0 is what Raid 1 is in the real world): how can I find out via telnet if the server runs on Raid 0 or Raid 1?

Is there an simple command?

Which brings me to my second concern....If it is Raid 1 like it should be, is there a simple check if both HD are healthy or it is running just on 1?

niller74 · May 31, 2009

Is it running software or hardware raid?

Not familiar with CentOS, but if it has proc fs running on a software raid try:

cat /proc/mdstat

That will give at least tell you which kind of raid you are running.

dave_boo · May 31, 2009

It's been a while since I ran mdraid (which I'm assuming you are running). However, assuming that things haven't changed, telneting in and as root do this:

cat /etc/raidtab

There should be a 'raid-level' that will tell you what you're running.

For the health, look at hdparm. Without knowing what the disks are, you're going to need to do this:

hdparm -I /dev/sda

and change the /dev/sda to what you need.

Will spit out a lot of information. Alternatively you can check out this link for checking into the S.M.A.R.T. info.

If you're running your drives on a proper controller, you need to tell us the model. A quick paste of lspci will help out.

h90 · May 31, 2009

thanks!

I got:

[root@nohavename ~]# cat /proc/mdstat

Personalities : [raid1]

md3 : active raid1 sdb1[1] sda1[0]

104320 blocks [2/2] [uU]

md1 : active raid1 sdb2[1] sda2[0]

20972736 blocks [2/2] [uU]

md2 : active raid1 sdb5[1] sda5[0]

220965888 blocks [2/2] [uU]

md0 : active raid1 sdb3[1] sda3[0]

2096384 blocks [2/2] [uU]

unused devices: <none>

[root@nohavename ~]# cat /etc/raidtab

cat: /etc/raidtab: No such file or directory

[root@nohavename ~]# hdparm -I /dev/sda

/dev/sda:

HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device

So what I understood: it is raid 1: right???

But I did not understand anything more. Specially not if it is healthy.

Thanks for the help

autonomous_unit · May 31, 2009

Yes, those are RAID1 and the system thinks they are in sync with redundancy. However, as you seem to be aware, the disks could be unhealthy and the system just hasn't noticed yet, particularly if they are many years old and have data that is never accessed.

# smartctl -a /dev/sda

# smartctl -a /dev/sdb

these commands will tell you something about the SMART diagnostic status of the drives. Of interest are Reallocated_Sector_Ct to tell you of bad blocks that have been remapped, Current_Pending_Sector to tell you of blocks that are currently having trouble, and Offline_Uncorrectable to tell you blocks that are unrecoverable. You might also be interested in temperature (to see if your server is having cooling problems) and the lifetime counters such as start/stop count, power on hours, etc. Search the web for SMART attributes for more information on this topic.

Before reading further, you might want to attempt backups from the disks if you are worried about their health. The more activity you cause, the more change that a failing disk will completely fail, so it is good to try to prioritize and copy your most important files off the disks before doing anything else!

As for more active checks, I have my systems set to do the following automatically:

1. In /etc/smartd.conf I have the following automatic entry on Fedora (might need entries per drive on CentOS, but am not sure)

DEVICESCAN -H -m root -a -o on -S on -s (S/../.././02|L/../../6/03)

this triggers a SMART self-test on a regular basis. I have to be honest, I don't even remember the rule meaning here, as I've been copying it from system to system for many years. I suspect it runs a short test daily and a long test weekly.

2. In a small /etc/cron.weekly/md-scan.sh script, I have:

#!/bin/sh

# initiate MD block-check sync action on all MD devices

for f in /sys/block/md*/md/sync_action
do
if [[ -w "$f" ]]
then
	echo check > "$f"
fi
done

this will actually cause the software RAID system to access and check the redundancy in RAID1 (or parity codes in RAID5 etc) and eventually access every block on the RAID volume. This is good to make sure your data is really there on a bulk server that has lots of data files that go unaccessed for months or years by applications. It will help detect a failing disk much sooner, so less chance of a catastrophic RAID array failure. Of course, this causes a prolonged burst of activity on the disks for a large server filesystem...

h90 · June 1, 2009

Yes, those are RAID1 and the system thinks they are in sync with redundancy. However, as you seem to be aware, the disks could be unhealthy and the system just hasn't noticed yet, particularly if they are many years old and have data that is never accessed.
# smartctl -a /dev/sda

# smartctl -a /dev/sdb

these commands will tell you something about the SMART diagnostic status of the drives. Of interest are Reallocated_Sector_Ct to tell you of bad blocks that have been remapped, Current_Pending_Sector to tell you of blocks that are currently having trouble, and Offline_Uncorrectable to tell you blocks that are unrecoverable. You might also be interested in temperature (to see if your server is having cooling problems) and the lifetime counters such as start/stop count, power on hours, etc. Search the web for SMART attributes for more information on this topic.

Before reading further, you might want to attempt backups from the disks if you are worried about their health. The more activity you cause, the more change that a failing disk will completely fail, so it is good to try to prioritize and copy your most important files off the disks before doing anything else!

As for more active checks, I have my systems set to do the following automatically:

1. In /etc/smartd.conf I have the following automatic entry on Fedora (might need entries per drive on CentOS, but am not sure)
DEVICESCAN -H -m root -a -o on -S on -s (S/../.././02|L/../../6/03)
this triggers a SMART self-test on a regular basis. I have to be honest, I don't even remember the rule meaning here, as I've been copying it from system to system for many years. I suspect it runs a short test daily and a long test weekly.

2. In a small /etc/cron.weekly/md-scan.sh script, I have:
#!/bin/sh

# initiate MD block-check sync action on all MD devices

for f in /sys/block/md*/md/sync_action
do
if [[ -w "$f" ]]
then
	echo check > "$f"
fi
done
this will actually cause the software RAID system to access and check the redundancy in RAID1 (or parity codes in RAID5 etc) and eventually access every block on the RAID volume. This is good to make sure your data is really there on a bulk server that has lots of data files that go unaccessed for months or years by applications. It will help detect a failing disk much sooner, so less chance of a catastrophic RAID array failure. Of course, this causes a prolonged burst of activity on the disks for a large server filesystem...

# smartctl -a /dev/sdb

Home page is http://smartmontools.sourceforge.net/

Device: ATA Maxtor 7Y250M0 Version: YAR5

SATA disks accessed via libata are not currently supported by

smartmontools. When libata is given an ATA pass-thru ioctl() then an

additional '-d libata' device type will be added to smartmontools.

that does not help much as it seems.

But anyhow I want to change to a new server in a few months...I hope this will do the job for a few more month.....

niller74 · June 1, 2009

You should read the man mdadm and pay attention to the --monitor part.

lor · June 20, 2009

As you are using Raid1 you do not need to bother much about the drives failing. If one drive fails you just replace it with a new identical one. The Raid System will not even bother that a drive failed. After you add a new drive the mirror will be rebuilt.

Only thing to remember is that you replace the correct drive when one fails!

Chris

Sign In

Centos Raid

Recommended Posts

h90

niller74

dave_boo

h90

autonomous_unit

h90

niller74

lor

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Announcements

Topics

Popular Contributors

Latest posts...

Report Thailand Post Suspends US Parcels as Trump Axes Tax Exemption

Dental clinic: no medical certificate for visa ?

UK Lucy Connolly to go FREE after outrage !

Report Thailand Post Suspends US Parcels as Trump Axes Tax Exemption

Famine in Gaza as children denied nourishing food supplements

4th 60day stamp - no extension?

Popular in The Pub

ASEANNOW

MORE INFO

POPULAR AREAS

CONTACT US

Thailand

Support

Activity

My Activity Streams