Jump to content

Centos Raid


h90

Recommended Posts

Since years I have a Server in Netherlands which is doing the webpages and emails.

It runs rock solid and I never take care of it.

I know it has two harddisk in Raid and so I have some safety against lost data due to a HD failure.

Now I just found out that I messed up Raid 0 and Raid 1 in my mind (thought Raid 0 is what Raid 1 is in the real world): how can I find out via telnet if the server runs on Raid 0 or Raid 1?

Is there an simple command?

Which brings me to my second concern....If it is Raid 1 like it should be, is there a simple check if both HD are healthy or it is running just on 1?

Link to comment
Share on other sites

Is it running software or hardware raid?

Not familiar with CentOS, but if it has proc fs running on a software raid try:

cat /proc/mdstat

That will give at least tell you which kind of raid you are running.

Link to comment
Share on other sites

It's been a while since I ran mdraid (which I'm assuming you are running). However, assuming that things haven't changed, telneting in and as root do this:

cat /etc/raidtab

There should be a 'raid-level' that will tell you what you're running.

For the health, look at hdparm. Without knowing what the disks are, you're going to need to do this:

hdparm -I /dev/sda

and change the /dev/sda to what you need.

Will spit out a lot of information. Alternatively you can check out this link for checking into the S.M.A.R.T. info.

If you're running your drives on a proper controller, you need to tell us the model. A quick paste of lspci will help out.

Edited by dave_boo
Link to comment
Share on other sites

thanks!

I got:

[root@nohavename ~]# cat /proc/mdstat

Personalities : [raid1]

md3 : active raid1 sdb1[1] sda1[0]

104320 blocks [2/2] [uU]

md1 : active raid1 sdb2[1] sda2[0]

20972736 blocks [2/2] [uU]

md2 : active raid1 sdb5[1] sda5[0]

220965888 blocks [2/2] [uU]

md0 : active raid1 sdb3[1] sda3[0]

2096384 blocks [2/2] [uU]

unused devices: <none>

[root@nohavename ~]# cat /etc/raidtab

cat: /etc/raidtab: No such file or directory

[root@nohavename ~]# hdparm -I /dev/sda

/dev/sda:

HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device

So what I understood: it is raid 1: right???

But I did not understand anything more. Specially not if it is healthy.

Thanks for the help

Link to comment
Share on other sites

Yes, those are RAID1 and the system thinks they are in sync with redundancy. However, as you seem to be aware, the disks could be unhealthy and the system just hasn't noticed yet, particularly if they are many years old and have data that is never accessed.

# smartctl -a /dev/sda

# smartctl -a /dev/sdb

these commands will tell you something about the SMART diagnostic status of the drives. Of interest are Reallocated_Sector_Ct to tell you of bad blocks that have been remapped, Current_Pending_Sector to tell you of blocks that are currently having trouble, and Offline_Uncorrectable to tell you blocks that are unrecoverable. You might also be interested in temperature (to see if your server is having cooling problems) and the lifetime counters such as start/stop count, power on hours, etc. Search the web for SMART attributes for more information on this topic.

Before reading further, you might want to attempt backups from the disks if you are worried about their health. The more activity you cause, the more change that a failing disk will completely fail, so it is good to try to prioritize and copy your most important files off the disks before doing anything else!

As for more active checks, I have my systems set to do the following automatically:

1. In /etc/smartd.conf I have the following automatic entry on Fedora (might need entries per drive on CentOS, but am not sure)

DEVICESCAN -H -m root -a -o on -S on -s (S/../.././02|L/../../6/03)

this triggers a SMART self-test on a regular basis. I have to be honest, I don't even remember the rule meaning here, as I've been copying it from system to system for many years. I suspect it runs a short test daily and a long test weekly.

2. In a small /etc/cron.weekly/md-scan.sh script, I have:

#!/bin/sh

# initiate MD block-check sync action on all MD devices

for f in /sys/block/md*/md/sync_action
do
if [[ -w "$f" ]]
then
	echo check > "$f"
fi
done

this will actually cause the software RAID system to access and check the redundancy in RAID1 (or parity codes in RAID5 etc) and eventually access every block on the RAID volume. This is good to make sure your data is really there on a bulk server that has lots of data files that go unaccessed for months or years by applications. It will help detect a failing disk much sooner, so less chance of a catastrophic RAID array failure. Of course, this causes a prolonged burst of activity on the disks for a large server filesystem...

Link to comment
Share on other sites

Yes, those are RAID1 and the system thinks they are in sync with redundancy. However, as you seem to be aware, the disks could be unhealthy and the system just hasn't noticed yet, particularly if they are many years old and have data that is never accessed.

# smartctl -a /dev/sda

# smartctl -a /dev/sdb

these commands will tell you something about the SMART diagnostic status of the drives. Of interest are Reallocated_Sector_Ct to tell you of bad blocks that have been remapped, Current_Pending_Sector to tell you of blocks that are currently having trouble, and Offline_Uncorrectable to tell you blocks that are unrecoverable. You might also be interested in temperature (to see if your server is having cooling problems) and the lifetime counters such as start/stop count, power on hours, etc. Search the web for SMART attributes for more information on this topic.

Before reading further, you might want to attempt backups from the disks if you are worried about their health. The more activity you cause, the more change that a failing disk will completely fail, so it is good to try to prioritize and copy your most important files off the disks before doing anything else!

As for more active checks, I have my systems set to do the following automatically:

1. In /etc/smartd.conf I have the following automatic entry on Fedora (might need entries per drive on CentOS, but am not sure)

DEVICESCAN -H -m root -a -o on -S on -s (S/../.././02|L/../../6/03)

this triggers a SMART self-test on a regular basis. I have to be honest, I don't even remember the rule meaning here, as I've been copying it from system to system for many years. I suspect it runs a short test daily and a long test weekly.

2. In a small /etc/cron.weekly/md-scan.sh script, I have:

#!/bin/sh

# initiate MD block-check sync action on all MD devices

for f in /sys/block/md*/md/sync_action
do
if [[ -w "$f" ]]
then
	echo check > "$f"
fi
done

this will actually cause the software RAID system to access and check the redundancy in RAID1 (or parity codes in RAID5 etc) and eventually access every block on the RAID volume. This is good to make sure your data is really there on a bulk server that has lots of data files that go unaccessed for months or years by applications. It will help detect a failing disk much sooner, so less chance of a catastrophic RAID array failure. Of course, this causes a prolonged burst of activity on the disks for a large server filesystem...

# smartctl -a /dev/sdb

smartctl version 5.33 [i686-redhat-linux-gnu] Copyright © 2002-4 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

Device: ATA Maxtor 7Y250M0 Version: YAR5

SATA disks accessed via libata are not currently supported by

smartmontools. When libata is given an ATA pass-thru ioctl() then an

additional '-d libata' device type will be added to smartmontools.

that does not help much as it seems.

But anyhow I want to change to a new server in a few months...I hope this will do the job for a few more month.....

Link to comment
Share on other sites

  • 3 weeks later...

As you are using Raid1 you do not need to bother much about the drives failing. If one drive fails you just replace it with a new identical one. The Raid System will not even bother that a drive failed. After you add a new drive the mirror will be rebuilt.

Only thing to remember is that you replace the correct drive when one fails!

Chris

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.



×
×
  • Create New...