May 31, 200916 yr Since years I have a Server in Netherlands which is doing the webpages and emails. It runs rock solid and I never take care of it. I know it has two harddisk in Raid and so I have some safety against lost data due to a HD failure. Now I just found out that I messed up Raid 0 and Raid 1 in my mind (thought Raid 0 is what Raid 1 is in the real world): how can I find out via telnet if the server runs on Raid 0 or Raid 1? Is there an simple command? Which brings me to my second concern....If it is Raid 1 like it should be, is there a simple check if both HD are healthy or it is running just on 1?
May 31, 200916 yr Is it running software or hardware raid? Not familiar with CentOS, but if it has proc fs running on a software raid try: cat /proc/mdstat That will give at least tell you which kind of raid you are running.
May 31, 200916 yr It's been a while since I ran mdraid (which I'm assuming you are running). However, assuming that things haven't changed, telneting in and as root do this: cat /etc/raidtab There should be a 'raid-level' that will tell you what you're running. For the health, look at hdparm. Without knowing what the disks are, you're going to need to do this: hdparm -I /dev/sda and change the /dev/sda to what you need. Will spit out a lot of information. Alternatively you can check out this link for checking into the S.M.A.R.T. info. If you're running your drives on a proper controller, you need to tell us the model. A quick paste of lspci will help out.
May 31, 200916 yr Author thanks! I got: [root@nohavename ~]# cat /proc/mdstat Personalities : [raid1] md3 : active raid1 sdb1[1] sda1[0] 104320 blocks [2/2] [uU] md1 : active raid1 sdb2[1] sda2[0] 20972736 blocks [2/2] [uU] md2 : active raid1 sdb5[1] sda5[0] 220965888 blocks [2/2] [uU] md0 : active raid1 sdb3[1] sda3[0] 2096384 blocks [2/2] [uU] unused devices: <none> [root@nohavename ~]# cat /etc/raidtab cat: /etc/raidtab: No such file or directory [root@nohavename ~]# hdparm -I /dev/sda /dev/sda: HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device So what I understood: it is raid 1: right??? But I did not understand anything more. Specially not if it is healthy. Thanks for the help
May 31, 200916 yr Yes, those are RAID1 and the system thinks they are in sync with redundancy. However, as you seem to be aware, the disks could be unhealthy and the system just hasn't noticed yet, particularly if they are many years old and have data that is never accessed. # smartctl -a /dev/sda # smartctl -a /dev/sdb these commands will tell you something about the SMART diagnostic status of the drives. Of interest are Reallocated_Sector_Ct to tell you of bad blocks that have been remapped, Current_Pending_Sector to tell you of blocks that are currently having trouble, and Offline_Uncorrectable to tell you blocks that are unrecoverable. You might also be interested in temperature (to see if your server is having cooling problems) and the lifetime counters such as start/stop count, power on hours, etc. Search the web for SMART attributes for more information on this topic. Before reading further, you might want to attempt backups from the disks if you are worried about their health. The more activity you cause, the more change that a failing disk will completely fail, so it is good to try to prioritize and copy your most important files off the disks before doing anything else! As for more active checks, I have my systems set to do the following automatically: 1. In /etc/smartd.conf I have the following automatic entry on Fedora (might need entries per drive on CentOS, but am not sure) DEVICESCAN -H -m root -a -o on -S on -s (S/../.././02|L/../../6/03) this triggers a SMART self-test on a regular basis. I have to be honest, I don't even remember the rule meaning here, as I've been copying it from system to system for many years. I suspect it runs a short test daily and a long test weekly. 2. In a small /etc/cron.weekly/md-scan.sh script, I have: #!/bin/sh # initiate MD block-check sync action on all MD devices for f in /sys/block/md*/md/sync_action do if [[ -w "$f" ]] then echo check > "$f" fi done this will actually cause the software RAID system to access and check the redundancy in RAID1 (or parity codes in RAID5 etc) and eventually access every block on the RAID volume. This is good to make sure your data is really there on a bulk server that has lots of data files that go unaccessed for months or years by applications. It will help detect a failing disk much sooner, so less chance of a catastrophic RAID array failure. Of course, this causes a prolonged burst of activity on the disks for a large server filesystem...
June 1, 200916 yr Author Yes, those are RAID1 and the system thinks they are in sync with redundancy. However, as you seem to be aware, the disks could be unhealthy and the system just hasn't noticed yet, particularly if they are many years old and have data that is never accessed.# smartctl -a /dev/sda # smartctl -a /dev/sdb these commands will tell you something about the SMART diagnostic status of the drives. Of interest are Reallocated_Sector_Ct to tell you of bad blocks that have been remapped, Current_Pending_Sector to tell you of blocks that are currently having trouble, and Offline_Uncorrectable to tell you blocks that are unrecoverable. You might also be interested in temperature (to see if your server is having cooling problems) and the lifetime counters such as start/stop count, power on hours, etc. Search the web for SMART attributes for more information on this topic. Before reading further, you might want to attempt backups from the disks if you are worried about their health. The more activity you cause, the more change that a failing disk will completely fail, so it is good to try to prioritize and copy your most important files off the disks before doing anything else! As for more active checks, I have my systems set to do the following automatically: 1. In /etc/smartd.conf I have the following automatic entry on Fedora (might need entries per drive on CentOS, but am not sure) DEVICESCAN -H -m root -a -o on -S on -s (S/../.././02|L/../../6/03) this triggers a SMART self-test on a regular basis. I have to be honest, I don't even remember the rule meaning here, as I've been copying it from system to system for many years. I suspect it runs a short test daily and a long test weekly. 2. In a small /etc/cron.weekly/md-scan.sh script, I have: #!/bin/sh # initiate MD block-check sync action on all MD devices for f in /sys/block/md*/md/sync_action do if [[ -w "$f" ]] then echo check > "$f" fi done this will actually cause the software RAID system to access and check the redundancy in RAID1 (or parity codes in RAID5 etc) and eventually access every block on the RAID volume. This is good to make sure your data is really there on a bulk server that has lots of data files that go unaccessed for months or years by applications. It will help detect a failing disk much sooner, so less chance of a catastrophic RAID array failure. Of course, this causes a prolonged burst of activity on the disks for a large server filesystem... # smartctl -a /dev/sdb smartctl version 5.33 [i686-redhat-linux-gnu] Copyright © 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA Maxtor 7Y250M0 Version: YAR5 SATA disks accessed via libata are not currently supported by smartmontools. When libata is given an ATA pass-thru ioctl() then an additional '-d libata' device type will be added to smartmontools. that does not help much as it seems. But anyhow I want to change to a new server in a few months...I hope this will do the job for a few more month.....
June 20, 200916 yr As you are using Raid1 you do not need to bother much about the drives failing. If one drive fails you just replace it with a new identical one. The Raid System will not even bother that a drive failed. After you add a new drive the mirror will be rebuilt. Only thing to remember is that you replace the correct drive when one fails! Chris
Create an account or sign in to comment