Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 06/24/2011 11:09 PM, William Seligman wrote: > Oops... I omitted the config file. > > On 6/24/11 1:27 PM, William Seligman wrote: >> I recently had a major unexplained failure of my HA cluster due to DRBD. I could >> use some help to understand what happened. >> >> Setup: Two systems; hypatia is primary, orestes is secondary. OS is Scientific >> Linux 5.5: kernel 2.6.18-194.26.1.el5xen; DRBD version drbd-8.3.8.1-30.el5. >> >> The drives on the two systems are configured identically: /dev/sdc1 and >> /dev/sdd1 are grouped by software RAID1 into /dev/md2. DRBD resource "admin" is >> device /dev/drbd1 in a Primary/Secondary configuration, formed from /dev/md2 on >> both systems; I've attached the config file below, with comments stripped out. >> >> In case it's relevant: I use LVM to carve /dev/drbd1 into several partitions. >> One partition contains the image files for several xen VMs; other partitions are >> NFS-exported to both the VMs and to other systems in the lab. >> >> All of these resources, including DRBD, are managed by the HA software: >> corosync-1.2.7; pacemaker-1.0.11; openais-1.1.3; heartbeat-3.0.3. >> >> Here's the problem: I had a hard-drive failure on the secondary: >> >> Jun 8 01:04:04 orestes kernel: ata4.00: exception Emask 0x40 SAct 0x0 SErr >> 0x800 action 0x6 frozen >> Jun 8 01:04:04 orestes kernel: ata4: SError: { HostInt } >> Jun 8 01:04:04 orestes kernel: ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 >> tag 0 >> Jun 8 01:04:04 orestes kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/40 >> Emask 0x44 (timeout) >> Jun 8 01:04:04 orestes kernel: ata4.00: status: { DRDY } >> Jun 8 01:04:04 orestes kernel: ata4: hard resetting link >> Jun 8 01:04:05 orestes kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 >> SControl 300) >> Jun 8 01:04:35 orestes kernel: ata4.00: qc timeout (cmd 0xec) >> Jun 8 01:04:35 orestes kernel: ata4.00: failed to IDENTIFY (I/O error, >> err_mask=0x4) >> Jun 8 01:04:35 orestes kernel: ata4.00: revalidation failed (errno=-5) >> Jun 8 01:04:35 orestes kernel: ata4: failed to recover some devices, retrying >> in 5 secs >> Jun 8 01:04:40 orestes kernel: ata4: hard resetting link >> Jun 8 01:04:42 orestes kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 >> SControl 300) >> >> OK, drives fail. That's what the double-redundancy is for (RAID1+DRBD). But on >> the primary: >> >> Jun 8 01:04:39 hypatia kernel: block drbd1: [drbd1_worker/6650] sock_sendmsg >> time expired, ko = 4294967295 >> Jun 8 01:04:45 hypatia kernel: block drbd1: [drbd1_worker/6650] sock_sendmsg >> time expired, ko = 4294967294 >> Jun 8 01:04:51 hypatia kernel: block drbd1: [drbd1_worker/6650] sock_sendmsg >> time expired, ko = 4294967293 >> Jun 8 01:04:57 hypatia kernel: block drbd1: [drbd1_worker/6650] sock_sendmsg >> time expired, ko = 4294967292 >> Jun 8 01:05:01 hypatia lrmd: [3988]: WARN: Nagios:monitor process (PID 23641) >> timed out (try 1). Killing with signal SIGTERM (15). >> Jun 8 01:05:01 hypatia lrmd: [3988]: WARN: operation monitor[75] on >> ocf::Xen::Nagios for client 3991, its parameters: CRM_meta_interval=[10000] >> xmfile=[/xen/configs/nagios.cfg] CRM_meta_timeout=[30000] >> crm_feature_set=[3.0.1] CRM_meta_name=[monitor] : pid [23641] timed out >> >> I'll quote more of the log file if asked, but since this is a DRBD mailing list >> I'll describe the HA errors abstractly: "Nagios" is the name of one of my >> virtual machines whose disk image is on drbd1. One after the other, all the >> virtual machines on drbd1 timeout and fail. Corosync tries to transfer the >> resources from the good primary to the bad secondary; this fails (NFS issues) >> and the bad secondary STONITHs the good primary! >> >> There's nothing in the log files of the virtual machines; the entries stop as at >> the time of lrmd timeouts. >> >> My questions are: >> >> - The drive on ata4 failed, but it was part of a software RAID1. Why would that >> cause a problem as long as the other drive was OK? >> >> - Why would a drive failure on a secondary cause kernel drbd errors on the primary? >> >> My only explanation is that maybe I had the syncer rate too high; it was 50M but >> I reduced it to 15M. Is that a possible cause for this kind of problem? > global { > usage-count yes; > } > > common { > protocol A; > > handlers { > pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; > pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; > local-io-error "/usr/lib/drbd/notify-io-error.sh; > /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; > } > startup { > } > disk { > } > net { > ping-timeout 11; > } > syncer { > rate 15M; > } > } > > resource admin { > device /dev/drbd1; > disk /dev/md2; > > net { > after-sb-0pri discard-zero-changes; > after-sb-1pri consensus; > after-sb-2pri disconnect; > } > startup { > wfc-timeout 60; > degr-wfc-timeout 60; > outdated-wfc-timeout 60; > } > handlers { > split-brain "/usr/lib/drbd/notify-split-brain.sh sysadmin at nevis.columbia.edu"; > } > > meta-disk internal; > > on hypatia.nevis.columbia.edu { > address 192.168.100.7:7789; > } > on orestes.nevis.columbia.edu { > address 192.168.100.6:7789; > } > } Hi William > Setup: Two systems; hypatia is primary, orestes is secondary. OS is Scientific > Linux 5.5: kernel 2.6.18-194.26.1.el5xen; DRBD version drbd-8.3.8.1-30.el5. I have a similar setup as you described here. OS is Centos 5.6: kernel 2.6.18-238.12.1.el5xen; DRDB version kmod-drbd83-xen-8.3.10-2.el5.elrepo.x86_64 Two machines running Xen, having one raid1 disk block then LVM and several partitions are used for drbd resource (simple primary secundary, fallover is done manually). A week ago a verify run triggered a lock as you described. The secondary node was hit by a bad disk: Primary: Jun 27 02:00:02 xenutrecht kernel: block drbd4: conn( Connected -> VerifyT ) Jun 27 02:00:02 xenutrecht kernel: block drbd4: Online Verify start sector: 0 Jun 27 02:00:45 xenutrecht kernel: block drbd4: [drbd4_worker/6525] sock_sendmsg time expired, ko = 429496 7295 ...this kept repeating... And the sendary found a disk error: Jun 27 02:00:02 xenutrecht02 kernel: block drbd4: conn( Connected -> VerifyS ) Jun 27 02:00:02 xenutrecht02 kernel: block drbd4: Starting Online Verify from sector 0 Jun 27 02:00:25 xenutrecht02 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 27 02:00:25 xenutrecht02 kernel: ata1.00: BMDMA stat 0x4 Jun 27 02:00:25 xenutrecht02 kernel: ata1.00: cmd 25/00:00:bf:b1:a5/00:04:41:00:00/e0 tag 0 dma 524288 in Jun 27 02:00:25 xenutrecht02 kernel: res 51/40:da:e5:b3:a5/40:01:41:00:00/e0 Emask 0x9 (media error) Jun 27 02:00:25 xenutrecht02 kernel: ata1.00: status: { DRDY ERR } Jun 27 02:00:25 xenutrecht02 kernel: ata1.00: error: { UNC } Jun 27 02:00:25 xenutrecht02 kernel: ata1.00: configured for UDMA/133 Jun 27 02:00:25 xenutrecht02 kernel: ata1: EH complete Jun 27 02:00:26 xenutrecht02 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 27 02:00:26 xenutrecht02 kernel: ata1.00: BMDMA stat 0x4 Jun 27 02:00:26 xenutrecht02 kernel: ata1.00: cmd 25/00:00:bf:b1:a5/00:04:41:00:00/e0 tag 0 dma 524288 in Jun 27 02:00:26 xenutrecht02 kernel: res 51/40:da:e5:b3:a5/40:01:41:00:00/e0 Emask 0x9 (media error) Jun 27 02:00:26 xenutrecht02 kernel: ata1.00: status: { DRDY ERR } Jun 27 02:00:26 xenutrecht02 kernel: ata1.00: error: { UNC } Jun 27 02:00:26 xenutrecht02 kernel: ata1.00: configured for UDMA/133 Jun 27 02:00:26 xenutrecht02 kernel: ata1: EH complete Jun 27 02:00:27 xenutrecht02 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 27 02:00:27 xenutrecht02 kernel: ata1.00: BMDMA stat 0x4 Jun 27 02:00:27 xenutrecht02 kernel: ata1.00: cmd 25/00:00:bf:b1:a5/00:04:41:00:00/e0 tag 0 dma 524288 in Jun 27 02:00:27 xenutrecht02 kernel: res 51/40:da:e5:b3:a5/40:01:41:00:00/e0 Emask 0x9 (media error) Jun 27 02:00:27 xenutrecht02 kernel: ata1.00: status: { DRDY ERR } Jun 27 02:00:27 xenutrecht02 kernel: ata1.00: error: { UNC } Jun 27 02:00:28 xenutrecht02 kernel: ata1.00: configured for UDMA/133 Jun 27 02:00:28 xenutrecht02 kernel: ata1: EH complete Jun 27 02:00:29 xenutrecht02 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 27 02:00:29 xenutrecht02 kernel: ata1.00: BMDMA stat 0x4 Jun 27 02:00:29 xenutrecht02 kernel: ata1.00: cmd 25/00:00:bf:b1:a5/00:04:41:00:00/e0 tag 0 dma 524288 in Jun 27 02:00:29 xenutrecht02 kernel: res 51/40:da:e5:b3:a5/40:01:41:00:00/e0 Emask 0x9 (media error) Jun 27 02:00:29 xenutrecht02 kernel: ata1.00: status: { DRDY ERR } Jun 27 02:00:29 xenutrecht02 kernel: ata1.00: error: { UNC } Jun 27 02:00:29 xenutrecht02 kernel: ata1.00: configured for UDMA/133 Jun 27 02:00:29 xenutrecht02 kernel: ata1: EH complete Jun 27 02:00:30 xenutrecht02 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 27 02:00:30 xenutrecht02 kernel: ata1.00: BMDMA stat 0x4 Jun 27 02:00:30 xenutrecht02 kernel: ata1.00: cmd 25/00:00:bf:b1:a5/00:04:41:00:00/e0 tag 0 dma 524288 in Jun 27 02:00:30 xenutrecht02 kernel: res 51/40:da:e5:b3:a5/40:01:41:00:00/e0 Emask 0x9 (media error) Jun 27 02:00:30 xenutrecht02 kernel: ata1.00: status: { DRDY ERR } Jun 27 02:00:30 xenutrecht02 kernel: ata1.00: error: { UNC } Jun 27 02:00:30 xenutrecht02 kernel: ata1.00: configured for UDMA/133 Jun 27 02:00:30 xenutrecht02 kernel: ata1: EH complete Jun 27 02:00:31 xenutrecht02 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 27 02:00:31 xenutrecht02 kernel: ata1.00: BMDMA stat 0x4 Jun 27 02:00:31 xenutrecht02 kernel: ata1.00: cmd 25/00:00:bf:b1:a5/00:04:41:00:00/e0 tag 0 dma 524288 in Jun 27 02:00:31 xenutrecht02 kernel: res 51/40:da:e5:b3:a5/40:01:41:00:00/e0 Emask 0x9 (media error) Jun 27 02:00:31 xenutrecht02 kernel: ata1.00: status: { DRDY ERR } Jun 27 02:00:31 xenutrecht02 kernel: ata1.00: error: { UNC } Jun 27 02:00:32 xenutrecht02 kernel: ata1.00: configured for UDMA/133 Jun 27 02:00:32 xenutrecht02 kernel: sd 1:0:0:0: SCSI error: return code = 0x08000002 Jun 27 02:00:32 xenutrecht02 kernel: sdb: Current [descriptor]: sense key: Medium Error Jun 27 02:00:32 xenutrecht02 kernel: Add. Sense: Unrecovered read error - auto reallocate failed Jun 27 02:00:32 xenutrecht02 kernel: Jun 27 02:00:32 xenutrecht02 kernel: Descriptor sense data with sense descriptors (in hex): Jun 27 02:00:32 xenutrecht02 kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Jun 27 02:00:32 xenutrecht02 kernel: 41 a5 b3 e5 Jun 27 02:00:32 xenutrecht02 kernel: end_request: I/O error, dev sdb, sector 1101378533 Jun 27 02:00:32 xenutrecht02 kernel: printk: 1 messages suppressed. Jun 27 02:00:32 xenutrecht02 kernel: raid1: sdb1: rescheduling sector 1101378464 Jun 27 02:00:32 xenutrecht02 kernel: raid1: sdb1: rescheduling sector 1101378472 Jun 27 02:00:32 xenutrecht02 kernel: raid1: sdb1: rescheduling sector 1101378480 Jun 27 02:00:32 xenutrecht02 kernel: raid1: sdb1: rescheduling sector 1101378488 Jun 27 02:00:32 xenutrecht02 kernel: raid1: sdb1: rescheduling sector 1101378496 Jun 27 02:00:32 xenutrecht02 kernel: raid1: sdb1: rescheduling sector 1101378504 Jun 27 02:00:32 xenutrecht02 kernel: raid1: sdb1: rescheduling sector 1101378512 Jun 27 02:00:32 xenutrecht02 kernel: raid1: sdb1: rescheduling sector 1101378520 Jun 27 02:00:32 xenutrecht02 kernel: raid1: sdb1: rescheduling sector 1101378528 Jun 27 02:00:32 xenutrecht02 kernel: raid1: sdb1: rescheduling sector 1101378536 Jun 27 02:00:32 xenutrecht02 kernel: ata1: EH complete Jun 27 02:00:32 xenutrecht02 kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB) Jun 27 02:00:33 xenutrecht02 kernel: sdb: Write Protect is off Jun 27 02:00:33 xenutrecht02 kernel: SCSI device sdb: drive cache: write back Jun 27 02:00:34 xenutrecht02 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 27 02:00:34 xenutrecht02 kernel: ata1.00: BMDMA stat 0x4 Jun 27 02:00:34 xenutrecht02 kernel: ata1.00: cmd 25/00:00:bf:c1:a5/00:04:41:00:00/e0 tag 0 dma 524288 in Jun 27 02:00:34 xenutrecht02 kernel: res 51/40:5a:65:c3:a5/40:02:41:00:00/e0 Emask 0x9 (media error) Jun 27 02:00:34 xenutrecht02 kernel: ata1.00: status: { DRDY ERR } Jun 27 02:00:34 xenutrecht02 kernel: ata1.00: error: { UNC } Jun 27 02:00:35 xenutrecht02 kernel: ata1.00: configured for UDMA/133 Jun 27 02:00:35 xenutrecht02 kernel: ata1: EH complete Jun 27 02:00:36 xenutrecht02 kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB) Jun 27 02:00:36 xenutrecht02 kernel: sdb: Write Protect is off Jun 27 02:00:36 xenutrecht02 kernel: SCSI device sdb: drive cache: write back Jun 27 02:00:36 xenutrecht02 kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB) Jun 27 02:00:36 xenutrecht02 kernel: sdb: Write Protect is off Jun 27 02:00:36 xenutrecht02 kernel: SCSI device sdb: drive cache: write back Jun 27 02:00:39 xenutrecht02 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 27 02:00:39 xenutrecht02 kernel: ata1.00: BMDMA stat 0x4 Jun 27 02:00:39 xenutrecht02 kernel: ata1.00: cmd 25/00:08:df:b3:a5/00:00:41:00:00/e0 tag 0 dma 4096 in Jun 27 02:00:39 xenutrecht02 kernel: res 51/40:02:e5:b3:a5/40:00:41:00:00/e0 Emask 0x9 (media error) Jun 27 02:00:39 xenutrecht02 kernel: ata1.00: status: { DRDY ERR } Jun 27 02:00:39 xenutrecht02 kernel: ata1.00: error: { UNC } Jun 27 02:00:39 xenutrecht02 kernel: ata1.00: configured for UDMA/133 Jun 27 02:00:39 xenutrecht02 kernel: ata1: EH complete Jun 27 02:00:40 xenutrecht02 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Jun 27 02:00:40 xenutrecht02 kernel: ata1.00: BMDMA stat 0x4 Jun 27 02:00:40 xenutrecht02 kernel: ata1.00: cmd 25/00:08:df:b3:a5/00:00:41:00:00/e0 tag 0 dma 4096 in Jun 27 02:00:40 xenutrecht02 kernel: res 51/40:02:e5:b3:a5/40:00:41:00:00/e0 Emask 0x9 (media error) Jun 27 02:00:40 xenutrecht02 kernel: ata1.00: status: { DRDY ERR } Jun 27 02:00:40 xenutrecht02 kernel: ata1.00: error: { UNC } Jun 27 02:00:40 xenutrecht02 kernel: ata1.00: configured for UDMA/133 Jun 27 02:00:40 xenutrecht02 kernel: ata1: EH complete Jun 27 02:00:41 xenutrecht02 kernel: printk: 50 messages suppressed. Jun 27 02:00:41 xenutrecht02 kernel: raid1: sdc1: redirecting sector 1101378464 to another mirror Jun 27 02:00:41 xenutrecht02 kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB) Jun 27 02:00:41 xenutrecht02 kernel: sdb: Write Protect is off Jun 27 02:00:41 xenutrecht02 kernel: SCSI device sdb: drive cache: write back Jun 27 02:00:41 xenutrecht02 kernel: SCSI device sdb: 1953525168 512-byte hdwr sectors (1000205 MB) Jun 27 02:00:41 xenutrecht02 kernel: sdb: Write Protect is off Jun 27 02:00:41 xenutrecht02 kernel: SCSI device sdb: drive cache: write back Jun 27 02:00:42 xenutrecht02 kernel: printk: 9 messages suppressed. Jun 27 02:00:42 xenutrecht02 kernel: raid1: sdb1: redirecting sector 1101378544 to another mirror Jun 27 02:00:49 xenutrecht02 kernel: block drbd4: [drbd4_worker/3598] sock_sendmsg time expired, ko = 4294967295 Jun 27 02:00:55 xenutrecht02 kernel: block drbd4: [drbd4_worker/3598] sock_sendmsg time expired, ko = 4294967294 and this kept repeating.. So raid1 solved the the disk problem but drbd locked up. I had to reboot the machine (rather hard by the power button as normal reboot did stalled). As I was not sure whether this was a real hardware problem instead of a cabling issue, I put the disk back in the raid. This effectively rewrites the entire disk and no bad sectors remained. I changed some cables and tried again. This weekend the disk crashed again. As a result of the crash drbd locked up again and now all virtual machines on both host stopped working. qemu used 100% of cpu and the load on both machines went up 10 20. I was still able to access the two hosts over ssh, but even a reboot was no longer working (I managed to enable sysrq and forced a reboot in this way). After the reboot all machines came back up. So the other host has not rebooted. This shows that a secondary can pull down a primary. (Yes I am convinced now the disk is really bad :-) ) The disappointment for me is that I use block device mirroring with drbd between two xen hosts so that I can quickly switch over virtual machines to the surviving node. Without drbd the raid1 would have properly worked. Perhaps drbd and xen don't go along nicely. I consider to stop using it, as it turns out not to be reliable. Instead I plan to use an iscsi disk in a raid1 mirror with a local disk. The raid1 is setup to only write to the iscsi disk (write-mostly) and read from the local disk. This gives me the same hardware redundancy as when using drbd. I can use the iscsi disk on another machine in case of migration. The drbd way is nicer, but only if it is reliable in my setup. And I have no clue how to achieve that. Cheers, Theo