Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I asked about this before, but maybe I did it in the wrong way. I'll try again, and be brief. Setup: Two systems; hypatia is primary, orestes is secondary. OS is Scientific Linux 5.5: kernel 2.6.18-194.26.1.el5xen; DRBD version drbd-8.3.8.1-30.el5. On both systems: /dev/sdc1 and /dev/sdd1 make a software RAID1, /dev/md2. DRBD resource "admin" is device /dev/drbd1 in a Primary/Secondary configuration, formed from /dev/md2 on both systems. Here's the problem. There was a hardware failure on one of the RAID1 drives on the secondary: Jun 8 01:04:04 orestes kernel: ata4.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x6 frozen and so on. But for some reason, this led to a problem on the primary: Jun 8 01:04:39 hypatia kernel: block drbd1: [drbd1_worker/6650] sock_sendmsg time expired, ko = 4294967295 Jun 8 01:04:45 hypatia kernel: block drbd1: [drbd1_worker/6650] sock_sendmsg time expired, ko = 4294967294 From googling, I know this means that DRBD couldn't write to drbd1 anymore. Any ideas of how this could happen, or anything I could test? Config file: global { usage-count yes; } common { protocol A; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; } startup { } disk { } net { ping-timeout 11; } syncer { rate 15M; } } resource admin { device /dev/drbd1; disk /dev/md2; net { after-sb-0pri discard-zero-changes; after-sb-1pri consensus; after-sb-2pri disconnect; } startup { wfc-timeout 60; degr-wfc-timeout 60; outdated-wfc-timeout 60; } handlers { split-brain "/usr/lib/drbd/notify-split-brain.sh sysadmin at nevis.columbia.edu"; } meta-disk internal; on hypatia.nevis.columbia.edu { address 192.168.100.7:7789; } on orestes.nevis.columbia.edu { address 192.168.100.6:7789; } } -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4497 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110706/6bec7ffc/attachment.bin>