Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I asked about this before, but maybe I did it in the wrong way. I'll try again,
and be brief.
Setup: Two systems; hypatia is primary, orestes is secondary. OS is Scientific
Linux 5.5: kernel 2.6.18-194.26.1.el5xen; DRBD version drbd-8.3.8.1-30.el5.
On both systems: /dev/sdc1 and /dev/sdd1 make a software RAID1, /dev/md2. DRBD
resource "admin" is device /dev/drbd1 in a Primary/Secondary configuration,
formed from /dev/md2 on both systems.
Here's the problem. There was a hardware failure on one of the RAID1 drives on
the secondary:
Jun 8 01:04:04 orestes kernel: ata4.00: exception Emask 0x40 SAct 0x0 SErr
0x800 action 0x6 frozen
and so on. But for some reason, this led to a problem on the primary:
Jun 8 01:04:39 hypatia kernel: block drbd1: [drbd1_worker/6650] sock_sendmsg
time expired, ko = 4294967295
Jun 8 01:04:45 hypatia kernel: block drbd1: [drbd1_worker/6650] sock_sendmsg
time expired, ko = 4294967294
From googling, I know this means that DRBD couldn't write to drbd1 anymore.
Any ideas of how this could happen, or anything I could test?
Config file:
global {
usage-count yes;
}
common {
protocol A;
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
}
startup {
}
disk {
}
net {
ping-timeout 11;
}
syncer {
rate 15M;
}
}
resource admin {
device /dev/drbd1;
disk /dev/md2;
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri consensus;
after-sb-2pri disconnect;
}
startup {
wfc-timeout 60;
degr-wfc-timeout 60;
outdated-wfc-timeout 60;
}
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh sysadmin at nevis.columbia.edu";
}
meta-disk internal;
on hypatia.nevis.columbia.edu {
address 192.168.100.7:7789;
}
on orestes.nevis.columbia.edu {
address 192.168.100.6:7789;
}
}
--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4497 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110706/6bec7ffc/attachment.bin>