Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I've RTFM'ed and google'd on this problem. Now I ask the experts. Setup: Two systems; hypatia is primary, orestes is secondary. OS is Scientific Linux 5.5: kernel 2.6.18-194.26.1.el5xen; DRBD version drbd-8.3.8.1-30.el5. Each has two partitions that are used for separate DRBD devices: /dev/md0 (software RAID1) and /dev/sdd2. On both systems: partition /dev/md0 => device drbd1 partition /dev/sdd2 => device drbd2 The DRBD traffic goes over a single Ethernet cable that connects the two systems. For drbd1, the control heirarchy is Corosync->DRBD->LVM->Xen. For drbd2, the control is Corosync->DRBD->Just mount the thing. The complicated one is drbd1, but it seems to work just fine. The problem appears to be with drbd2, which doesn't do much of anything; it's a work/backup directory which I use to take infrequent (~two months) snapshots of the virtual machines on drbd1. Every ten seconds, the error messages at the end of this post appear in the log of the primary system and there are similar lines on the secondary system. It seems that drbd2 is losing its connection, re-establishing, and doing a re-sync. Everything works, most of the time. But once every few weeks there's enough of a delay that Corosync takes notice and STONITHs one of the systems, which is a big pain. I've tried: - switching from Protocol C to Protocol A - setting "net {ping-timeout 100;}" - throttling the connection by "syncer {rate 10M;}" (used to be 100M) Any ideas? Mar 4 12:26:25 hypatia kernel: block drbd2: meta connection shut down by peer. Mar 4 12:26:25 hypatia kernel: block drbd2: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Mar 4 12:26:25 hypatia kernel: block drbd2: asender terminated Mar 4 12:26:25 hypatia kernel: block drbd2: Terminating asender thread Mar 4 12:26:25 hypatia kernel: block drbd2: sock was shut down by peer Mar 4 12:26:25 hypatia kernel: block drbd2: short read expecting header on sock: r=0 Mar 4 12:26:25 hypatia kernel: block drbd2: Creating new current UUID Mar 4 12:26:25 hypatia kernel: block drbd2: Connection closed Mar 4 12:26:25 hypatia kernel: block drbd2: conn( NetworkFailure -> Unconnected ) Mar 4 12:26:25 hypatia kernel: block drbd2: receiver terminated Mar 4 12:26:25 hypatia kernel: block drbd2: Restarting receiver thread Mar 4 12:26:25 hypatia kernel: block drbd2: receiver (re)started Mar 4 12:26:25 hypatia kernel: block drbd2: conn( Unconnected -> WFConnection ) Mar 4 12:26:25 hypatia kernel: block drbd2: Handshake successful: Agreed network protocol version 94 Mar 4 12:26:25 hypatia kernel: block drbd2: conn( WFConnection -> WFReportParams ) Mar 4 12:26:25 hypatia kernel: block drbd2: Starting asender thread (from drbd2_receiver [7920]) Mar 4 12:26:25 hypatia kernel: block drbd2: data-integrity-alg: <not-used> Mar 4 12:26:25 hypatia kernel: block drbd2: drbd_sync_handshake: Mar 4 12:26:25 hypatia kernel: block drbd2: self C4884637D2C418DF:922772A0478F5E1F:2DE51139CD7C3DF7:EB27F748FC21DC65 bits:0 flags:0 Mar 4 12:26:25 hypatia kernel: block drbd2: peer 922772A0478F5E1E:0000000000000000:2DE51139CD7C3DF6:EB27F748FC21DC65 bits:0 flags:0 Mar 4 12:26:25 hypatia kernel: block drbd2: uuid_compare()=1 by rule 70 Mar 4 12:26:25 hypatia kernel: block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Mar 4 12:26:25 hypatia kernel: block drbd2: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Mar 4 12:26:25 hypatia kernel: block drbd2: Began resync as SyncSource (will sync 0 KB [0 bits set]). Mar 4 12:26:25 hypatia kernel: block drbd2: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Mar 4 12:26:25 hypatia kernel: block drbd2: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5894 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110304/5ca2f46a/attachment.bin>