Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 3/4/11 12:38 PM, William Seligman wrote: > I've RTFM'ed and google'd on this problem. Now I ask the experts. Now that I've joined this list, I looked at the archives directly. I see that Cory Coager reported the same problem: http://lists.linbit.com/pipermail/drbd-user/2011-March/015735.html Lars Ellenberg suggested that the problem was due to a bad NIC. Maybe... but what are the odds that two different systems have a bad NIC? > Setup: Two systems; hypatia is primary, orestes is secondary. OS is Scientific > Linux 5.5: kernel 2.6.18-194.26.1.el5xen; DRBD version drbd-8.3.8.1-30.el5. > > Each has two partitions that are used for separate DRBD devices: /dev/md0 > (software RAID1) and /dev/sdd2. On both systems: > > partition /dev/md0 => device drbd1 > partition /dev/sdd2 => device drbd2 > > The DRBD traffic goes over a single Ethernet cable that connects the two systems. > > For drbd1, the control heirarchy is Corosync->DRBD->LVM->Xen. > For drbd2, the control is Corosync->DRBD->Just mount the thing. > > The complicated one is drbd1, but it seems to work just fine. The problem > appears to be with drbd2, which doesn't do much of anything; it's a work/backup > directory which I use to take infrequent (~two months) snapshots of the virtual > machines on drbd1. > > Every ten seconds, the error messages at the end of this post appear in the log > of the primary system and there are similar lines on the secondary system. It > seems that drbd2 is losing its connection, re-establishing, and doing a re-sync. > > Everything works, most of the time. But once every few weeks there's enough of a > delay that Corosync takes notice and STONITHs one of the systems, which is a big > pain. > > I've tried: > - switching from Protocol C to Protocol A > - setting "net {ping-timeout 100;}" > - throttling the connection by "syncer {rate 10M;}" (used to be 100M) > > Any ideas? > > Mar 4 12:26:25 hypatia kernel: block drbd2: meta connection shut down by peer. > Mar 4 12:26:25 hypatia kernel: block drbd2: peer( Secondary -> Unknown ) conn( > Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) > Mar 4 12:26:25 hypatia kernel: block drbd2: asender terminated > Mar 4 12:26:25 hypatia kernel: block drbd2: Terminating asender thread > Mar 4 12:26:25 hypatia kernel: block drbd2: sock was shut down by peer > Mar 4 12:26:25 hypatia kernel: block drbd2: short read expecting header on > sock: r=0 > Mar 4 12:26:25 hypatia kernel: block drbd2: Creating new current UUID > Mar 4 12:26:25 hypatia kernel: block drbd2: Connection closed > Mar 4 12:26:25 hypatia kernel: block drbd2: conn( NetworkFailure -> Unconnected ) > Mar 4 12:26:25 hypatia kernel: block drbd2: receiver terminated > Mar 4 12:26:25 hypatia kernel: block drbd2: Restarting receiver thread > Mar 4 12:26:25 hypatia kernel: block drbd2: receiver (re)started > Mar 4 12:26:25 hypatia kernel: block drbd2: conn( Unconnected -> WFConnection ) > Mar 4 12:26:25 hypatia kernel: block drbd2: Handshake successful: Agreed > network protocol version 94 > Mar 4 12:26:25 hypatia kernel: block drbd2: conn( WFConnection -> WFReportParams ) > Mar 4 12:26:25 hypatia kernel: block drbd2: Starting asender thread (from > drbd2_receiver [7920]) > Mar 4 12:26:25 hypatia kernel: block drbd2: data-integrity-alg: <not-used> > Mar 4 12:26:25 hypatia kernel: block drbd2: drbd_sync_handshake: > Mar 4 12:26:25 hypatia kernel: block drbd2: self > C4884637D2C418DF:922772A0478F5E1F:2DE51139CD7C3DF7:EB27F748FC21DC65 bits:0 flags:0 > Mar 4 12:26:25 hypatia kernel: block drbd2: peer > 922772A0478F5E1E:0000000000000000:2DE51139CD7C3DF6:EB27F748FC21DC65 bits:0 flags:0 > Mar 4 12:26:25 hypatia kernel: block drbd2: uuid_compare()=1 by rule 70 > Mar 4 12:26:25 hypatia kernel: block drbd2: peer( Unknown -> Secondary ) conn( > WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) > Mar 4 12:26:25 hypatia kernel: block drbd2: conn( WFBitMapS -> SyncSource ) > pdsk( UpToDate -> Inconsistent ) > Mar 4 12:26:25 hypatia kernel: block drbd2: Began resync as SyncSource (will > sync 0 KB [0 bits set]). > Mar 4 12:26:25 hypatia kernel: block drbd2: Resync done (total 1 sec; paused 0 > sec; 0 K/sec) > Mar 4 12:26:25 hypatia kernel: block drbd2: conn( SyncSource -> Connected ) > pdsk( Inconsistent -> UpToDate ) > -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5894 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110304/a017a273/attachment.bin>