[DRBD-user] DRBD sync messages every ten seconds

William Seligman seligman at nevis.columbia.edu
Fri Mar 4 22:58:11 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 3/4/11 12:38 PM, William Seligman wrote:
> I've RTFM'ed and google'd on this problem. Now I ask the experts.

Now that I've joined this list, I looked at the archives directly. I see that
Cory Coager reported the same problem:

http://lists.linbit.com/pipermail/drbd-user/2011-March/015735.html

Lars Ellenberg suggested that the problem was due to a bad NIC. Maybe... but
what are the odds that two different systems have a bad NIC?

> Setup: Two systems; hypatia is primary, orestes is secondary. OS is Scientific
> Linux 5.5: kernel 2.6.18-194.26.1.el5xen; DRBD version drbd-8.3.8.1-30.el5.
> 
> Each has two partitions that are used for separate DRBD devices: /dev/md0
> (software RAID1) and /dev/sdd2. On both systems:
> 
> partition /dev/md0 => device drbd1
> partition /dev/sdd2 => device drbd2
> 
> The DRBD traffic goes over a single Ethernet cable that connects the two systems.
> 
> For drbd1, the control heirarchy is Corosync->DRBD->LVM->Xen.
> For drbd2, the control is Corosync->DRBD->Just mount the thing.
> 
> The complicated one is drbd1, but it seems to work just fine. The problem
> appears to be with drbd2, which doesn't do much of anything; it's a work/backup
> directory which I use to take infrequent (~two months) snapshots of the virtual
> machines on drbd1.
> 
> Every ten seconds, the error messages at the end of this post appear in the log
> of the primary system and there are similar lines on the secondary system. It
> seems that drbd2 is losing its connection, re-establishing, and doing a re-sync.
> 
> Everything works, most of the time. But once every few weeks there's enough of a
> delay that Corosync takes notice and STONITHs one of the systems, which is a big
> pain.
> 
> I've tried:
> - switching from Protocol C to Protocol A
> - setting "net {ping-timeout 100;}"
> - throttling the connection by "syncer {rate 10M;}" (used to be 100M)
> 
> Any ideas?
> 
> Mar  4 12:26:25 hypatia kernel: block drbd2: meta connection shut down by peer.
> Mar  4 12:26:25 hypatia kernel: block drbd2: peer( Secondary -> Unknown ) conn(
> Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> Mar  4 12:26:25 hypatia kernel: block drbd2: asender terminated
> Mar  4 12:26:25 hypatia kernel: block drbd2: Terminating asender thread
> Mar  4 12:26:25 hypatia kernel: block drbd2: sock was shut down by peer
> Mar  4 12:26:25 hypatia kernel: block drbd2: short read expecting header on
> sock: r=0
> Mar  4 12:26:25 hypatia kernel: block drbd2: Creating new current UUID
> Mar  4 12:26:25 hypatia kernel: block drbd2: Connection closed
> Mar  4 12:26:25 hypatia kernel: block drbd2: conn( NetworkFailure -> Unconnected )
> Mar  4 12:26:25 hypatia kernel: block drbd2: receiver terminated
> Mar  4 12:26:25 hypatia kernel: block drbd2: Restarting receiver thread
> Mar  4 12:26:25 hypatia kernel: block drbd2: receiver (re)started
> Mar  4 12:26:25 hypatia kernel: block drbd2: conn( Unconnected -> WFConnection )
> Mar  4 12:26:25 hypatia kernel: block drbd2: Handshake successful: Agreed
> network protocol version 94
> Mar  4 12:26:25 hypatia kernel: block drbd2: conn( WFConnection -> WFReportParams )
> Mar  4 12:26:25 hypatia kernel: block drbd2: Starting asender thread (from
> drbd2_receiver [7920])
> Mar  4 12:26:25 hypatia kernel: block drbd2: data-integrity-alg: <not-used>
> Mar  4 12:26:25 hypatia kernel: block drbd2: drbd_sync_handshake:
> Mar  4 12:26:25 hypatia kernel: block drbd2: self
> C4884637D2C418DF:922772A0478F5E1F:2DE51139CD7C3DF7:EB27F748FC21DC65 bits:0 flags:0
> Mar  4 12:26:25 hypatia kernel: block drbd2: peer
> 922772A0478F5E1E:0000000000000000:2DE51139CD7C3DF6:EB27F748FC21DC65 bits:0 flags:0
> Mar  4 12:26:25 hypatia kernel: block drbd2: uuid_compare()=1 by rule 70
> Mar  4 12:26:25 hypatia kernel: block drbd2: peer( Unknown -> Secondary ) conn(
> WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
> Mar  4 12:26:25 hypatia kernel: block drbd2: conn( WFBitMapS -> SyncSource )
> pdsk( UpToDate -> Inconsistent )
> Mar  4 12:26:25 hypatia kernel: block drbd2: Began resync as SyncSource (will
> sync 0 KB [0 bits set]).
> Mar  4 12:26:25 hypatia kernel: block drbd2: Resync done (total 1 sec; paused 0
> sec; 0 K/sec)
> Mar  4 12:26:25 hypatia kernel: block drbd2: conn( SyncSource -> Connected )
> pdsk( Inconsistent -> UpToDate )
> 


-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5894 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110304/a017a273/attachment.bin>


More information about the drbd-user mailing list