[DRBD-user] DRBD sync messages every ten seconds

William Seligman seligman at nevis.columbia.edu
Fri Mar 4 18:38:27 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I've RTFM'ed and google'd on this problem. Now I ask the experts.

Setup: Two systems; hypatia is primary, orestes is secondary. OS is Scientific
Linux 5.5: kernel 2.6.18-194.26.1.el5xen; DRBD version drbd-8.3.8.1-30.el5.

Each has two partitions that are used for separate DRBD devices: /dev/md0
(software RAID1) and /dev/sdd2. On both systems:

partition /dev/md0 => device drbd1
partition /dev/sdd2 => device drbd2

The DRBD traffic goes over a single Ethernet cable that connects the two systems.

For drbd1, the control heirarchy is Corosync->DRBD->LVM->Xen.
For drbd2, the control is Corosync->DRBD->Just mount the thing.

The complicated one is drbd1, but it seems to work just fine. The problem
appears to be with drbd2, which doesn't do much of anything; it's a work/backup
directory which I use to take infrequent (~two months) snapshots of the virtual
machines on drbd1.

Every ten seconds, the error messages at the end of this post appear in the log
of the primary system and there are similar lines on the secondary system. It
seems that drbd2 is losing its connection, re-establishing, and doing a re-sync.

Everything works, most of the time. But once every few weeks there's enough of a
delay that Corosync takes notice and STONITHs one of the systems, which is a big
pain.

I've tried:
- switching from Protocol C to Protocol A
- setting "net {ping-timeout 100;}"
- throttling the connection by "syncer {rate 10M;}" (used to be 100M)

Any ideas?

Mar  4 12:26:25 hypatia kernel: block drbd2: meta connection shut down by peer.
Mar  4 12:26:25 hypatia kernel: block drbd2: peer( Secondary -> Unknown ) conn(
Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Mar  4 12:26:25 hypatia kernel: block drbd2: asender terminated
Mar  4 12:26:25 hypatia kernel: block drbd2: Terminating asender thread
Mar  4 12:26:25 hypatia kernel: block drbd2: sock was shut down by peer
Mar  4 12:26:25 hypatia kernel: block drbd2: short read expecting header on
sock: r=0
Mar  4 12:26:25 hypatia kernel: block drbd2: Creating new current UUID
Mar  4 12:26:25 hypatia kernel: block drbd2: Connection closed
Mar  4 12:26:25 hypatia kernel: block drbd2: conn( NetworkFailure -> Unconnected )
Mar  4 12:26:25 hypatia kernel: block drbd2: receiver terminated
Mar  4 12:26:25 hypatia kernel: block drbd2: Restarting receiver thread
Mar  4 12:26:25 hypatia kernel: block drbd2: receiver (re)started
Mar  4 12:26:25 hypatia kernel: block drbd2: conn( Unconnected -> WFConnection )
Mar  4 12:26:25 hypatia kernel: block drbd2: Handshake successful: Agreed
network protocol version 94
Mar  4 12:26:25 hypatia kernel: block drbd2: conn( WFConnection -> WFReportParams )
Mar  4 12:26:25 hypatia kernel: block drbd2: Starting asender thread (from
drbd2_receiver [7920])
Mar  4 12:26:25 hypatia kernel: block drbd2: data-integrity-alg: <not-used>
Mar  4 12:26:25 hypatia kernel: block drbd2: drbd_sync_handshake:
Mar  4 12:26:25 hypatia kernel: block drbd2: self
C4884637D2C418DF:922772A0478F5E1F:2DE51139CD7C3DF7:EB27F748FC21DC65 bits:0 flags:0
Mar  4 12:26:25 hypatia kernel: block drbd2: peer
922772A0478F5E1E:0000000000000000:2DE51139CD7C3DF6:EB27F748FC21DC65 bits:0 flags:0
Mar  4 12:26:25 hypatia kernel: block drbd2: uuid_compare()=1 by rule 70
Mar  4 12:26:25 hypatia kernel: block drbd2: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Mar  4 12:26:25 hypatia kernel: block drbd2: conn( WFBitMapS -> SyncSource )
pdsk( UpToDate -> Inconsistent )
Mar  4 12:26:25 hypatia kernel: block drbd2: Began resync as SyncSource (will
sync 0 KB [0 bits set]).
Mar  4 12:26:25 hypatia kernel: block drbd2: Resync done (total 1 sec; paused 0
sec; 0 K/sec)
Mar  4 12:26:25 hypatia kernel: block drbd2: conn( SyncSource -> Connected )
pdsk( Inconsistent -> UpToDate )

-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5894 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110304/5ca2f46a/attachment.bin>


More information about the drbd-user mailing list