[DRBD-user] disconnecting hangs after ko-count failure

Walter Haidinger walter.haidinger at gmx.at
Sat Jan 19 15:12:23 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi!

I've got a reproducible problem with drbd-8.0.8. Well, the problem surfaced after migrating to v8, it never showed with drbd v7.

Both peers run openSUSE 10.3 and a self-compiled drbd 8.0.8. They're connected over an 11b WLAN-Link with a throughput of about 550 kB/s. Exchanged data is tunneled using OpenVPN. The link is pretty stable, I had a ping running for over a week and did not lose a single packet with an average rtt of 2.4 ms.

Now, ever since going to v8, the peer is considered dead due to the ko-count (raised to 25 already). I've yet to figure out why this happens but there is a more annoying issue:

Any subsequent attempt to shut down drbd on either node makes drbd hang in the disconnecting state, i.e. 'drbdadm disconnect res0' will show: 
Child process does not terminate!
Exiting.

/proc/drbd on both nodes then:
 0: cs:Disconnecting st:Secondary/Unknown ds:UpToDate/Inconsistent C r---

The logs show:
 Becoming sync source due to disk states.
 peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
 [drbd0_receiver/13018] sock_sendmsg time expired, ko = 24
 [drbd0_receiver/13018] sock_sendmsg time expired, ko = 23
 [drbd0_receiver/13018] sock_sendmsg time expired, ko = 22
 [drbd0_receiver/13018] sock_sendmsg time expired, ko = 21
 [drbd0_receiver/13018] sock_sendmsg time expired, ko = 20
 [drbd0_receiver/13018] sock_sendmsg time expired, ko = 19
 PingAck did not arrive in time.
 peer( Secondary -> Unknown ) conn( WFBitMapS -> NetworkFailure )
 asender terminated
 short sent ReportBitMap size=4096 sent=2012
 Writing meta data super block now.
 md_sync_timer expired! Worker calls drbd_md_sync().
 role( Primary -> Secondary )
 conn( NetworkFailure -> Disconnecting )

The other node logs:
 Becoming sync target due to disk states.
 peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
 Writing meta data super block now.
 sock_recvmsg returned -104
 peer( Primary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
 asender terminated
 error receiving ReportBitMap, l: 4088!
 tl_clear()
 Connection closed
 Writing meta data super block now.
 conn( NetworkFailure -> Unconnected )

Is there any way to force a disconnect? So far only rebooting both nodes solves this as drbd will reconnect then. Well, until the next network failure. 

How can I further diagnose this? Do you need more information, like the drbd.conf setup? 

Any hints are appreciated!

Thanks, Walter
-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer



More information about the drbd-user mailing list