Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi! I've got a reproducible problem with drbd-8.0.8. Well, the problem surfaced after migrating to v8, it never showed with drbd v7. Both peers run openSUSE 10.3 and a self-compiled drbd 8.0.8. They're connected over an 11b WLAN-Link with a throughput of about 550 kB/s. Exchanged data is tunneled using OpenVPN. The link is pretty stable, I had a ping running for over a week and did not lose a single packet with an average rtt of 2.4 ms. Now, ever since going to v8, the peer is considered dead due to the ko-count (raised to 25 already). I've yet to figure out why this happens but there is a more annoying issue: Any subsequent attempt to shut down drbd on either node makes drbd hang in the disconnecting state, i.e. 'drbdadm disconnect res0' will show: Child process does not terminate! Exiting. /proc/drbd on both nodes then: 0: cs:Disconnecting st:Secondary/Unknown ds:UpToDate/Inconsistent C r--- The logs show: Becoming sync source due to disk states. peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) [drbd0_receiver/13018] sock_sendmsg time expired, ko = 24 [drbd0_receiver/13018] sock_sendmsg time expired, ko = 23 [drbd0_receiver/13018] sock_sendmsg time expired, ko = 22 [drbd0_receiver/13018] sock_sendmsg time expired, ko = 21 [drbd0_receiver/13018] sock_sendmsg time expired, ko = 20 [drbd0_receiver/13018] sock_sendmsg time expired, ko = 19 PingAck did not arrive in time. peer( Secondary -> Unknown ) conn( WFBitMapS -> NetworkFailure ) asender terminated short sent ReportBitMap size=4096 sent=2012 Writing meta data super block now. md_sync_timer expired! Worker calls drbd_md_sync(). role( Primary -> Secondary ) conn( NetworkFailure -> Disconnecting ) The other node logs: Becoming sync target due to disk states. peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Writing meta data super block now. sock_recvmsg returned -104 peer( Primary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) asender terminated error receiving ReportBitMap, l: 4088! tl_clear() Connection closed Writing meta data super block now. conn( NetworkFailure -> Unconnected ) Is there any way to force a disconnect? So far only rebooting both nodes solves this as drbd will reconnect then. Well, until the next network failure. How can I further diagnose this? Do you need more information, like the drbd.conf setup? Any hints are appreciated! Thanks, Walter -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer