Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
drbd.conf content:
net {
after-sb-0pri discard-older-primary;
after-sb-1pri call-pri-lost-after-sb;
after-sb-2pri call-pri-lost-after-sb;
}
----- Original Message -----
From: Walter Haidinger
To: drbd-user at lists.linbit.com
Sent: Saturday, January 19, 2008 10:12 PM
Subject: [DRBD-user] disconnecting hangs after ko-count failure
Hi!
I've got a reproducible problem with drbd-8.0.8. Well, the problem surfaced after migrating to v8, it never showed with drbd v7.
Both peers run openSUSE 10.3 and a self-compiled drbd 8.0.8. They're connected over an 11b WLAN-Link with a throughput of about 550 kB/s. Exchanged data is tunneled using OpenVPN. The link is pretty stable, I had a ping running for over a week and did not lose a single packet with an average rtt of 2.4 ms.
Now, ever since going to v8, the peer is considered dead due to the ko-count (raised to 25 already). I've yet to figure out why this happens but there is a more annoying issue:
Any subsequent attempt to shut down drbd on either node makes drbd hang in the disconnecting state, i.e. 'drbdadm disconnect res0' will show:
Child process does not terminate!
Exiting.
/proc/drbd on both nodes then:
0: cs:Disconnecting st:Secondary/Unknown ds:UpToDate/Inconsistent C r---
The logs show:
Becoming sync source due to disk states.
peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
[drbd0_receiver/13018] sock_sendmsg time expired, ko = 24
[drbd0_receiver/13018] sock_sendmsg time expired, ko = 23
[drbd0_receiver/13018] sock_sendmsg time expired, ko = 22
[drbd0_receiver/13018] sock_sendmsg time expired, ko = 21
[drbd0_receiver/13018] sock_sendmsg time expired, ko = 20
[drbd0_receiver/13018] sock_sendmsg time expired, ko = 19
PingAck did not arrive in time.
peer( Secondary -> Unknown ) conn( WFBitMapS -> NetworkFailure )
asender terminated
short sent ReportBitMap size=4096 sent=2012
Writing meta data super block now.
md_sync_timer expired! Worker calls drbd_md_sync().
role( Primary -> Secondary )
conn( NetworkFailure -> Disconnecting )
The other node logs:
Becoming sync target due to disk states.
peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Writing meta data super block now.
sock_recvmsg returned -104
peer( Primary -> Unknown ) conn( WFBitMapT -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
asender terminated
error receiving ReportBitMap, l: 4088!
tl_clear()
Connection closed
Writing meta data super block now.
conn( NetworkFailure -> Unconnected )
Is there any way to force a disconnect? So far only rebooting both nodes solves this as drbd will reconnect then. Well, until the next network failure.
How can I further diagnose this? Do you need more information, like the drbd.conf setup?
Any hints are appreciated!
Thanks, Walter
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
__________ NOD32 2808 (20080120) Information __________
This message was checked by NOD32 antivirus system.
http://www.nod32.com.hk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080121/e52faa25/attachment.htm>