[DRBD-user] Resource hangs with "time expired" errors

AZ 9901 az9901 at gmail.com
Sat Aug 3 22:07:05 CEST 2013

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Le 20 juin 2013 à 22:47, AZ 9901 a écrit :
> Hello,
> 
> I already faced this issue sporadically a few months ago, it occured again last night.
> Here is what happens.
> 
> 
> 
> Online verification is running (as a weekly basis) :
> 
> root at srv2-1:~# cat /proc/drbd 
> version: 8.3.15 (api:88/proto:86-97)
> GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by root at srv2-1, 2013-05-20 13:24:15
> 1: cs:VerifyT ro:Primary/Secondary ds:UpToDate/UpToDate C r---d-
>    ns:1717212 nr:0 dw:1720216 dr:844934114 al:7737 bm:0 lo:1 pe:4238 ua:2048 ap:2049 ep:1 wo:b oos:0
>    [=======>............] verified: 41.9% (1105704/1902544)M
>    finish: 10095:28:53 speed: 28 (9,648) K/sec
> 
> With the following settings :
> syncer {
>  rate 10M;
>  verify-alg crc32c;
> }
> 
> During this verification, primary's network input rate is about 3Mbps, output rate 1Mbps (out of 100Mbps).
> 
> 
> 
> Some activity starts on the resource, taking network rate between 4Mbps and 10Mbps (out of 100Mbps).
> After about one hour, resource totally hangs, read and write are impossible, even a simple "ls" hangs.
> Many many errors like the following one appear in the syslog :
> Jun 19 21:08:10 srv2-1 kernel: block drbd1: [drbd1_worker/26788] sock_sendmsg time expired, ko = 4294967295
> 
> 
> 
> At this moment, to take the resource back to production, the only solution I found is to stop network communication between the two nodes (using netfilter/iptables).
> Well, I did not think about testing "drbdadm disconnect".
> I initially tested "/etc/init.d/drbd stop" on the secondary node, but it hung until network communication was cut.
> 
> 
> 
> Questions :
> 
> 1 - Is there a bug that makes DRBD / online verification as if it was in a infinite loop, giving "sock_sendmsg time expired" messages ?
> 2 - Could it be possible for the DRBD team to investigate on that ?
> 3 - As a workaround, it there any DRBD configuration possible that would for example make the primary StandAlone (disconnect) in case of this error ?




Hello,

Could you help me with this issue please ?

Thank you very much for your support !

Best regards,

Ben




More information about the drbd-user mailing list