Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Le 20 juin 2013 à 22:47, AZ 9901 a écrit : > Hello, > > I already faced this issue sporadically a few months ago, it occured again last night. > Here is what happens. > > > > Online verification is running (as a weekly basis) : > > root at srv2-1:~# cat /proc/drbd > version: 8.3.15 (api:88/proto:86-97) > GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by root at srv2-1, 2013-05-20 13:24:15 > 1: cs:VerifyT ro:Primary/Secondary ds:UpToDate/UpToDate C r---d- > ns:1717212 nr:0 dw:1720216 dr:844934114 al:7737 bm:0 lo:1 pe:4238 ua:2048 ap:2049 ep:1 wo:b oos:0 > [=======>............] verified: 41.9% (1105704/1902544)M > finish: 10095:28:53 speed: 28 (9,648) K/sec > > With the following settings : > syncer { > rate 10M; > verify-alg crc32c; > } > > During this verification, primary's network input rate is about 3Mbps, output rate 1Mbps (out of 100Mbps). > > > > Some activity starts on the resource, taking network rate between 4Mbps and 10Mbps (out of 100Mbps). > After about one hour, resource totally hangs, read and write are impossible, even a simple "ls" hangs. > Many many errors like the following one appear in the syslog : > Jun 19 21:08:10 srv2-1 kernel: block drbd1: [drbd1_worker/26788] sock_sendmsg time expired, ko = 4294967295 > > > > At this moment, to take the resource back to production, the only solution I found is to stop network communication between the two nodes (using netfilter/iptables). > Well, I did not think about testing "drbdadm disconnect". > I initially tested "/etc/init.d/drbd stop" on the secondary node, but it hung until network communication was cut. > > > > Questions : > > 1 - Is there a bug that makes DRBD / online verification as if it was in a infinite loop, giving "sock_sendmsg time expired" messages ? > 2 - Could it be possible for the DRBD team to investigate on that ? > 3 - As a workaround, it there any DRBD configuration possible that would for example make the primary StandAlone (disconnect) in case of this error ? Hello, Could you help me with this issue please ? Thank you very much for your support ! Best regards, Ben