[DRBD-user] "Concurrent local write detected!"
worleys at gmail.com
Mon Dec 20 21:06:41 CET 2010
I've read in the archives that this is a severe error, even in a
primary/primary setup, but have seen nothing to fix it, and I see them
spew constantly whenever using DRBD, on both primary systems (with GFS
atop or not).
I'm using RHEL5.5/2.6.18-194.3.1.el5 and IB/SDP.
This seems to have eventually lead to the following message spewing on
block drbd0: [drbd0_worker/7157] sock_sendmsg time expired, ko = ...
...and later timeouts like (I'm using fio to benchmark):
INFO: task fio:9015 blocked for more than 120 seconds.
fio D ffffffff80150462 0 9015 8972 9016 (NOTLB)
ffff8109c9083d78 0000000000000086 0000000000000000 0000000000000000
0000000000000000 0000000000000007 ffff8108c286c080 ffff8106552750c0
00005f2aba62be3a 0000000000000365 ffff8108c286c268 0000000500000080
...on the other primary, I see:
block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1
exit code 0 (0x0)
block drbd1: Split-Brain detected but unresolved, dropping connection!
block drbd1: helper command: /sbin/drbdadm split-brain minor-1
block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0)
block drbd1: conn( NetworkFailure -> Disconnecting )
block drbd1: error receiving ReportState, l: 4!
block drbd1: Connection closed
block drbd1: conn( Disconnecting -> StandAlone )
block drbd1: receiver terminated
block drbd1: Terminating receiver thread
... followed by more "Concurrent local write detected!" then the
"sock_sendmsg time expired, ko =" spewage on that primary too.
The IB link is fine. The split-brain resolution failure is due to the
fencing mechanism not working, but I'm not worried about that yet (it
should have never gotten to the state of detecting split brain): I'm
worried about resolving the "concurrent local write" issue, and
keeping GFS from hanging.
More information about the drbd-user