[DRBD-user] "Concurrent local write detected!"

Chris Worley worleys at gmail.com
Mon Dec 20 21:06:41 CET 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

I've read in the archives that this is a severe error, even in a
primary/primary setup, but have seen nothing to fix it, and I see them
spew constantly whenever using DRBD, on both primary systems (with GFS
atop or not).

I'm using RHEL5.5/2.6.18-194.3.1.el5 and IB/SDP.

This seems to have eventually lead to the following message spewing on
one primary:

block drbd0: [drbd0_worker/7157] sock_sendmsg time expired, ko = ...

...and later timeouts like (I'm using fio to benchmark):

INFO: task fio:9015 blocked for more than 120 seconds.
fio           D ffffffff80150462     0  9015   8972          9016       (NOTLB)
 ffff8109c9083d78 0000000000000086 0000000000000000 0000000000000000
 0000000000000000 0000000000000007 ffff8108c286c080 ffff8106552750c0
 00005f2aba62be3a 0000000000000365 ffff8108c286c268 0000000500000080
Call Trace:
 [<ffffffff80063c6f>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80063cb9>] .text.lock.mutex+0xf/0x14
 [<ffffffff80063c06>] __mutex_unlock_slowpath+0x2a/0x33
 [<ffffffff887d346a>] :gfs:__gfs_write+0x82/0xc6
 [<ffffffff800eef32>] aio_pwrite+0x2c/0x75
 [<ffffffff800ef9f3>] aio_run_iocb+0xef/0x18a
 [<ffffffff800f055d>] io_submit_one+0x396/0x499
 [<ffffffff800f0b74>] sys_io_submit+0xbe/0x1a4
 [<ffffffff8005d116>] system_call+0x7e/0x83

...on the other primary, I see:

block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1
exit code 0 (0x0)
block drbd1: Split-Brain detected but unresolved, dropping connection!
block drbd1: helper command: /sbin/drbdadm split-brain minor-1
block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0)
block drbd1: conn( NetworkFailure -> Disconnecting )
block drbd1: error receiving ReportState, l: 4!
block drbd1: Connection closed
block drbd1: conn( Disconnecting -> StandAlone )
block drbd1: receiver terminated
block drbd1: Terminating receiver thread

... followed by more "Concurrent local write detected!" then the
"sock_sendmsg time expired, ko =" spewage on that primary too.

The IB link is fine.  The split-brain resolution failure is due to the
fencing mechanism not working, but I'm not worried about that yet (it
should have never gotten to the state of detecting split brain): I'm
worried about resolving the "concurrent local write" issue, and
keeping GFS from hanging.

Any ideas?



More information about the drbd-user mailing list