Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I've read in the archives that this is a severe error, even in a primary/primary setup, but have seen nothing to fix it, and I see them spew constantly whenever using DRBD, on both primary systems (with GFS atop or not). I'm using RHEL5.5/2.6.18-194.3.1.el5 and IB/SDP. This seems to have eventually lead to the following message spewing on one primary: block drbd0: [drbd0_worker/7157] sock_sendmsg time expired, ko = ... ...and later timeouts like (I'm using fio to benchmark): INFO: task fio:9015 blocked for more than 120 seconds. fio D ffffffff80150462 0 9015 8972 9016 (NOTLB) ffff8109c9083d78 0000000000000086 0000000000000000 0000000000000000 0000000000000000 0000000000000007 ffff8108c286c080 ffff8106552750c0 00005f2aba62be3a 0000000000000365 ffff8108c286c268 0000000500000080 Call Trace: [<ffffffff80063c6f>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff80063cb9>] .text.lock.mutex+0xf/0x14 [<ffffffff80063c06>] __mutex_unlock_slowpath+0x2a/0x33 [<ffffffff887d346a>] :gfs:__gfs_write+0x82/0xc6 [<ffffffff800eef32>] aio_pwrite+0x2c/0x75 [<ffffffff800ef9f3>] aio_run_iocb+0xef/0x18a [<ffffffff800f055d>] io_submit_one+0x396/0x499 [<ffffffff800f0b74>] sys_io_submit+0xbe/0x1a4 [<ffffffff8005d116>] system_call+0x7e/0x83 ...on the other primary, I see: block drbd1: helper command: /sbin/drbdadm initial-split-brain minor-1 exit code 0 (0x0) block drbd1: Split-Brain detected but unresolved, dropping connection! block drbd1: helper command: /sbin/drbdadm split-brain minor-1 block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0) block drbd1: conn( NetworkFailure -> Disconnecting ) block drbd1: error receiving ReportState, l: 4! block drbd1: Connection closed block drbd1: conn( Disconnecting -> StandAlone ) block drbd1: receiver terminated block drbd1: Terminating receiver thread ... followed by more "Concurrent local write detected!" then the "sock_sendmsg time expired, ko =" spewage on that primary too. The IB link is fine. The split-brain resolution failure is due to the fencing mechanism not working, but I'm not worried about that yet (it should have never gotten to the state of detecting split brain): I'm worried about resolving the "concurrent local write" issue, and keeping GFS from hanging. Any ideas? Thanks, Chris