[DRBD-user] DRBD + OCFS2 - Split-Brain detected but unresolved

Jacek Osiecki cjosh at silvercube.pl
Tue Apr 17 17:06:31 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I am currently testing dual-master setup with DRBD+OCFS2.
Finally I managed to get it working well on kernel 2.6.39.4, DRBD version 
8.3.10 (userland version: 8.4.1) and OCFS2 version 1.5.0.

I had some troubles with broken replication, and sometimes I see that
automatic recovery sometimes works and sometimes does not. What's strange, 
is that this still are tests, and actually when one server is fully 
functional, second one has no processess that even touch the synchronized 
partition.

In dmesg on the active server it looks like this:

[707152.209885] block drbd0: Handshake successful: Agreed network protocol version 96
[707152.209895] block drbd0: conn( WFConnection -> WFReportParams )
[707152.210068] block drbd0: Starting asender thread (from drbd0_receiver [1096])
[707152.210341] block drbd0: data-integrity-alg: <not-used>
[707152.210352] block drbd0: max BIO size = 130560
[707152.210359] block drbd0: drbd_sync_handshake:
[707152.210363] block drbd0: self 8631CEC3370B5C31:A9BC3587FC5AA879:0BF8587A4ABA37B5:0BF7587A4ABA37B5 bits:21 flags:0
[707152.210368] block drbd0: peer 73E08E8E06754D97:A9BC3587FC5AA879:0BF8587A4ABA37B5:0BF7587A4ABA37B5 bits:0 flags:0
[707152.210371] block drbd0: uuid_compare()=100 by rule 90
[707152.210377] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0
[707152.212439] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
[707152.212442] block drbd0: Split-Brain detected but unresolved, dropping connection!
[707152.212445] block drbd0: helper command: /sbin/drbdadm split-brain minor-0
[707152.214134] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
[707152.214137] block drbd0: conn( WFReportParams -> Disconnecting )
[707152.214141] block drbd0: error receiving ReportState, l: 4!
[707152.214150] block drbd0: asender terminated
[707152.214154] block drbd0: Terminating drbd0_asender
[707152.214177] block drbd0: Connection closed
[707152.214180] block drbd0: conn( Disconnecting -> StandAlone )
[707152.214188] block drbd0: receiver terminated
[707152.214190] block drbd0: Terminating drbd0_receiver

Is there any help for this situation? I don't understand why the case 
isn't solved, since second server doesn't write to drbd0, sometimes even 
partition wasn't mounted (I can't be 100% sure, but it seems so).

I would be greatful if you could give me some hint how to make this 
configuration stable, without sacrificing data on one of nodes (now in 
order to recover I have to set second node to slave). Any ideas what is 
wrong in my setup?

P.S. Any suggestions how to measure real performance (read/write/copy) of 
DRBD+OCFS2? UnixBench gives crazy results (read performance about 10% of 
local filesystem)...

Best regards,
-- 
Jacek Osiecki
josiecki at silvercube.pl

Silvercube s.c.
ul. Makuszynskiego 4
31-752 Kraków
+48 (12) 684 21 00


More information about the drbd-user mailing list