Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I am currently testing dual-master setup with DRBD+OCFS2. Finally I managed to get it working well on kernel 2.6.39.4, DRBD version 8.3.10 (userland version: 8.4.1) and OCFS2 version 1.5.0. I had some troubles with broken replication, and sometimes I see that automatic recovery sometimes works and sometimes does not. What's strange, is that this still are tests, and actually when one server is fully functional, second one has no processess that even touch the synchronized partition. In dmesg on the active server it looks like this: [707152.209885] block drbd0: Handshake successful: Agreed network protocol version 96 [707152.209895] block drbd0: conn( WFConnection -> WFReportParams ) [707152.210068] block drbd0: Starting asender thread (from drbd0_receiver [1096]) [707152.210341] block drbd0: data-integrity-alg: <not-used> [707152.210352] block drbd0: max BIO size = 130560 [707152.210359] block drbd0: drbd_sync_handshake: [707152.210363] block drbd0: self 8631CEC3370B5C31:A9BC3587FC5AA879:0BF8587A4ABA37B5:0BF7587A4ABA37B5 bits:21 flags:0 [707152.210368] block drbd0: peer 73E08E8E06754D97:A9BC3587FC5AA879:0BF8587A4ABA37B5:0BF7587A4ABA37B5 bits:0 flags:0 [707152.210371] block drbd0: uuid_compare()=100 by rule 90 [707152.210377] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 [707152.212439] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0) [707152.212442] block drbd0: Split-Brain detected but unresolved, dropping connection! [707152.212445] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 [707152.214134] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) [707152.214137] block drbd0: conn( WFReportParams -> Disconnecting ) [707152.214141] block drbd0: error receiving ReportState, l: 4! [707152.214150] block drbd0: asender terminated [707152.214154] block drbd0: Terminating drbd0_asender [707152.214177] block drbd0: Connection closed [707152.214180] block drbd0: conn( Disconnecting -> StandAlone ) [707152.214188] block drbd0: receiver terminated [707152.214190] block drbd0: Terminating drbd0_receiver Is there any help for this situation? I don't understand why the case isn't solved, since second server doesn't write to drbd0, sometimes even partition wasn't mounted (I can't be 100% sure, but it seems so). I would be greatful if you could give me some hint how to make this configuration stable, without sacrificing data on one of nodes (now in order to recover I have to set second node to slave). Any ideas what is wrong in my setup? P.S. Any suggestions how to measure real performance (read/write/copy) of DRBD+OCFS2? UnixBench gives crazy results (read performance about 10% of local filesystem)... Best regards, -- Jacek Osiecki josiecki at silvercube.pl Silvercube s.c. ul. Makuszynskiego 4 31-752 Kraków +48 (12) 684 21 00