Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I read the doc(http://www.drbd.org/users-guide-emb/s-use-online-verify.html), and it says to resync after a verify failure via: drbdadm disconnect drbdadm connect I did a verify on my main node (DRBD0, below) and it found some OOS blocks, Since the disconnect causes the node to become unavailable, I did the disconnect/reconnect on the other node. This resulted in a split-brain. So, I infer that the disconnect/connect must be on the same node as the node that requested the verify. The next time, I did the verify on the second node (DRBD1) and the disconnect/connect on the same node. Again, split brain. What am I doing wrong? Dan Barker Configuration files below, dmesg output inline. DRBD0 (172.30.0.40) is Primary, iSCSITarget Active for I/O DRBD1 (172.30.0.41) is Primary, iSCSITarget Active (idle) DRBD1 verify all block drbd3: conn( Connected -> VerifyS ) block drbd3: Starting Online Verify from sector 0 block drbd1: conn( Connected -> VerifyS ) block drbd1: Starting Online Verify from sector 0 block drbd2: conn( Connected -> VerifyS ) block drbd2: Starting Online Verify from sector 0 block drbd2: Online verify done (total 10375 sec; paused 0 sec; 25264 K/sec) block drbd2: conn( VerifyS -> Connected ) block drbd3: Out of sync: start=722699032, size=8 (sectors) block drbd3: Out of sync: start=722709312, size=8 (sectors) block drbd3: Out of sync: start=751133504, size=8 (sectors) block drbd1: Online verify done (total 16418 sec; paused 0 sec; 25544 K/sec) block drbd1: conn( VerifyS -> Connected ) block drbd3: Online verify done (total 16880 sec; paused 0 sec; 24844 K/sec) block drbd3: Online verify found 3 4k block out of sync! block drbd3: conn( VerifyS -> Connected ) block drbd3: Writing the whole bitmap, due to failed kmalloc block drbd3: helper command: /sbin/drbdadm out-of-sync minor-3 block drbd3: helper command: /sbin/drbdadm out-of-sync minor-3 exit code 0 (0x0) block drbd3: 12 KB (3 bits) marked out-of-sync by on disk bit-map. DRBD1 iscsiitarget stop iscsi_trgt: Removing all connections, sessions and targets DRBD1 disconnect r2 block drbd3: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) block drbd3: Creating new current UUID block drbd3: meta connection shut down by peer. block drbd3: asender terminated block drbd3: Terminating asender thread block drbd3: Connection closed block drbd3: conn( Disconnecting -> StandAlone ) block drbd3: receiver terminated block drbd3: Terminating receiver thread DRBD1 connect r2 block drbd3: conn( StandAlone -> Unconnected ) block drbd3: Starting receiver thread (from drbd3_worker [2936]) block drbd3: receiver (re)started block drbd3: conn( Unconnected -> WFConnection ) block drbd3: Handshake successful: Agreed network protocol version 91 block drbd3: conn( WFConnection -> WFReportParams ) block drbd3: Starting asender thread (from drbd3_receiver [6831]) block drbd3: data-integrity-alg: <not-used> block drbd3: drbd_sync_handshake: block drbd3: self 0AA964F1A2B09569:5582778FF39681F5:956F8322C0AB90FB:4C3D617FEA1E0097 bits:3 flags:0 block drbd3: peer B32E410E11613899:5582778FF39681F5:956F8322C0AB90FA:4C3D617FEA1E0097 bits:193 flags:0 block drbd3: uuid_compare()=100 by rule 90 block drbd3: Split-Brain detected, dropping connection! block drbd3: helper command: /sbin/drbdadm split-brain minor-3 block drbd3: helper command: /sbin/drbdadm split-brain minor-3 exit code 0 (0x0) block drbd3: conn( WFReportParams -> Disconnecting ) block drbd3: error receiving ReportState, l: 4! block drbd3: asender terminated block drbd3: Terminating asender thread block drbd3: Connection closed block drbd3: conn( Disconnecting -> StandAlone ) block drbd3: receiver terminated block drbd3: Terminating receiver thread DRBD1 secondary r2 block drbd3: role( Primary -> Secondary ) DRBD1 -- --discard-my-data connect r2 block drbd3: conn( StandAlone -> Unconnected ) block drbd3: Starting receiver thread (from drbd3_worker [2936]) block drbd3: receiver (re)started block drbd3: conn( Unconnected -> WFConnection ) DRBD0 connect r2 block drbd3: Handshake successful: Agreed network protocol version 91 block drbd3: conn( WFConnection -> WFReportParams ) block drbd3: Starting asender thread (from drbd3_receiver [6854]) block drbd3: data-integrity-alg: <not-used> block drbd3: drbd_sync_handshake: block drbd3: self 0AA964F1A2B09568:5582778FF39681F5:956F8322C0AB90FB:4C3D617FEA1E0097 bits:3 flags:0 block drbd3: peer B32E410E11613899:5582778FF39681F5:956F8322C0AB90FA:4C3D617FEA1E0097 bits:1283 flags:0 block drbd3: uuid_compare()=100 by rule 90 block drbd3: Split-Brain detected, manually solved. Sync from peer node block drbd3: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) block drbd3: conn( WFBitMapT -> WFSyncUUID ) block drbd3: helper command: /sbin/drbdadm before-resync-target minor-3 block drbd3: helper command: /sbin/drbdadm before-resync-target minor-3 exit code 0 (0x0) block drbd3: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) block drbd3: Began resync as SyncTarget (will sync 5132 KB [1283 bits set]). block drbd3: Resync done (total 3 sec; paused 0 sec; 1708 K/sec) block drbd3: 6 % had equal check sums, eliminated: 352K; transferred 4780K total 5132K block drbd3: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) block drbd3: helper command: /sbin/drbdadm after-resync-target minor-3 block drbd3: helper command: /sbin/drbdadm after-resync-target minor-3 exit code 0 (0x0) DRBD1 primary r2 block drbd3: role( Secondary -> Primary ) DRBD1 iscsitarget start iSCSI Enterprise Target Software - version 1.4.20 iscsi_trgt: Registered io type fileio iscsi_trgt: Registered io type blockio iscsi_trgt: Registered io type nullio All is back to normal. CONFIGURATION FILES: /etc/iet/ietd.conf Target iqn.2010-03.com.visioncomm.Storage00:Storage00 Lun 0 Path=/dev/drbd1,Type=blockio,ScsiSN=SPIDSK-090311-00 Lun 1 Path=/dev/drbd2,Type=blockio,ScsiSN=SPIDSK-090312-00 Lun 2 Path=/dev/drbd3,Type=blockio,ScsiSN=SPIDSK-090319-00 /etc/drbd.d/global_common.conf global { usage-count yes; } common { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; } startup { } disk { } net { allow-two-primaries; } syncer { csums-alg md5; rate 25M; verify-alg md5; } } /etc/drbd.d/r0.res resource r0 { startup { become-primary-on both; } device /dev/drbd1; disk /dev/sdb; meta-disk internal; on Storage00 { address 172.30.0.40:7789; } on Storage01 { address 172.30.0.41:7789; } } /etc/drbd.d/r1.res resource r1 { startup { become-primary-on both; } device /dev/drbd2; disk /dev/sdc; meta-disk internal; on Storage00 { address 172.30.0.40:7790; } on Storage01 { address 172.30.0.41:7790; } } /etc/drbd.d/r2.res resource r2 { startup { become-primary-on both; } device /dev/drbd3; disk /dev/sdd; meta-disk internal; on Storage00 { address 172.30.0.40:7791; } on Storage01 { address 172.30.0.41:7791; } }