Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I have two nodes, with internal metadata. Simulating a network failure when one node has a filesystem on the drbd resource mounted, naturally gives me a split brain. However, im not able to recover from the split-brain in an easy manner. My question is why wont it sync when the secondary, node2, is outdated? My command and output history: [root at node1 ~]# modprobe drbd [root at node2 ~]# modprobe drbd [root at node1 ~]# drbdadm up all [root at node2 ~]# drbdadm up all [root at node1 ~]# drbdadm primary all [root at node2 ~]# drbdadm primary all [root at node1 ~]# cat /proc/drbd version: 8.0pre3 (api:82/proto:80) SVN Revision: 2169 build by root at node1.itinerary.com, 2006-06-29 14:42:49 0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 resync: used:0/7 hits:0 misses:0 starving:0 dirty:0 changed:0 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0 1: cs:Unconfigured [root at node1 ~]# mount /dev/drbdvg/test-root /mnt/ -text3 [root at node2 ~]# ifconfig eth0 down [root at node2 ~]# drbd0: PingAck did not arrive in time. drbd0: short read expecting header on sock: r=-512 [root at node2 ~]# cat /proc/drbd version: 8.0pre3 (api:82/proto:80) SVN Revision: 2169 build by root at node1.itinerary.com, 2006-06-29 14:42:49 0: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown r--- ns:0 nr:484 dw:484 dr:416 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 resync: used:0/7 hits:0 misses:0 starving:0 dirty:0 changed:0 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0 1: cs:Unconfigured [root at node2 ~]# ifconfig eth0 up [root at node2 ~]# drbd0: Split-Brain detected, dropping connection! drbd0: error receiving ReportState, l: 4! [root at node2 ~]# reboot (later) [root at node2 ~]# modprobe drbd [root at node2 ~]# drbdadm outdate all [root at node2 ~]# drbdadm up all drbd0: Backing device has merge_bvec_fn()! drbd0: No usable activity log found. [root at node2 ~]# cat /proc/drbd version: 8.0pre3 (api:82/proto:80) SVN Revision: 2169 build by root at node1.itinerary.com, 2006-06-29 14:42:49 0: cs:WFConnection st:Secondary/Unknown ds:Outdated/DUnknown r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 resync: used:0/7 hits:0 misses:0 starving:0 dirty:0 changed:0 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0 1: cs:Unconfigured [root at node1 ~]# drbdadm connect all [root at node2 ~]# drbd0: Split-Brain detected, dropping connection! drbd0: error receiving ReportState, l: 4! [root at node2 ~]# drbdadm down all [root at node2 ~]# drbdadm create-md all Valid meta-data already in place, recreate new? [need to type 'yes' to confirm] yes Creating meta data... initialising activity log initialising bitmap (3296 KB) 99%New drbd meta data block sucessfully created. [root at node2 ~]# drbdadm up all drbd0: No usable activity log found. [root at node2 ~]# cat /proc/drbd version: 8.0pre3 (api:82/proto:80) SVN Revision: 2169 build by root at node1.itinerary.com, 2006-06-29 14:42:49 0: cs:WFConnection st:Secondary/Unknown ds:Inconsistent/DUnknown r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 resync: used:0/7 hits:0 misses:0 starving:0 dirty:0 changed:0 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0 1: cs:Unconfigured [root at node1 ~]# drbdadm connect all [root at node2 ~]# cat /proc/drbd version: 8.0pre3 (api:82/proto:80) SVN Revision: 2169 build by root at node1.itinerary.com, 2006-06-29 14:42:49 0: cs:SyncTarget st:Secondary/Primary ds:Inconsistent/UpToDate r--- ns:0 nr:18193408 dw:18193376 dr:0 al:0 bm:1110 lo:9 pe:1351 ua:12 ap:0 [===>................] sync'ed: 16.9% (87656/105423)M finish: 0:14:15 speed: 104,896 (92,348) K/sec resync: used:4/7 hits:1137313 misses:1114 starving:0 dirty:0 changed:1114 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0 1: cs:Unconfigured So is create-md the only solution when in a split-brain? I can't invalidate node2 with drbd invalidate all since its not in connected state, and connecting it just prints split brain and disconnects the nodes. if it's of intrest, my conf looks like this: global { usage-count yes; } common { syncer { rate 200M; } } resource r0 { protocol C; handlers { pri-on-incon-degr "halt -f"; pri-lost-after-sb "halt -f"; outdate-peer "/usr/lib/drbd/outdate-peer.sh on amd 192.168.22.11 192.168.23.11 on alf 192.168.22.12 192.168.23.12"; } startup { degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error pass_on; } net { allow-two-primaries; cram-hmac-alg "sha1"; shared-secret "password"; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; } syncer { al-extents 257; } on node1.domain.com { device /dev/drbd0; disk /dev/md0; address 10.1.1.26:7788; flexible-meta-disk internal; } on node2.domain.com { device /dev/drbd0; disk /dev/md0; address 10.1.1.30:7788; meta-disk internal; } } Best regards, /Håkan