[DRBD-user] Problem with StandAlone

venkatesh prabhu vikieethechip at gmail.com
Thu Feb 23 12:13:24 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,
I configured a two node cluster and able to mirror them.
But facing issue when testing some scenarios.

Steps i followed and end up in standalone mode is
1. Shutdowm the primary node <N1>.
2. make the other node<N2> primary - make some changes in file system.
3. shutdown the N2.
4. start the N1, make it primary , do some changes in file system and
reboot N1 and start N2.
5. when they come up N1 state is
     0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r-----
6. The state of N2 is
     0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
But i have added auto recovery split brain algorithm in my config
file. Please see the config file at the end.
Please let me know how to automatically solve this standalone situation.

Logs from N2:
kernel: drbd: initialized. Version: 8.3.12 (api:88/proto:86-96)
Feb 23 16:03:19 lab1602 kernel: drbd: GIT-hash:
e2a8ef4656be026bbae540305fcb998a5991090f build by dag at Build32R6,
2011-11-20 10:55:07
Feb 23 16:03:19 lab1602 kernel: drbd: registered as block device major 147
Feb 23 16:03:19 lab1602 kernel: drbd: minor_table @ 0xf40ac700
Feb 23 16:03:19 lab1602 kernel: block drbd0: Starting worker thread
(from cqueue [1538])
Feb 23 16:03:19 lab1602 kernel: block drbd0: disk( Diskless -> Attaching )
Feb 23 16:03:19 lab1602 kernel: block drbd0: Found 4 transactions (6
active extents) in activity log.
Feb 23 16:03:19 lab1602 kernel: block drbd0: Method to ensure write
ordering: drain
Feb 23 16:03:19 lab1602 kernel: block drbd0: max BIO size = 131072
Feb 23 16:03:19 lab1602 kernel: block drbd0: drbd_bm_resize called
with capacity == 2097152
Feb 23 16:03:19 lab1602 kernel: block drbd0: resync bitmap:
bits=262144 words=8192 pages=8
Feb 23 16:03:19 lab1602 kernel: block drbd0: size = 1024 MB (1048576 KB)
Feb 23 16:03:19 lab1602 kernel: block drbd0: bitmap READ of 8 pages
took 1 jiffies
Feb 23 16:03:19 lab1602 kernel: block drbd0: recounting of set bits
took additional 0 jiffies
Feb 23 16:03:19 lab1602 kernel: block drbd0: 0 KB (0 bits) marked
out-of-sync by on disk bit-map.
Feb 23 16:03:19 lab1602 kernel: block drbd0: Marked additional 12 MB
as out-of-sync based on AL.
Feb 23 16:03:19 lab1602 kernel: block drbd0: bitmap WRITE of 0 pages
took 0 jiffies
Feb 23 16:03:19 lab1602 kernel: block drbd0: 12 MB (3072 bits) marked
out-of-sync by on disk bit-map.
Feb 23 16:03:19 lab1602 kernel: block drbd0: disk( Attaching -> UpToDate )
Feb 23 16:03:19 lab1602 kernel: block drbd0: attached to UUIDs
1CAAF7BF141F6FAB:77403D46BDEFCA05:022A37BAC8B2DC72:022937BAC8B2DC72
Feb 23 16:03:19 lab1602 kernel: block drbd0: conn( StandAlone -> Unconnected )
Feb 23 16:03:19 lab1602 kernel: block drbd0: Starting receiver thread
(from drbd0_worker [1547])
Feb 23 16:03:19 lab1602 kernel: block drbd0: receiver (re)started
Feb 23 16:03:19 lab1602 kernel: block drbd0: conn( Unconnected -> WFConnection )
Feb 23 16:03:37 lab1602 kernel: block drbd0: Handshake successful:
Agreed network protocol version 96
Feb 23 16:03:37 lab1602 kernel: block drbd0: Peer authenticated using
20 bytes of 'sha1' HMAC
Feb 23 16:03:37 lab1602 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Feb 23 16:03:37 lab1602 kernel: block drbd0: Starting asender thread
(from drbd0_receiver [1565])
Feb 23 16:03:37 lab1602 kernel: block drbd0: data-integrity-alg: <not-used>
Feb 23 16:03:37 lab1602 kernel: block drbd0: drbd_sync_handshake:
Feb 23 16:03:37 lab1602 kernel: block drbd0: self
1CAAF7BF141F6FAB:77403D46BDEFCA05:022A37BAC8B2DC72:022937BAC8B2DC72
bits:3072 flags:0
Feb 23 16:03:37 lab1602 kernel: block drbd0: peer
1C7B811C8FAD3496:77403D46BDEFCA04:022A37BAC8B2DC72:022937BAC8B2DC72
bits:25 flags:0
Feb 23 16:03:37 lab1602 kernel: block drbd0: uuid_compare()=100 by rule 90
Feb 23 16:03:37 lab1602 kernel: block drbd0: helper command:
/sbin/drbdadm initial-split-brain minor-0
Feb 23 16:03:37 lab1602 kernel: block drbd0: helper command:
/sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
Feb 23 16:03:37 lab1602 kernel: block drbd0: State change failed:
Device is held open by someone
Feb 23 16:03:37 lab1602 kernel: block drbd0:   state = {
cs:WFReportParams ro:Primary/Unknown ds:UpToDate/DUnknown r----- }
Feb 23 16:03:37 lab1602 kernel: block drbd0:  wanted = {
cs:WFReportParams ro:Secondary/Unknown ds:UpToDate/DUnknown r----- }
Feb 23 16:03:37 lab1602 kernel: block drbd0: State change failed:
Device is held open by someone
Feb 23 16:03:37 lab1602 kernel: block drbd0:   state = {
cs:WFReportParams ro:Primary/Unknown ds:UpToDate/DUnknown r----- }
Feb 23 16:03:37 lab1602 kernel: block drbd0:  wanted = {
cs:WFReportParams ro:Secondary/Unknown ds:UpToDate/DUnknown r----- }
Feb 23 16:03:37 lab1602 kernel: block drbd0: helper command:
/sbin/drbdadm pri-lost-after-sb minor-0
Feb 23 16:03:37 lab1602 kernel: block drbd0: helper command:
/sbin/drbdadm pri-lost-after-sb minor-0 exit code 0 (0x0)
Feb 23 16:03:37 lab1602 kernel: block drbd0: Split-Brain detected but
unresolved, dropping connection!
Feb 23 16:03:37 lab1602 kernel: block drbd0: helper command:
/sbin/drbdadm split-brain minor-0
Feb 23 16:03:37 lab1602 kernel: block drbd0: helper command:
/sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Feb 23 16:03:37 lab1602 kernel: block drbd0: conn( WFReportParams ->
Disconnecting )
Feb 23 16:03:37 lab1602 kernel: block drbd0: error receiving ReportState, l: 4!
Feb 23 16:03:37 lab1602 kernel: block drbd0: asender terminated
Feb 23 16:03:37 lab1602 kernel: block drbd0: Terminating asender thread
Feb 23 16:03:37 lab1602 kernel: block drbd0: Connection closed
Feb 23 16:03:37 lab1602 kernel: block drbd0: conn( Disconnecting -> StandAlone )
Feb 23 16:03:37 lab1602 kernel: block drbd0: receiver terminated
Feb 23 16:03:37 lab1602 kernel: block drbd0: Terminating receiver thread




Logs From N1:

Feb 23 16:03:54 lab1601 kernel: drbd: initialized. Version: 8.3.12
(api:88/proto:86-96)
Feb 23 16:03:54 lab1601 kernel: drbd: GIT-hash:
e2a8ef4656be026bbae540305fcb998a5991090f build by dag at Build32R6,
2011-11-20 10:55:07
Feb 23 16:03:54 lab1601 kernel: drbd: registered as block device major 147
Feb 23 16:03:54 lab1601 kernel: drbd: minor_table @ 0xf3d3e300
Feb 23 16:03:54 lab1601 kernel: block drbd0: Starting worker thread
(from cqueue [1537])
Feb 23 16:03:54 lab1601 kernel: block drbd0: disk( Diskless -> Attaching )
Feb 23 16:03:54 lab1601 kernel: block drbd0: Found 4 transactions (6
active extents) in activity log.
Feb 23 16:03:54 lab1601 kernel: block drbd0: Method to ensure write
ordering: drain
Feb 23 16:03:54 lab1601 kernel: block drbd0: max BIO size = 131072
Feb 23 16:03:54 lab1601 kernel: block drbd0: drbd_bm_resize called
with capacity == 2097152
Feb 23 16:03:54 lab1601 kernel: block drbd0: resync bitmap:
bits=262144 words=8192 pages=8
Feb 23 16:03:54 lab1601 kernel: block drbd0: size = 1024 MB (1048576 KB)
Feb 23 16:03:54 lab1601 kernel: block drbd0: bitmap READ of 8 pages
took 2 jiffies
Feb 23 16:03:54 lab1601 kernel: block drbd0: recounting of set bits
took additional 0 jiffies
Feb 23 16:03:54 lab1601 kernel: block drbd0: 100 KB (25 bits) marked
out-of-sync by on disk bit-map.
Feb 23 16:03:54 lab1601 kernel: block drbd0: disk( Attaching -> UpToDate )
Feb 23 16:03:54 lab1601 kernel: block drbd0: attached to UUIDs
1C7B811C8FAD3497:77403D46BDEFCA04:022A37BAC8B2DC72:022937BAC8B2DC72
Feb 23 16:03:54 lab1601 kernel: block drbd0: conn( StandAlone -> Unconnected )
Feb 23 16:03:54 lab1601 kernel: block drbd0: Starting receiver thread
(from drbd0_worker [1547])
Feb 23 16:03:54 lab1601 kernel: block drbd0: receiver (re)started
Feb 23 16:03:54 lab1601 kernel: block drbd0: conn( Unconnected -> WFConnection )
Feb 23 16:03:55 lab1601 kernel: block drbd0: Handshake successful:
Agreed network protocol version 96
Feb 23 16:03:55 lab1601 kernel: block drbd0: Peer authenticated using
20 bytes of 'sha1' HMAC
Feb 23 16:03:55 lab1601 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Feb 23 16:03:55 lab1601 kernel: block drbd0: Starting asender thread
(from drbd0_receiver [1564])
Feb 23 16:03:55 lab1601 kernel: block drbd0: data-integrity-alg: <not-used>
Feb 23 16:03:55 lab1601 kernel: block drbd0: drbd_sync_handshake:
Feb 23 16:03:55 lab1601 kernel: block drbd0: self
1C7B811C8FAD3496:77403D46BDEFCA04:022A37BAC8B2DC72:022937BAC8B2DC72
bits:25 flags:0
Feb 23 16:03:55 lab1601 kernel: block drbd0: peer
1CAAF7BF141F6FAB:77403D46BDEFCA05:022A37BAC8B2DC72:022937BAC8B2DC72
bits:3072 flags:2
Feb 23 16:03:55 lab1601 kernel: block drbd0: uuid_compare()=100 by rule 90
Feb 23 16:03:55 lab1601 kernel: block drbd0: helper command:
/sbin/drbdadm initial-split-brain minor-0
Feb 23 16:03:55 lab1601 kernel: block drbd0: helper command:
/sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
Feb 23 16:03:55 lab1601 kernel: block drbd0: Split-Brain detected, 1
primaries, automatically solved. Sync from this node
Feb 23 16:03:55 lab1601 kernel: block drbd0: peer( Unknown -> Primary
) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
Feb 23 16:03:55 lab1601 kernel: block drbd0: sock was shut down by peer
Feb 23 16:03:55 lab1601 kernel: block drbd0: peer( Primary -> Unknown
) conn( WFBitMapS -> BrokenPipe ) pdsk( Consistent -> DUnknown )
Feb 23 16:03:55 lab1601 kernel: block drbd0: short read expecting
header on sock: r=0
Feb 23 16:03:55 lab1601 kernel: block drbd0: meta connection shut down by peer.
Feb 23 16:03:55 lab1601 kernel: block drbd0: asender terminated
Feb 23 16:03:55 lab1601 kernel: block drbd0: Terminating asender thread
Feb 23 16:03:55 lab1601 kernel: block drbd0: bitmap WRITE of 0 pages
took 0 jiffies
Feb 23 16:03:55 lab1601 kernel: block drbd0: 100 KB (25 bits) marked
out-of-sync by on disk bit-map.
Feb 23 16:03:55 lab1601 kernel: block drbd0: Connection closed
Feb 23 16:03:55 lab1601 kernel: block drbd0: conn( BrokenPipe -> Unconnected )
Feb 23 16:03:55 lab1601 kernel: block drbd0: receiver terminated
Feb 23 16:03:55 lab1601 kernel: block drbd0: Restarting receiver thread
Feb 23 16:03:55 lab1601 kernel: block drbd0: receiver (re)started
Feb 23 16:03:55 lab1601 kernel: block drbd0: conn( Unconnected -> WFConnection )

-- 
global {
                 usage-count no;
      }

      common {

                protocol C;

                startup {
                        degr-wfc-timeout 3;    #3 sec..

                        wfc-timeout 120;    # 3 min.

                #       become-primary-on both;
                } # end of startup

                handlers {
                } # end of handlers

       disk {
                on-io-error  detach;
                no-disk-barrier;
                no-disk-flushes;
                no-md-flushes;
       } # end of disk

       net {
                 timeout       60;    #  6 seconds  (unit = 0.1 seconds)
                 connect-int   10;    # 10 seconds  (unit = 1 second)
                 ping-int      10;    # 10 seconds  (unit = 1 second)
                 ping-timeout   5;    # 500 ms (unit = 0.1 seconds)
                cram-hmac-alg sha1;
                 shared-secret "DRBD disk mirroring for BTP-R";
                after-sb-0pri discard-older-primary;
                #after-sb-1pri  discard-secondary;
                #after-sb-2pri violently-as0p;
                after-sb-1pri   call-pri-lost-after-sb;
                after-sb-2pri   call-pri-lost-after-sb;
                rr-conflict     call-pri-lost;

       } # end of net
     } # end of commo
resource mfs_drbd {

       syncer {
            rate 100M;
        }

        on lab1602 {
                device     /dev/drbd0;
                disk       /dev/vg0/mirror;
                address    10.203.230.136:7788;
                meta-disk  /dev/vg0/drbdmeta[0];
                #meta-disk      internal;
        }

        on lab1601 {
                device    /dev/drbd0;
                disk      /dev/vg0/mirror;
                address   10.203.230.135:7788;
                meta-disk /dev/vg0/drbdmeta[0];
                #meta-disk      internal;
       }
     } #end of resource sfs_drbd

Thanks in advance.

vengatesh



More information about the drbd-user mailing list