[DRBD-user] cs:WFBitMapT/cs:WFBitMapS status remains, no sync

Ger Apeldoorn ger.apeldoorn at sara.nl
Wed May 6 11:37:13 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

Problem:
    cs:WFBitMapT/cs:WFBitMapS status remains after changes in network. 
(Not syncing) (More detailed description below)

Environment:
    Packages:
        drbd82-8.2.6-1.el5.centos
        kmod-drbd82-8.2.6-2  

    OS:
        Linux node01 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 
x86_64 x86_64 x86_64 GNU/Linux

    /proc/drbd:
       Primary:
--------------------%<--------------------
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by 
buildsvn at c5-x8664-build, 2008-10-03 11:30:17
 0: cs:WFBitMapS st:Secondary/Secondary ds:UpToDate/Outdated C r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 oos:380
--------------------%<--------------------
       Secondary:
--------------------%<--------------------
version: 8.2.6 (api:88/proto:86-88)
GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by 
buildsvn at c5-x8664-build, 2008-10-03 11:30:17
 0: cs:WFBitMapT st:Secondary/Secondary ds:Outdated/UpToDate C r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 oos:72
--------------------%<--------------------

    drbdadm get-gi all
       Primary:
--------------------%<--------------------
A6DAADF5FB98B6D0:95905CADE9DCFB23:3CF9226F2C03C538:0000000000000004:1:1:0:1:0:1
--------------------%<--------------------
       Secondary:
--------------------%<--------------------
CDCCE6A3D5BA3FFC:95905CADE9DCFB23:3CF9226F2C03C538:0000000000000004:1:0:0:1:0:0
--------------------%<--------------------

    tail -n /var/log/messages | grep -i drbd
       Primary:
--------------------%<--------------------
May  6 10:36:22 node01 kernel: drbd0: Split-Brain detected, dropping 
connection!
May  6 10:36:22 node01 kernel: drbd0: self 
A6DAADF5FB98B6D0:95905CADE9DCFB23:3CF9226F2C03C538:0000000000000004
May  6 10:36:22 node01 kernel: drbd0: peer 
CDCCE6A3D5BA3FFC:95905CADE9DCFB23:3CF9226F2C03C538:0000000000000004
May  6 10:36:22 node01 kernel: drbd0: helper command: /sbin/drbdadm 
split-brain
May  6 10:36:22 node01 kernel: drbd0: conn( WFReportParams -> 
Disconnecting )
May  6 10:36:22 node01 kernel: drbd0: error receiving ReportState, l: 4!
May  6 10:36:22 node01 kernel: drbd0: asender terminated
May  6 10:36:22 node01 kernel: drbd0: Terminating asender thread
May  6 10:36:22 node01 kernel: drbd0: tl_clear()
May  6 10:36:22 node01 kernel: drbd0: Connection closed
May  6 10:36:22 node01 kernel: drbd0: conn( Disconnecting -> StandAlone )
May  6 10:36:22 node01 kernel: drbd0: receiver terminated
May  6 10:36:22 node01 kernel: drbd0: Terminating receiver thread
May  6 10:40:43 node01 kernel: drbd0: conn( StandAlone -> Unconnected )
May  6 10:40:43 node01 kernel: drbd0: Starting receiver thread (from 
drbd0_worker [16858])
May  6 10:40:43 node01 kernel: drbd0: receiver (re)started
May  6 10:40:43 node01 kernel: drbd0: conn( Unconnected -> WFConnection )
May  6 10:40:53 node01 kernel: drbd0: Handshake successful: Agreed 
network protocol version 88
May  6 10:40:53 node01 kernel: drbd0: conn( WFConnection -> 
WFReportParams )
May  6 10:40:53 node01 kernel: drbd0: Starting asender thread (from 
drbd0_receiver [17038])
May  6 10:40:53 node01 kernel: drbd0: data-integrity-alg: <not-used>
May  6 10:40:53 node01 kernel: drbd0: Split-Brain detected, manually 
solved. Sync from this node
May  6 10:40:53 node01 kernel: drbd0: peer( Unknown -> Secondary ) conn( 
WFReportParams -> WFBitMapS )
May  6 10:40:53 node01 kernel: drbd0: Writing meta data super block now.
May  6 10:41:05 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967295
May  6 10:41:11 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967294
May  6 10:41:17 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967293
May  6 10:41:23 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967292
May  6 10:41:29 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967291
May  6 10:41:35 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967290
May  6 10:41:41 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967289
May  6 10:41:47 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967288
May  6 10:41:53 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967287
May  6 10:41:59 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967286
May  6 10:42:05 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967285
May  6 10:42:11 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967284
May  6 10:42:17 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967283
May  6 10:42:23 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967282
May  6 10:42:29 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967281
May  6 10:42:35 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967280
May  6 10:42:41 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967279
May  6 10:42:47 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967278
May  6 10:42:53 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967277
May  6 10:42:59 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967276
May  6 10:43:05 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967275
May  6 10:43:11 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967274
May  6 10:43:17 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967273
May  6 10:43:23 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967272
May  6 10:43:29 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967271
May  6 10:43:35 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967270
May  6 10:43:41 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967269
May  6 10:43:47 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967268
May  6 10:43:53 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967267
May  6 10:43:59 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967266
May  6 10:44:05 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967265
May  6 10:44:11 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967264
May  6 10:44:17 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967263
May  6 10:44:23 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967262
May  6 10:44:29 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967261
May  6 10:44:35 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967260
May  6 10:44:41 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967259
May  6 10:44:47 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967258
May  6 10:44:53 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967257
May  6 10:44:59 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967256
May  6 10:45:05 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967255
May  6 10:45:11 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967254
May  6 10:45:17 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967253
May  6 10:45:23 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967252
May  6 10:45:29 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967251
May  6 10:45:35 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967250
May  6 10:45:41 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967249
May  6 10:45:47 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967248
May  6 10:45:53 node01 kernel: drbd0: [drbd0_worker/16858] sock_sendmsg 
time expired, ko = 4294967247
--------------------%<--------------------

       Secondary:
--------------------%<--------------------
May  6 10:36:22 node02 kernel: drbd0: Split-Brain detected, dropping 
connection!
May  6 10:36:22 node02 kernel: drbd0: self 
CDCCE6A3D5BA3FFC:95905CADE9DCFB23:3CF9226F2C03C538:0000000000000004
May  6 10:36:22 node02 kernel: drbd0: peer 
A6DAADF5FB98B6D0:95905CADE9DCFB23:3CF9226F2C03C538:0000000000000004
May  6 10:36:22 node02 kernel: drbd0: helper command: /sbin/drbdadm 
split-brain
May  6 10:36:22 node02 kernel: drbd0: conn( WFReportParams -> 
Disconnecting )
May  6 10:36:22 node02 kernel: drbd0: error receiving ReportState, l: 4!
May  6 10:36:22 node02 kernel: drbd0: asender terminated
May  6 10:36:22 node02 kernel: drbd0: Terminating asender thread
May  6 10:36:22 node02 kernel: drbd0: tl_clear()
May  6 10:36:22 node02 kernel: drbd0: Connection closed
May  6 10:36:22 node02 kernel: drbd0: conn( Disconnecting -> StandAlone )
May  6 10:36:22 node02 kernel: drbd0: receiver terminated
May  6 10:36:22 node02 kernel: drbd0: Terminating receiver thread
May  6 10:40:53 node02 kernel: drbd0: conn( StandAlone -> Unconnected )
May  6 10:40:53 node02 kernel: drbd0: Starting receiver thread (from 
drbd0_worker [13338])
May  6 10:40:53 node02 kernel: drbd0: receiver (re)started
May  6 10:40:53 node02 kernel: drbd0: conn( Unconnected -> WFConnection )
May  6 10:40:53 node02 kernel: drbd0: Handshake successful: Agreed 
network protocol version 88
May  6 10:40:53 node02 kernel: drbd0: conn( WFConnection -> 
WFReportParams )
May  6 10:40:53 node02 kernel: drbd0: Starting asender thread (from 
drbd0_receiver [14123])
May  6 10:40:53 node02 kernel: drbd0: data-integrity-alg: <not-used>
May  6 10:40:53 node02 kernel: drbd0: Split-Brain detected, manually 
solved. Sync from peer node
May  6 10:40:53 node02 kernel: drbd0: peer( Unknown -> Secondary ) conn( 
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
May  6 10:40:53 node02 kernel: drbd0: Writing meta data super block now.
--------------------%<--------------------

   Configuration:
--------------------%<--------------------
# /usr/share/doc/drbd82/drbd.conf
#
common {
        syncer { rate 100M; }
        protocol C;
}

resource drbd0 {
        on node01 {
                device /dev/drbd0;
                disk /dev/datavg/drbdlv;
                meta-disk internal;
                address 192.168.1.1:8888;
        }

        on node02 {
                device /dev/drbd0;
                disk /dev/datavg/drbdlv;
                meta-disk internal;
                address 192.168.1.2:8888;
        }
}
--------------------%<--------------------


It was working properly whilst on a test-network. Now that it is 
connected to the production network, the nodes can see eachother, but 
syncing does not work anymore.
No firewalls are in the way.
I use a seperate connection (over eth1) for the drbd sync.

I try to get rid of the split-brain by issuing: (as can be seen in the logs)
--------------------%<--------------------
drbdadm -- --discard-my-data connect drbd0
--------------------%<--------------------

How do I get out of this state and get the sync up and running again?

Thank you very much for your help!

Ger Apeldoorn




More information about the drbd-user mailing list