[DRBD-user] stuck in WFBitMapS / WFBitMapT

alex at crackpot.org alex at crackpot.org
Tue May 6 01:09:17 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On a test cluster, I was trying to tune drbd.conf.  Entered a very  
large value for snfbuf-size (1024).  After 30 min, command had still  
not completed, though the file being written hadn't been updated in 27  
min, and was the desired size.  (I used dd to create a 1GB file, and  
the test file was 1GB.)

23 was primary, 22 was secondary.

The manual says anything larger than 1M may cause problems, and in my  
case it seems clear this is too large.  The trouble now is I cannot  
get my cluster usable again.

I edited drbd.conf on both nodes to restore the previous sndbuf-size  
value (128).  Was unable to make this take effect on the current  
primary.  (Very sorry now, did not note down the exact error.   
Something like 'took more than 5 seconds to complete'.)

I was unable to shut 23 down cleanly.  'shutdown' noted 'system going  
down for reboot' in the syslog, and did nothing after that.  Forcibly  
cycled the power.

I have rebooted both nodes.  The current primary is 22 (took over when  
23 rebooted).  I have been unable to get them to sync now, even after  
invalidating the entire device on 23.  They are connected, but not  
getting past the 'waiting for bit map' stage.  Seems the bitmap is  
messed up in some respect.  I'm really unsure at this point how to  
resolve this.  Any help is appreciated.

alex

May  5 15:25:21 dellpe2950-23 kernel: drbd0: short sent ReportState  
size=12 sent=0
May  5 15:25:21 dellpe2950-23 kernel: drbd0: asender terminated
May  5 15:25:21 dellpe2950-23 kernel: drbd0: Terminating asender thread
May  5 15:25:21 dellpe2950-23 kernel: drbd0: tl_clear()
May  5 15:25:21 dellpe2950-23 kernel: drbd0: Connection closed
May  5 15:25:21 dellpe2950-23 kernel: drbd0: conn( Timeout -> Unconnected )
May  5 15:25:21 dellpe2950-23 kernel: drbd0: receiver terminated
May  5 15:25:21 dellpe2950-23 kernel: drbd0: receiver (re)started
May  5 15:25:21 dellpe2950-23 kernel: drbd0: conn( Unconnected ->  
WFConnection )
May  5 15:25:22 dellpe2950-23 kernel: drbd0: Handshake successful:  
DRBD Network Protocol version 86
May  5 15:25:22 dellpe2950-23 kernel: drbd0: conn( WFConnection ->  
WFReportParams )
May  5 15:25:22 dellpe2950-23 kernel: drbd0: Starting asender thread  
(from drbd0_receiver [6259])
May  5 15:25:28 dellpe2950-23 kernel: drbd0: conn( WFReportParams -> Timeout )
May  5 15:25:28 dellpe2950-23 kernel: drbd0: short sent ReportSizes  
size=40 sent=0
May  5 15:25:34 dellpe2950-23 kernel: drbd0: short sent ReportUUIDs  
size=56 sent=0
May  5 15:25:40 dellpe2950-23 kernel: drbd0: short sent ReportState  
size=12 sent=0


May  5 15:27:20 dellpe2950-23 kernel: drbd0: State change failed: Can  
not start resync since it is already active
May  5 15:27:20 dellpe2950-23 kernel: drbd0:   state = { cs:WFBitMapT  
st:Secondary/Primary ds:UpToDate/UpToDate r--- }
May  5 15:27:20 dellpe2950-23 kernel: drbd0:  wanted = {  
cs:StartingSyncT st:Secondary/Primary ds:Inconsistent/UpToDate r--- }
May  5 15:28:05 dellpe2950-23 kernel: drbd0: peer( Primary -> Unknown  
) conn( WFBitMapT -> Disconnecting ) pdsk( UpToDate -> DUnknown )
May  5 15:28:05 dellpe2950-23 kernel: drbd0: error receiving  
ReportBitMap, l: 4088!
May  5 15:28:05 dellpe2950-23 kernel: drbd0: asender terminated
May  5 15:28:05 dellpe2950-23 kernel: drbd0: Terminating asender thread
May  5 15:28:05 dellpe2950-23 kernel: drbd0: Writing meta data super  
block now.
May  5 15:28:05 dellpe2950-23 kernel: drbd0: tl_clear()
May  5 15:28:05 dellpe2950-23 kernel: drbd0: Connection closed
May  5 15:28:05 dellpe2950-23 kernel: drbd0: conn( Disconnecting ->  
StandAlone )
May  5 15:28:05 dellpe2950-23 kernel: drbd0: receiver terminated
May  5 15:28:05 dellpe2950-23 kernel: drbd0: Terminating receiver thread


May  5 15:28:21 dellpe2950-23 kernel: drbd0: conn( StandAlone -> Unconnected )
May  5 15:28:21 dellpe2950-23 kernel: drbd0: Starting receiver thread  
(from drbd0_worker [4416])
May  5 15:28:21 dellpe2950-23 kernel: drbd0: receiver (re)started
May  5 15:28:21 dellpe2950-23 kernel: drbd0: conn( Unconnected ->  
WFConnection )
May  5 15:28:21 dellpe2950-23 kernel: drbd0: Handshake successful:  
DRBD Network Protocol version 86
May  5 15:28:21 dellpe2950-23 kernel: drbd0: conn( WFConnection ->  
WFReportParams )
May  5 15:28:21 dellpe2950-23 kernel: drbd0: Starting asender thread  
(from drbd0_receiver [6301])
May  5 15:28:22 dellpe2950-23 kernel: drbd0: Split-Brain detected, aborting!
May  5 15:28:22 dellpe2950-23 kernel: drbd0: self  
99D56CF91187B3F4:8C1668A9CCF498F1:150E86C1B532DE51:FBA773E22A805495
May  5 15:28:22 dellpe2950-23 kernel: drbd0: peer  
C21D5DCBDE372E53:8C1668A9CCF498F0:150E86C1B532DE50:FBA773E22A805495
May  5 15:28:22 dellpe2950-23 kernel: drbd0: helper command:  
/sbin/drbdadm split-brain
May  5 15:28:22 dellpe2950-23 kernel: drbd0: conn( WFReportParams ->  
Disconnecting )
May  5 15:28:22 dellpe2950-23 kernel: drbd0: error receiving  
ReportState, l: 4!
May  5 15:28:22 dellpe2950-23 kernel: drbd0: asender terminated
May  5 15:28:22 dellpe2950-23 kernel: drbd0: Terminating asender thread
May  5 15:28:22 dellpe2950-23 kernel: drbd0: tl_clear()
May  5 15:28:22 dellpe2950-23 kernel: drbd0: Connection closed
May  5 15:28:22 dellpe2950-23 kernel: drbd0: conn( Disconnecting ->  
StandAlone )
May  5 15:28:22 dellpe2950-23 kernel: drbd0: receiver terminated
May  5 15:28:22 dellpe2950-23 kernel: drbd0: Terminating receiver thread
May  5 15:28:57 dellpe2950-23 kernel: drbd0: disk( UpToDate -> Inconsistent )
May  5 15:28:57 dellpe2950-23 kernel: drbd0: Queueing bitmap io:  
invalidate forced full sync
May  5 15:28:57 dellpe2950-23 kernel: drbd0: Writing meta data super  
block now.
May  5 15:28:57 dellpe2950-23 kernel: drbd0: Writing meta data super  
block now.
May  5 15:28:57 dellpe2950-23 kernel: drbd0: writing of bitmap took 13 jiffies
May  5 15:28:57 dellpe2950-23 kernel: drbd0: 259 GB (67774141 bits)  
marked out-of-sync by on disk bit-map.
May  5 15:28:57 dellpe2950-23 kernel: drbd0: Writing meta data super  
block now.
May  5 15:29:07 dellpe2950-23 kernel: drbd0: conn( StandAlone -> Unconnected )
May  5 15:29:07 dellpe2950-23 kernel: drbd0: Starting receiver thread  
(from drbd0_worker [4416])
May  5 15:29:07 dellpe2950-23 kernel: drbd0: receiver (re)started
May  5 15:29:07 dellpe2950-23 kernel: drbd0: conn( Unconnected ->  
WFConnection )
May  5 15:29:07 dellpe2950-23 kernel: drbd0: Handshake successful:  
DRBD Network Protocol version 86
May  5 15:29:07 dellpe2950-23 kernel: drbd0: conn( WFConnection ->  
WFReportParams )
May  5 15:29:07 dellpe2950-23 kernel: drbd0: Starting asender thread  
(from drbd0_receiver [6321])
May  5 15:29:08 dellpe2950-23 kernel: drbd0: Becoming sync target due  
to disk states.
May  5 15:29:08 dellpe2950-23 kernel: drbd0: peer( Unknown -> Primary  
) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
May  5 15:29:08 dellpe2950-23 kernel: drbd0: Writing meta data super  
block now.

[root at dellpe2950-23]# cat /etc/drbd.conf
resource drbd-resource-0 {
   protocol C;
   startup {
     degr-wfc-timeout 5;
   }

   net {
     #on-disconnect reconnect;
     after-sb-0pri disconnect;
     after-sb-1pri disconnect;
     max-buffers 4096;
     unplug-watermark 128;
     sndbuf-size 128;
   }

   disk {
     on-io-error detach;
   }

   syncer {
     rate 12M;
     al-extents 577;
   }

   on dellpe2950-22 {
     device /dev/drbd0;
     disk   /dev/sda7; # db partition
     address 10.99.210.33:7789; # Private subnet IP
     meta-disk internal;
   }

   on dellpe2950-23 {
     device /dev/drbd0;
     disk   /dev/sda7;   # db partition
     address 10.99.210.34:7789;  # Private subnet IP
     meta-disk internal;
   }
}



More information about the drbd-user mailing list