[DRBD-user] Slow sync

James Pifer jep at obrien-pifer.com
Wed Jul 28 20:46:53 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, 2010-07-28 at 17:29 +0200, Frederic Emmelmann wrote:
> Hi,
> 
> Try this on both sides:
> 
> drbdadm adjust data
> 
> also:
> 
> syncer rates are in MB/Sec so 1150Mb = 1.150Gb per sec. this is overhead.
> 
> 
> Greetz
> Frederic

Problem was with my NICs that I was using to create a direct connection.
I used iperf and found that connection was really poor. I used the
internal NICs and they successfully connect at gigabit. 

I got everything installed, switched to dual primary, and installed
ocfs2 like I wanted. Everything works great until I reboot. 

When drbd starts up it goes into split brain. Here are the logs on both
servers:

host1:
Jul 28 14:36:58 xenhost1 kernel: [ 1064.135566] drbd: initialized. Version: 8.3.7 (api:88/proto:86-91)
Jul 28 14:36:58 xenhost1 kernel: [ 1064.135570] drbd: GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by phil at fat-tyre, 2010-01-13 17:17:27
Jul 28 14:36:58 xenhost1 kernel: [ 1064.135574] drbd: registered as block device major 147
Jul 28 14:36:58 xenhost1 kernel: [ 1064.135577] drbd: minor_table @ 0xffff88007dfa10c0
Jul 28 14:37:29 xenhost1 kernel: [ 1095.791798] block drbd1: Starting worker thread (from cqueue [5498])
Jul 28 14:37:29 xenhost1 kernel: [ 1095.426400] block drbd1: disk( Diskless -> Attaching ) 
Jul 28 14:37:29 xenhost1 kernel: [ 1095.900498] block drbd1: Found 57 transactions (3080 active extents) in activity log.
Jul 28 14:37:29 xenhost1 kernel: [ 1095.900503] block drbd1: Method to ensure write ordering: drain
Jul 28 14:37:29 xenhost1 kernel: [ 1095.900508] block drbd1: max_segment_size ( = BIO size ) = 32768
Jul 28 14:37:29 xenhost1 kernel: [ 1095.900513] block drbd1: drbd_bm_resize called with capacity == 650082440
Jul 28 14:37:29 xenhost1 kernel: [ 1095.902473] block drbd1: resync bitmap: bits=81260305 words=1269693
Jul 28 14:37:29 xenhost1 kernel: [ 1095.902480] block drbd1: size = 310 GB (325041220 KB)
Jul 28 14:37:29 xenhost1 kernel: [ 1095.951341] block drbd1: recounting of set bits took additional 3 jiffies
Jul 28 14:37:29 xenhost1 kernel: [ 1095.951348] block drbd1: 12 GB (3110912 bits) marked out-of-sync by on disk bit-map.
Jul 28 14:37:29 xenhost1 kernel: [ 1095.951357] block drbd1: disk( Attaching -> UpToDate ) 
Jul 28 14:37:29 xenhost1 kernel: [ 1095.199721] block drbd1: conn( StandAlone -> Unconnected ) 
Jul 28 14:37:29 xenhost1 kernel: [ 1095.985824] block drbd1: Starting receiver thread (from drbd1_worker [6086])
Jul 28 14:37:29 xenhost1 kernel: [ 1095.199816] block drbd1: receiver (re)started
Jul 28 14:37:29 xenhost1 kernel: [ 1095.199823] block drbd1: conn( Unconnected -> WFConnection ) 
Jul 28 14:37:30 xenhost1 kernel: [ 1095.299912] block drbd1: Handshake successful: Agreed network protocol version 91
Jul 28 14:37:30 xenhost1 kernel: [ 1095.299921] block drbd1: conn( WFConnection -> WFReportParams ) 
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300029] block drbd1: Starting asender thread (from drbd1_receiver [6099])
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300259] block drbd1: data-integrity-alg: <not-used>
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300274] block drbd1: drbd_sync_handshake:
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300279] block drbd1: self D57628D842FD0424:C41E460BB976C3AB:A26D1EC8FBF252BC:AE658353ED7587BF bits:3110912 flags:0
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300285] block drbd1: peer F9387700F1203DA8:C41E460BB976C3AB:A26D1EC8FBF252BD:AE658353ED7587BF bits:3072 flags:2
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300290] block drbd1: uuid_compare()=100 by rule 90
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300293] block drbd1: Split-Brain detected, dropping connection!
Jul 28 14:37:30 xenhost1 kernel: [ 1095.300299] block drbd1: helper command: /sbin/drbdadm split-brain minor-1
Jul 28 14:37:30 xenhost1 kernel: [ 1095.305528] block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0)
Jul 28 14:37:30 xenhost1 kernel: [ 1095.305536] block drbd1: conn( WFReportParams -> Disconnecting ) 
Jul 28 14:37:30 xenhost1 kernel: [ 1095.305549] block drbd1: error receiving ReportState, l: 4!
Jul 28 14:37:30 xenhost1 kernel: [ 1095.785351] block drbd1: asender terminated
Jul 28 14:37:30 xenhost1 kernel: [ 1095.785362] block drbd1: Terminating asender thread
Jul 28 14:37:30 xenhost1 kernel: [ 1095.305905] block drbd1: Connection closed
Jul 28 14:37:30 xenhost1 kernel: [ 1095.305972] block drbd1: conn( Disconnecting -> StandAlone ) 
Jul 28 14:37:30 xenhost1 kernel: [ 1095.306016] block drbd1: receiver terminated
Jul 28 14:37:30 xenhost1 kernel: [ 1095.306020] block drbd1: Terminating receiver thread
Jul 28 14:37:30 xenhost1 kernel: [ 1096.007604] block drbd1: role( Secondary -> Primary ) 


host2:
Jul 28 14:37:34 xenhost2 kernel: [  942.863857] drbd: initialized. Version: 8.3.7 (api:88/proto:86-91)
Jul 28 14:37:34 xenhost2 kernel: [  942.863861] drbd: GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by phil at fat-tyre, 2010-01-13 17:17:27
Jul 28 14:37:34 xenhost2 kernel: [  942.863865] drbd: registered as block device major 147
Jul 28 14:37:34 xenhost2 kernel: [  942.863868] drbd: minor_table @ 0xffff88007d2d3280
Jul 28 14:37:34 xenhost2 kernel: [  943.307311] block drbd1: Starting worker thread (from cqueue [4481])
Jul 28 14:37:34 xenhost2 kernel: [  943.050659] block drbd1: disk( Diskless -> Attaching ) 
Jul 28 14:37:34 xenhost2 kernel: klogd 1.4.1, ---------- state change ----------
Jul 28 14:37:35 xenhost2 kernel: [  943.081457] block drbd1: Found 3 transactions (3 active extents) in activity log.
Jul 28 14:37:35 xenhost2 kernel: [  943.081462] block drbd1: Method to ensure write ordering: drain
Jul 28 14:37:35 xenhost2 kernel: [  943.081467] block drbd1: max_segment_size ( = BIO size ) = 32768
Jul 28 14:37:35 xenhost2 kernel: [  943.081471] block drbd1: drbd_bm_resize called with capacity == 650082440
Jul 28 14:37:35 xenhost2 kernel: [  943.083402] block drbd1: resync bitmap: bits=81260305 words=1269693
Jul 28 14:37:35 xenhost2 kernel: [  943.083408] block drbd1: size = 310 GB (325041220 KB)
Jul 28 14:37:35 xenhost2 kernel: [  943.183900] block drbd1: recounting of set bits took additional 3 jiffies
Jul 28 14:37:35 xenhost2 kernel: [  943.183907] block drbd1: 12 MB (3072 bits) marked out-of-sync by on disk bit-map.
Jul 28 14:37:35 xenhost2 kernel: [  943.183915] block drbd1: disk( Attaching -> UpToDate ) 
Jul 28 14:37:35 xenhost2 kernel: [  943.195879] block drbd1: conn( StandAlone -> Unconnected ) 
Jul 28 14:37:35 xenhost2 kernel: [  943.195913] block drbd1: Starting receiver thread (from drbd1_worker [5836])
Jul 28 14:37:35 xenhost2 kernel: [  943.024854] block drbd1: receiver (re)started
Jul 28 14:37:35 xenhost2 kernel: [  943.024861] block drbd1: conn( Unconnected -> WFConnection ) 
Jul 28 14:37:41 xenhost2 kernel: [  949.182843] block drbd1: Handshake successful: Agreed network protocol version 91
Jul 28 14:37:41 xenhost2 kernel: [  949.182853] block drbd1: conn( WFConnection -> WFReportParams ) 
Jul 28 14:37:41 xenhost2 kernel: [  949.182875] block drbd1: Starting asender thread (from drbd1_receiver [5849])
Jul 28 14:37:41 xenhost2 kernel: [  949.183555] block drbd1: data-integrity-alg: <not-used>
Jul 28 14:37:41 xenhost2 kernel: [  949.183574] block drbd1: drbd_sync_handshake:
Jul 28 14:37:41 xenhost2 kernel: [  949.183579] block drbd1: self F9387700F1203DA8:C41E460BB976C3AB:A26D1EC8FBF252BD:AE658353ED7587BF bits:3072 flags:0
Jul 28 14:37:41 xenhost2 kernel: [  949.183584] block drbd1: peer D57628D842FD0424:C41E460BB976C3AB:A26D1EC8FBF252BC:AE658353ED7587BF bits:3110912 flags:2
Jul 28 14:37:41 xenhost2 kernel: [  949.183589] block drbd1: uuid_compare()=100 by rule 90
Jul 28 14:37:41 xenhost2 kernel: [  949.183592] block drbd1: Split-Brain detected, dropping connection!
Jul 28 14:37:41 xenhost2 kernel: [  949.183598] block drbd1: helper command: /sbin/drbdadm split-brain minor-1
Jul 28 14:37:41 xenhost2 kernel: [  949.203957] block drbd1: meta connection shut down by peer.
Jul 28 14:37:41 xenhost2 kernel: [  949.203964] block drbd1: conn( WFReportParams -> NetworkFailure ) 
Jul 28 14:37:41 xenhost2 kernel: [  949.203976] block drbd1: asender terminated
Jul 28 14:37:41 xenhost2 kernel: [  949.203979] block drbd1: Terminating asender thread
Jul 28 14:37:41 xenhost2 kernel: [  949.189312] block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0)
Jul 28 14:37:41 xenhost2 kernel: [  949.189321] block drbd1: conn( NetworkFailure -> Disconnecting ) 
Jul 28 14:37:41 xenhost2 kernel: [  949.189332] block drbd1: error receiving ReportState, l: 4!
Jul 28 14:37:41 xenhost2 kernel: [  949.189428] block drbd1: Connection closed
Jul 28 14:37:41 xenhost2 kernel: [  949.189438] block drbd1: conn( Disconnecting -> StandAlone ) 
Jul 28 14:37:41 xenhost2 kernel: [  949.189685] block drbd1: receiver terminated
Jul 28 14:37:41 xenhost2 kernel: [  949.189690] block drbd1: Terminating receiver thread
Jul 28 14:37:41 xenhost2 kernel: [  949.197702] block drbd1: role( Secondary -> Primary ) 

Any suggestions?

Thanks,
James




More information about the drbd-user mailing list