Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, 2010-07-28 at 17:29 +0200, Frederic Emmelmann wrote: > Hi, > > Try this on both sides: > > drbdadm adjust data > > also: > > syncer rates are in MB/Sec so 1150Mb = 1.150Gb per sec. this is overhead. > > > Greetz > Frederic Problem was with my NICs that I was using to create a direct connection. I used iperf and found that connection was really poor. I used the internal NICs and they successfully connect at gigabit. I got everything installed, switched to dual primary, and installed ocfs2 like I wanted. Everything works great until I reboot. When drbd starts up it goes into split brain. Here are the logs on both servers: host1: Jul 28 14:36:58 xenhost1 kernel: [ 1064.135566] drbd: initialized. Version: 8.3.7 (api:88/proto:86-91) Jul 28 14:36:58 xenhost1 kernel: [ 1064.135570] drbd: GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by phil at fat-tyre, 2010-01-13 17:17:27 Jul 28 14:36:58 xenhost1 kernel: [ 1064.135574] drbd: registered as block device major 147 Jul 28 14:36:58 xenhost1 kernel: [ 1064.135577] drbd: minor_table @ 0xffff88007dfa10c0 Jul 28 14:37:29 xenhost1 kernel: [ 1095.791798] block drbd1: Starting worker thread (from cqueue [5498]) Jul 28 14:37:29 xenhost1 kernel: [ 1095.426400] block drbd1: disk( Diskless -> Attaching ) Jul 28 14:37:29 xenhost1 kernel: [ 1095.900498] block drbd1: Found 57 transactions (3080 active extents) in activity log. Jul 28 14:37:29 xenhost1 kernel: [ 1095.900503] block drbd1: Method to ensure write ordering: drain Jul 28 14:37:29 xenhost1 kernel: [ 1095.900508] block drbd1: max_segment_size ( = BIO size ) = 32768 Jul 28 14:37:29 xenhost1 kernel: [ 1095.900513] block drbd1: drbd_bm_resize called with capacity == 650082440 Jul 28 14:37:29 xenhost1 kernel: [ 1095.902473] block drbd1: resync bitmap: bits=81260305 words=1269693 Jul 28 14:37:29 xenhost1 kernel: [ 1095.902480] block drbd1: size = 310 GB (325041220 KB) Jul 28 14:37:29 xenhost1 kernel: [ 1095.951341] block drbd1: recounting of set bits took additional 3 jiffies Jul 28 14:37:29 xenhost1 kernel: [ 1095.951348] block drbd1: 12 GB (3110912 bits) marked out-of-sync by on disk bit-map. Jul 28 14:37:29 xenhost1 kernel: [ 1095.951357] block drbd1: disk( Attaching -> UpToDate ) Jul 28 14:37:29 xenhost1 kernel: [ 1095.199721] block drbd1: conn( StandAlone -> Unconnected ) Jul 28 14:37:29 xenhost1 kernel: [ 1095.985824] block drbd1: Starting receiver thread (from drbd1_worker [6086]) Jul 28 14:37:29 xenhost1 kernel: [ 1095.199816] block drbd1: receiver (re)started Jul 28 14:37:29 xenhost1 kernel: [ 1095.199823] block drbd1: conn( Unconnected -> WFConnection ) Jul 28 14:37:30 xenhost1 kernel: [ 1095.299912] block drbd1: Handshake successful: Agreed network protocol version 91 Jul 28 14:37:30 xenhost1 kernel: [ 1095.299921] block drbd1: conn( WFConnection -> WFReportParams ) Jul 28 14:37:30 xenhost1 kernel: [ 1095.300029] block drbd1: Starting asender thread (from drbd1_receiver [6099]) Jul 28 14:37:30 xenhost1 kernel: [ 1095.300259] block drbd1: data-integrity-alg: <not-used> Jul 28 14:37:30 xenhost1 kernel: [ 1095.300274] block drbd1: drbd_sync_handshake: Jul 28 14:37:30 xenhost1 kernel: [ 1095.300279] block drbd1: self D57628D842FD0424:C41E460BB976C3AB:A26D1EC8FBF252BC:AE658353ED7587BF bits:3110912 flags:0 Jul 28 14:37:30 xenhost1 kernel: [ 1095.300285] block drbd1: peer F9387700F1203DA8:C41E460BB976C3AB:A26D1EC8FBF252BD:AE658353ED7587BF bits:3072 flags:2 Jul 28 14:37:30 xenhost1 kernel: [ 1095.300290] block drbd1: uuid_compare()=100 by rule 90 Jul 28 14:37:30 xenhost1 kernel: [ 1095.300293] block drbd1: Split-Brain detected, dropping connection! Jul 28 14:37:30 xenhost1 kernel: [ 1095.300299] block drbd1: helper command: /sbin/drbdadm split-brain minor-1 Jul 28 14:37:30 xenhost1 kernel: [ 1095.305528] block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0) Jul 28 14:37:30 xenhost1 kernel: [ 1095.305536] block drbd1: conn( WFReportParams -> Disconnecting ) Jul 28 14:37:30 xenhost1 kernel: [ 1095.305549] block drbd1: error receiving ReportState, l: 4! Jul 28 14:37:30 xenhost1 kernel: [ 1095.785351] block drbd1: asender terminated Jul 28 14:37:30 xenhost1 kernel: [ 1095.785362] block drbd1: Terminating asender thread Jul 28 14:37:30 xenhost1 kernel: [ 1095.305905] block drbd1: Connection closed Jul 28 14:37:30 xenhost1 kernel: [ 1095.305972] block drbd1: conn( Disconnecting -> StandAlone ) Jul 28 14:37:30 xenhost1 kernel: [ 1095.306016] block drbd1: receiver terminated Jul 28 14:37:30 xenhost1 kernel: [ 1095.306020] block drbd1: Terminating receiver thread Jul 28 14:37:30 xenhost1 kernel: [ 1096.007604] block drbd1: role( Secondary -> Primary ) host2: Jul 28 14:37:34 xenhost2 kernel: [ 942.863857] drbd: initialized. Version: 8.3.7 (api:88/proto:86-91) Jul 28 14:37:34 xenhost2 kernel: [ 942.863861] drbd: GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by phil at fat-tyre, 2010-01-13 17:17:27 Jul 28 14:37:34 xenhost2 kernel: [ 942.863865] drbd: registered as block device major 147 Jul 28 14:37:34 xenhost2 kernel: [ 942.863868] drbd: minor_table @ 0xffff88007d2d3280 Jul 28 14:37:34 xenhost2 kernel: [ 943.307311] block drbd1: Starting worker thread (from cqueue [4481]) Jul 28 14:37:34 xenhost2 kernel: [ 943.050659] block drbd1: disk( Diskless -> Attaching ) Jul 28 14:37:34 xenhost2 kernel: klogd 1.4.1, ---------- state change ---------- Jul 28 14:37:35 xenhost2 kernel: [ 943.081457] block drbd1: Found 3 transactions (3 active extents) in activity log. Jul 28 14:37:35 xenhost2 kernel: [ 943.081462] block drbd1: Method to ensure write ordering: drain Jul 28 14:37:35 xenhost2 kernel: [ 943.081467] block drbd1: max_segment_size ( = BIO size ) = 32768 Jul 28 14:37:35 xenhost2 kernel: [ 943.081471] block drbd1: drbd_bm_resize called with capacity == 650082440 Jul 28 14:37:35 xenhost2 kernel: [ 943.083402] block drbd1: resync bitmap: bits=81260305 words=1269693 Jul 28 14:37:35 xenhost2 kernel: [ 943.083408] block drbd1: size = 310 GB (325041220 KB) Jul 28 14:37:35 xenhost2 kernel: [ 943.183900] block drbd1: recounting of set bits took additional 3 jiffies Jul 28 14:37:35 xenhost2 kernel: [ 943.183907] block drbd1: 12 MB (3072 bits) marked out-of-sync by on disk bit-map. Jul 28 14:37:35 xenhost2 kernel: [ 943.183915] block drbd1: disk( Attaching -> UpToDate ) Jul 28 14:37:35 xenhost2 kernel: [ 943.195879] block drbd1: conn( StandAlone -> Unconnected ) Jul 28 14:37:35 xenhost2 kernel: [ 943.195913] block drbd1: Starting receiver thread (from drbd1_worker [5836]) Jul 28 14:37:35 xenhost2 kernel: [ 943.024854] block drbd1: receiver (re)started Jul 28 14:37:35 xenhost2 kernel: [ 943.024861] block drbd1: conn( Unconnected -> WFConnection ) Jul 28 14:37:41 xenhost2 kernel: [ 949.182843] block drbd1: Handshake successful: Agreed network protocol version 91 Jul 28 14:37:41 xenhost2 kernel: [ 949.182853] block drbd1: conn( WFConnection -> WFReportParams ) Jul 28 14:37:41 xenhost2 kernel: [ 949.182875] block drbd1: Starting asender thread (from drbd1_receiver [5849]) Jul 28 14:37:41 xenhost2 kernel: [ 949.183555] block drbd1: data-integrity-alg: <not-used> Jul 28 14:37:41 xenhost2 kernel: [ 949.183574] block drbd1: drbd_sync_handshake: Jul 28 14:37:41 xenhost2 kernel: [ 949.183579] block drbd1: self F9387700F1203DA8:C41E460BB976C3AB:A26D1EC8FBF252BD:AE658353ED7587BF bits:3072 flags:0 Jul 28 14:37:41 xenhost2 kernel: [ 949.183584] block drbd1: peer D57628D842FD0424:C41E460BB976C3AB:A26D1EC8FBF252BC:AE658353ED7587BF bits:3110912 flags:2 Jul 28 14:37:41 xenhost2 kernel: [ 949.183589] block drbd1: uuid_compare()=100 by rule 90 Jul 28 14:37:41 xenhost2 kernel: [ 949.183592] block drbd1: Split-Brain detected, dropping connection! Jul 28 14:37:41 xenhost2 kernel: [ 949.183598] block drbd1: helper command: /sbin/drbdadm split-brain minor-1 Jul 28 14:37:41 xenhost2 kernel: [ 949.203957] block drbd1: meta connection shut down by peer. Jul 28 14:37:41 xenhost2 kernel: [ 949.203964] block drbd1: conn( WFReportParams -> NetworkFailure ) Jul 28 14:37:41 xenhost2 kernel: [ 949.203976] block drbd1: asender terminated Jul 28 14:37:41 xenhost2 kernel: [ 949.203979] block drbd1: Terminating asender thread Jul 28 14:37:41 xenhost2 kernel: [ 949.189312] block drbd1: helper command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0) Jul 28 14:37:41 xenhost2 kernel: [ 949.189321] block drbd1: conn( NetworkFailure -> Disconnecting ) Jul 28 14:37:41 xenhost2 kernel: [ 949.189332] block drbd1: error receiving ReportState, l: 4! Jul 28 14:37:41 xenhost2 kernel: [ 949.189428] block drbd1: Connection closed Jul 28 14:37:41 xenhost2 kernel: [ 949.189438] block drbd1: conn( Disconnecting -> StandAlone ) Jul 28 14:37:41 xenhost2 kernel: [ 949.189685] block drbd1: receiver terminated Jul 28 14:37:41 xenhost2 kernel: [ 949.189690] block drbd1: Terminating receiver thread Jul 28 14:37:41 xenhost2 kernel: [ 949.197702] block drbd1: role( Secondary -> Primary ) Any suggestions? Thanks, James