Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
Problem:
cs:WFReportParams/cs:WFBitMapS status remains after changes in network. (Not syncing) (More detailed description below)
Environment:
Packages:
drbd8-utils (2:8.3.0-1~bpo50+1)
drbd8-source (2:8.3.0-1~bpo50+1)
drbd8-2.6.26-2-686 (2:8.3.0-1~bpo50+1+2.6.26-15lenny3)
OS:
Debian 5.0.1 (lenny)
Linux ha1 2.6.26-2-686 #1 SMP Thu May 28 15:39:35 UTC 2009 i686 GNU/Linux
/proc/drbd:
ha1:
----------------------------------------
version: 8.3.0 (api:88/proto:86-89)
GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13
0: cs:WFReportParams ro:Primary/Unknown ds:UpToDate/DUnknown C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:1 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
----------------------------------------
ha2:
----------------------------------------
version: 8.3.0 (api:88/proto:86-89)
GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13
0: cs:WFBitMapS ro:Primary/Primary ds:UpToDate/UpToDate C r---
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
----------------------------------------
drbdadm get-gi all
ha1:
----------------------------------------
2022486FE91F8A06:1078AA65592ABF21:F0C8494FC827E7A1:CAB5686435C06CF3:1:1:0:1:0:0
----------------------------------------
ha2:
----------------------------------------
FC9995E47E613507:1078AA65592ABF21:F0C8494FC827E7A1:CAB5686435C06CF3:1:1:1:0:0:0
----------------------------------------
ifconfig eth3
ha1:
----------------------------------------
eth3 Link encap:Ethernet HWaddr 00:04:23:b3:f6:4c
inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::204:23ff:feb3:f64c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:16110 Metric:1
RX packets:1186159 errors:0 dropped:5 overruns:0 frame:0
TX packets:1852294 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1047687951 (999.1 MiB) TX bytes:3681795935 (3.4 GiB)
----------------------------------------
ha2:
----------------------------------------
eth3 Link encap:Ethernet HWaddr 00:04:23:af:27:68
inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::204:23ff:feaf:2768/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:16110 Metric:1
RX packets:1852133 errors:0 dropped:154 overruns:0 frame:0
TX packets:1186168 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3681660323 (3.4 GiB) TX bytes:1047695375 (999.1 MiB)
----------------------------------------
Configuration
ha1:
----------------------------------------
global {
usage-count no;
}
common {
syncer { rate 10M; }
}
resource shared {
protocol C;
handlers {
}
startup {
degr-wfc-timeout 120; # 2 minutes.
become-primary-on both;
}
disk {
on-io-error pass_on;
}
net {
allow-two-primaries;
cram-hmac-alg "sha256";
shared-secret "DRBD2KVM";
after-sb-0pri discard-least-changes;
after-sb-1pri call-pri-lost-after-sb;
after-sb-2pri call-pri-lost-after-sb;
rr-conflict disconnect;
}
syncer {
rate 10M;
al-extents 257;
}
on ha1 {
device /dev/drbd0;
disk /dev/data/shared;
address 192.168.0.1:7788;
flexible-meta-disk internal;
}
on ha2 {
device /dev/drbd0;
disk /dev/data/shared;
address 192.168.0.2:7788;
meta-disk internal;
}
}
----------------------------------------
ha2:
----------------------------------------
global {
usage-count no;
}
common {
syncer { rate 10M; }
}
resource shared {
protocol C;
handlers {
}
startup {
degr-wfc-timeout 120; # 2 minutes.
become-primary-on both;
}
disk {
on-io-error pass_on;
}
net {
allow-two-primaries;
cram-hmac-alg "sha256";
shared-secret "DRBD2KVM";
after-sb-0pri discard-least-changes;
after-sb-1pri call-pri-lost-after-sb;
after-sb-2pri call-pri-lost-after-sb;
rr-conflict disconnect;
}
syncer {
rate 10M;
al-extents 257;
}
on ha2 {
device /dev/drbd0;
disk /dev/data/shared;
address 192.168.0.2:7788;
flexible-meta-disk internal;
}
on ha1 {
device /dev/drbd0;
disk /dev/data/shared;
address 192.168.0.1:7788;
meta-disk internal;
}
}
----------------------------------------
/var/log/syslog
ha1:
----------------------------------------
Jun 9 14:57:37 ha1 drbd:: START
Jun 9 14:57:58 ha1 kernel: [18213.499269] drbd: initialised. Version: 8.3.0 (api:88/proto:86-89)
Jun 9 14:57:58 ha1 kernel: [18213.499310] drbd: GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13
Jun 9 14:57:58 ha1 kernel: [18213.499348] drbd: registered as block device major 147
Jun 9 14:57:58 ha1 kernel: [18213.499369] drbd: minor_table @ 0xf77ff680
Jun 9 14:57:59 ha1 kernel: [18213.516964] drbd0: disk( Diskless -> Attaching )
Jun 9 14:57:59 ha1 kernel: [18213.517006] drbd0: Starting worker thread (from cqueue [4755])
Jun 9 14:57:59 ha1 kernel: [18213.542668] drbd0: Found 6 transactions (34 active extents) in activity log.
Jun 9 14:57:59 ha1 kernel: [18213.542708] drbd0: Method to ensure write ordering: barrier
Jun 9 14:57:59 ha1 kernel: [18213.542733] drbd0: max_segment_size ( = BIO size ) = 32768
Jun 9 14:57:59 ha1 kernel: [18213.542759] drbd0: drbd_bm_resize called with capacity == 516008
Jun 9 14:57:59 ha1 kernel: [18213.542828] drbd0: resync bitmap: bits=64501 words=2016
Jun 9 14:57:59 ha1 kernel: [18213.544257] drbd0: size = 252 MB (258004 KB)
Jun 9 14:57:59 ha1 kernel: [18213.546040] drbd0: recounting of set bits took additional 0 jiffies
Jun 9 14:57:59 ha1 kernel: [18213.546070] drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Jun 9 14:57:59 ha1 kernel: [18213.546116] drbd0: disk( Attaching -> UpToDate )
Jun 9 14:57:59 ha1 kernel: [18213.546160] drbd0: Barriers not supported on meta data device - disabling
Jun 9 14:57:59 ha1 kernel: [18213.594392] drbd0: conn( StandAlone -> Unconnected )
Jun 9 14:57:59 ha1 kernel: [18213.594455] drbd0: Starting receiver thread (from drbd0_worker [7981])
Jun 9 14:57:59 ha1 kernel: [18213.594552] drbd0: receiver (re)started
Jun 9 14:57:59 ha1 kernel: [18213.594581] drbd0: conn( Unconnected -> WFConnection )
Jun 9 14:58:05 ha1 kernel: [18220.108066] drbd0: Handshake successful: Agreed network protocol version 89
Jun 9 14:58:05 ha1 kernel: [18220.109359] drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC
Jun 9 14:58:05 ha1 kernel: [18220.109421] drbd0: conn( WFConnection -> WFReportParams )
Jun 9 14:58:05 ha1 kernel: [18220.109529] drbd0: Starting asender thread (from drbd0_receiver [7988])
Jun 9 14:58:05 ha1 kernel: [18220.109816] drbd0: data-integrity-alg: <not-used>
Jun 9 14:58:05 ha1 kernel: [18220.109917] drbd0: drbd_sync_handshake:
Jun 9 14:58:05 ha1 kernel: [18220.109953] drbd0: self CAB5686435C06CF2:C09BFAC5A5B8F389:E3BF213E87EB37B2:695057964073073D
Jun 9 14:58:05 ha1 kernel: [18220.109988] drbd0: peer 192CCD8368AFE826:C09BFAC5A5B8F389:E3BF213E87EB37B2:695057964073073D
Jun 9 14:58:05 ha1 kernel: [18220.110025] drbd0: uuid_compare()=100 by rule 9
Jun 9 14:58:05 ha1 kernel: [18220.110047] drbd0: Split-Brain detected, 0 primaries, automatically solved. Sync from this node
Jun 9 14:58:05 ha1 kernel: [18220.110116] drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Jun 9 14:58:05 ha1 kernel: [18220.122034] drbd0: role( Secondary -> Primary )
Jun 9 14:58:05 ha1 kernel: [18220.159079] drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
Jun 9 14:58:05 ha1 kernel: [18220.159143] drbd0: Began resync as SyncSource (will sync 0 KB [0 bits set]).
Jun 9 14:58:05 ha1 kernel: [18220.159175] drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
Jun 9 14:58:05 ha1 kernel: [18220.159205] drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
Jun 9 14:58:05 ha1 kernel: [18220.159288] drbd0: peer( Secondary -> Primary )
Jun 9 14:58:05 ha1 kernel: [18220.162340] drbd0: drbd_sync_handshake:
Jun 9 14:58:05 ha1 kernel: [18220.162376] drbd0: self CAB5686435C06CF3:0000000000000000:A69BFEC8ACF135DB:C09BFAC5A5B8F389
Jun 9 14:58:05 ha1 kernel: [18220.162415] drbd0: peer CAB5686435C06CF2:0000000000000000:A69BFEC8ACF135DB:C09BFAC5A5B8F389
Jun 9 14:58:05 ha1 kernel: [18220.162451] drbd0: uuid_compare()=0 by rule 4
Jun 9 14:59:03 ha1 drbd:: BLOCK
Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: meta connection shut down by peer.
Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: asender terminated
Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: Terminating asender thread
Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: Creating new current UUID
Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: short read expecting header on sock: r=-512
Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: Connection closed
Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: conn( NetworkFailure -> Unconnected )
Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: receiver terminated
Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: Restarting receiver thread
Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: receiver (re)started
Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: conn( Unconnected -> WFConnection )
Jun 9 14:59:31 ha1 drbd:: UNBLOCK
Jun 9 14:59:31 ha1 kernel: [18306.055929] drbd0: Handshake successful: Agreed network protocol version 89
Jun 9 14:59:31 ha1 kernel: [18306.056887] drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC
Jun 9 14:59:31 ha1 kernel: [18306.056934] drbd0: conn( WFConnection -> WFReportParams )
Jun 9 14:59:31 ha1 kernel: [18306.056992] drbd0: Starting asender thread (from drbd0_receiver [7988])
Jun 9 14:59:31 ha1 kernel: [18306.057256] drbd0: data-integrity-alg: <not-used>
Jun 9 14:59:31 ha1 kernel: [18306.057311] drbd0: drbd_sync_handshake:
Jun 9 14:59:31 ha1 kernel: [18306.057336] drbd0: self 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
Jun 9 14:59:31 ha1 kernel: [18306.057372] drbd0: peer 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
Jun 9 14:59:31 ha1 kernel: [18306.057408] drbd0: uuid_compare()=100 by rule 9
----------------------------------------
ha2:
----------------------------------------
Jun 9 14:57:51 ha2 drbd:: START
Jun 9 14:58:05 ha2 kernel: [18238.993868] drbd: initialised. Version: 8.3.0 (api:88/proto:86-89)
Jun 9 14:58:05 ha2 kernel: [18238.993913] drbd: GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13
Jun 9 14:58:05 ha2 kernel: [18238.993952] drbd: registered as block device major 147
Jun 9 14:58:05 ha2 kernel: [18238.993975] drbd: minor_table @ 0xf797a600
Jun 9 14:58:05 ha2 kernel: [18239.012188] drbd0: disk( Diskless -> Attaching )
Jun 9 14:58:05 ha2 kernel: [18239.012229] drbd0: Starting worker thread (from cqueue [4075])
Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: Found 6 transactions (19 active extents) in activity log.
Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: Method to ensure write ordering: barrier
Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: max_segment_size ( = BIO size ) = 32768
Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: drbd_bm_resize called with capacity == 516008
Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: resync bitmap: bits=64501 words=2016
Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: size = 252 MB (258004 KB)
Jun 9 14:58:05 ha2 kernel: [18239.049133] drbd0: recounting of set bits took additional 0 jiffies
Jun 9 14:58:05 ha2 kernel: [18239.049165] drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
Jun 9 14:58:05 ha2 kernel: [18239.049210] drbd0: disk( Attaching -> UpToDate )
Jun 9 14:58:05 ha2 kernel: [18239.049255] drbd0: Barriers not supported on meta data device - disabling
Jun 9 14:58:05 ha2 kernel: [18239.079460] drbd0: conn( StandAlone -> Unconnected )
Jun 9 14:58:05 ha2 kernel: [18239.079521] drbd0: Starting receiver thread (from drbd0_worker [7160])
Jun 9 14:58:05 ha2 kernel: [18239.079620] drbd0: receiver (re)started
Jun 9 14:58:05 ha2 kernel: [18239.079649] drbd0: conn( Unconnected -> WFConnection )
Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: Handshake successful: Agreed network protocol version 89
Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC
Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: conn( WFConnection -> WFReportParams )
Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: Starting asender thread (from drbd0_receiver [7167])
Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: data-integrity-alg: <not-used>
Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: drbd_sync_handshake:
Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: self 192CCD8368AFE826:C09BFAC5A5B8F389:E3BF213E87EB37B2:695057964073073D
Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: peer CAB5686435C06CF2:C09BFAC5A5B8F389:E3BF213E87EB37B2:695057964073073D
Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: uuid_compare()=100 by rule 9
Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: Split-Brain detected, 0 primaries, automatically solved. Sync from peer node
Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Jun 9 14:58:05 ha2 kernel: [18239.184236] drbd0: conn( WFBitMapT -> WFSyncUUID )
Jun 9 14:58:05 ha2 kernel: [18239.193997] drbd0: peer( Secondary -> Primary )
Jun 9 14:58:05 ha2 kernel: [18239.231620] drbd0: role( Secondary -> Primary )
Jun 9 14:58:05 ha2 kernel: [18239.234903] drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
Jun 9 14:58:05 ha2 kernel: [18239.240198] drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Jun 9 14:58:05 ha2 kernel: [18239.240256] drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent )
Jun 9 14:58:05 ha2 kernel: [18239.240298] drbd0: Began resync as SyncTarget (will sync 0 KB [0 bits set]).
Jun 9 14:58:05 ha2 kernel: [18239.240326] drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
Jun 9 14:58:05 ha2 kernel: [18239.240352] drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate )
Jun 9 14:58:05 ha2 kernel: [18239.240415] drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
Jun 9 14:58:05 ha2 kernel: [18239.247101] drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
Jun 9 14:59:01 ha2 drbd:: BLOCK
Jun 9 14:59:06 ha2 kernel: [18299.688034] drbd0: PingAck did not arrive in time.
Jun 9 14:59:06 ha2 kernel: [18299.688076] drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jun 9 14:59:06 ha2 kernel: [18299.688129] drbd0: asender terminated
Jun 9 14:59:06 ha2 kernel: [18299.688148] drbd0: Terminating asender thread
Jun 9 14:59:06 ha2 kernel: [18299.688212] drbd0: Creating new current UUID
Jun 9 14:59:06 ha2 kernel: [18299.688319] drbd0: short read expecting header on sock: r=-512
Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: Connection closed
Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: conn( NetworkFailure -> Unconnected )
Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: receiver terminated
Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: Restarting receiver thread
Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: receiver (re)started
Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: conn( Unconnected -> WFConnection )
Jun 9 14:59:30 ha2 drbd:: UNBLOCK
Jun 9 14:59:31 ha2 kernel: [18325.128067] drbd0: Handshake successful: Agreed network protocol version 89
Jun 9 14:59:31 ha2 kernel: [18325.128917] drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC
Jun 9 14:59:31 ha2 kernel: [18325.128979] drbd0: conn( WFConnection -> WFReportParams )
Jun 9 14:59:31 ha2 kernel: [18325.129035] drbd0: Starting asender thread (from drbd0_receiver [7167])
Jun 9 14:59:31 ha2 kernel: [18325.129570] drbd0: data-integrity-alg: <not-used>
Jun 9 14:59:31 ha2 kernel: [18325.129642] drbd0: drbd_sync_handshake:
Jun 9 14:59:31 ha2 kernel: [18325.129667] drbd0: self 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
Jun 9 14:59:31 ha2 kernel: [18325.129717] drbd0: peer 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389
Jun 9 14:59:31 ha2 kernel: [18325.129752] drbd0: uuid_compare()=100 by rule 9
Jun 9 14:59:31 ha2 kernel: [18325.129774] drbd0: Split-Brain detected, 2 primaries, automatically solved. Sync from this node
Jun 9 14:59:31 ha2 kernel: [18325.129845] drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
----------------------------------------
How to reproduce it:
- both nodes: logger -t drbd: START
- both nodes: /etc/init.d/drbd start (perfect sync; both primary)
- both nodes: logger -t drbd: BLOCK
- ha1: /sbin/iptables -I OUTPUT -p tcp --dport 7788 -j REJECT
/sbin/iptables -I INPUT -p tcp --dport 7788 -j REJECT
- wait until both nodes are in
"0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r---"
- both nodes: logger -t drbd: UNBLOCK
- ha1: /sbin/iptables -D OUTPUT -p tcp --dport 7788 -j REJECT
/sbin/iptables -D INPUT -p tcp --dport 7788 -j REJECT
This is on an test network but I don't understand why drbd won't fully sync.
It's an p2p connection between the hosts (cross-cable) with an 1Gbit network card and
an MTU of 16110 (got the problem also with mtu at 1500).
What do I need to do/change to get the sync to complete automatic?
Thank you very much for your help!
Kind regards,
Reinier Haasjes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 2485 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090609/697c08b9/attachment.bin>