Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, Problem: cs:WFReportParams/cs:WFBitMapS status remains after changes in network. (Not syncing) (More detailed description below) Environment: Packages: drbd8-utils (2:8.3.0-1~bpo50+1) drbd8-source (2:8.3.0-1~bpo50+1) drbd8-2.6.26-2-686 (2:8.3.0-1~bpo50+1+2.6.26-15lenny3) OS: Debian 5.0.1 (lenny) Linux ha1 2.6.26-2-686 #1 SMP Thu May 28 15:39:35 UTC 2009 i686 GNU/Linux /proc/drbd: ha1: ---------------------------------------- version: 8.3.0 (api:88/proto:86-89) GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13 0: cs:WFReportParams ro:Primary/Unknown ds:UpToDate/DUnknown C r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:1 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 ---------------------------------------- ha2: ---------------------------------------- version: 8.3.0 (api:88/proto:86-89) GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13 0: cs:WFBitMapS ro:Primary/Primary ds:UpToDate/UpToDate C r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 ---------------------------------------- drbdadm get-gi all ha1: ---------------------------------------- 2022486FE91F8A06:1078AA65592ABF21:F0C8494FC827E7A1:CAB5686435C06CF3:1:1:0:1:0:0 ---------------------------------------- ha2: ---------------------------------------- FC9995E47E613507:1078AA65592ABF21:F0C8494FC827E7A1:CAB5686435C06CF3:1:1:1:0:0:0 ---------------------------------------- ifconfig eth3 ha1: ---------------------------------------- eth3 Link encap:Ethernet HWaddr 00:04:23:b3:f6:4c inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::204:23ff:feb3:f64c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:16110 Metric:1 RX packets:1186159 errors:0 dropped:5 overruns:0 frame:0 TX packets:1852294 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1047687951 (999.1 MiB) TX bytes:3681795935 (3.4 GiB) ---------------------------------------- ha2: ---------------------------------------- eth3 Link encap:Ethernet HWaddr 00:04:23:af:27:68 inet addr:192.168.0.2 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::204:23ff:feaf:2768/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:16110 Metric:1 RX packets:1852133 errors:0 dropped:154 overruns:0 frame:0 TX packets:1186168 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3681660323 (3.4 GiB) TX bytes:1047695375 (999.1 MiB) ---------------------------------------- Configuration ha1: ---------------------------------------- global { usage-count no; } common { syncer { rate 10M; } } resource shared { protocol C; handlers { } startup { degr-wfc-timeout 120; # 2 minutes. become-primary-on both; } disk { on-io-error pass_on; } net { allow-two-primaries; cram-hmac-alg "sha256"; shared-secret "DRBD2KVM"; after-sb-0pri discard-least-changes; after-sb-1pri call-pri-lost-after-sb; after-sb-2pri call-pri-lost-after-sb; rr-conflict disconnect; } syncer { rate 10M; al-extents 257; } on ha1 { device /dev/drbd0; disk /dev/data/shared; address 192.168.0.1:7788; flexible-meta-disk internal; } on ha2 { device /dev/drbd0; disk /dev/data/shared; address 192.168.0.2:7788; meta-disk internal; } } ---------------------------------------- ha2: ---------------------------------------- global { usage-count no; } common { syncer { rate 10M; } } resource shared { protocol C; handlers { } startup { degr-wfc-timeout 120; # 2 minutes. become-primary-on both; } disk { on-io-error pass_on; } net { allow-two-primaries; cram-hmac-alg "sha256"; shared-secret "DRBD2KVM"; after-sb-0pri discard-least-changes; after-sb-1pri call-pri-lost-after-sb; after-sb-2pri call-pri-lost-after-sb; rr-conflict disconnect; } syncer { rate 10M; al-extents 257; } on ha2 { device /dev/drbd0; disk /dev/data/shared; address 192.168.0.2:7788; flexible-meta-disk internal; } on ha1 { device /dev/drbd0; disk /dev/data/shared; address 192.168.0.1:7788; meta-disk internal; } } ---------------------------------------- /var/log/syslog ha1: ---------------------------------------- Jun 9 14:57:37 ha1 drbd:: START Jun 9 14:57:58 ha1 kernel: [18213.499269] drbd: initialised. Version: 8.3.0 (api:88/proto:86-89) Jun 9 14:57:58 ha1 kernel: [18213.499310] drbd: GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13 Jun 9 14:57:58 ha1 kernel: [18213.499348] drbd: registered as block device major 147 Jun 9 14:57:58 ha1 kernel: [18213.499369] drbd: minor_table @ 0xf77ff680 Jun 9 14:57:59 ha1 kernel: [18213.516964] drbd0: disk( Diskless -> Attaching ) Jun 9 14:57:59 ha1 kernel: [18213.517006] drbd0: Starting worker thread (from cqueue [4755]) Jun 9 14:57:59 ha1 kernel: [18213.542668] drbd0: Found 6 transactions (34 active extents) in activity log. Jun 9 14:57:59 ha1 kernel: [18213.542708] drbd0: Method to ensure write ordering: barrier Jun 9 14:57:59 ha1 kernel: [18213.542733] drbd0: max_segment_size ( = BIO size ) = 32768 Jun 9 14:57:59 ha1 kernel: [18213.542759] drbd0: drbd_bm_resize called with capacity == 516008 Jun 9 14:57:59 ha1 kernel: [18213.542828] drbd0: resync bitmap: bits=64501 words=2016 Jun 9 14:57:59 ha1 kernel: [18213.544257] drbd0: size = 252 MB (258004 KB) Jun 9 14:57:59 ha1 kernel: [18213.546040] drbd0: recounting of set bits took additional 0 jiffies Jun 9 14:57:59 ha1 kernel: [18213.546070] drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jun 9 14:57:59 ha1 kernel: [18213.546116] drbd0: disk( Attaching -> UpToDate ) Jun 9 14:57:59 ha1 kernel: [18213.546160] drbd0: Barriers not supported on meta data device - disabling Jun 9 14:57:59 ha1 kernel: [18213.594392] drbd0: conn( StandAlone -> Unconnected ) Jun 9 14:57:59 ha1 kernel: [18213.594455] drbd0: Starting receiver thread (from drbd0_worker [7981]) Jun 9 14:57:59 ha1 kernel: [18213.594552] drbd0: receiver (re)started Jun 9 14:57:59 ha1 kernel: [18213.594581] drbd0: conn( Unconnected -> WFConnection ) Jun 9 14:58:05 ha1 kernel: [18220.108066] drbd0: Handshake successful: Agreed network protocol version 89 Jun 9 14:58:05 ha1 kernel: [18220.109359] drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC Jun 9 14:58:05 ha1 kernel: [18220.109421] drbd0: conn( WFConnection -> WFReportParams ) Jun 9 14:58:05 ha1 kernel: [18220.109529] drbd0: Starting asender thread (from drbd0_receiver [7988]) Jun 9 14:58:05 ha1 kernel: [18220.109816] drbd0: data-integrity-alg: <not-used> Jun 9 14:58:05 ha1 kernel: [18220.109917] drbd0: drbd_sync_handshake: Jun 9 14:58:05 ha1 kernel: [18220.109953] drbd0: self CAB5686435C06CF2:C09BFAC5A5B8F389:E3BF213E87EB37B2:695057964073073D Jun 9 14:58:05 ha1 kernel: [18220.109988] drbd0: peer 192CCD8368AFE826:C09BFAC5A5B8F389:E3BF213E87EB37B2:695057964073073D Jun 9 14:58:05 ha1 kernel: [18220.110025] drbd0: uuid_compare()=100 by rule 9 Jun 9 14:58:05 ha1 kernel: [18220.110047] drbd0: Split-Brain detected, 0 primaries, automatically solved. Sync from this node Jun 9 14:58:05 ha1 kernel: [18220.110116] drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Jun 9 14:58:05 ha1 kernel: [18220.122034] drbd0: role( Secondary -> Primary ) Jun 9 14:58:05 ha1 kernel: [18220.159079] drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Jun 9 14:58:05 ha1 kernel: [18220.159143] drbd0: Began resync as SyncSource (will sync 0 KB [0 bits set]). Jun 9 14:58:05 ha1 kernel: [18220.159175] drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Jun 9 14:58:05 ha1 kernel: [18220.159205] drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Jun 9 14:58:05 ha1 kernel: [18220.159288] drbd0: peer( Secondary -> Primary ) Jun 9 14:58:05 ha1 kernel: [18220.162340] drbd0: drbd_sync_handshake: Jun 9 14:58:05 ha1 kernel: [18220.162376] drbd0: self CAB5686435C06CF3:0000000000000000:A69BFEC8ACF135DB:C09BFAC5A5B8F389 Jun 9 14:58:05 ha1 kernel: [18220.162415] drbd0: peer CAB5686435C06CF2:0000000000000000:A69BFEC8ACF135DB:C09BFAC5A5B8F389 Jun 9 14:58:05 ha1 kernel: [18220.162451] drbd0: uuid_compare()=0 by rule 4 Jun 9 14:59:03 ha1 drbd:: BLOCK Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: meta connection shut down by peer. Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: asender terminated Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: Terminating asender thread Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: Creating new current UUID Jun 9 14:59:06 ha1 kernel: [18280.616168] drbd0: short read expecting header on sock: r=-512 Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: Connection closed Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: conn( NetworkFailure -> Unconnected ) Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: receiver terminated Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: Restarting receiver thread Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: receiver (re)started Jun 9 14:59:06 ha1 kernel: [18280.654545] drbd0: conn( Unconnected -> WFConnection ) Jun 9 14:59:31 ha1 drbd:: UNBLOCK Jun 9 14:59:31 ha1 kernel: [18306.055929] drbd0: Handshake successful: Agreed network protocol version 89 Jun 9 14:59:31 ha1 kernel: [18306.056887] drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC Jun 9 14:59:31 ha1 kernel: [18306.056934] drbd0: conn( WFConnection -> WFReportParams ) Jun 9 14:59:31 ha1 kernel: [18306.056992] drbd0: Starting asender thread (from drbd0_receiver [7988]) Jun 9 14:59:31 ha1 kernel: [18306.057256] drbd0: data-integrity-alg: <not-used> Jun 9 14:59:31 ha1 kernel: [18306.057311] drbd0: drbd_sync_handshake: Jun 9 14:59:31 ha1 kernel: [18306.057336] drbd0: self 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389 Jun 9 14:59:31 ha1 kernel: [18306.057372] drbd0: peer 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389 Jun 9 14:59:31 ha1 kernel: [18306.057408] drbd0: uuid_compare()=100 by rule 9 ---------------------------------------- ha2: ---------------------------------------- Jun 9 14:57:51 ha2 drbd:: START Jun 9 14:58:05 ha2 kernel: [18238.993868] drbd: initialised. Version: 8.3.0 (api:88/proto:86-89) Jun 9 14:58:05 ha2 kernel: [18238.993913] drbd: GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by phil at fat-tyre, 2008-12-18 15:26:13 Jun 9 14:58:05 ha2 kernel: [18238.993952] drbd: registered as block device major 147 Jun 9 14:58:05 ha2 kernel: [18238.993975] drbd: minor_table @ 0xf797a600 Jun 9 14:58:05 ha2 kernel: [18239.012188] drbd0: disk( Diskless -> Attaching ) Jun 9 14:58:05 ha2 kernel: [18239.012229] drbd0: Starting worker thread (from cqueue [4075]) Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: Found 6 transactions (19 active extents) in activity log. Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: Method to ensure write ordering: barrier Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: max_segment_size ( = BIO size ) = 32768 Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: drbd_bm_resize called with capacity == 516008 Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: resync bitmap: bits=64501 words=2016 Jun 9 14:58:05 ha2 kernel: [18239.047152] drbd0: size = 252 MB (258004 KB) Jun 9 14:58:05 ha2 kernel: [18239.049133] drbd0: recounting of set bits took additional 0 jiffies Jun 9 14:58:05 ha2 kernel: [18239.049165] drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jun 9 14:58:05 ha2 kernel: [18239.049210] drbd0: disk( Attaching -> UpToDate ) Jun 9 14:58:05 ha2 kernel: [18239.049255] drbd0: Barriers not supported on meta data device - disabling Jun 9 14:58:05 ha2 kernel: [18239.079460] drbd0: conn( StandAlone -> Unconnected ) Jun 9 14:58:05 ha2 kernel: [18239.079521] drbd0: Starting receiver thread (from drbd0_worker [7160]) Jun 9 14:58:05 ha2 kernel: [18239.079620] drbd0: receiver (re)started Jun 9 14:58:05 ha2 kernel: [18239.079649] drbd0: conn( Unconnected -> WFConnection ) Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: Handshake successful: Agreed network protocol version 89 Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: conn( WFConnection -> WFReportParams ) Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: Starting asender thread (from drbd0_receiver [7167]) Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: data-integrity-alg: <not-used> Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: drbd_sync_handshake: Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: self 192CCD8368AFE826:C09BFAC5A5B8F389:E3BF213E87EB37B2:695057964073073D Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: peer CAB5686435C06CF2:C09BFAC5A5B8F389:E3BF213E87EB37B2:695057964073073D Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: uuid_compare()=100 by rule 9 Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: Split-Brain detected, 0 primaries, automatically solved. Sync from peer node Jun 9 14:58:05 ha2 kernel: [18239.180299] drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Jun 9 14:58:05 ha2 kernel: [18239.184236] drbd0: conn( WFBitMapT -> WFSyncUUID ) Jun 9 14:58:05 ha2 kernel: [18239.193997] drbd0: peer( Secondary -> Primary ) Jun 9 14:58:05 ha2 kernel: [18239.231620] drbd0: role( Secondary -> Primary ) Jun 9 14:58:05 ha2 kernel: [18239.234903] drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Jun 9 14:58:05 ha2 kernel: [18239.240198] drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Jun 9 14:58:05 ha2 kernel: [18239.240256] drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) Jun 9 14:58:05 ha2 kernel: [18239.240298] drbd0: Began resync as SyncTarget (will sync 0 KB [0 bits set]). Jun 9 14:58:05 ha2 kernel: [18239.240326] drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Jun 9 14:58:05 ha2 kernel: [18239.240352] drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Jun 9 14:58:05 ha2 kernel: [18239.240415] drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 Jun 9 14:58:05 ha2 kernel: [18239.247101] drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) Jun 9 14:59:01 ha2 drbd:: BLOCK Jun 9 14:59:06 ha2 kernel: [18299.688034] drbd0: PingAck did not arrive in time. Jun 9 14:59:06 ha2 kernel: [18299.688076] drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jun 9 14:59:06 ha2 kernel: [18299.688129] drbd0: asender terminated Jun 9 14:59:06 ha2 kernel: [18299.688148] drbd0: Terminating asender thread Jun 9 14:59:06 ha2 kernel: [18299.688212] drbd0: Creating new current UUID Jun 9 14:59:06 ha2 kernel: [18299.688319] drbd0: short read expecting header on sock: r=-512 Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: Connection closed Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: conn( NetworkFailure -> Unconnected ) Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: receiver terminated Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: Restarting receiver thread Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: receiver (re)started Jun 9 14:59:06 ha2 kernel: [18299.706987] drbd0: conn( Unconnected -> WFConnection ) Jun 9 14:59:30 ha2 drbd:: UNBLOCK Jun 9 14:59:31 ha2 kernel: [18325.128067] drbd0: Handshake successful: Agreed network protocol version 89 Jun 9 14:59:31 ha2 kernel: [18325.128917] drbd0: Peer authenticated using 32 bytes of 'sha256' HMAC Jun 9 14:59:31 ha2 kernel: [18325.128979] drbd0: conn( WFConnection -> WFReportParams ) Jun 9 14:59:31 ha2 kernel: [18325.129035] drbd0: Starting asender thread (from drbd0_receiver [7167]) Jun 9 14:59:31 ha2 kernel: [18325.129570] drbd0: data-integrity-alg: <not-used> Jun 9 14:59:31 ha2 kernel: [18325.129642] drbd0: drbd_sync_handshake: Jun 9 14:59:31 ha2 kernel: [18325.129667] drbd0: self 06A5AF91DACA3D7F:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389 Jun 9 14:59:31 ha2 kernel: [18325.129717] drbd0: peer 1078AA65592ABF21:CAB5686435C06CF3:A69BFEC8ACF135DB:C09BFAC5A5B8F389 Jun 9 14:59:31 ha2 kernel: [18325.129752] drbd0: uuid_compare()=100 by rule 9 Jun 9 14:59:31 ha2 kernel: [18325.129774] drbd0: Split-Brain detected, 2 primaries, automatically solved. Sync from this node Jun 9 14:59:31 ha2 kernel: [18325.129845] drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) ---------------------------------------- How to reproduce it: - both nodes: logger -t drbd: START - both nodes: /etc/init.d/drbd start (perfect sync; both primary) - both nodes: logger -t drbd: BLOCK - ha1: /sbin/iptables -I OUTPUT -p tcp --dport 7788 -j REJECT /sbin/iptables -I INPUT -p tcp --dport 7788 -j REJECT - wait until both nodes are in "0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r---" - both nodes: logger -t drbd: UNBLOCK - ha1: /sbin/iptables -D OUTPUT -p tcp --dport 7788 -j REJECT /sbin/iptables -D INPUT -p tcp --dport 7788 -j REJECT This is on an test network but I don't understand why drbd won't fully sync. It's an p2p connection between the hosts (cross-cable) with an 1Gbit network card and an MTU of 16110 (got the problem also with mtu at 1500). What do I need to do/change to get the sync to complete automatic? Thank you very much for your help! Kind regards, Reinier Haasjes -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 2485 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090609/697c08b9/attachment.bin>