Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, Another issue I've come across is that invalidation of the secondary node doesn't automatically start a resync. Here follows the sequense of commands to trigger this bug: Proc1:~ # cat /proc/drbd version: 0.7.4 (api:76/proto:74) SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07 0: cs:Connected st:Primary/Secondary ld:Consistent ns:3004 nr:0 dw:3004 dr:4600 al:37 bm:4024 lo:0 pe:0 ua:0 ap:0 Proc1:~ # drbdadm invalidate all ioctl(,INVALIDATE,) failed: Operation now in progress Only in 'Connected' cstate possible. Command '/sbin/drbdsetup /dev/drbd0 invalidate' terminated with exit code 20 drbdsetup exited with code 20 Proc1:~ # cat /proc/drbd version: 0.7.4 (api:76/proto:74) SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07 0: cs:Connected st:Primary/Secondary ld:Consistent ns:3004 nr:0 dw:3004 dr:4600 al:37 bm:4024 lo:0 pe:0 ua:0 ap:0 Ok, fair enough, we don't want to invalidate the primary side. Try do the same thing on the other side. Proc2:~ # cat /proc/drbd version: 0.7.4 (api:76/proto:74) SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07 0: cs:Connected st:Secondary/Primary ld:Consistent ns:0 nr:3004 dw:3004 dr:0 al:0 bm:4024 lo:0 pe:0 ua:0 ap:0 Proc2:~ # drbdadm invalidate all Proc2:~ # cat /proc/drbd version: 0.7.4 (api:76/proto:74) SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07 0: cs:WFBitMapT st:Secondary/Primary ld:Inconsistent ns:0 nr:3004 dw:3004 dr:0 al:0 bm:8048 lo:0 pe:0 ua:0 ap:0 Proc2: /var/log/messages Nov 26 18:02:15 Proc2 kernel: drbd0: drbdsetup [3415]: cstate Connected --> WFBitMapT Nov 26 18:02:16 Proc2 kernel: drbd0: 65928176 KB now marked out-of-sync by on disk bit-map. Looks ok, the secondary has initiated the sync. Now let's look at the primary side. Proc1:~ # cat /proc/drbd version: 0.7.4 (api:76/proto:74) SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07 0: cs:Connected st:Primary/Secondary ld:Consistent ns:3004 nr:0 dw:3004 dr:4600 al:37 bm:4024 lo:0 pe:0 ua:0 ap:0 Proc1: /var/log/messages (contains nothing new) This is where the problem starts. Even if the nodes are connected and all, the primary node has no idea the secondary node wants to synchronize. Proc1:~ # drbdadm connect all Proc1:~ # cat /proc/drbd version: 0.7.4 (api:76/proto:74) SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07 0: cs:SyncSource st:Primary/Secondary ld:Consistent ns:11100 nr:0 dw:3004 dr:19960 al:37 bm:4024 lo:0 pe:42 ua:1065 ap:0 [>...................] sync'ed: 0.1% (64372/64382)M finish: 1:38:05 speed: 10,936 (10,936) K/sec Running "drbdadm connect" to connect the already connected primary side seems to help drbd understand that the other side is waiting for the sync. Once the sync is started it completes just fine. Nov 26 18:05:02 Proc1 kernel: drbd0: drbd0_receiver [1099]: cstate Connected --> BrokenPipe Nov 26 18:05:02 Proc1 kernel: drbd0: short read expecting header on sock: r=-512 Nov 26 18:05:02 Proc1 kernel: drbd0: worker terminated Nov 26 18:05:02 Proc1 kernel: drbd0: asender terminated Nov 26 18:05:02 Proc1 kernel: drbd0: drbd0_receiver [1099]: cstate BrokenPipe --> StandAlone Nov 26 18:05:02 Proc1 kernel: drbd0: Connection lost. Nov 26 18:05:02 Proc1 kernel: drbd0: receiver terminated Nov 26 18:05:02 Proc1 kernel: drbd0: drbdsetup [3820]: cstate StandAlone --> Unconnected Nov 26 18:05:02 Proc1 kernel: drbd0: drbd0_receiver [3822]: cstate Unconnected --> WFConnection Nov 26 18:05:02 Proc1 kernel: drbd0: drbd0_receiver [3822]: cstate WFConnection --> WFReportParams Nov 26 18:05:02 Proc1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Nov 26 18:05:02 Proc1 kernel: drbd0: Connection established. Nov 26 18:05:02 Proc1 kernel: drbd0: I am(P): 1:00000007:00000001:00000021:0000000c:10 Nov 26 18:05:02 Proc1 kernel: drbd0: Peer(S): 0:00000007:00000001:00000020:0000000c:01 Nov 26 18:05:02 Proc1 kernel: drbd0: drbd0_receiver [3822]: cstate WFReportParams --> WFBitMapS Nov 26 18:05:02 Proc1 kernel: drbd0: Primary/Unknown --> Primary/Secondary Nov 26 18:05:03 Proc1 kernel: drbd0: drbd0_receiver [3822]: cstate WFBitMapS --> SyncSource Nov 26 18:05:03 Proc1 kernel: drbd0: Resync started as SyncSource (need to sync 65928176 KB [16482044 bits set]). Proc2:~ # cat /proc/drbd version: 0.7.4 (api:76/proto:74) SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07 0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent ns:0 nr:753244 dw:753244 dr:0 al:0 bm:8093 lo:0 pe:1368 ua:5 ap:0 [>...................] sync'ed: 1.2% (63650/64382)M finish: 1:28:40 speed: 12,224 (10,872) K/sec Nov 26 18:02:15 Proc2 kernel: drbd0: drbdsetup [3415]: cstate Connected --> WFBitMapT Nov 26 18:02:16 Proc2 kernel: drbd0: 65928176 KB now marked out-of-sync by on disk bit-map. Nov 26 18:05:02 Proc2 kernel: drbd0: sock was shut down by peer Nov 26 18:05:02 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate WFBitMapT --> BrokenPipe Nov 26 18:05:02 Proc2 kernel: drbd0: short read expecting header on sock: r=0 Nov 26 18:05:02 Proc2 kernel: drbd0: meta connection shut down by peer. Nov 26 18:05:02 Proc2 kernel: drbd0: asender terminated Nov 26 18:05:02 Proc2 kernel: drbd0: worker terminated Nov 26 18:05:02 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate BrokenPipe --> Unconnected Nov 26 18:05:02 Proc2 kernel: drbd0: Connection lost. Nov 26 18:05:02 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate Unconnected --> WFConnection Nov 26 18:05:02 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate WFConnection --> WFReportParams Nov 26 18:05:02 Proc2 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Nov 26 18:05:02 Proc2 kernel: drbd0: Connection established. Nov 26 18:05:02 Proc2 kernel: drbd0: I am(S): 0:00000007:00000001:00000020:0000000c:01 Nov 26 18:05:02 Proc2 kernel: drbd0: Peer(P): 1:00000007:00000001:00000021:0000000c:10 Nov 26 18:05:02 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate WFReportParams --> WFBitMapT Nov 26 18:05:02 Proc2 kernel: drbd0: Secondary/Unknown --> Secondary/Primary Nov 26 18:05:03 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate WFBitMapT --> SyncTarget Nov 26 18:05:03 Proc2 kernel: drbd0: Resync started as SyncTarget (need to sync 65928176 KB [16482044 bits set]). Nov 26 19:46:19 Proc1 kernel: drbd0: Resync done (total 6076 sec; paused 0 sec; 10848 K/sec) Nov 26 19:46:19 Proc1 kernel: drbd0: drbd0_worker [3821]: cstate SyncSource --> Connected Nov 26 19:46:19 Proc2 kernel: drbd0: Resync done (total 6076 sec; paused 0 sec; 10848 K/sec) Nov 26 19:46:19 Proc2 kernel: drbd0: drbd0_worker [3432]: cstate SyncTarget --> Connected /Per