Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
Another issue I've come across is that invalidation of the secondary node
doesn't automatically start a resync. Here follows the sequense of
commands to trigger this bug:
Proc1:~ # cat /proc/drbd
version: 0.7.4 (api:76/proto:74)
SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:3004 nr:0 dw:3004 dr:4600 al:37 bm:4024 lo:0 pe:0 ua:0 ap:0
Proc1:~ # drbdadm invalidate all
ioctl(,INVALIDATE,) failed: Operation now in progress
Only in 'Connected' cstate possible.
Command '/sbin/drbdsetup /dev/drbd0 invalidate' terminated with exit code 20
drbdsetup exited with code 20
Proc1:~ # cat /proc/drbd
version: 0.7.4 (api:76/proto:74)
SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:3004 nr:0 dw:3004 dr:4600 al:37 bm:4024 lo:0 pe:0 ua:0 ap:0
Ok, fair enough, we don't want to invalidate the primary side.
Try do the same thing on the other side.
Proc2:~ # cat /proc/drbd
version: 0.7.4 (api:76/proto:74)
SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
0: cs:Connected st:Secondary/Primary ld:Consistent
ns:0 nr:3004 dw:3004 dr:0 al:0 bm:4024 lo:0 pe:0 ua:0 ap:0
Proc2:~ # drbdadm invalidate all
Proc2:~ # cat /proc/drbd
version: 0.7.4 (api:76/proto:74)
SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
0: cs:WFBitMapT st:Secondary/Primary ld:Inconsistent
ns:0 nr:3004 dw:3004 dr:0 al:0 bm:8048 lo:0 pe:0 ua:0 ap:0
Proc2: /var/log/messages
Nov 26 18:02:15 Proc2 kernel: drbd0: drbdsetup [3415]: cstate Connected --> WFBitMapT
Nov 26 18:02:16 Proc2 kernel: drbd0: 65928176 KB now marked out-of-sync by on disk bit-map.
Looks ok, the secondary has initiated the sync. Now let's look at the
primary side.
Proc1:~ # cat /proc/drbd
version: 0.7.4 (api:76/proto:74)
SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:3004 nr:0 dw:3004 dr:4600 al:37 bm:4024 lo:0 pe:0 ua:0 ap:0
Proc1: /var/log/messages
(contains nothing new)
This is where the problem starts. Even if the nodes are connected and all,
the primary node has no idea the secondary node wants to synchronize.
Proc1:~ # drbdadm connect all
Proc1:~ # cat /proc/drbd
version: 0.7.4 (api:76/proto:74)
SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
0: cs:SyncSource st:Primary/Secondary ld:Consistent
ns:11100 nr:0 dw:3004 dr:19960 al:37 bm:4024 lo:0 pe:42 ua:1065 ap:0
[>...................] sync'ed: 0.1% (64372/64382)M
finish: 1:38:05 speed: 10,936 (10,936) K/sec
Running "drbdadm connect" to connect the already connected primary side
seems to help drbd understand that the other side is waiting for the
sync. Once the sync is started it completes just fine.
Nov 26 18:05:02 Proc1 kernel: drbd0: drbd0_receiver [1099]: cstate Connected --> BrokenPipe
Nov 26 18:05:02 Proc1 kernel: drbd0: short read expecting header on sock: r=-512
Nov 26 18:05:02 Proc1 kernel: drbd0: worker terminated
Nov 26 18:05:02 Proc1 kernel: drbd0: asender terminated
Nov 26 18:05:02 Proc1 kernel: drbd0: drbd0_receiver [1099]: cstate BrokenPipe --> StandAlone
Nov 26 18:05:02 Proc1 kernel: drbd0: Connection lost.
Nov 26 18:05:02 Proc1 kernel: drbd0: receiver terminated
Nov 26 18:05:02 Proc1 kernel: drbd0: drbdsetup [3820]: cstate StandAlone --> Unconnected
Nov 26 18:05:02 Proc1 kernel: drbd0: drbd0_receiver [3822]: cstate Unconnected --> WFConnection
Nov 26 18:05:02 Proc1 kernel: drbd0: drbd0_receiver [3822]: cstate WFConnection --> WFReportParams
Nov 26 18:05:02 Proc1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
Nov 26 18:05:02 Proc1 kernel: drbd0: Connection established.
Nov 26 18:05:02 Proc1 kernel: drbd0: I am(P): 1:00000007:00000001:00000021:0000000c:10
Nov 26 18:05:02 Proc1 kernel: drbd0: Peer(S): 0:00000007:00000001:00000020:0000000c:01
Nov 26 18:05:02 Proc1 kernel: drbd0: drbd0_receiver [3822]: cstate WFReportParams --> WFBitMapS
Nov 26 18:05:02 Proc1 kernel: drbd0: Primary/Unknown --> Primary/Secondary
Nov 26 18:05:03 Proc1 kernel: drbd0: drbd0_receiver [3822]: cstate WFBitMapS --> SyncSource
Nov 26 18:05:03 Proc1 kernel: drbd0: Resync started as SyncSource (need to sync 65928176 KB [16482044 bits set]).
Proc2:~ # cat /proc/drbd
version: 0.7.4 (api:76/proto:74)
SVN Revision: 1539 build by lmb at chip, 2004-09-14 10:21:07
0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
ns:0 nr:753244 dw:753244 dr:0 al:0 bm:8093 lo:0 pe:1368 ua:5 ap:0
[>...................] sync'ed: 1.2% (63650/64382)M
finish: 1:28:40 speed: 12,224 (10,872) K/sec
Nov 26 18:02:15 Proc2 kernel: drbd0: drbdsetup [3415]: cstate Connected --> WFBitMapT
Nov 26 18:02:16 Proc2 kernel: drbd0: 65928176 KB now marked out-of-sync by on disk bit-map.
Nov 26 18:05:02 Proc2 kernel: drbd0: sock was shut down by peer
Nov 26 18:05:02 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate WFBitMapT --> BrokenPipe
Nov 26 18:05:02 Proc2 kernel: drbd0: short read expecting header on sock: r=0
Nov 26 18:05:02 Proc2 kernel: drbd0: meta connection shut down by peer.
Nov 26 18:05:02 Proc2 kernel: drbd0: asender terminated
Nov 26 18:05:02 Proc2 kernel: drbd0: worker terminated
Nov 26 18:05:02 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate BrokenPipe --> Unconnected
Nov 26 18:05:02 Proc2 kernel: drbd0: Connection lost.
Nov 26 18:05:02 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate Unconnected --> WFConnection
Nov 26 18:05:02 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate WFConnection --> WFReportParams
Nov 26 18:05:02 Proc2 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
Nov 26 18:05:02 Proc2 kernel: drbd0: Connection established.
Nov 26 18:05:02 Proc2 kernel: drbd0: I am(S): 0:00000007:00000001:00000020:0000000c:01
Nov 26 18:05:02 Proc2 kernel: drbd0: Peer(P): 1:00000007:00000001:00000021:0000000c:10
Nov 26 18:05:02 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate WFReportParams --> WFBitMapT
Nov 26 18:05:02 Proc2 kernel: drbd0: Secondary/Unknown --> Secondary/Primary
Nov 26 18:05:03 Proc2 kernel: drbd0: drbd0_receiver [1111]: cstate WFBitMapT --> SyncTarget
Nov 26 18:05:03 Proc2 kernel: drbd0: Resync started as SyncTarget (need to sync 65928176 KB [16482044 bits set]).
Nov 26 19:46:19 Proc1 kernel: drbd0: Resync done (total 6076 sec; paused 0 sec; 10848 K/sec)
Nov 26 19:46:19 Proc1 kernel: drbd0: drbd0_worker [3821]: cstate SyncSource --> Connected
Nov 26 19:46:19 Proc2 kernel: drbd0: Resync done (total 6076 sec; paused 0 sec; 10848 K/sec)
Nov 26 19:46:19 Proc2 kernel: drbd0: drbd0_worker [3432]: cstate SyncTarget --> Connected
/Per