[DRBD-user] secondary sync behaviour after drbd stop and start

Gianluca Cecchi gianluca.cecchi at gmail.com
Tue Jun 21 17:55:24 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


hello,
using drbd 8.3.8 in CentOS 5.6 as provided by extra repo.
testing/simulating condition when secondary is taken down and some of
its data are modified before restart it...
(or you want to verify that if any problem/bug have modified them, you
come back to safety when you restart the secondary)

servers are drbdsrv1 and drbdsrv2, in Primary/Secondary with drbdsrv1 as Primary
with
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----

drbd r0 on /dev/vdb1 for both nodes and then ext3 file system on top of drbd0

1) [root at drbdsrv1 ~]# mount /dev/drbd0 /mnt/
[root at drbdsrv1 ~]# dd if=/dev/zero bs=/mnt/testfile bs=1024k count=2048

Secondary receives 2Gb on the wire and remains updated

2) keep down secondary

[root at drbdsrv2 ~]# service drbd stop

3) mount backing device on secondary
[root at drbdsrv2 ~]# mount /dev/vdb1 /mnt
[root at drbdsrv2 /]# ll /mnt
total 2099220
drwx------ 2 root root      16384 Jun 21 12:37 lost+found
-rw-r--r-- 1 root root 2147483648 Jun 21 12:40 testfile

4) remove testfile on secondary
[root at drbdsrv2 /]# rm /mnt/testfile
rm: remove regular file `/mnt/testfile'? y
[root at drbdsrv2 /]# ll /mnt
total 16
drwx------ 2 root root 16384 Jun 21 12:37 lost+found

5) restart drbd on secondary
[root at drbdsrv2 /]# service drbd start
Starting DRBD resources: [
r0
Found valid meta data in the expected location, 4294914048 bytes into /dev/vdb1.
d(r0) s(r0) n(r0) ].

Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: Handshake successful:
Agreed network protocol version 94
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: Starting asender thread
(from drbd0_receiver [2240])
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: data-integrity-alg: <not-used>
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: drbd_sync_handshake:
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: self
4D7B0F99C46751ED:3F596D601C18DA13:97CF1359A37B373C:598B9390C1D0D38D
bits:0 flags:0
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: peer
3F596D601C18DA12:0000000000000000:97CF1359A37B373C:598B9390C1D0D38D
bits:0 flags:0
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: uuid_compare()=1 by rule 70
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: peer( Unknown ->
Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown ->
UpToDate )
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: conn( WFBitMapS ->
SyncSource ) pdsk( UpToDate -> Inconsistent )
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: Began resync as
SyncSource (will sync 0 KB [0 bits set]).
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: Resync done (total 1
sec; paused 0 sec; 0 K/sec)
Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: conn( SyncSource ->
Connected ) pdsk( Inconsistent -> UpToDate )

But if I issue 2) and 3) again on secondary there is no testfile at all.
Is this expected behaviour and no check at all during connection?
Any config parameter to force a validation?

To get the file back and consistency I can
put verify-alg in drbd.conf and adjust on both

after 5) on secondary
6) drbdadm verify r0
Jun 21 17:26:31 drbdsrv2 kernel: block drbd0: conn( Connected -> VerifyS )
Jun 21 17:26:31 drbdsrv2 kernel: block drbd0: Starting Online Verify
from sector 0
Jun 21 17:26:31 drbdsrv2 kernel: block drbd0: Out of sync: start=0,
size=16 (sectors)
Jun 21 17:26:31 drbdsrv2 kernel: block drbd0: Out of sync: start=2056,
size=24 (sectors)
...
Jun 21 17:29:57 drbdsrv2 kernel: block drbd0: Out of sync:
start=4194304, size=8 (sectors)
Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: Online verify  done
(total 410 sec; paused 0 sec; 10228 K/sec)
Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: Online verify found 48
4k block out of sync!
Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: conn( VerifyS -> Connected )
Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: Writing the whole
bitmap, due to failed kmalloc
Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: helper command:
/sbin/drbdadm out-of-sync minor-0
Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: helper command:
/sbin/drbdadm out-of-sync minor-0 exit code 0 (0x0)
Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: 192 KB (48 bits) marked
out-of-sync by on disk bit-map.


7) drbdadm disconnect
Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: peer( Primary -> Unknown
) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: short read expecting
header on sock: r=-512
Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: asender terminated
Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: Terminating asender thread
Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: Connection closed
Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: conn( Disconnecting ->
StandAlone )
Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: receiver terminated
Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: Terminating receiver thread


8) drbdadm connect
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( StandAlone -> Unconnected )
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: Starting receiver thread
(from drbd0_worker [4597])
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: receiver (re)started
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( Unconnected ->
WFConnection )
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: Handshake successful:
Agreed network protocol version 94
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: Starting asender thread
(from drbd0_receiver [4781])
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: data-integrity-alg: <not-used>
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: drbd_sync_handshake:
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: self
413994C8186E5E70:0000000000000000:8E750AA871E2A456:CFB2C0B3736F0BD5
bits:48 flags:0
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: peer
7C2F5FC1C424E3E5:413994C8186E5E71:8E750AA871E2A457:CFB2C0B3736F0BD5
bits:48 flags:0
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: uuid_compare()=-1 by rule 50
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: peer( Unknown -> Primary
) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID )
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: helper command:
/sbin/drbdadm before-resync-target minor-0
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: helper command:
/sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( WFSyncUUID ->
SyncTarget ) disk( UpToDate -> Inconsistent )
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: Began resync as
SyncTarget (will sync 192 KB [48 bits set]).
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: Resync done (total 1
sec; paused 0 sec; 192 K/sec)
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( SyncTarget ->
Connected ) disk( Inconsistent -> UpToDate )
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: helper command:
/sbin/drbdadm after-resync-target minor-0
Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: helper command:
/sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)

Does the above mean that in any case it is safer to run a full resync
of the secondary when I reattach it?

Any comment if I'm misunderstanding anything?

Thanks in advance,
Gianluca



More information about the drbd-user mailing list