Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
hello, using drbd 8.3.8 in CentOS 5.6 as provided by extra repo. testing/simulating condition when secondary is taken down and some of its data are modified before restart it... (or you want to verify that if any problem/bug have modified them, you come back to safety when you restart the secondary) servers are drbdsrv1 and drbdsrv2, in Primary/Secondary with drbdsrv1 as Primary with 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- drbd r0 on /dev/vdb1 for both nodes and then ext3 file system on top of drbd0 1) [root at drbdsrv1 ~]# mount /dev/drbd0 /mnt/ [root at drbdsrv1 ~]# dd if=/dev/zero bs=/mnt/testfile bs=1024k count=2048 Secondary receives 2Gb on the wire and remains updated 2) keep down secondary [root at drbdsrv2 ~]# service drbd stop 3) mount backing device on secondary [root at drbdsrv2 ~]# mount /dev/vdb1 /mnt [root at drbdsrv2 /]# ll /mnt total 2099220 drwx------ 2 root root 16384 Jun 21 12:37 lost+found -rw-r--r-- 1 root root 2147483648 Jun 21 12:40 testfile 4) remove testfile on secondary [root at drbdsrv2 /]# rm /mnt/testfile rm: remove regular file `/mnt/testfile'? y [root at drbdsrv2 /]# ll /mnt total 16 drwx------ 2 root root 16384 Jun 21 12:37 lost+found 5) restart drbd on secondary [root at drbdsrv2 /]# service drbd start Starting DRBD resources: [ r0 Found valid meta data in the expected location, 4294914048 bytes into /dev/vdb1. d(r0) s(r0) n(r0) ]. Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: Handshake successful: Agreed network protocol version 94 Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: Starting asender thread (from drbd0_receiver [2240]) Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: data-integrity-alg: <not-used> Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: drbd_sync_handshake: Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: self 4D7B0F99C46751ED:3F596D601C18DA13:97CF1359A37B373C:598B9390C1D0D38D bits:0 flags:0 Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: peer 3F596D601C18DA12:0000000000000000:97CF1359A37B373C:598B9390C1D0D38D bits:0 flags:0 Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: uuid_compare()=1 by rule 70 Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: Began resync as SyncSource (will sync 0 KB [0 bits set]). Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Jun 21 13:04:24 drbdsrv1 kernel: block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) But if I issue 2) and 3) again on secondary there is no testfile at all. Is this expected behaviour and no check at all during connection? Any config parameter to force a validation? To get the file back and consistency I can put verify-alg in drbd.conf and adjust on both after 5) on secondary 6) drbdadm verify r0 Jun 21 17:26:31 drbdsrv2 kernel: block drbd0: conn( Connected -> VerifyS ) Jun 21 17:26:31 drbdsrv2 kernel: block drbd0: Starting Online Verify from sector 0 Jun 21 17:26:31 drbdsrv2 kernel: block drbd0: Out of sync: start=0, size=16 (sectors) Jun 21 17:26:31 drbdsrv2 kernel: block drbd0: Out of sync: start=2056, size=24 (sectors) ... Jun 21 17:29:57 drbdsrv2 kernel: block drbd0: Out of sync: start=4194304, size=8 (sectors) Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: Online verify done (total 410 sec; paused 0 sec; 10228 K/sec) Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: Online verify found 48 4k block out of sync! Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: conn( VerifyS -> Connected ) Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: Writing the whole bitmap, due to failed kmalloc Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: helper command: /sbin/drbdadm out-of-sync minor-0 Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: helper command: /sbin/drbdadm out-of-sync minor-0 exit code 0 (0x0) Jun 21 17:33:22 drbdsrv2 kernel: block drbd0: 192 KB (48 bits) marked out-of-sync by on disk bit-map. 7) drbdadm disconnect Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: short read expecting header on sock: r=-512 Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: asender terminated Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: Terminating asender thread Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: Connection closed Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: conn( Disconnecting -> StandAlone ) Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: receiver terminated Jun 21 17:33:48 drbdsrv2 kernel: block drbd0: Terminating receiver thread 8) drbdadm connect Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( StandAlone -> Unconnected ) Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: Starting receiver thread (from drbd0_worker [4597]) Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: receiver (re)started Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( Unconnected -> WFConnection ) Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: Handshake successful: Agreed network protocol version 94 Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( WFConnection -> WFReportParams ) Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4781]) Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: data-integrity-alg: <not-used> Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: drbd_sync_handshake: Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: self 413994C8186E5E70:0000000000000000:8E750AA871E2A456:CFB2C0B3736F0BD5 bits:48 flags:0 Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: peer 7C2F5FC1C424E3E5:413994C8186E5E71:8E750AA871E2A457:CFB2C0B3736F0BD5 bits:48 flags:0 Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: uuid_compare()=-1 by rule 50 Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID ) Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: Began resync as SyncTarget (will sync 192 KB [48 bits set]). Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: Resync done (total 1 sec; paused 0 sec; 192 K/sec) Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 Jun 21 17:33:52 drbdsrv2 kernel: block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) Does the above mean that in any case it is safer to run a full resync of the secondary when I reattach it? Any comment if I'm misunderstanding anything? Thanks in advance, Gianluca