[DRBD-user] fs errors on replacing secondary's disk

Tue Jun 15 18:24:18 CEST 2004

/ 2004-06-15 12:08:37 -0400
\ george young:
> [drbd 0.6.10, SuSE 8.3 x86 linux, 2.4.20, lsi megaraid scsi 320-2]
> nodes are "pig-app" and "pig-db".
> 
> One of the disks on node "pig-db" failed, (RAID 5, so no crash), so I
> forced pig-db to secondary for both drbd file systems and rebooted pig-db without
> drbd.  After replacing the disk, and wiping and reconfiguring the RAID
> box(don't ask why...), I need to bring the system back up as drbd secondary.
> 
> Starting drbd seems fine:
> May  9 15:58:21 pig-db drbd: modprobe -s drbd minor_count=2
> May  9 15:58:21 pig-db kernel: drbd: initialised. Version: 0.6.10 (api:64/proto:62)
> May  9 15:58:21 pig-db drbd: drbdsetup /dev/nb0 disk /dev/sdc1 --disk-size=36708492
> May  9 15:58:21 pig-db drbd: drbdsetup /dev/nb0 net 10.0.0.114:7788 10.0.0.115:7788 C --sync-nice=-15 --sync-min=10M --sync-max=100M --tl-size=5000 --timeout=60 --connect-int=10 --ping-int=10 --sync-group=1
> May  9 15:58:21 pig-db drbd: drbdsetup /dev/nb1 disk /dev/sdb1 --disk-size=1060290k
> May  9 15:58:21 pig-db drbd: drbdsetup /dev/nb1 net 10.0.0.114:7789 10.0.0.115:7789 C --sync-nice=-15 --sync-min=10M --sync-max=100M --tl-size=5000 --timeout=60 --connect-int=10 --ping-int=10 --sync-group=2
> May  9 15:58:21 pig-db drbd: drbdsetup /dev/nb0 wait_connect -t 0
> May  9 15:58:21 pig-db drbd: drbdsetup /dev/nb1 wait_connect -t 0
> May  9 15:58:21 pig-db kernel: drbd0: Connection established. size=36708492 KB / blksize=4096 B
> May  9 15:58:21 pig-db kernel: drbd1: Connection established. size=1060290 KB / blksize=4096 B
> May  9 15:58:21 pig-db drbd: 'drbd_home' SyncingQuick, waiting for this to finish
> May  9 15:58:21 pig-db drbd: drbdsetup /dev/nb0 wait_sync
> May  9 15:58:21 pig-db drbd: 'drbd_db' SyncingQuick, waiting for this to finish
> May  9 15:58:21 pig-db drbd: drbdsetup /dev/nb1 wait_sync
> May  9 15:59:42 pig-db drbd: 'drbd_home' SyncingQuick finished, issue drbdsetup /dev/nb0 secondary_remote
> May  9 16:00:14 pig-db drbd: 'drbd_db' SyncingQuick finished, issue drbdsetup /dev/nb1 secondary_remo

SyncingQuick assumes you have "almost" up-to-date data on the lower
level disk. you actually have a blank disk. drbd has no way to know that,
so you must tell it...

> I do "datadisk drbd_home start", which switches drbd_home partition to primary and mounts it:
> May  9 17:38:46 pig-db datadisk: ===> datadisk drbd_db start <===
> May  9 17:38:46 pig-db datadisk: drbdsetup /dev/nb1 primary
> May  9 17:38:46 pig-db datadisk: /bin/true /dev/nb1
> May  9 17:38:46 pig-db datadisk: mount -v /dev/nb1
> May  9 17:38:46 pig-db kernel: drbd1: blksize=1024 B
> May  9 17:38:46 pig-db kernel: drbd1: blksize=4096 B
> May  9 17:38:46 pig-db kernel: reiserfs: found format "3.6" with standard journal
> May  9 17:38:46 pig-db kernel: reiserfs: enabling write barrier flush mode
> May  9 17:38:46 pig-db kernel: reiserfs: using ordered data mode
> May  9 17:38:46 pig-db kernel: reiserfs: checking transaction log (drbd(43,1)) for (drbd(43,1))
> May  9 17:38:46 pig-db kernel: Using r5 hash to sort names
> May  9 17:38:46 pig-db datadisk: 'drbd_db' activated
> 
> But I get a stream of kernel errors, like:
> 
>   is_tree_node: node level 0 does not match to the expected one 1
>   vs-5150: search_by_key: invalid format found in block 8831. Fsck?
>   zam-7001: io error in reiserfs_find_entry
> 
> I double checked the device sizes (that has bitten me before).  Is there
> some special initialization I have to do with a blank new disk device?

either (now) do drbdsetup /dev/nbX replicate (on the Secondary!,
out-of-date  node! not on the other...)
or (before you "drbd start" it) do rm /var/lib/drbd/*

( in case this is found in some archive later: this was for 0.6.x
  procedure for 0.7 is different. )

	lge