[DRBD-user] QuickSync not happening on secondary reboot

Thu Jul 1 11:08:52 CEST 2004

Hi list,
i have two nodes with SuSE 8.0 setup to replicate 64GB of data through
drbd.
On top of /dev/nb0 I have ext3 with blocksize=1024 (mke2fs -j -b 1024
/dev/nb0).
I serve a Pervasive 8.1 db and samba from that disk.

The versions are as follows:
drbd: 0.6.12
heartbeat: 1.0.3
Linux: (uname -a) Linux nodeb 2.4.18-64GB-SMP #1 SMP Wed Mar 27 13:58:12
UTC 2002 i686 unknown

The two nodes are called nodea and nodeb. Nodea is primary and nodeb is
secondary. They replicate through a gigabit ethernet crossover.

If I understand how drbd works if nodea is primary and I shutdown nodeb
as soon as nodeb boots again and provided that nodea was not rebooted in
the meanwhile a quick sync should happen.

This is not happening to me.
What I do is as follows:
1) boot both nodes, eventually nodea becomes primary and mounts /dev/nb0
2) nodeb is secondary
3) shutdown nodeb
4) copy data on drbd device or run some query on db. If I ls /pervasive
(mount point of /dev/nb0) I see the files are there and they are ok
(using md5 checksumming)
5) boot nodeb again
6) nodeb doesn not quicksync
7) check that the nodes are not syncing by looking at /proc/drbd
8) make nodeb primary (run heartbeat restart on nodea)
9) ls /pervasive yelds umpredictable results: from fs corruption to
missing files (usually missing files) 

I tought: maybe it is the fs cache, so i put a sync command in
background to run every 5 mins and in haresources, but I still don't get
the QuickSync to happen.

On the contrary, If I do not reboot the secondary, but only cause a
failover by running /etc/init.d/heartbeat restart on primary all data
gets migrated just fine.

This happens consistently only when I reboot the secondary.
Configuration and dmesg log are attached.

What am I doing wrong?
Many thanks in advance,
Umberto
-------------- next part --------------
resource drbd0 {
  protocol=C
  fsckcmd=/bin/true

  disk {
     disk-size=65776188
       do-panic
  }
  net {
       tl-size = 5000
       sndbuf-size = 1280
       sync-rate=160M # bytes/sec
       timeout=60
       connect-int=10
       ping-int=10
  }
  on nodea {
       device=/dev/nb0
       disk=/dev/sda6
       address=192.168.1.1
       port=7789
  }
  on nodeb {
       device=/dev/nb0
       disk=/dev/sda6
       address=192.168.1.2
       port=7789
  }
}
-------------- next part --------------
following is dmesg log of nodea seeing nodeb rebbot

<-- boot of nodea

drbd: initialised. Version: 0.6.12 (api:64/proto:62)
drbd0: Creating state file
"/var/lib/drbd/drbd0"
bcm5700: eth0 NIC Link is Down
bcm5700: eth0 NIC Link is Up, 1000 Mbps full duplex
drbd0: Connection established. size=65776188 KB / blksize=4096 B

<-- nodeb is up too

isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
IPv6 v0.8 for NET4.0
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present
get_hw_addr uses obsolete (PF_INET,SOCK_PACKET)
eth1: no IPv6 routers present
Journalled Block Device driver loaded
drbd0: blksize=1024 B

<-- hertbeat ran datadisk start

kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.17, 10 Jan 2002 on drbd(43,0), internal journal
EXT3-fs: mounted filesystem with ordered data mode.

<-- here I shutdown nodeb manually

drbd0: Connection lost.
bcm5700: eth0 NIC Link is Down
bcm5700: eth0 NIC Link is Up, 100 Mbps full duplex
bcm5700: eth0 NIC Link is Down
bcm5700: eth0 NIC Link is Up, 1000 Mbps full duplex
bcm5700: eth0 NIC Link is Down
bcm5700: eth0 NIC Link is Up, 1000 Mbps full duplex
bcm5700: eth0 NIC Link is Down

<--- network is eventually up again, the two nodes reconnect to each other

bcm5700: eth0 NIC Link is Up, 1000 Mbps full duplex
drbd0: Connection established. size=65776188 KB / blksize=1024 B

<-- this is strange: 0 blks???

drbd0: Synchronisation started blks=0 
drbd0: Synchronisation done.
drbd0: blksize=1024 B
-------------- next part --------------
nodea 192.168.100.20 sync datadisk::drbd0 sync smb psql

#
# Sync is my homegrown script. It just calls sync.
#