[DRBD-user] drbd-0.7.10 data loss with protocol 'C'

Wed Jun 1 12:07:57 CEST 2005

Hi drbd user,

I'm running two machines with SLES9 (kernel-bigsmp-2.6.5-7.151).

A few days ago I set up drbd (drbd-0.7.10) and heartbeat 
(heartbeat-1.2.3-2.3) to form a HA cluster.

The first manual trials were very promising!!!

But after implemting the heartbeat  functionality I ran into trouble.
I decided to use 'protocol C' to reduce the risk of data loss.
My drbd datadisks is a logical volume (500MB, reiserfs) mounted as /andre

Okay, here my first succesfull try

pdxjs:/ # drbdadm state all
Primary/Secondary

pdxjs:/ # ls -l /andre
total 1
drwxr-xr-x   4 root root  80 Jun  1 11:12 .
drwxr-xr-x  23 root root 592 Jun  1 10:17 ..

pdxjs:/ # cp /boot/vmlinuz /andre/alinuxkernel; rcheartbeat stop
Stopping High-Availability services                                 done
pdxjs:/ #

As expected my 'packet' will be switched to the other node

pdxbs:~ # drbdadm state all
Primary/Secondary

pdxbs:/ # ls -l /andre
total 1626
drwxr-xr-x   4 root root     112 Jun  1 11:38 .
drwxr-xr-x  22 root root     544 Jun  1 11:35 ..
-rw-r--r--   1 root root 1661938 Jun  1 11:38 alinuxkernel

Fine, perfectly, after switching back next try

pdxjs:/ # cp -R /etc/ /andre; cp /boot/vmlinuz /andre/secondkern; 
ifconfig bond0 down

As expected now pdxbs runs the 'packet'...

pdxbs:/ # drbdadm state all
Primary/Unknown

split-brain problem, the formerly primary node was powered down by heartbeat

UUPSSSSS

pdxbs:/ # ls /andre
.  ..  alinuxkernel
pdxbs:/ #

WHERE ARE MY DATA???

If understand the manual right, I would expect that my cp command first 
returns if all blocks are commited by the second node.

Here comes drbd.conf

resource r0 {
     protocol               C;
     incon-degr-cmd       "echo '!DRBD! pri on incon-degr' | wall ; 
sleep 60 ; ha
lt -f";
     on pdxbs {
         device           /dev/drbd0;
         disk             /dev/vg00/lvol7;
         address          149.221.5.28:7788;
         meta-disk        internal;
     }
     on pdxjs {
         device           /dev/drbd0;
         disk             /dev/vg00/lvol7;
         address          149.221.5.23:7788;
         meta-disk        internal;
     }
     disk {
         on-io-error      detach;
     }
     syncer {
         rate             10M;
         group              1;
         al-extents       257;
     }
     startup {
         degr-wfc-timeout 120;
     }
}

some idea???

Best regards

Andre