[DRBD-user] DRBD & network failures - lost files

Lars Ellenberg lars.ellenberg at linbit.com
Sun Dec 26 12:17:36 CET 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Dec 23, 2004 at 08:02:59PM +0200, Artis Caune wrote:
> Hello DRBD users!
> 
> Strange problem here:
> 
> We set up DRBD as shown in
> http://linuxha.trick.ca/DRBD_2fQuickStart07
> 
> And all works as expected when doing failure test with:

fine.

> BUT if we do real failure on master: files are not synced when master
> cames back...
> 
> 
> 
> # touch /mnt/ha0/test_2
> # ifconfig eth0 down; \
>    umount /mnt/ha0; \
>    drbdadm secondary all; \
>    sleep 300; \
>    ifconfig eth0 up; \
>    drbdadmin primary all; \
>    mount /mnt/ha0
> 
>          --- doing this while 'sleeping 300s' ---
> % drbdadm primary all
> % mount /mnt/ha0 && ls /mnt/ha0/test_2
> % touch /mnt/ha0/lost_file
> % umount /mnt/ha0
> % drbdadmin secondary all
> 
> and now after sleep there is no such file
> /mnt/ha0/lost_file, but nodes are
> cs:Connected st:Primary/Secondary ld:Consistent
> cs:Connected st:Secondary/Primary ld:Consistent
> 
> 
> 
> what to do?

in short: you created a artificial split brain situation.
which is not easy (if not impossible) to resolve automatically.

even if it is several years old, and much of drbd internals have change
since, please read
http://www.drbd.org/fileadmin/drbd/publications/drbd_paper_for_NLUUG_2001.pdf
6.3 meta-data
described there is what we refer to as generation counters.

try to write down the generation counters for each step you do.
you will notice that drbd has not much choice than to do exactly
what you did not expect.


> Please CC me.
ok.


	Lars




More information about the drbd-user mailing list