[DRBD-user] Lost whole disk data while sync

Lars Ellenberg Lars.Ellenberg at linbit.com
Mon Sep 12 09:55:40 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


please subscribe.  currently I have to explicitly let your messages
through by hand.  costs time.  please don't send html.

thanks.

> > > I have installed DRBD-0.7.10 on AS3, ha1 & ha2.
> > 
> > you want to use 0.7.13, in case you have SMP.
> > 
> > > ha1 is the primary and there are many files in it.
> > > ha2 is the secondary, it will be replicated after I start up DRBD.
> > >
> > > I have checked that ha1 is primary and ha2 is secondary before I
> > > connected the cross link cable between them. All files in ha1 are
> > > gone after I connected this cable. I don't know what is going on.
> > > This isn't the first time I installed DRBD. I have done before and
> > > they were OK then.
> > >
> > > Which commands I can use to force data sync from ha1 to ha2 while these
> > > two ha1 & ha2 are connected firstly, and that can prevent me from
> > > this mistake?
> > 
> > what do you mean by "gone"? where exactly have "all files" been before?
> 
> 
> DRBD is a very good software for HA, and replicated data automatically. 
> I just recover host and DRBD will take care of HA, and I don't worry 
> anything.
> 
> To initialize DRBD first time, it will automatically sync data from primary 
> to 
> secondary. But sometimes it sync data from secondary to primary, 
> if I made some procedures wrong while starting DRBD. 

drbd will _never_ sync to a node that is in primary state.
it is possible to make a synctarget primary.
but it is not possible to make a primary sync target.

don't confuse what you call the boxes, and which state they are in.

> I have installed DRBD for many times, and it works well. I knew I did 
> something wrong this time. Does DRBD provide some commands to sync 
> data from primary to secondary manually? and make sure data to be 
> replicated first to prevent mistakes done by person. 

why not read the manpage?
drbdadm invalidate

> ha1: /dev/sdb1 : there are many files in it before started DRBD.
> ha2: /dev/sda4 : did 'mkfs -t ext3 /dev/sda4' before started DRBD.

I really hope you use external meta data.
even then, you should not use the lower level devices.
so, if you want to create a file system, you "mkfs /dev/drbd0"
(obviously this is only possible when Primary).

on the other node (the gonna-be sync-target), all effects of anything
are overwritten by the synchronization. so it is nonsense.

> I started both DRBD, and the DRBD sync data from ha2 /dev/sda4 to 
> ha1 /dev/sdb1, and clean /dev/sda4 of ha1. I want to sync from ha1 to ha2.
>  
> > where is the drbd mounted?
> 
> 
> on ha1 {
> device /dev/drbd0;
> disk /dev/sdb1;
> address 192.168.0.192:7788;
> meta-disk internal;
> }

this has nothing to do with the question (mount point?).
but it is interessting, nonetheless.
to quote from the  example drbd.conf:

# NOTE that if you do not have some dedicated partition to use for
# the meta-data, you may use 'internal' meta-data.
#
#	THIS HOWEVER WILL DESTROY THE LAST 128M
#	OF THE LOWER LEVEL DEVICE.
#
# So you better make sure you shrink the filesystem by 128M FIRST!
# or by 132M just to be sure... :)

do you notice something?

> > do you have heartbeat configured?
> > did your actions interfere with heartbeat actions?
> 
> 
> I have heartbeat then.
> ---------------------- haresources 
> --------------------------------------------
> ha1 drbddisk::r0 Filesystem::/dev/drbd0::/raid::ext3
> 210.59.22.14
> haj

so you told heartbeat to mount /dev/drbd0 on /raid
btw, you probably should list the ip first on the resource line.

> > please provide some log messages from that event.
> 
> Sep 10 14:19:56 ha1 kernel: drbd: initialised. Version: 0.7.10(api:77/proto:74)
> Sep 10 14:19:56 ha1 kernel: drbd: SVN Revision: 1743 build by root at baumeister, 2005-01-31 13:08:05
> Sep 10 14:19:56 ha1 kernel: drbd: registered as block device major 147
> Sep 10 14:19:56 ha1 kernel: drbd0: Creating state block
> drbd0: resync bitmap: bits=137184413 words=4287014
> drbd0: size = 523 GB (548737652 KB)
> drbd0: Assuming that all blocks are out of sync (aka FullSync)
> drbd0: 548737652 KB now marked out-of-sync by on disk bit-map.
> drbd0: drbdsetup [2809]: cstate Unconfigured --> StandAlone
> drbd0: drbdsetup [2844]: cstate StandAlone --> Unconnected
> drbd0: drbd0_receiver [2845]: cstate Unconnected --> WFConnection
> drbd0: Secondary/Unknown --> Primary/Unknown
> drbd0: Primary/Unknown --> Secondary/Unknown
> drbd0: Secondary/Unknown --> Primary/Unknown
> 
> Sep 10 14:24:58
>  ha1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
> drbd0: Connection established.
> drbd0: I am(P): 1:00000002:00000001:00000001:00000002:10
> drbd0: Peer(S): 1:00000006:00000004:00000027:00000006:01
> drbd0: Current Primary shall become sync TARGET!  Aborting to prevent data corruption.

ok. I guess you understand plain english.

> drbd0: drbd0_receiver [2845]: cstate WFReportParams --> StandAlone
> drbd0: error receiving ReportParams, l: 72!
> drbd0: worker terminated
> drbd0: asender terminated
> drbd0: drbd0_receiver [2845]: cstate StandAlone --> StandAlone
> drbd0: Connection lost.
> drbd0: receiver terminated


next try.


> Sep 10 14:32:50 ha1 drbd: Starting DRBD resources: 
> Sep 10 14:32:50 ha1 kernel: drbd: initialised. Version: 0.7.10(api:77/proto:74)
> Sep 10 14:32:50 ha1 kernel: drbd: SVN Revision: 1743 build by root at baumeister, 2005-01-31 13:08:05
> Sep 10 14:32:50 ha1 kernel: drbd: registered as block device major 147
> Sep 10 14:32:50 ha1 kernel: drbd0: resync bitmap: bits=137184413 words=4287014
> drbd0: size = 523 GB (548737652 KB)
> drbd0: 523 GB marked out-of-sync by on disk bit-map.
> drbd0: Found 6 transactions (324 active extents) in activity log.
> drbd0: drbdsetup [2573]: cstate Unconfigured --> StandAlone
> drbd0: drbdsetup [2586]: cstate StandAlone --> Unconnected
> drbd0: drbd0_receiver [2587]: cstate Unconnected --> WFConnection
> drbd0: drbd0_receiver [2587]: cstate WFConnection --> WFReportParams
> drbd0: Handshake successful: DRBD Network Protocol version 74
> drbd0: Connection established.
> drbd0: I am(S): 1:00000002:00000001:00000002:00000002:00
> drbd0: Peer(S): 1:00000006:00000004:00000027:00000006:00

ok, this time it is secondary/secondary,
so you just now forced it to sync your fresh made empty file system over
to the previously populated one.
your bad.

> Sep 10 14:33:03 ha1 kernel: drbd0: drbd0_receiver [2587]: cstate WFReportParams --> WFBitMapT
> Sep 10 14:33:03 ha1 kernel: drbd0: Secondary/Unknown --> Secondary/Secondary
> Sep 10 14:33:03 ha1 rc: Starting drbd: succeeded
> Sep 10 14:33:04 ha1 kernel: drbd0: drbd0_receiver [2587]: cstate WFBitMapT --> SyncTarget
> Sep 10 14:33:04 ha1 kernel: drbd0: Resync started as SyncTarget (need to sync 548737652 KB [137184413 bits set]).
> Sep 10 14:33:32 ha1 kernel: drbd0: Secondary/Secondary --> Primary/Secondary 

too late, you made the already synctarget primary,
thus you are accessing the remote data.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list