[DRBD-user] Re: DRDB doing full resync on configuration with two masters

Tue Apr 1 16:24:44 CEST 2008

posting to drbd-user is subscribers-only.
but I'll answer this anyways :-)

On Tue, Apr 01, 2008 at 11:42:19AM +0200, Rafał Krypa wrote:
> Hi.
> I'm expecting some problems with DRBD and would like to ask you for
> advice.
> My configuration is somehow exotic. I've configured one of the nodes
> (let's call it slave) to setup DRBD device in initrd and use it as root
> device, mounted read only. The second node (let's call it master)
> usually does not mount the DRBD device at all. I've set
> 'allow-two-primaries' and 'become-primary-on both' in config file,
> despite using ReiserFS file system on top of it, because the purpose of
> this setup is to maintain the file system on the central server (master
> node), keeping it up to date on the slave and allowing the slave to
> start even when the master is down (I think of it as kind of nfsroot
> with local cache).
> I am aware that if I mount and change the file system on master node, it
> will become inconsistent with caches on the slave, but that's fine, I
> can accept rebooting the slave machine after making updates, as it won't
> happen often.

this is a good recipe for complete data destruction.  do NOT under NO
circumstances do "two-primary" with a conventional file system.
really.
don't.

if you insist on using DRBD as sort of netboot image, and you want to
update it on your netboot server, shut down the netboot client, or
disconnect before you mount the image on the netboot server for update.
everything else is a sure way to corruption.

> However this configuration is giving me unexpected problems. When I
> reboot the slave node, during it's startup sequence, almost whole device
> is resynchronized from the master. With DRBD running on top of 1GB LV
> with 'flexible-meta-disk  internal' it wants to synchronize 708608 KB,
> even though the data was not altered at all at neither node.
> The messages from DRBD, as seen on the master node, are as follows:

why don't you use an established way to net-boot a "live" image?

there are plenty working projects. at least have a look at
 LTSP http://www.ltsp.org/
or the (unfortunately not really forthcoming)
 http://fedoraproject.org/wiki/StatelessLinux/CachedClient
also, this may be interessting:
 http://w3.linux-magazine.com/issue/70/Network_Block_Devices.pdf

just use the right tool for the job.
DRBD based netboot clients is probably a bad idea.
even if you'd get it working properly,
and deal correctly with the implications of coherence,
it would still not scale to multiple clients.

> ### Now the slave node went down
> drbd1: PingAck did not arrive in time.
> drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure )

hard reboot of drbd netboot client.

> ### Now the slave node set up it's DRBD device during initrd sequence
> drbd1: Handshake successful: DRBD Network Protocol version 86
> drbd1: conn( WFConnection -> WFReportParams ) 
> drbd1: Starting asender thread (from drbd1_receiver [4881])
> drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
> pdsk( DUnknown -> UpToDate ) 
> drbd1: Writing meta data super block now.
> drbd1: peer( Secondary -> Primary ) 
> drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) 
> drbd1: Began resync as SyncSource (will sync 708608 KB [177152 bits
> set]).
> drbd1: Writing meta data super block now.
> drbd1: local disk flush failed with status -5
> drbd1: Resync done (total 104 sec; paused 0 sec; 6812 K/sec)
> drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
> drbd1: Writing meta data super block now.
>
> I would like to avoid this costly synchronization, but don't know how.
> Is it caused by the slave node not stopping it's DRBD resource gently
> while going down

exactly. the way you are doing this,
the netboot client is a crashed primary,
and needs to do the crashed primary recovery.

> (it cannot, as /dev/drbd0 is still being used as it's
> root device)?

well. you could have /bin /sbin and whatever else is needed
copied into some ramdisk, before you mounted 

> But why so many bits are set if the device is not even
> mounted read/write on any node?
> I am attaching full configuration file.

you could have at least stripped the garbage and comments.
but since you left them in, I'll just quote them back:

>     # Configures the size of the active set. Each extent is 4M,
>     # 257 Extents ~> 1GB active set size. In case your syncer
        ^^^^^^^^^^^^^^^^^^
>     # runs @ 10MB/sec, all resync after a primary's crash will last
>     # 1GB / ( 10MB/sec ) ~ 102 seconds ~ One Minute and 42 Seconds.
>     # BTW, the hash algorithm works best if the number of al-extents
>     # is prime. (To test the worst case performace use a power of 2)
>     al-extents 257;
>   }

for more information about what exactly the "al-extents" are,
see http://www.drbd.org/users-guide/s-activity-log.html
or http://www.drbd.org/fileadmin/drbd/publications/drbd8.linux-conf.eu.2007.pdf

enough homework for today?
 ;)

-- 
: Lars Ellenberg                           http://www.linbit.com :
: DRBD/HA support and consulting             sales at linbit.com :
: LINBIT Information Technologies GmbH      Tel +43-1-8178292-0  :
: Vivenotgasse 48, A-1120 Vienna/Europe     Fax +43-1-8178292-82 :
__
please use the "List-Reply" function of your email client.