[DRBD-user] Setting an "outdated" disk back to "UpToDate" ?

Mon Aug 18 11:09:20 CEST 2008

Hi Lars,
thanks for this tip (and sorry for the long answer in advance). With the 
command

  drbdadm -- --overwrite-data-of-peer primary db

I'm able to bring the secondary into primary role. Now if I power on the 
former primary, I get a split brain (.. well as expected I guess). Now I 
run "drbdadm -- --discard-my-data connect all" on the secondary and the 
changes (200 MB) are synced from the primary to the secondary - 
everything works fine then. Thanks !

The test for a simulated data center burn with node recovery from the 
old data center here was like this:

- Node1 = Primary
- Node2 = Standby
- Diconnect replication link on Node1
- Heartbeat uses dopd to outdate peer disk on Node2
- Power off Node1 (hard power off and put back network cable)
- Node2 tries to take over ressources (fails, outdated disk)
- drbdadm -- --overwrite-data-of-peer primary db on Node2
- /usr/lib/heartbeat/ResourceManager takegroup drbddisk on Node2
- Write 200 MB of data on drbd device (file on mountpoint)
- Power on Node 1
- Split brain
- Try drbdadm -- --discard-my-data connect all on Node1
.. drbd syncs the 200 MB changes ...
END

When running the same test with a fresh DRBD device in Node1 after the 
reboot (=server burned down and now I bring in a replacement machine and 
restored it from tape) everything syncs as expected (full sync).

The test for a simulated data center burn without node recovery from the 
old data center here was like this:

- Node1 = Primary
- Node2 = Standby
- Diconnect replication link on Node2 (and keep the network pulled)
- Heartbeat uses dopd to outdate peer disk on Node2
- Power off Node1 (hard power off)
- Node2 tries to take over ressources (fails, outdated disk)
- drbdadm -- --overwrite-data-of-peer primary db on Node2
- /usr/lib/heartbeat/ResourceManager takegroup drbddisk on Node2
- Write 200 MB of data on drbd device (file on mountpoint)
- Power on Node 1 (and keep the network pulled)
- drbdadm down db on Node1
- drbdadm wipe-md db on Node1
- drbdadm create-md db on Node1
- drbdadm up db Node1
.. drbd syncs the ALL DATA (full sync) ...
END

Here's a short summary of this thread for the list archive:

==== Thread Summary =====
Dopd and failover
-------------------------
A administrator can use dopd to invalidate remote disks via all 
available heartbeat communication channels. Drbd currently only support 
a single IP/ interface as replication interface. To enable redundancy 
for the replication link bonding can be used, however clusters usually 
utilize a second fully indipendent link (redundant heartbeat) for 
communication. This link can be used to invalidate disks on the standby 
side in case the replication link dies. This avoids working with old data.

If you use dopd for outdating remote disks and you have a node failure 
while your standby disk is outdated you have to run

   drbdadm -- --overwrite-data-of-peer primary RESOURCE

on the standby side to enable the resource. If your former primary comes 
back with the old meta data, you will get a split brain situation. To 
solve this the standard split brain solving can be used 
(--discard-my-data connect).
=====================

Thanks again,
Robert

Lars Ellenberg schrieb:
> On Fri, Aug 15, 2008 at 05:27:02PM +0200, Robert wrote:
>   
>> Ok, found a way:
>>
>> debnode2:~# drbdadm down db
>> debnode2:~# drbdadm -- :::::1:::: set-gi db
>> previously  
>> 1E65F6C2CB7D5B5C:0000000000000000:E9ACAE2A5D5F54A8:0173BFF26274A71F:1:0:0:0:0:0
>> set GI to   
>> 1E65F6C2CB7D5B5C:0000000000000000:E9ACAE2A5D5F54A8:0173BFF26274A71F:1:1:0:0:0:0
>>
>> Write new GI to disk?
>> [need to type 'yes' to confirm] yes
>>
>> debnode2:~# drbdadm up db
>> debnode2:~# /usr/lib/heartbeat/ResourceManager takegroup drbddisk
>>
>> Why not add a simple "drbdadm uptodate" to drbdadm to issue the  
>> corresponding commands ?
>>     
>
> drbdadm -- --overwrite-data-of-peer primary db
>
> is expected to to that for you.
>
>