[DRBD-user] Recover node after hard-crash (drbd9)

Robert Altnoeder robert.altnoeder at linbit.com
Tue Jun 30 11:20:06 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello Christiaan,

thank you for your detailed report.

On a test cluster, I was able to reproduce the situation where node A 
stays "Inconsistent" after resyncing from node B, and node C refuses to 
connect to node A. This seems to be a generic problem, not related to 
the hard crash of node A, but related to the fact that node A is "weakly 
connected" at the time of the resync ("weak" meaning that it is not 
directly connected to the "Primary", which would be impossible in this 
case, because the primary node is a diskless client).

We are currently investigating this issue.

By the way, recovery of a failed resource (such as resolving a split 
brain) can be done directly through the drbd-utils without having to 
perform additional actions in drbdmanage, so your use of drbdmanage and 
the drbd-utils in the logs you posted appears to be correct.

Best regards,
-- 
: Robert Altnoeder
: LINBIT | Your Way to High Availability
:
: http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT.


On 06/29/2015 09:26 PM, Christiaan den Besten wrote:
> Hi !
>
> Just posted this on the drbd-dev list, but I think it might be better for drbd-users to share with others:
>
> I just started testing the (official) drbd9 modules and tools on thee CentOS 7 VMx (running on Xen Hypervisor) in PV mode.
>
> My DRBD ‘cluster’ has 3 nodes :
>
> cluster1-a.storage.as41887.net
> cluster1-b.storage.as41887.net
> cluster1-c.storage.as41887.net
>
> [...snip...]
>
> [root at cluster1-a ~]# drbdadm status | grep 'testvm_prolocation_net' -A 6
> testvm_prolocation_net role:Secondary
>   disk:Inconsistent
>   cluster1-b.storage.as41887.net role:Secondary
>     replication:SyncTarget peer-disk:UpToDate done:91.61
>   cluster1-c.storage.as41887.net connection:Connecting
>
> [...snip...]
>
> Above steps where a replay I did today after having seen this behaviour last night. So it seems to be reproducible. Also wondering if the ‘recover’ steps should be done though drbdmanage as well? Can’t seem to find any complete documentation on drbdmanage besides the man-page.
>
> Yours,
> Chris
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user




More information about the drbd-user mailing list