[DRBD-user] drbd device not ready on reboot

Tue Jan 27 07:10:20 CET 2009

On Tue, Jan 27, 2009 at 7:06 AM, Michael Grant <mgrant at grant.org> wrote:
> On Fri, Jan 23, 2009 at 5:26 PM, Lars Ellenberg
> <lars.ellenberg at linbit.com> wrote:
>> On Fri, Jan 23, 2009 at 02:46:10PM +0100, Michael Grant wrote:
>>> I'm in the process of building my cluster.  I have one node up named
>>> a.example.org.  However, when I boot this machine (a), the boot
>>> process blocks with this message:
>>>
>>>  DRBD's startup script waits for the peer node(s) to appear.
>>>  - In case this node was already a degraded cluster before the
>>>    reboot the timeout is 120 seconds. [degr-wfc-timeout]
>>>  - If the peer was available before the reboot the timeout will
>>>    expire after 0 seconds. [wfc-timeout]
>>>    (These values are for resource 'vm1-root'; 0 sec -> wait forever)
>>>  To abort waiting enter 'yes' [  340]:
>>>
>>> In my case, the peer (I assume b is the peer) has never been
>>> available.  Therefore, this node should have been a degraded cluster
>>> before the reboot, hence, the timeout should be 120 seconds, but the
>>> counter keeps on ticking and ticking, so it seems there's a problem
>>> here.
>>>
>>> Did I do something wrong?  How do I convince a that it is a degraded
>>> cluster and to come up with my drbd devices ready?
>>
>> misconception about when degr-wfc-timeout is used.
>> please read
>> http://thread.gmane.org/gmane.linux.network.drbd/15849/focus=15854
>
> Thank you. Ok, I see the logic of this behavior now.  Until I have my
> two nodes fully online for the first time I need to temporarily set
> degr-wfc-timeout to 0.
>
> Now, for the moment, when I type 'yes' to abort waiting and the
> machine fully comes up, when I try to mount one of these
> never-yet-degraded resources, I get this error:
>
>    mount: block device /dev/drbd0 is write-protected, mounting read-only
>    mount: Wrong medium type
>
> I find that I cannot mount this resource until I again do:
>
>    drbdadm -- --overwrite-data-of-peer primary vm2-root
>
> Why do I need to do this each time I boot?  Is there some way to avoid
> this until I get my other half of the resource up?
>
> Michael Grant
>

This might provide some more info, after a reboot it looks like this:

[#838] cat /proc/drbd
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by
phil at fat-tyre, 2008-11-12 16:40:33
 0: cs:WFConnection st:Secondary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:68 lo:0 pe:0 ua:0 ap:0
        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
 1: cs:WFConnection st:Secondary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:43 lo:0 pe:0 ua:0 ap:0
        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

[#839] drbdadm -- --overwrite-data-of-peer primary all

[#840] cat /proc/drbd
version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by
phil at fat-tyre, 2008-11-12 16:40:33
 0: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:68 lo:0 pe:0 ua:0 ap:0
        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0
 1: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:43 lo:0 pe:0 ua:0 ap:0
        resync: used:0/61 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0