[DRBD-user] Help, cannot get drbd processes to connect between two nodes

Thu May 3 22:08:17 CEST 2007

Note, if I change the order in which I execute bringing up drbd (drbdadm
down on both nodes, then bring up node2 first), then the cs status is
reversed (node1 is standalone and node2 is wfconnection).

Doug
WSI, Inc.
On Thu, 2007-05-03 at 16:03 -0400, Doug Knight wrote:

> I'm not sure where to start on this one. I've been working with drbd
> and heartbeat, trying to track down an issue where one of the two
> nodes doesn't fail over resources correctly when heartbeat is
> shutdown. I uncovered that at some point drbd stopped talking across
> my dedicated network link, and even manually I cannot get the two
> nodes to see each other through drbd. Pings across the network link
> work fine in both directions. I have completely unloaded and reloaded
> the drbd modules from the kernel, which had corrected this issue the
> last time I saw it, but it didn't correct it this time. I've rebooted
> one of the nodes, but I'm not in a position where I can reboot the
> other yet (other activity on the other node requires scheduling the
> reboot). Can someone point me down a troubleshooting road to determine
> why drbd doesn't reconnect? Here's how the /proc/drbd files look after
> I've done the usual (modprobe drbd; service drbd start;) set of
> commands:
> 
> Node1
> [root at arc-dknightlx ~]# modprobe drbd
> [root at arc-dknightlx ~]# service drbd start
> Starting DRBD resources:    [ d0 s0 n0 ].
> ..........
> ***************************************************************
> DRBD's startup script waits for the peer node(s) to appear.
> - In case this node was already a degraded cluster before the
>    reboot the timeout is 60 seconds. [degr-wfc-timeout]
> - If the peer was available before the reboot the timeout will
>    expire after 0 seconds. [wfc-timeout]
>    (These values are for resource 'pgsql'; 0 sec -> wait forever)
> To abort waiting enter 'yes' [  12]:yes
> 
> [root at arc-dknightlx ~]# cat /proc/drbd
> version: 8.0.1 (api:86/proto:86)
> SVN Revision: 2784 build by root at arc-dknightlx, 2007-04-23 13:19:33
> 0: cs:WFConnection st:Secondary/Unknown ds:UpToDate/DUnknown C r---
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
>         act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0
> changed:0
> 
> 
> Node2
> [root at arc-tkincaidlx log]# modprobe drbd
> [root at arc-tkincaidlx log]# service drbd start
> Starting DRBD resources:    [ d0 s0 n0 ].
> ..........
> ***************************************************************
> DRBD's startup script waits for the peer node(s) to appear.
> - In case this node was already a degraded cluster before the
>    reboot the timeout is 60 seconds. [degr-wfc-timeout]
> - If the peer was available before the reboot the timeout will
>    expire after 0 seconds. [wfc-timeout]
>    (These values are for resource 'pgsql'; 0 sec -> wait forever)
> To abort waiting enter 'yes' [  12]:yes
> 
> [root at arc-tkincaidlx log]# cat /proc/drbd
> version: 8.0.1 (api:86/proto:86)
> SVN Revision: 2784 build by root at arc-tkincaidlx.wsicorp.com,
> 2007-04-23 13:20:47
> 0: cs:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown   r---
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
>         resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
>         act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0
> changed:0
> 
> Any help would be greatly appreciated. 
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070503/e1dce25e/attachment.htm>