[DRBD-user] Help, cannot get drbd processes to connect between two nodes

Thu May 3 22:03:09 CEST 2007

I'm not sure where to start on this one. I've been working with drbd and
heartbeat, trying to track down an issue where one of the two nodes
doesn't fail over resources correctly when heartbeat is shutdown. I
uncovered that at some point drbd stopped talking across my dedicated
network link, and even manually I cannot get the two nodes to see each
other through drbd. Pings across the network link work fine in both
directions. I have completely unloaded and reloaded the drbd modules
from the kernel, which had corrected this issue the last time I saw it,
but it didn't correct it this time. I've rebooted one of the nodes, but
I'm not in a position where I can reboot the other yet (other activity
on the other node requires scheduling the reboot). Can someone point me
down a troubleshooting road to determine why drbd doesn't reconnect?
Here's how the /proc/drbd files look after I've done the usual (modprobe
drbd; service drbd start;) set of commands:

Node1
[root at arc-dknightlx ~]# modprobe drbd
[root at arc-dknightlx ~]# service drbd start
Starting DRBD resources:    [ d0 s0 n0 ].
..........
***************************************************************
 DRBD's startup script waits for the peer node(s) to appear.
 - In case this node was already a degraded cluster before the
   reboot the timeout is 60 seconds. [degr-wfc-timeout]
 - If the peer was available before the reboot the timeout will
   expire after 0 seconds. [wfc-timeout]
   (These values are for resource 'pgsql'; 0 sec -> wait forever)
 To abort waiting enter 'yes' [  12]:yes

[root at arc-dknightlx ~]# cat /proc/drbd
version: 8.0.1 (api:86/proto:86)
SVN Revision: 2784 build by root at arc-dknightlx, 2007-04-23 13:19:33
 0: cs:WFConnection st:Secondary/Unknown ds:UpToDate/DUnknown C r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

Node2
[root at arc-tkincaidlx log]# modprobe drbd
[root at arc-tkincaidlx log]# service drbd start
Starting DRBD resources:    [ d0 s0 n0 ].
..........
***************************************************************
 DRBD's startup script waits for the peer node(s) to appear.
 - In case this node was already a degraded cluster before the
   reboot the timeout is 60 seconds. [degr-wfc-timeout]
 - If the peer was available before the reboot the timeout will
   expire after 0 seconds. [wfc-timeout]
   (These values are for resource 'pgsql'; 0 sec -> wait forever)
 To abort waiting enter 'yes' [  12]:yes

[root at arc-tkincaidlx log]# cat /proc/drbd
version: 8.0.1 (api:86/proto:86)
SVN Revision: 2784 build by root at arc-tkincaidlx.wsicorp.com, 2007-04-23
13:20:47
 0: cs:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown   r---
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
        act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

Any help would be greatly appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070503/ea391900/attachment.htm>