[DRBD-user] Primary node fails to establish a connection with Secondary when drbd is started

Tue Feb 10 12:00:03 CET 2009

Hi,

I've setup a simple two node DRBD cluster to implement a HA-NFS solution

We are running DRBD on CentOS 5.2 using the CentOS RPMS kmod-drbd82-8.2.6-2 and drbd82-8.2.6-1.el5.centos on both nodes.

The Primary node is running kernel 2.6.18-92.1.22.el5 and the Secondary is running Kernel 2.6.18-92.1.18.el5.

When I initially set the system up both nodes were running kernel 2.6.18-92.1.18.el5.

A few weeks ago I shutdown the primary, the secondary took over as expected and everything was hunky dory.

For various reasons I had to rebuild the primary, which is why it is running a different kernel.

/etc/drbd.conf looks like this on both nodes:

common { syncer { rate 100M; } }
       resource r0 {
            protocol C;
            net {
                 cram-hmac-alg sha1;
                 shared-secret "marviQ";
            }
            on data01.tmf.qinip.net {
                 device    /dev/drbd1;
                 disk      /dev/data01_ext3/main_sites;
                 address   172.16.1.102:7789;
                 meta-disk internal;
            }
            on data00.tmf.qinip.net {
                 device    /dev/drbd1;
                 disk      /dev/data00_ext3/main_sites_backup;
                 address   172.16.1.101:7789;
                 meta-disk internal;
            }
       }

I've setup a dedicated NIC on both nodes that is used for DRBD sync traffic. The nodes are connected by a crossover cable.

When I execute service drbd start on the Primary I get this:

[root at data00 ~]# service drbd start
Starting DRBD resources:    [ ].
..........
***************************************************************
 DRBD's startup script waits for the peer node(s) to appear.
 - In case this node was already a degraded cluster before the
   reboot the timeout is 0 seconds. [degr-wfc-timeout]
 - If the peer was available before the reboot the timeout will
   expire after 0 seconds. [wfc-timeout]
   (These values are for resource 'r0'; 0 sec -> wait forever)
 To abort waiting enter 'yes' [ -- ]:[  10]:[  11]:[  12]:[  13]:[  14]:[  15]:[  16]:[  17]:[  18]:[  19]:[  20]:[  21]:[  22]:[  23]:[  24]:[  25]:[  26]:[  27]:[  28]:[  29]:[  30]:[  31]:[  32]:[  33]:[  34]:[  35]:[  36]:[  37]:[  38]:[  39]:[  40]:[  41]:[  42]:[  43]:[  44]:[  45]:[  46]:[  47]:[  48]:[  49]:[  50]:[  51]:[  52]:[  53]:[  54]:[  55]:[  56]:[  57]:[  58]:[  59]:[  60]:[  61]:[  62]:[  63]:[  64]:[  65]:[  66]:[  67]:[  68]:[  69]:[  70]:[  71]:[  72]:[  73]:[  74]:[  75]:[  76]:[  77]:[  78]:[  79]:[  80]:[  81]:[  82]:[  83]....

This goes on forever.

Tcpdump shows that Primary is not able to establish a connection with the Secondary.

Every time the Primary sends a SYN to the Secondary, the Secondary responds with RST, ACK

When I rebuilt the primary I executed 'drbdadm create-md r0' , this was successful.

But every time I try to start drbd on the primary it waits for ever.

I guess I have done something silly but if someone could explain how to fix this I would be very grateful.

I would like to thank the creators of DRBD for making this open source.

TIA

Shaun

Shaun

Op dit e-mailbericht is een disclaimer van toepassing, welke te vinden is op http://www.espritxb.nl/disclaimer
NB: Vanaf heden zijn al onze mailadressen gewijzigd in ... at espritxb.nl! Pas dit svp aan in uw adresboek.