[DRBD-user] drbd-8.3.3rc1.tar.gz

Tue Sep 1 12:28:01 CEST 2009

Philipp Reisner ha scritto:
> Hi,
>
> Please help testing this pre release...
>
>   

I see the same problems, here, that i see on 8.3.2 and that i described
a couple of days ago resuming an old thread
See "Re: [DRBD-user] bind before connect failed, err = -99"

errno 99 is:

#define EADDRNOTAVAIL   99      /* Cannot assign requested address */

and that is definitely not true: when i start drbd, the IP addresses are
already assigned, but drbd fails to bind() to the socket.
As i wrote, i'm testing in a couple of KVM virtualized machines on
debian, using 2.6.35.5.

What i found is that if i "relax" the init scripts, that is i add a 1
second delay when drbd starts (sleep 1 added in /etc/init.d/drbd), this
error doesn't show up (though, there are other problems).

i had a look at the code and noticed that, comparing to 8.0, the code
hasn't changed that much except for the support of IPv6.
Thinking that the problem was ipv6-related, i disabled ipv6 in the
kernel, but the problem is still there.

If i setup KVM to use only 1 processor, skipping SMP, it seems (i did
about 5 reboots), the "bind before connect" erro is gone.

My conclusion (enforced by the sleep1; test) is that on SMP machines,
Network initialization takes some time and drbd, even if is started
later in the init scripts, cannot bind to the correct ip.
The proper solution in this case is to add a retry loop, waiting for a
second before a bind and the next one.

Anyway, pointed out this race bug that prevents multi primary clusters
to come up cleanly after a reboot on SMP machines, then another problem
arises.

Let me use the logs to explain the problem:

NODE A:

version: 8.3.3rc1 (api:88/proto:86-91)
GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa build by
phil at fat-tyre, 2009-08-28 15:07:52

1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:0 nr:0 dw:0 dr:1344 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

NODE B:
wsos2:~# cat /proc/drbd
version: 8.3.3rc1 (api:88/proto:86-91)
GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa build by
phil at fat-tyre, 2009-08-28 15:07:52

1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:0 dw:0 dr:272 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

Note that node B is synced, uptodate and in SECONDARY mode.
This time (KVM is set to SMP=1, so it's using 1 processor) the virtual
machine is rebooted.

NODE A:

version: 8.3.3rc1 (api:88/proto:86-91)
GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa build by
phil at fat-tyre, 2009-08-28 15:07:52

1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----
    ns:0 nr:0 dw:0 dr:1344 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

This correctly switches to WFConnection waiting for the other node to
come back.

When the other node reboots and the situation is:

NODE A:
wsos1:~# cat /proc/drbd
version: 8.3.3rc1 (api:88/proto:86-91)
GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa build by
phil at fat-tyre, 2009-08-28 15:07:52

1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r----
    ns:0 nr:0 dw:0 dr:1344 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

NODE B:
version: 8.3.3rc1 (api:88/proto:86-91)
GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa build by
phil at fat-tyre, 2009-08-28 15:07:52

1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r----
    ns:0 nr:0 dw:0 dr:272 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

dmesg shows the following:

NODE A:
[35983.365303] block drbd1: conn( Unconnected -> WFConnection )
[36014.210363] block drbd1: Handshake successful: Agreed network
protocol version 91
[36014.232528] block drbd1: conn( WFConnection -> WFReportParams )
[36014.244034] block drbd1: Starting asender thread (from drbd1_receiver
[2518])
[36014.256209] block drbd1: data-integrity-alg: <not-used>
[36014.268233] block drbd1: drbd_sync_handshake:
[36014.309949] block drbd1: self
7DD6E87914B74E79:B9EEEB11D2967349:D535828070E7A253:003A471B39E868FF
bits:0 flags:0
[36014.332785] block drbd1: peer
D9D79CC42EAF2A29:B9EEEB11D2967348:D535828070E7A252:003A471B39E868FF
bits:0 flags:0
[36014.355486] block drbd1: uuid_compare()=100 by rule 90
[36014.367067] block drbd1: Split-Brain detected, dropping connection!
[36014.388802] block drbd1: helper command: /sbin/drbdadm split-brain
minor-1
[36014.403331] block drbd1: helper command: /sbin/drbdadm split-brain
minor-1 exit code 0 (0x0)
[36014.425760] block drbd1: conn( WFReportParams -> Disconnecting )
[36014.437527] block drbd1: error receiving ReportState, l: 4!
[36014.442417] block drbd1: meta connection shut down by peer.
[36014.442421] block drbd1: asender terminated
[36014.442424] block drbd1: Terminating asender thread
[36014.483153] block drbd1: Connection closed
[36014.494704] block drbd1: conn( Disconnecting -> StandAlone )
[36014.506243] block drbd1: receiver terminated
[36014.517494] block drbd1: Terminating receiver thread

NODE B (which was rebooted):
[   10.592528] drbd: initialized. Version: 8.3.3rc1 (api:88/proto:86-91)
[   10.606004] drbd: GIT-hash: 026d60bb0e6a7d5758c6c3e6245f38f6d8b921aa
build by phil at fat-tyre, 2009-08-28 15:07:52
[   10.629019] drbd: registered as block device major 147
[   10.640617] drbd: minor_table @ 0xffff8800bcded5c0
[   10.657765] block drbd1: Starting worker thread (from cqueue [1718])
[   10.683786] block drbd1: disk( Diskless -> Attaching )
[   10.704007] block drbd1: Found 6 transactions (6 active extents) in
activity log.
[   10.716021] block drbd1: Method to ensure write ordering: barrier
[   10.727881] block drbd1: max_segment_size ( = BIO size ) = 32768
[   10.749612] block drbd1: drbd_bm_resize called with capacity == 104854328
[   10.761830] block drbd1: resync bitmap: bits=13106791 words=204794
[   10.773466] block drbd1: size = 50 GB (52427164 KB)
[   10.787784] block drbd1: recounting of set bits took additional 0 jiffies
[   10.799555] block drbd1: 0 KB (0 bits) marked out-of-sync by on disk
bit-map.
[   10.821465] block drbd1: disk( Attaching -> UpToDate )
[   10.833017] block drbd1: Barriers not supported on meta data device -
disabling
[   10.903226] block drbd1: conn( StandAlone -> Unconnected )
[   10.914793] block drbd1: Starting receiver thread (from drbd1_worker
[1723])
[   10.930485] block drbd1: receiver (re)started
[   10.955876] block drbd1: conn( Unconnected -> WFConnection )
[   11.989830] block drbd1: role( Secondary -> Primary )
[   12.001469] block drbd1: Creating new current UUID
[   14.772142] block drbd1: Handshake successful: Agreed network
protocol version 91
[   14.794147] block drbd1: conn( WFConnection -> WFReportParams )
[   14.805471] block drbd1: Starting asender thread (from drbd1_receiver
[1741])
[   14.818916] block drbd1: data-integrity-alg: <not-used>
[   14.840835] block drbd1: drbd_sync_handshake:
[   14.842195] block drbd1: self
D9D79CC42EAF2A29:B9EEEB11D2967348:D535828070E7A252:003A471B39E868FF
bits:0 flags:0
[   14.865068] block drbd1: peer
7DD6E87914B74E79:B9EEEB11D2967349:D535828070E7A253:003A471B39E868FF
bits:0 flags:0
[   14.887991] block drbd1: uuid_compare()=100 by rule 90
[   14.899527] block drbd1: Split-Brain detected, dropping connection!
[   14.921356] block drbd1: helper command: /sbin/drbdadm split-brain
minor-1
[   14.935335] block drbd1: helper command: /sbin/drbdadm split-brain
minor-1 exit code 0 (0x0)
[   14.957930] block drbd1: conn( WFReportParams -> Disconnecting )
[   14.969624] block drbd1: error receiving ReportState, l: 4!
[   14.981213] block drbd1: asender terminated
[   14.992568] block drbd1: Terminating asender thread
[   15.004137] block drbd1: Connection closed
[   15.015599] block drbd1: conn( Disconnecting -> StandAlone )
[   15.027267] block drbd1: receiver terminated
[   15.038693] block drbd1: Terminating receiver thread

They detect a split brain situation but the node that was rebooted was
in secondary mode.
No writes were allowed and a split brain should have been automatically
resolved.

The drbd.conf i'm using is the following:

global {
    # dialog-refresh 5; # 5 seconds
    # disable-ip-verification;
    usage-count no;
}

common {
  syncer { rate 75M; }
}

# *******************************************************************
# *******************************************************************

resource www-storage {
  protocol C;

  startup {
    wfc-timeout  1;
    degr-wfc-timeout 120;    # 2 minutes.
    become-primary-on both;
  }

  disk {
    on-io-error   detach;
  }

  net {
    # timeout       60;    #  6 seconds  (unit = 0.1 seconds)
    # connect-int   10;    # 10 seconds  (unit = 1 second)
    # ping-int      10;    # 10 seconds  (unit = 1 second)
    # ping-timeout   5;    # 500 ms (unit = 0.1 seconds)

    after-sb-0pri discard-least-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
    rr-conflict disconnect;

There is something that definitely doesn't work as expected, or i'm too
blind to see the error in my config.

Additional notice:
i have rebooted the node that was in secondary mode and this problem
showed up.
If i reboot the node that is in primary mode (the other is secondary),
the rebooted node comes up in primary mode and this seems to work well.

I hope the informations are enough to investigate.

Max