[DRBD-user] Some weird behaviour

Thu May 13 04:40:53 CEST 2004

I'm very sorry for the incompleteness of the previous post.

Em Wed, 12 May
2004 18:13:22 +0200, Lars Ellenberg escreveu:

>> 1) Sometimes I gues the following message: 
>>    "drbd0: predetermined states are in contradiction to GC's"
>>    That happened, at least, with host1/Primary and when booting host2.
>>    What does it mean?
> 
> 
> you have Paul running as Primary, and then
> Silas connects, and says
>  "Hey, I have more recent data than you, so I want to be the Primary,
>   and you become sync target!"
> And Paul answeres
>  "I won't cooperate, I may have running services using me.
>   You just go away and leave me alone."
>
> This can happen if you *forced*, maybe after some timeout,
> a node to become Primary, without knowing that it had the best data.

The unique situation that this might happen which I can think of, is
when Silas *was* the Primary *before* Paul becoming Primary, and existed
a POWERFAIL (thus, heartbeat will probably force Paul to become Primary).
I think this IS the case. I'm conducting POWERFAIL tests.

Just wondering (I'll test this tomorrow): if I write to DRBD through Paul
(with Silas off), does Paul's DRBD become the more recent?

If so, does it make any sense to set a (heartbeat) script to write a byte
to the DRBD, each time it switches to Primary?

>> 2) Also, I'm using heartbeat to
control the NFS resource that relies
>> upon DRBD, and setting DRBD to be "passive", ie, cluster-managed (see
>> configuration file below).
> 
> No, I don't see it?

Very sorry. Including it now, see below.

>> Setting it to "auto_failback=off", and booting
>> host1 and host2 at the same time (they launch INIT/drbd script almost at
>> the same time) I get this sometimes:
>> on host1: /proc/drbd: primary/unknown, NEEDS_SYNC
>> on host2: it starts dumping "drbd0: !page in drbd_put_ee()" in an infinite
>> loop.
>>    What could be wrong? This is the unique situation that is unstable.
> 
> I don't understand this one.
> I do not even understand exactly what you do here.
> Please explain.

This has to be my weirdest post ever :)

I've set HEARTBEAT to "auto_failback=off" (which means to leave the
resource's state as it is, instead of reacquiring it).
Then, I've booted both nodes at the same time (so that the INIT drbd script 
runs (almost) at the same time) - so no one is Primary.

When the boot process reaches "/etc/init.d/drbd start" I get these states
sometimes: 
host1: "cat /proc/drbd" shows "st:Primary/Unknown" and "NEEDS_SYNC";
host2: it just goes bezerk! It enters in an infinite loop of messages
"drbd0: !page in drbd_put_ee()",.

Here goes my /etc/drbd.conf:
# cat /etc/drbd.conf | grep -v "^[[:space:]]*#" | grep -v "^[[: space:]]*$"
resource drbd0 {
  protocol = C
  fsckcmd  = /bin/true
  inittimeout=-10
  disk {
    do-panic
    disk-size=256M
  }
  net {
    sndbuf-size = 512k
    sync-min   = 4M   # syncer tries hard to not drop below this rate
    sync-max   = 100M # if you don't care about network saturation
    tl-size     = 5000  # transfer log size, ensures strict write ordering
    timeout     = 60    # unit: 0.1 seconds
    connect-int = 10    # unit: seconds
    ping-int    = 10    # unit: seconds
    ko-count    = 4     # if some block send times out this many times,
  }
  on host1 {
    device  = /dev/nb0
    disk    = /dev/hdb1
    address = 192.168.130.201
    port    = 7788
  }
  on host2 {
    device  = /dev/nb0
    disk    = /dev/hdb1
    address = 192.168.130.202
    port    = 7788
  }
}

-- 
-
Nuno Tavares
http://nthq.cjb.net/