[DRBD-user] drbd resource fencing - 2nd try with more information

Wed Dec 5 06:08:37 CET 2007

On Tue, Dec 04, 2007 at 10:48:43AM +0100, Dominik Klein wrote:
> Hi Florian, drbd-users
> 
> I see I have been very short on info here. Sorry for that.
> 
> So I want to learn about resource fencing in DRBD. I read the recent 
> thread about it and read about the different modes DRBD offers for 
> fencing. As I dont have a STONITH device, I went for resource-only.
> 
> Here's my configuration, what I did and what I got.
> 
> Nodes: dktest1debian, dktest2debian
> OS: Debian Etch 32 bit
> DRBD: 8.0.7
> Heartbeat: 2.1.12-24
> Kernel 2.6.18-4-686
> Network: eth0 10.250.250.0/24 for drbd and heartbeat
> 	 eth1 10.2.50.0/24 for normal networking and heartbeat
> 
> ha.cf:
> keepalive 2
> deadtime 30
> warntime 10
> ucast eth1 10.2.50.100
> ucast eth0 10.250.250.100
> node    dktest1debian
> node    dktest2debian
> ping 10.2.50.32
> ping 10.2.50.2
> ping 10.2.50.34
> ping 10.2.50.250
> ping 10.2.50.11
> respawn root /usr/lib/heartbeat/pingd -p /var/run/pingd.pid -d 5s -m 100
> respawn hacluster /usr/lib/heartbeat/dopd
> apiauth dopd gid=haclient uid=hacluster
> use_logd        yes
> crm on
> 
> drbd.conf:
> global {
>   usage-count no;
> }
> common {
>  handlers {
>    outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater";
>  }
> }
> resource drbd2 {
>   protocol C;
>   startup {
>     wfc-timeout  15;
>     degr-wfc-timeout 120;
>   }
>   disk {
>     on-io-error   detach;
>     fencing       resource-only;
>   }
>   net {
>     after-sb-0pri disconnect;
>     after-sb-1pri disconnect;
>     after-sb-2pri disconnect;
>     rr-conflict disconnect;
>     max-buffers      20480;
>     max-epoch-size   16384;
>     unplug-watermark 20480;
>   }
>   syncer {
>     rate 140M;
>   }
>   on dktest1debian {
>     device     /dev/drbd2;
>     disk       /dev/sda3;
>     address    10.250.250.100:7790;
>     meta-disk  internal;
>   }
>   on dktest2debian {
>     device    /dev/drbd2;
>     disk      /dev/sda3;
>     address   10.250.250.101:7790;
>     meta-disk internal;
>   }
> }
> 
> Now I do:
> reboot both nodes
> rm /var/lib/heartbeat/crm/* on both nodes
> 
> So we start off real clean.
> 
> /etc/init.d/heartbeat start on both nodes
> 
> Wait to see online/online and that a DC has been chosen, dopd is started.
> 
> At this point, I have no resources configured and Linux-HA is running 
> with all defaults (no STONITH).
> 
> Now I promote drbd2 on dktest1debian.
> 
> After that I unplug the DRBD link (eth0)
> 
> Then in the logs I see:
> 
> Dec  4 10:27:39 dktest1debian drbd-peer-outdater: [2674]: debug: drbd 
> peer: dktest2debian
> Dec  4 10:27:39 dktest1debian drbd-peer-outdater: [2674]: debug: drbd 
> resource: drbd2
> Dec  4 10:27:39 dktest1debian drbd-peer-outdater: [2674]: ERROR: 
> cl_free: Bad magic number in object at 0xbfc405e8

this is comming from the heartbeat messaging layer.  something in there
is broken, or you somehow got a broken build, or, most likely, the
versions of the "drbd-outdate-peer" and the "dopd" are not matching.

> What does this mean (bad magic number)?

something in your heartbeat communication channels and the way the
"dopd" works with them is screwed up.

as long as I cannot reproduce this, I cannot help you.

is there anything "special" about your architechtures and distributions?

-- 
: commercial DRBD/HA support and consulting: sales at linbit.com :
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.