[DRBD-user] drbd and pacemaker: unknown error on Redhat 6.1

Wed Jul 6 15:33:45 CEST 2011

Hello guys,

we have a problem getting pacemaker and drbd-8.3.11 running und RHEL 6.1
We also tried drbd-8.3.9.

Pacemaker itself works perfectly when you take over ips or start daemons,
but we cant get it to manage drbd resources and mount a drbd filesystem.

The problem is: When pacemaker tries to mount the drbd filesystem
we get the error:
fs0_start_0 (node=lb02, call=31, rc=1, status=complete): unknown error

We compiled drbd using:
./configure --enable-spec --with-km --prefix=/
make tgz
cp drbd*.tar.gz `rpm -E _sourcedir`
rpmbuild -bb drbd.spec
rpmbuild -bb drbd-km.spec
and then installed the resulting rpms.

We know drbd very well and the drbd resources are ok and running, we can
manage them using drbdadm and see everything is ok. Afterwards we 
hand over the drbd to pacemaker. When we cold boot the two drbd servers the 
first server comes up with a valid mounted filesystem but whenever
we try to migrate or failover we will see the unknown error.

The Logfiles give no more hints.

cluster was configured using:
crm configure <<END
property no-quorum-policy=ignore
property stonith-enabled=false
primitive drbd0 ocf:linbit:drbd \
 params drbd_resource=app \
 op monitor role=Master interval=59s timeout=30s \
 op monitor role=Slave interval=60s timeout=30s

ms ms-drbd0 drbd0 \
 meta clone-max=2 notify=true globally-unique=false

primitive fs0 ocf:heartbeat:Filesystem params fstype=ext3 directory=/app \
 device=/dev/drbd/by-res/app
commit
exit
END

crm status shows:

Stack: openais
Current DC: lb01 - partition with quorum
Version: 1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ lb02 lb01 ]

 Master/Slave Set: ms-drbd0 [drbd0]
Masters: [ lb01 ]
Slaves: [ lb02 ]

Failed actions:
    fs0_start_0 (node=lb02, call=31, rc=1, status=complete): unknown error
    fs0_start_0 (node=lb01, call=8, rc=1, status=complete): unknown
error

We can run the /usr/lib/ocf/resource.d/heartbeat/Filesystem System script with the above parameters on the shell succesfully without getting any errors.

The drbd setup is quite easy, uses standard global_common.conf
and the following res :
resource app {
  protocol      C;

  startup { wfc-timeout 0; degr-wfc-timeout     120; }
  disk { on-io-error detach; }
  syncer {
    rate 60M;
  }
  on lb01.domain.net {
    device      /dev/drbd0;
    disk        /dev/mapper/vg0-app;
    address     192.168.0.1:7792;
    meta-disk   internal;
  }
  on lb02.domain.net {
    device      /dev/drbd0;
    disk        /dev/mapper/vg0-app;
    address     192.168.0.2:7792;
    meta-disk   internal;
  }
}

Any ideas how we can debug that problem ?

Christoph

-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

Christoph.Adomeit at gatworks.de     Internetloesungen vom Feinsten
Fon. +49 2166 9149-32                      Fax. +49 2166 9149-10