[DRBD-user] drbd kernel BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

Fri Mar 16 13:29:24 CET 2012

On Fri, Mar 16, 2012 at 10:36 AM, France <mailinglists at isg.si> wrote:
> Hi,
>
> i'm hitting a bug in drbd, with latest CentOs and drbd 8.3.12 using GFS2 on
> top with cman and rgmanager.
>
> Here is the simplest method to have it occur.
> 1. Start drbd on node s2
> 2. Start drbd on node s3
> They sync up:
> [root at s3 ~]# cat /proc/drbd
> version: 8.3.12 (api:88/proto:86-96)
> GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag at Build64R6,
> 2011-11-20 10:57:03
>  0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
>    ns:0 nr:45060 dw:45056 dr:660 al:0 bm:11 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b
> oos:0
> 3. Start cman on s2 & s3, so i can use gfs2: cluster is up OK:
> [root at s3 ~]# cman_tool status
> Version: 6.2.0
> Config Version: 8
> Cluster Name: stor
> Cluster Id: 61164
> Cluster Member: Yes
> Cluster Generation: 140
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 1
> Total votes: 2
> Node votes: 1
> Quorum: 1
> Active subsystems: 7
> Flags: 2node
> Ports Bound: 0
> Node name: s3alt.c.XX.si
> Node ID: 3
> Multicast addresses: 239.192.238.219 239.192.0.2
> Node addresses: 192.168.168.3 10.31.0.42
> 4. Start gfs2 on both nodes:
> Mar 16 10:29:41 s3 kernel: GFS2 (built Mar  7 2012 00:54:51) installed
> Mar 16 10:29:41 s3 kernel: GFS2: fsid=: Trying to join cluster "lock_dlm",
> "stor:drbdstor"
> Mar 16 10:29:41 s3 kernel: dlm: Using SCTP for communications

That's the root cause of your problem.

This is a known issue, and although it's apparently DRBD that's
causing this panic, its own code isn't to blame for this.
DLM-over-SCTP isn't fully supported, and unless you can entice (i.e.
pay) someone to fix it, you won't get this to work reliably.

Sadly, it's apparently impossible to force DLM-over-TCP for multihomed
hosts, so the only way to work around this seems to be to just run the
DLM on box with a single (possibly bonded) network interface, silly as
that may sound.

A more detailed discussion of this issue is here:

http://www.mail-archive.com/drbd-user@lists.linbit.com/msg04492.html

Hope this helps.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now