[DRBD-user] Can't unload ib_sdp module after "drbdadm down all" (fishy module refcount)

Lars Ellenberg lars.ellenberg at linbit.com
Wed Jan 4 16:40:57 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Jan 03, 2012 at 08:16:15PM +0100, Florian Haas wrote:
> Hi,
> 
> DRBD 8.3.12 on CentOS 6.2; SDP from kernel-ib-1.5.3, built locally

You too, of all people?

Something crops up, and since DRBD is used at the same time,
it has to be DRBD's fault?

I mean, that's possible, of course. But ...

> from ofa_kernel-1.5.3-OFED srpm. DRBD resource config is as follows:

How would the drbd configuration influence module refcount imbalance?

> resource vg_cluster1 {
>     on alice {
>         device           /dev/drbd1 minor 1;
>         disk             /dev/sda;
>         address          sdp 192.168.100.12:7789;
>         meta-disk        internal;
>     }
>     on bob {
>         device           /dev/drbd1 minor 1;
>         disk             /dev/sdb;
>         address          sdp 192.168.100.13:7789;
>         meta-disk        internal;
>     }
> }
> 
> The 192.168.100.0/24 network is a directly connected IB link, and DRBD
> demonstrably does use SDP:
> 
> # sdpnetstat -Sn
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> sdp        0      0 192.168.100.13:55825    192.168.100.12:7788     ESTABLISHED
> sdp        0      0 192.168.100.13:55826    192.168.100.12:7789     ESTABLISHED
> sdp        0      0 192.168.100.13:7789     192.168.100.12:41104    ESTABLISHED
> sdp        0      0 192.168.100.13:7788     192.168.100.12:41105    ESTABLISHED
> 
> 
> The ib_sdp module refcount looks normal at this time (at least, I
> would expect the 2 in lsmod's "used by" column, one per SDP-enabled
> DRBD resource -- but please correct me if this is a misconception):
> 
> lsmod | grep ib_sdp
> ib_sdp                130827  2
> 
> 
> Now, "drbdadm down" all seems to not have the expected effect on the refcount:
> 
> # drbdadm down all; lsmod | grep ib_sdp
> ib_sdp                130827  4294967294
> 
> 4 billion references on that module look excessive. :) I suppose the
> refcount incorrectly goes negative.

Sure. That's a -2.

> This is inconvenient as you're now unable to unload ib_sdp. I presume
> this is a bug;

/me too ;-)

Only at this point I doubt it is a DRBD bug.
All module refcount stuff is implicit, so I would expect the module
count on all other network related modules to go wrong as well.

Besides, I think I complained about that to the OFED guys
about two and a half years ago already,
when I helped to fix their memleak and frame corruption.

Never pressed the issue, though,
and can not remember any useful response.

Of course it _may_ be DRBD, or something that DRBD could work around,
but I suspect it is something in the OFED stack.
If they reason otherwise, I'll listen.

> if I can provide any traces or debug logs to narrow
> down the issue I'll be happy to.

Let us know what you find out.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.



More information about the drbd-user mailing list