Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Jan 03, 2012 at 08:16:15PM +0100, Florian Haas wrote:
> Hi,
>
> DRBD 8.3.12 on CentOS 6.2; SDP from kernel-ib-1.5.3, built locally
You too, of all people?
Something crops up, and since DRBD is used at the same time,
it has to be DRBD's fault?
I mean, that's possible, of course. But ...
> from ofa_kernel-1.5.3-OFED srpm. DRBD resource config is as follows:
How would the drbd configuration influence module refcount imbalance?
> resource vg_cluster1 {
> on alice {
> device /dev/drbd1 minor 1;
> disk /dev/sda;
> address sdp 192.168.100.12:7789;
> meta-disk internal;
> }
> on bob {
> device /dev/drbd1 minor 1;
> disk /dev/sdb;
> address sdp 192.168.100.13:7789;
> meta-disk internal;
> }
> }
>
> The 192.168.100.0/24 network is a directly connected IB link, and DRBD
> demonstrably does use SDP:
>
> # sdpnetstat -Sn
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address Foreign Address State
> sdp 0 0 192.168.100.13:55825 192.168.100.12:7788 ESTABLISHED
> sdp 0 0 192.168.100.13:55826 192.168.100.12:7789 ESTABLISHED
> sdp 0 0 192.168.100.13:7789 192.168.100.12:41104 ESTABLISHED
> sdp 0 0 192.168.100.13:7788 192.168.100.12:41105 ESTABLISHED
>
>
> The ib_sdp module refcount looks normal at this time (at least, I
> would expect the 2 in lsmod's "used by" column, one per SDP-enabled
> DRBD resource -- but please correct me if this is a misconception):
>
> lsmod | grep ib_sdp
> ib_sdp 130827 2
>
>
> Now, "drbdadm down" all seems to not have the expected effect on the refcount:
>
> # drbdadm down all; lsmod | grep ib_sdp
> ib_sdp 130827 4294967294
>
> 4 billion references on that module look excessive. :) I suppose the
> refcount incorrectly goes negative.
Sure. That's a -2.
> This is inconvenient as you're now unable to unload ib_sdp. I presume
> this is a bug;
/me too ;-)
Only at this point I doubt it is a DRBD bug.
All module refcount stuff is implicit, so I would expect the module
count on all other network related modules to go wrong as well.
Besides, I think I complained about that to the OFED guys
about two and a half years ago already,
when I helped to fix their memleak and frame corruption.
Never pressed the issue, though,
and can not remember any useful response.
Of course it _may_ be DRBD, or something that DRBD could work around,
but I suspect it is something in the OFED stack.
If they reason otherwise, I'll listen.
> if I can provide any traces or debug logs to narrow
> down the issue I'll be happy to.
Let us know what you find out.
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.