Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Jan 04, 2012 at 04:40:57PM +0100, Lars Ellenberg wrote: > On Tue, Jan 03, 2012 at 08:16:15PM +0100, Florian Haas wrote: > > Hi, > > > > DRBD 8.3.12 on CentOS 6.2; SDP from kernel-ib-1.5.3, built locally > > You too, of all people? > > Something crops up, and since DRBD is used at the same time, > it has to be DRBD's fault? > > I mean, that's possible, of course. But ... Hm, well, I love to correct my own arrogant statements ;-) > > # drbdadm down all; lsmod | grep ib_sdp > > ib_sdp 130827 4294967294 > > > > 4 billion references on that module look excessive. :) I suppose the > > refcount incorrectly goes negative. > > Sure. That's a -2. > > > This is inconvenient as you're now unable to unload ib_sdp. I presume > > this is a bug; > > /me too ;-) > > Only at this point I doubt it is a DRBD bug. > > All module refcount stuff is implicit, so I would expect the module > count on all other network related modules to go wrong as well. Hm. We'll see about that. > Besides, I think I complained about that to the OFED guys > about two and a half years ago already, > when I helped to fix their memleak and frame corruption. > > Never pressed the issue, though, > and can not remember any useful response. > > Of course it _may_ be DRBD, or something that DRBD could work around, > but I suspect it is something in the OFED stack. > If they reason otherwise, I'll listen. > > > if I can provide any traces or debug logs to narrow > > down the issue I'll be happy to. > > Let us know what you find out. Based on some git blaming, I found drbd: 53eb779 (July 2008) kernel: ac5a488e (long ago), 1b08534e (Dec 2008) The relevant part of the latter is: commit 1b08534e562dae7b084326f8aa8cc12a4c1b6593 net: Fix module refcount leak in kernel_accept() ... diff --git a/net/socket.c b/net/socket.c index 92764d8..76ba80a 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2307,6 +2307,7 @@ int kernel_accept(struct socket *sock, struct socket **newsock, int flags) } (*newsock)->ops = sock->ops; + __module_get((*newsock)->ops->owner); done: return err; So. We are doing it as the kernel was doing it back in July 2008, only the kernel was doing it wrong, and got fixed in December :-/ You can verify if you see such imbalance when using ipv6 (as a module) as well. And you can try a patch: diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c index 0e55c45..7decee3 100644 --- a/drbd/drbd_receiver.c +++ b/drbd/drbd_receiver.c @@ -528,6 +528,7 @@ STATIC int drbd_accept(struct drbd_conf *mdev, const char **what, goto out; } (*newsock)->ops = sock->ops; + __module_get((*newsock)->ops->owner); out: return err; Thanks, Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.