Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Jan 03, 2012 at 08:16:15PM +0100, Florian Haas wrote: > Hi, > > DRBD 8.3.12 on CentOS 6.2; SDP from kernel-ib-1.5.3, built locally You too, of all people? Something crops up, and since DRBD is used at the same time, it has to be DRBD's fault? I mean, that's possible, of course. But ... > from ofa_kernel-1.5.3-OFED srpm. DRBD resource config is as follows: How would the drbd configuration influence module refcount imbalance? > resource vg_cluster1 { > on alice { > device /dev/drbd1 minor 1; > disk /dev/sda; > address sdp 192.168.100.12:7789; > meta-disk internal; > } > on bob { > device /dev/drbd1 minor 1; > disk /dev/sdb; > address sdp 192.168.100.13:7789; > meta-disk internal; > } > } > > The 192.168.100.0/24 network is a directly connected IB link, and DRBD > demonstrably does use SDP: > > # sdpnetstat -Sn > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address State > sdp 0 0 192.168.100.13:55825 192.168.100.12:7788 ESTABLISHED > sdp 0 0 192.168.100.13:55826 192.168.100.12:7789 ESTABLISHED > sdp 0 0 192.168.100.13:7789 192.168.100.12:41104 ESTABLISHED > sdp 0 0 192.168.100.13:7788 192.168.100.12:41105 ESTABLISHED > > > The ib_sdp module refcount looks normal at this time (at least, I > would expect the 2 in lsmod's "used by" column, one per SDP-enabled > DRBD resource -- but please correct me if this is a misconception): > > lsmod | grep ib_sdp > ib_sdp 130827 2 > > > Now, "drbdadm down" all seems to not have the expected effect on the refcount: > > # drbdadm down all; lsmod | grep ib_sdp > ib_sdp 130827 4294967294 > > 4 billion references on that module look excessive. :) I suppose the > refcount incorrectly goes negative. Sure. That's a -2. > This is inconvenient as you're now unable to unload ib_sdp. I presume > this is a bug; /me too ;-) Only at this point I doubt it is a DRBD bug. All module refcount stuff is implicit, so I would expect the module count on all other network related modules to go wrong as well. Besides, I think I complained about that to the OFED guys about two and a half years ago already, when I helped to fix their memleak and frame corruption. Never pressed the issue, though, and can not remember any useful response. Of course it _may_ be DRBD, or something that DRBD could work around, but I suspect it is something in the OFED stack. If they reason otherwise, I'll listen. > if I can provide any traces or debug logs to narrow > down the issue I'll be happy to. Let us know what you find out. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.