[Drbd-dev] BUG: NULL pointer dereference triggered by drbdsetup events2
Lars Ellenberg
lars.ellenberg at linbit.com
Thu Mar 2 22:33:12 CET 2017
On Wed, Mar 01, 2017 at 01:36:55PM -0800, Eric Wheeler wrote:
> On Wed, 1 Mar 2017, Lars Ellenberg wrote:
>
> > On Tue, Feb 28, 2017 at 04:28:46PM -0800, Eric Wheeler wrote:
> > > Hello all,
> > >
> > > We found a relatively easy to reproduce bug that crashes the kernel. Start
> > > with all of the resources down and only the DRBD module loaded and run the
> > > following:
> > >
> > > drbdadm up foo
> > > drbdsetup events2 foo | true
> > > drbdadm down foo; ls -d /sys/devices/virtual/bdi/147:7740
> > > drbdadm up foo
> > >
> > > The backtrace is below. The ls above simply illustrates the problem. After
> > > a down, the sysfs entry should not exist. The bug manifests because
> > > add_disk attempts to create the same sysfs entry but it cannot and fails
> > > up the stack. I have a feeling that the interface used by events2 is
> > > holding open a reference count after down so the sysfs entry is never
> > > removed.
> > >
> > > We are piping into true because it fails quickly without reading any
> > > stdio, so perhaps the kernel is blocked trying to flush a buffer into
> > > userspace and never releases a resource count (speculation).
> > >
> > > This was tested using the 4.10.1 kernel with userspace tools
> > > drbd-utils-8.9.4. I suspect this could be worked around in userspace, but
> > > it would be ideal if the kernel module could be fixed up to prevent a
> > > crash.
> > >
> > > Please let me know if you need additional information or if I can provide
> > > any other testing.
> > >
> > > Thank you for your help!
> >
> > The "events2" part of drbd 8.4 was a "backport" from drbd 9.
> > I found this apparently relevant commit in our DRBD 9 repo.
> >
> > In our out-of-tree 8.4 code, we later dropped the .done parts again,
> > for compat with RHEL 5, which is likely why this fix fell through.
> >
> > I basically only re-diffed and compile-tested for current upstream kernel.
> > Does this help already, or do we need to dig deeper?
>
> Hi Lars,
>
> Thank you for responding so quickly!
>
> The events2 feature is neat, it allows us to prevent polling and provide
> better responsiveness in our infrastructure. We applied the patch and this
> is what we found:
>
> It no longer crashes the kernel with a null pointer dereference, but now
> the drbdsetup calls never leave the kernel. The first time we ran the test
> case it worked fine, but the second time 'drbdadm down foo' hung. I
> checked the stack traces inside the kernel for each process that is hung
> and have also included the output of this test:
>
>
> # This test succeeded
> ~]# drbdadm up foo
> ~]# drbdsetup events2 foo | true
> ~]# drbdadm down foo; ls -d /sys/devices/virtual/bdi/147:7740
> ls: cannot access /sys/devices/virtual/bdi/147:7740: No such file or directory
> ~]# drbdadm up foo
right.
> # This test failed
> ~]# drbdsetup events2 foo | true
> ~]# drbdadm down foo; ls -d /sys/devices/virtual/bdi/147:7740
> Command 'drbdsetup-84 down foo' did not terminate within 600 seconds
> /sys/devices/virtual/bdi/147:7740
> ~]# drbdadm up foo
> Command 'drbdsetup-84 new-resource foo --on-no-data-accessible=suspend-io' did not terminate within 5 seconds
There is something else going on here.
> The second test that failed left hung drbdsetup processes:
>
> ~]# ps ax |grep drbdsetup-84
> 1579 pts/0 D 0:00 drbdsetup-84 down foo
> 1585 ? D 0:00 drbdsetup-84 role 7740
Where is that "role" coming from?
> 1737 pts/0 D 0:00 drbdsetup-84 new-resource foo --on-no-data-accessible=suspend-io
>
> These are the in-kernel stack traces for each pid:
>
> ##### 1579 pts/0 D+ 0:00 drbdsetup-84 down foo
> [root at hvtest2 ~]# cat /proc/1579/stack
> [<ffffffff81102a8c>] wait_rcu_gp+0x5c/0x80
> [<ffffffff81106685>] synchronize_sched+0x45/0x60
> [<ffffffffa02a8ca2>] adm_del_resource+0xf2/0x110 [drbd]
Hm.
That's just doing a synchronize_rcu().
> [<ffffffffa02ab5f3>] drbd_adm_down+0x233/0x240 [drbd]
> [<ffffffff8161235d>] genl_family_rcv_msg+0x1cd/0x400
> [<ffffffff81612621>] genl_rcv_msg+0x91/0xd0
> [<ffffffff816113f1>] netlink_rcv_skb+0xc1/0xe0
> [<ffffffff81611adc>] genl_rcv+0x2c/0x40
> [<ffffffff81610a76>] netlink_unicast+0x106/0x210
> [<ffffffff81610fb4>] netlink_sendmsg+0x434/0x690
> ##### 1585 ? D 0:00 drbdsetup-84 role 7740
> ~]# cat /proc/1585/stack
> [<ffffffff81612658>] genl_rcv_msg+0xc8/0xd0
This is likely waiting for the "genl_lock()".
> [<ffffffff816113f1>] netlink_rcv_skb+0xc1/0xe0
> [<ffffffff81611adc>] genl_rcv+0x2c/0x40
> [<ffffffff81610a76>] netlink_unicast+0x106/0x210
> [<ffffffff81610fb4>] netlink_sendmsg+0x434/0x690
> ##### 1737 pts/0 D 0:00 drbdsetup-84 new-resource foo --on-no-data-accessible=suspend-io
> ~]# cat /proc/1737/stack
> [<ffffffff81612658>] genl_rcv_msg+0xc8/0xd0
The same.
> [<ffffffff816113f1>] netlink_rcv_skb+0xc1/0xe0
> [<ffffffff81611adc>] genl_rcv+0x2c/0x40
> [<ffffffff81610a76>] netlink_unicast+0x106/0x210
> [<ffffffff81610fb4>] netlink_sendmsg+0x434/0x690
The two drbdsetup waiting for genl_lock()
are maybe ugly, but expected in this,
as the drbdsetup down still holds it.
Question is: why does that down not make progress,
Why does the rcu grace period not make progress?
Anything suspicious in the logs before that?
> Please let me know if you would like any additional information.
Just as an additional data point,
can you try to reproduce with our out-of-tree DRBD 8.4 git?
(plus that .done() fix... I did not yet push it)
--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support
DRBD® and LINBIT® are registered trademarks of LINBIT
More information about the drbd-dev
mailing list