[Drbd-dev] BUG: NULL pointer dereference triggered by drbdsetup events2

Lars Ellenberg lars.ellenberg at linbit.com
Wed Mar 1 14:06:14 CET 2017


On Tue, Feb 28, 2017 at 04:28:46PM -0800, Eric Wheeler wrote:
> Hello all,
> 
> We found a relatively easy to reproduce bug that crashes the kernel. Start 
> with all of the resources down and only the DRBD module loaded and run the 
> following:
> 
> drbdadm up foo
> drbdsetup events2 foo | true
> drbdadm down foo; ls -d /sys/devices/virtual/bdi/147:7740
> drbdadm up foo
> 
> The backtrace is below. The ls above simply illustrates the problem. After 
> a down, the sysfs entry should not exist. The bug manifests because 
> add_disk attempts to create the same sysfs entry but it cannot and fails 
> up the stack. I have a feeling that the interface used by events2 is 
> holding open a reference count after down so the sysfs entry is never 
> removed.
> 
> We are piping into true because it fails quickly without reading any 
> stdio, so perhaps the kernel is blocked trying to flush a buffer into 
> userspace and never releases a resource count (speculation).
> 
> This was tested using the 4.10.1 kernel with userspace tools 
> drbd-utils-8.9.4. I suspect this could be worked around in userspace, but 
> it would be ideal if the kernel module could be fixed up to prevent a 
> crash.
> 
> Please let me know if you need additional information or if I can provide 
> any other testing.
> 
> Thank you for your help!

The "events2" part of drbd 8.4 was a "backport" from drbd 9.
I found this apparently relevant commit in our DRBD 9 repo.

In our out-of-tree 8.4 code, we later dropped the .done parts again,
for compat with RHEL 5, which is likely why this fix fell through.

I basically only re-diffed and compile-tested for current upstream kernel.
Does this help already, or do we need to dig deeper?

Thanks,
	Lars


commit 74635bbfbd03546f19ed015cec6e470062af44af
Author: Lars Ellenberg <lars.ellenberg at linbit.com>
Date:   Wed Jul 27 14:53:00 2016 +0200

    drbd: fix potential refcount leak, add missing drbd_adm_get_initial_state_done
    
    If some drbdsetup events2 / wait-* command aborts early,
    or the netlink socket currently in .dumpit = drbd_adm_get_initial_state
    is destroyed early for some other reason,
    the cleanup code inside that .dumpit function would never run,
    and we end up with a reference count leak.
    
    Move cleanup to .done = drbd_adm_get_initial_state_done.

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index f35db29..115b468 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -84,6 +84,7 @@ int drbd_adm_dump_connections_done(struct netlink_callback *cb);
 int drbd_adm_dump_peer_devices(struct sk_buff *skb, struct netlink_callback *cb);
 int drbd_adm_dump_peer_devices_done(struct netlink_callback *cb);
 int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_get_initial_state_done(struct netlink_callback *cb);
 
 #include <linux/drbd_genl_api.h>
 #include "drbd_nla.h"
@@ -4932,6 +4933,21 @@ static int get_initial_state(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
+int drbd_adm_get_initial_state_done(struct netlink_callback *cb)
+{
+	LIST_HEAD(head);
+	if (cb->args[0]) {
+		struct drbd_state_change *state_change =
+			(struct drbd_state_change *)cb->args[0];
+		cb->args[0] = 0;
+
+		/* connect list to head */
+		list_add(&head, &state_change->list);
+		free_state_changes(&head);
+	}
+	return 0;
+}
+
 int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct drbd_resource *resource;
@@ -4940,14 +4956,6 @@ int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb)
 	if (cb->args[5] >= 1) {
 		if (cb->args[5] > 1)
 			return get_initial_state(skb, cb);
-		if (cb->args[0]) {
-			struct drbd_state_change *state_change =
-				(struct drbd_state_change *)cb->args[0];
-
-			/* connect list to head */
-			list_add(&head, &state_change->list);
-			free_state_changes(&head);
-		}
 		return 0;
 	}
 
diff --git a/include/linux/drbd_genl.h b/include/linux/drbd_genl.h
index c934d3a..aa9bf5d 100644
--- a/include/linux/drbd_genl.h
+++ b/include/linux/drbd_genl.h
@@ -521,6 +521,7 @@ GENL_op(
 	DRBD_ADM_GET_INITIAL_STATE, 38,
 	GENL_op_init(
 	        .dumpit = drbd_adm_get_initial_state,
+	        .done = drbd_adm_get_initial_state_done,
 	),
 	GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY))
 


More information about the drbd-dev mailing list