[Drbd-dev] [PATCH] drbdadm: Fix segfault while starting stacked resource in DRBD9

Wed Jul 6 05:35:48 CEST 2016

Hi Roland,

>>> On 2016-7-4 at 17:30, in message <20160704093004.GK1324 at rck.sh>, Roland
Kammerer <roland.kammerer at linbit.com> wrote:
> The segfault happens because conn->peer is NULL? It could still be NULL
> with your patch. I need a real/complete configuration and please provide
> more information how and why you think this fixes anything. As is, I can
> not apply this patch, the intention is unclear to me and it looks broken.

Sorry for didn't describe clearly, information is list below:
# DRBD version:
drbd: 9.0.1-1 (GIT-hash: 86e443973082570aeb651848db89e0c7b995c306)
drbd-utils: 8.9.6

# Steps to reproduce this issue:
Launch "drbdadm up r0-U" on node "siteB-node1"(can be reproduced as long as the peer node is stacked) to start stached resource, segment fault happend.

# r0.res 
resource r0 {
  protocol C;
  device    /dev/drbd2;
  disk      /dev/vdb3;
  meta-disk internal;

  on siteA-node1 {
    address    192.168.122.57:7900;
  }

  on siteA-node2 {
    address    192.168.122.29:7900;
  }
}

resource r0-U {
  protocol A;
  device     /dev/drbd5;

  stacked-on-top-of r0 {
    address    10.0.0.1:7910;
  }

  on siteB-node1 {
    disk       /dev/vdb3;
    address    10.0.0.2:7910; # Public IP of the backup node
    meta-disk  internal;
  }
}

# Coredump file:
#0  0x000000000040c547 in adm_new_peer (ctx=<optimized out>) at drbdadm_main.c:1617
1617		argv[NA(argc)] = ssprintf("%s", conn->peer->node_id);
(gdb) bt
#0  0x000000000040c547 in adm_new_peer (ctx=<optimized out>) at drbdadm_main.c:1617
#1  0x000000000040ba23 in __call_cmd_fn (ctx=ctx at entry=0x2478800, on_error=on_error at entry=KEEP_RUNNING) at drbdadm_main.c:607
#2  0x000000000040e078 in _run_deferred_cmds (stage=stage at entry=CFG_NET_PREP_UP) at drbdadm_main.c:722
#3  0x000000000040e15f in run_deferred_cmds () at drbdadm_main.c:758
#4  0x0000000000403368 in main (argc=3, argv=0x7ffdc0b4dcc8) at drbdadm_main.c:3339
(gdb) p *conn
$3 = {name = 0x0, paths = {stqh_first = 0x2478550, stqh_last = 0x2478598}, config_line = 0, peer_devices = {stqh_first = 0x2473750, 
    stqh_last = 0x2473780}, peer = 0x0, net_options = {stqh_first = 0x2478790, stqh_last = 0x2478790}, pd_options = {stqh_first = 0x0, 
    stqh_last = 0x2478528}, ignore = 0, ignore_tmp = 0, implicit = 1, is_standalone = 0, link = {stqe_next = 0x0}}

The conn->peer was set to 0x0, which cause this segment fault.

# Debug:
In create_implicit_connections() of post_parse, the name of struct hostname address for stacked peer is a fakename, which is concatenated by the lower nodes, in this case is "siteA-node1_siteA-node2". However, in set_peer_in_connection(), no matter stacked peer or not hostinfo is finding by compare to the on_hosts list, but "siteA-node1_siteA-node2" is not in this list. Result in fail to find the corresponding peer,so conn->peer is set to NULL.

The patch sent in last thread " [PATCH] drbdadm: Fix segfault while starting stacked resource in DRBD9" only change the behavior of finding host info of stacked peer. I tested against regular/stacked resource, the network connection can be find and establish successfully after apply the patch.

Best regards,
Nick