[Drbd-dev] [PATCH] drbdadm: Fix segfault while starting stacked resource in DRBD9
Nick Wang
nwang at suse.com
Wed Jul 6 05:35:48 CEST 2016
Hi Roland,
>>> On 2016-7-4 at 17:30, in message <20160704093004.GK1324 at rck.sh>, Roland
Kammerer <roland.kammerer at linbit.com> wrote:
> The segfault happens because conn->peer is NULL? It could still be NULL
> with your patch. I need a real/complete configuration and please provide
> more information how and why you think this fixes anything. As is, I can
> not apply this patch, the intention is unclear to me and it looks broken.
Sorry for didn't describe clearly, information is list below:
# DRBD version:
drbd: 9.0.1-1 (GIT-hash: 86e443973082570aeb651848db89e0c7b995c306)
drbd-utils: 8.9.6
# Steps to reproduce this issue:
Launch "drbdadm up r0-U" on node "siteB-node1"(can be reproduced as long as the peer node is stacked) to start stached resource, segment fault happend.
# r0.res
resource r0 {
protocol C;
device /dev/drbd2;
disk /dev/vdb3;
meta-disk internal;
on siteA-node1 {
address 192.168.122.57:7900;
}
on siteA-node2 {
address 192.168.122.29:7900;
}
}
resource r0-U {
protocol A;
device /dev/drbd5;
stacked-on-top-of r0 {
address 10.0.0.1:7910;
}
on siteB-node1 {
disk /dev/vdb3;
address 10.0.0.2:7910; # Public IP of the backup node
meta-disk internal;
}
}
# Coredump file:
#0 0x000000000040c547 in adm_new_peer (ctx=<optimized out>) at drbdadm_main.c:1617
1617 argv[NA(argc)] = ssprintf("%s", conn->peer->node_id);
(gdb) bt
#0 0x000000000040c547 in adm_new_peer (ctx=<optimized out>) at drbdadm_main.c:1617
#1 0x000000000040ba23 in __call_cmd_fn (ctx=ctx at entry=0x2478800, on_error=on_error at entry=KEEP_RUNNING) at drbdadm_main.c:607
#2 0x000000000040e078 in _run_deferred_cmds (stage=stage at entry=CFG_NET_PREP_UP) at drbdadm_main.c:722
#3 0x000000000040e15f in run_deferred_cmds () at drbdadm_main.c:758
#4 0x0000000000403368 in main (argc=3, argv=0x7ffdc0b4dcc8) at drbdadm_main.c:3339
(gdb) p *conn
$3 = {name = 0x0, paths = {stqh_first = 0x2478550, stqh_last = 0x2478598}, config_line = 0, peer_devices = {stqh_first = 0x2473750,
stqh_last = 0x2473780}, peer = 0x0, net_options = {stqh_first = 0x2478790, stqh_last = 0x2478790}, pd_options = {stqh_first = 0x0,
stqh_last = 0x2478528}, ignore = 0, ignore_tmp = 0, implicit = 1, is_standalone = 0, link = {stqe_next = 0x0}}
The conn->peer was set to 0x0, which cause this segment fault.
# Debug:
In create_implicit_connections() of post_parse, the name of struct hostname address for stacked peer is a fakename, which is concatenated by the lower nodes, in this case is "siteA-node1_siteA-node2". However, in set_peer_in_connection(), no matter stacked peer or not hostinfo is finding by compare to the on_hosts list, but "siteA-node1_siteA-node2" is not in this list. Result in fail to find the corresponding peer,so conn->peer is set to NULL.
The patch sent in last thread " [PATCH] drbdadm: Fix segfault while starting stacked resource in DRBD9" only change the behavior of finding host info of stacked peer. I tested against regular/stacked resource, the network connection can be find and establish successfully after apply the patch.
Best regards,
Nick
More information about the drbd-dev
mailing list