[Drbd-dev] NULL pointer derefernce in 8.4.7-1
Eric Wheeler
drbd-dev at lists.ewheeler.net
Wed Mar 30 04:19:07 CEST 2016
Hello all,
We are getting kernel crashes in linux 4.1.20 with the drbd-8.4.git tree
at commit 3a6a769340ef93b1ba2792c6461250790795db49 .
I don't see anything in the newer commits that addresses this issue so
I'm posting---but I'll try the latest commit in master, too, just in case.
Please see the backtrace below. I also included our global_common.conf
further down. This is protocol A and the link is quite slow. This NULL
ptr dereference appears to show up when the drbd kernel thread is blocked
for a long time. It might happen at reconnect time because the BUG didn't
show up until 13 seconds after the P_BARRIER error.
The problem is pretty reproducable, so I can probably test patches.
Please let me know what I can do to help test.
Thanks!
--
Eric Wheeler
[ 2467.187849] block drbd7994: We did not send a P_BARRIER for 27003ms > ko-count (3) * timeout (90 * 0.1s); drbd kernel thread blocked?
22 or 23s elapse... and then:
[ 2480.674208] BUG: unable to handle kernel
ter dereference
at 0000000000000003
[ 2480.676403] IP:
[<ffffffff81357a96>] memcpy_erms+0x6/0x10
[ 2480.678547] PGD 0
[ 2480.680628] Oops: 0000 [#1]
[...]
[ 2480.700341] CPU: 7 PID: 23962 Comm: drbd_w_www3.ewh Tainted: G O 4.1.20
[ 2480.702612] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
[ 2480.704921] task: ffff8807c01d6e00 ti: ffff8807b7a88000 task.ti: ffff8807b7a88000
[ 2480.707206] RIP: 0010:[<ffffffff81357a96>]
[<ffffffff81357a96>] memcpy_erms+0x6/0x10
[ 2480.709537] RSP: 0018:ffff8807b7a8ba50 EFLAGS: 00010286
[ 2480.711774] RAX: ffff8807dc0cc4d8 RBX: 00000000000004a6 RCX: 00000000000004a6
[ 2480.714079] RDX: 00000000000004a6 RSI: 0000000000000003 RDI: ffff8807dc0cc4d8
[ 2480.716324] RBP: ffff8807b7a8ba98 R08: ffff8807b7a8bbd8 R09: ffff8807c01d7b28
[ 2480.718605] R10: 0000000000000000 R11: ffff88075891b800 R12: 0000000000000a50
[ 2480.720853] R13: ffff8807b7a8bbf8 R14: ffff8807b7a8bbf8 R15: 0000000000000000
[ 2480.723052] FS: 0000000000000000(0000) GS:ffff88082fdc0000(0000) knlGS:000000000000000
[ 2480.725221] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2480.727436] CR2: 0000000000000003 CR3: 0000000001a22000 CR4: 00000000001426e0
[ 2480.729500] Stack:
[ 2480.731538] ffffffff8135c40f
ffff8807b7a8bbd8
ffff8807dc0cc97e
ffff8807b7a8ba98
[ 2480.733562] ffff8807738cb600
ffff8807f7949000
000000000000fa9b
ffff8807b7a8bbe8
[ 2480.735620] 0000000000000a50
ffff8807b7a8bb48
ffffffff8161b76a
ffff88070000000c
[ 2480.737704] Call Trace:
[ 2480.739750] [<ffffffff8135c40f>] ? copy_from_iter+0x2bf/0x2e0
[ 2480.741834] [<ffffffff8161b76a>] tcp_sendmsg+0xa2a/0xb50
[ 2480.743914] [<ffffffff81646c54>] inet_sendmsg+0x64/0xa0
[ 2480.745905] [<ffffffff812d1403>] ? selinux_socket_sendmsg+0x23/0x30
[ 2480.747877] [<ffffffff815ac54d>] sock_sendmsg+0x3d/0x50
[ 2480.749813] [<ffffffff815ac67b>] kernel_sendmsg+0x2b/0x30
[ 2480.751713] [<ffffffffa06b0f26>] drbd_send+0xe6/0x200 [drbd]
[ 2480.753608] [<ffffffffa06b2b81>] _drbd_no_send_page.isra.40+0x71/0xb0 [drbd]
[ 2480.755463] [<ffffffffa06b3178>] drbd_send_dblock+0x3e8/0x7a0 [drbd]
[ 2480.757263] [<ffffffffa06a5874>] ? complete_master_bio+0x94/0x170 [drbd]
[ 2480.759073] [<ffffffffa06935cf>] w_send_dblock+0xaf/0x1e0 [drbd]
[ 2480.760844] [<ffffffffa06949a9>] drbd_worker+0xf9/0x3a0 [drbd]
[ 2480.762567] [<ffffffffa06aed00>] ? drbd_destroy_connection+0x190/0x190 [drbd]
[ 2480.764181] [<ffffffffa06aed1d>] drbd_thread_setup+0x1d/0x110 [drbd]
[ 2480.765777] [<ffffffffa06aed00>] ? drbd_destroy_connection+0x190/0x190 [drbd]
[ 2480.767337] [<ffffffff810c0b08>] kthread+0xd8/0xf0
[ 2480.768873] [<ffffffff810c0a30>] ? kthread_create_on_node+0x1b0/0x1b0
[ 2480.770409] [<ffffffff816e94e2>] ret_from_fork+0x42/0x70
[ 2480.771868] [<ffffffff810c0a30>] ? kthread_create_on_node+0x1b0/0x1b0
===> /etc/drbd.d/global_common.conf <===
common {
startup {
wfc-timeout 30;
outdated-wfc-timeout 20;
degr-wfc-timeout 30;
}
options {
on-no-data-accessible suspend-io;
}
syncer {
rate 500M;
}
disk {
al-extents 3389;
c-fill-target 10240;
c-delay-target 100;
c-plan-ahead 70;
c-min-rate 1024;
c-max-rate 400M;
on-io-error pass_on;
read-balancing when-congested-remote;
}
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri call-pri-lost-after-sb;
after-sb-2pri disconnect;
allow-two-primaries no;
protocol A;
cram-hmac-alg sha1;
verify-alg crc32c;
csums-alg crc32c;
max-buffers 8192;
max-epoch-size 8192;
tcp-cork yes;
sndbuf-size 1M;
rcvbuf-size 2M;
unplug-watermark 128;
ko-count 3;
timeout 90;
ping-int 10;
ping-timeout 30;
}
}
More information about the drbd-dev
mailing list