[DRBD-user] Kernel Oops on peer when removing LVM snapshot

Paul Gideon Dann pdgiddie at gmail.com
Tue Jun 16 12:51:07 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


This is an interesting (though frustrating) issue that I've run into with
DRBD+LVM, and having finally exhausted everything I can think of or find
myself, I'm hoping the mailing list might be able to offer some help!

My setup involves DRBD resources that are backed by LVM LVs, and then
formatted as PVs themselves, each forming its own VG.

System VG -> Backing LV -> DRBD -> Resource VG -> Resource LVs

The problem I'm having happens only for one DRBD resource, and not for any
of the others. This is what I do:

I create a snapshot of the Resource LV (meaning that the snapshot will also
be replicated via DRBD), and everything is fine. However, when I *remove*
the snapshot, the *secondary* peer oopses immediately:

====================
[  738.167953] BUG: unable to handle kernel NULL pointer dereference
at           (null)
[  738.167984] IP: [<ffffffffc09176fc>]
drbd_endio_write_sec_final+0x9c/0x490 [drbd]
[  738.168004] PGD 0
[  738.168010] Oops: 0002 [#1] SMP
[  738.168028] Modules linked in: dm_snapshot dm_bufio vhost_net vhost
macvtap macvlan ip6table_filter ip6_tables iptable_filter ip_tables
ebtable_nat ebtables x_tables 8021q garp mrp drbd lru_cache libcrc32c
bridge stp llc adt7475 hwmon_vid nouveau mxm_wmi wmi video ttm
drm_kms_helper
[  738.168192] CPU: 5 PID: 1963 Comm: drbd_r_vm-sql-s Not tainted
3.16.0-39-generic #53~14.04.1-Ubuntu
[  738.168199] Hardware name: Intel S5000XVN/S5000XVN, BIOS
S5000.86B.10.00.0084.101720071530 10/17/2007
[  738.168206] task: ffff8808292632f0 ti: ffff880824b60000 task.ti:
ffff880824b60000
[  738.168212] RIP: 0010:[<ffffffffc09176fc>]  [<ffffffffc09176fc>]
drbd_endio_write_sec_final+0x9c/0x490 [drbd]
[  738.168225] RSP: 0018:ffff880824b63ca0  EFLAGS: 00010093
[  738.168230] RAX: 0000000000000000 RBX: ffff88081647de80 RCX:
000000000000b028
[  738.168236] RDX: ffff88081647da00 RSI: 0000000000000202 RDI:
ffff880829cc26d0
[  738.168242] RBP: ffff880824b63d18 R08: 0000000000000246 R09:
0000000000000002
[  738.168247] R10: 0000000000000246 R11: 0000000000000005 R12:
ffff88082f8ffae0
[  738.168253] R13: ffff8804b5f46428 R14: ffff880829fd9800 R15:
ffff880829fd9bb0
[  738.168259] FS:  0000000000000000(0000) GS:ffff88085fd40000(0000)
knlGS:0000000000000000
[  738.168265] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  738.168270] CR2: 0000000000000000 CR3: 000000082a097000 CR4:
00000000000027e0
[  738.168276] Stack:
[  738.168279]  ffff880824b63ca8 ffff880800000000 0000000000060006
ffff88081647deb8
[  738.168290]  0000000000000000 0000000000000000 0000000007600800
0000000000400000
[  738.168300]  0000000000000000 0000000000000000 0000000007600800
0000000000000000
[  738.168310] Call Trace:
[  738.168321]  [<ffffffffc0927696>] drbd_submit_peer_request+0x86/0x360
[drbd]
[  738.168333]  [<ffffffffc09282d1>] receive_Data+0x3a1/0xfa0 [drbd]
[  738.168342]  [<ffffffffc091c73a>] ? drbd_recv+0x2a/0x1c0 [drbd]
[  738.168353]  [<ffffffffc092a255>] drbd_receiver+0x115/0x250 [drbd]
[  738.168364]  [<ffffffffc09345a0>] ? drbd_destroy_connection+0xc0/0xc0
[drbd]
[  738.168375]  [<ffffffffc09345eb>] drbd_thread_setup+0x4b/0x130 [drbd]
[  738.168385]  [<ffffffffc09345a0>] ? drbd_destroy_connection+0xc0/0xc0
[drbd]
[  738.168395]  [<ffffffff81091522>] kthread+0xd2/0xf0
[  738.168402]  [<ffffffff81091450>] ? kthread_create_on_node+0x1c0/0x1c0
[  738.168410]  [<ffffffff8176dd98>] ret_from_fork+0x58/0x90
[  738.168416]  [<ffffffff81091450>] ? kthread_create_on_node+0x1c0/0x1c0
[  738.168422] Code: 48 8d b8 d0 00 00 00 e8 73 62 e5 c0 8b 53 58 49 89 c2
c1 ea 09 41 01 96 54 02 00 00 49 83 fd ff 48 8b 13 48 8b 43 08 48 89 42 08
<48> 89 10 49 8b 86 c8 03 00 00 49 8d 96 c0 03 00 00 49 89 9e c8
[  738.168513] RIP  [<ffffffffc09176fc>]
drbd_endio_write_sec_final+0x9c/0x490 [drbd]
[  738.168524]  RSP <ffff880824b63ca0>
[  738.168528] CR2: 0000000000000000
====================

At that point, the IO stack seems to be completely frozen up: the drbd
kernel threads are stuck in D state, and the system becomes completely
unresponsive.

The system is Ubuntu Trusty 14.04.
Kernel is 3.16.0-39-generic
drbd-utils is 2:8.4.4-1ubuntu1

DRBD config for the resource is:

====================
 resource vm-sql-server {
 device /dev/drbd5;
 meta-disk internal;
 net {
   protocol A;
 }
 on mars {
   disk /dev/mars/drbd-backend-sql-server;
   address 192.168.254.101:7794;
 }
 on venus {
   disk /dev/venus/drbd-backend-sql-server;
   address 192.168.254.102:7794;
 }
}
====================

My LVM filter looks like this:
filter = [ "a|^/dev/sd.[0-9]+|" "a|^/dev/md[0-9]+|"
"a|^/dev/drbd/by-res/.*|" "r|.*|" ]

I've tried switching the protocol to C, and I've tried completely resyncing
the secondary. I'm out of ideas. Any help would be greatly appreciated!

Cheers,
Paul
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150616/9b65c404/attachment.htm>


More information about the drbd-user mailing list