Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
This is an interesting (though frustrating) issue that I've run into with DRBD+LVM, and having finally exhausted everything I can think of or find myself, I'm hoping the mailing list might be able to offer some help! My setup involves DRBD resources that are backed by LVM LVs, and then formatted as PVs themselves, each forming its own VG. System VG -> Backing LV -> DRBD -> Resource VG -> Resource LVs The problem I'm having happens only for one DRBD resource, and not for any of the others. This is what I do: I create a snapshot of the Resource LV (meaning that the snapshot will also be replicated via DRBD), and everything is fine. However, when I *remove* the snapshot, the *secondary* peer oopses immediately: ==================== [ 738.167953] BUG: unable to handle kernel NULL pointer dereference at (null) [ 738.167984] IP: [<ffffffffc09176fc>] drbd_endio_write_sec_final+0x9c/0x490 [drbd] [ 738.168004] PGD 0 [ 738.168010] Oops: 0002 [#1] SMP [ 738.168028] Modules linked in: dm_snapshot dm_bufio vhost_net vhost macvtap macvlan ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables 8021q garp mrp drbd lru_cache libcrc32c bridge stp llc adt7475 hwmon_vid nouveau mxm_wmi wmi video ttm drm_kms_helper [ 738.168192] CPU: 5 PID: 1963 Comm: drbd_r_vm-sql-s Not tainted 3.16.0-39-generic #53~14.04.1-Ubuntu [ 738.168199] Hardware name: Intel S5000XVN/S5000XVN, BIOS S5000.86B.10.00.0084.101720071530 10/17/2007 [ 738.168206] task: ffff8808292632f0 ti: ffff880824b60000 task.ti: ffff880824b60000 [ 738.168212] RIP: 0010:[<ffffffffc09176fc>] [<ffffffffc09176fc>] drbd_endio_write_sec_final+0x9c/0x490 [drbd] [ 738.168225] RSP: 0018:ffff880824b63ca0 EFLAGS: 00010093 [ 738.168230] RAX: 0000000000000000 RBX: ffff88081647de80 RCX: 000000000000b028 [ 738.168236] RDX: ffff88081647da00 RSI: 0000000000000202 RDI: ffff880829cc26d0 [ 738.168242] RBP: ffff880824b63d18 R08: 0000000000000246 R09: 0000000000000002 [ 738.168247] R10: 0000000000000246 R11: 0000000000000005 R12: ffff88082f8ffae0 [ 738.168253] R13: ffff8804b5f46428 R14: ffff880829fd9800 R15: ffff880829fd9bb0 [ 738.168259] FS: 0000000000000000(0000) GS:ffff88085fd40000(0000) knlGS:0000000000000000 [ 738.168265] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 738.168270] CR2: 0000000000000000 CR3: 000000082a097000 CR4: 00000000000027e0 [ 738.168276] Stack: [ 738.168279] ffff880824b63ca8 ffff880800000000 0000000000060006 ffff88081647deb8 [ 738.168290] 0000000000000000 0000000000000000 0000000007600800 0000000000400000 [ 738.168300] 0000000000000000 0000000000000000 0000000007600800 0000000000000000 [ 738.168310] Call Trace: [ 738.168321] [<ffffffffc0927696>] drbd_submit_peer_request+0x86/0x360 [drbd] [ 738.168333] [<ffffffffc09282d1>] receive_Data+0x3a1/0xfa0 [drbd] [ 738.168342] [<ffffffffc091c73a>] ? drbd_recv+0x2a/0x1c0 [drbd] [ 738.168353] [<ffffffffc092a255>] drbd_receiver+0x115/0x250 [drbd] [ 738.168364] [<ffffffffc09345a0>] ? drbd_destroy_connection+0xc0/0xc0 [drbd] [ 738.168375] [<ffffffffc09345eb>] drbd_thread_setup+0x4b/0x130 [drbd] [ 738.168385] [<ffffffffc09345a0>] ? drbd_destroy_connection+0xc0/0xc0 [drbd] [ 738.168395] [<ffffffff81091522>] kthread+0xd2/0xf0 [ 738.168402] [<ffffffff81091450>] ? kthread_create_on_node+0x1c0/0x1c0 [ 738.168410] [<ffffffff8176dd98>] ret_from_fork+0x58/0x90 [ 738.168416] [<ffffffff81091450>] ? kthread_create_on_node+0x1c0/0x1c0 [ 738.168422] Code: 48 8d b8 d0 00 00 00 e8 73 62 e5 c0 8b 53 58 49 89 c2 c1 ea 09 41 01 96 54 02 00 00 49 83 fd ff 48 8b 13 48 8b 43 08 48 89 42 08 <48> 89 10 49 8b 86 c8 03 00 00 49 8d 96 c0 03 00 00 49 89 9e c8 [ 738.168513] RIP [<ffffffffc09176fc>] drbd_endio_write_sec_final+0x9c/0x490 [drbd] [ 738.168524] RSP <ffff880824b63ca0> [ 738.168528] CR2: 0000000000000000 ==================== At that point, the IO stack seems to be completely frozen up: the drbd kernel threads are stuck in D state, and the system becomes completely unresponsive. The system is Ubuntu Trusty 14.04. Kernel is 3.16.0-39-generic drbd-utils is 2:8.4.4-1ubuntu1 DRBD config for the resource is: ==================== resource vm-sql-server { device /dev/drbd5; meta-disk internal; net { protocol A; } on mars { disk /dev/mars/drbd-backend-sql-server; address 192.168.254.101:7794; } on venus { disk /dev/venus/drbd-backend-sql-server; address 192.168.254.102:7794; } } ==================== My LVM filter looks like this: filter = [ "a|^/dev/sd.[0-9]+|" "a|^/dev/md[0-9]+|" "a|^/dev/drbd/by-res/.*|" "r|.*|" ] I've tried switching the protocol to C, and I've tried completely resyncing the secondary. I'm out of ideas. Any help would be greatly appreciated! Cheers, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150616/9b65c404/attachment.htm>