Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
So no ideas concerning this, then? I've seen the same thing happen on another resource, now. Actually, it doesn't need to be a snapshot: removing any logical volume causes the oops. It doesn't happen for every resource, though. I wonder if it's something to do with the frequency of other I/O? They both have intermittent spikes in I/O (from databases), but on average are not under heavy load. I've tried destroying the resource completely, re-creating both sides from scratch, creating a new LV on the resource and copying the data back onto it, but the same thing is happening again (oops on remote when I create and remove an LV). What can I do to debug this further? Paul On 16 June 2015 at 11:51, Paul Gideon Dann <pdgiddie at gmail.com> wrote: > This is an interesting (though frustrating) issue that I've run into with > DRBD+LVM, and having finally exhausted everything I can think of or find > myself, I'm hoping the mailing list might be able to offer some help! > > My setup involves DRBD resources that are backed by LVM LVs, and then > formatted as PVs themselves, each forming its own VG. > > System VG -> Backing LV -> DRBD -> Resource VG -> Resource LVs > > The problem I'm having happens only for one DRBD resource, and not for any > of the others. This is what I do: > > I create a snapshot of the Resource LV (meaning that the snapshot will > also be replicated via DRBD), and everything is fine. However, when I > *remove* the snapshot, the *secondary* peer oopses immediately: > > ==================== > [ 738.167953] BUG: unable to handle kernel NULL pointer dereference > at (null) > [ 738.167984] IP: [<ffffffffc09176fc>] > drbd_endio_write_sec_final+0x9c/0x490 [drbd] > [ 738.168004] PGD 0 > [ 738.168010] Oops: 0002 [#1] SMP > [ 738.168028] Modules linked in: dm_snapshot dm_bufio vhost_net vhost > macvtap macvlan ip6table_filter ip6_tables iptable_filter ip_tables > ebtable_nat ebtables x_tables 8021q garp mrp drbd lru_cache libcrc32c > bridge stp llc adt7475 hwmon_vid nouveau mxm_wmi wmi video ttm > drm_kms_helper > [ 738.168192] CPU: 5 PID: 1963 Comm: drbd_r_vm-sql-s Not tainted > 3.16.0-39-generic #53~14.04.1-Ubuntu > [ 738.168199] Hardware name: Intel S5000XVN/S5000XVN, BIOS > S5000.86B.10.00.0084.101720071530 10/17/2007 > [ 738.168206] task: ffff8808292632f0 ti: ffff880824b60000 task.ti: > ffff880824b60000 > [ 738.168212] RIP: 0010:[<ffffffffc09176fc>] [<ffffffffc09176fc>] > drbd_endio_write_sec_final+0x9c/0x490 [drbd] > [ 738.168225] RSP: 0018:ffff880824b63ca0 EFLAGS: 00010093 > [ 738.168230] RAX: 0000000000000000 RBX: ffff88081647de80 RCX: > 000000000000b028 > [ 738.168236] RDX: ffff88081647da00 RSI: 0000000000000202 RDI: > ffff880829cc26d0 > [ 738.168242] RBP: ffff880824b63d18 R08: 0000000000000246 R09: > 0000000000000002 > [ 738.168247] R10: 0000000000000246 R11: 0000000000000005 R12: > ffff88082f8ffae0 > [ 738.168253] R13: ffff8804b5f46428 R14: ffff880829fd9800 R15: > ffff880829fd9bb0 > [ 738.168259] FS: 0000000000000000(0000) GS:ffff88085fd40000(0000) > knlGS:0000000000000000 > [ 738.168265] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 738.168270] CR2: 0000000000000000 CR3: 000000082a097000 CR4: > 00000000000027e0 > [ 738.168276] Stack: > [ 738.168279] ffff880824b63ca8 ffff880800000000 0000000000060006 > ffff88081647deb8 > [ 738.168290] 0000000000000000 0000000000000000 0000000007600800 > 0000000000400000 > [ 738.168300] 0000000000000000 0000000000000000 0000000007600800 > 0000000000000000 > [ 738.168310] Call Trace: > [ 738.168321] [<ffffffffc0927696>] drbd_submit_peer_request+0x86/0x360 > [drbd] > [ 738.168333] [<ffffffffc09282d1>] receive_Data+0x3a1/0xfa0 [drbd] > [ 738.168342] [<ffffffffc091c73a>] ? drbd_recv+0x2a/0x1c0 [drbd] > [ 738.168353] [<ffffffffc092a255>] drbd_receiver+0x115/0x250 [drbd] > [ 738.168364] [<ffffffffc09345a0>] ? drbd_destroy_connection+0xc0/0xc0 > [drbd] > [ 738.168375] [<ffffffffc09345eb>] drbd_thread_setup+0x4b/0x130 [drbd] > [ 738.168385] [<ffffffffc09345a0>] ? drbd_destroy_connection+0xc0/0xc0 > [drbd] > [ 738.168395] [<ffffffff81091522>] kthread+0xd2/0xf0 > [ 738.168402] [<ffffffff81091450>] ? kthread_create_on_node+0x1c0/0x1c0 > [ 738.168410] [<ffffffff8176dd98>] ret_from_fork+0x58/0x90 > [ 738.168416] [<ffffffff81091450>] ? kthread_create_on_node+0x1c0/0x1c0 > [ 738.168422] Code: 48 8d b8 d0 00 00 00 e8 73 62 e5 c0 8b 53 58 49 89 c2 > c1 ea 09 41 01 96 54 02 00 00 49 83 fd ff 48 8b 13 48 8b 43 08 48 89 42 08 > <48> 89 10 49 8b 86 c8 03 00 00 49 8d 96 c0 03 00 00 49 89 9e c8 > [ 738.168513] RIP [<ffffffffc09176fc>] > drbd_endio_write_sec_final+0x9c/0x490 [drbd] > [ 738.168524] RSP <ffff880824b63ca0> > [ 738.168528] CR2: 0000000000000000 > ==================== > > At that point, the IO stack seems to be completely frozen up: the drbd > kernel threads are stuck in D state, and the system becomes completely > unresponsive. > > The system is Ubuntu Trusty 14.04. > Kernel is 3.16.0-39-generic > drbd-utils is 2:8.4.4-1ubuntu1 > > DRBD config for the resource is: > > ==================== > resource vm-sql-server { > device /dev/drbd5; > meta-disk internal; > net { > protocol A; > } > on mars { > disk /dev/mars/drbd-backend-sql-server; > address 192.168.254.101:7794; > } > on venus { > disk /dev/venus/drbd-backend-sql-server; > address 192.168.254.102:7794; > } > } > ==================== > > My LVM filter looks like this: > filter = [ "a|^/dev/sd.[0-9]+|" "a|^/dev/md[0-9]+|" > "a|^/dev/drbd/by-res/.*|" "r|.*|" ] > > I've tried switching the protocol to C, and I've tried completely > resyncing the secondary. I'm out of ideas. Any help would be greatly > appreciated! > > Cheers, > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150622/835bc989/attachment.htm>