Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Howdy folks, I’m having some problems getting a very basic 2 node environment setup with DRBDv9 on Ubuntu 16.04 in Google’s cloud. I’m using the ubuntu-1604-xenial-v20160627 image with basically this additional customization: add-apt-repository ppa:linbit/linbit-drbd9-stack apt update apt install drbd-utils python-drbdmanage drbd-dkms That appears to function and compiles the kernel module. My vm's have a 100gb disk attached as /dev/sdb and I have been able to mostly get something working with: on elysium-test01 vm: vgcreate drbdpool /dev/sdb drbdmanage init 10.12.0.2 drbdmanage add-node elysium-test02 10.12.0.3 drbdmanage add-resource data01 drbdmanage add-volume data01 90gb drbdmanage assign-resource data01 elysium-test01 drbdmanage assign-resource data01 elysium-test02 on elysium-test02 vm: vgcreate drbdpool /dev/sdb drbdmanage join -p 6999 10.12.0.3 1 elysium-test01 10.12.0.2 0 mUEU/uPLZOAFpkZGgmlT At this point checking with drbd-overview it looks like everything is happy and connected, though elysium-test02 is inconsistent. on elysium-test01 vm: mkfs.ext4 -F -E discard /dev/drbd100 mkdir -p /mnt/disks/data01 mount -o discard,defaults /dev/drbd100 /mnt/disks/data01 At this point everything looks okay, and logs show that elysium-test01 is now the primary for data01. Then the problems start, on the elysium-test02 node, after a few seconds the logs show "BUG: unable to handle kernel NULL pointer dereference at (null)” <snip> Jul 14 01:11:57 ubuntu kernel: [ 444.132269] drbd data01/0 drbd100 elysium-test01: received new current UUID: 096A8FD96D357D37 Jul 14 01:11:59 ubuntu kernel: [ 445.934934] drbd data01/0 drbd100 elysium-test01: Resync done (total 70 sec; paused 0 sec; 1255580 K/sec) Jul 14 01:11:59 ubuntu kernel: [ 445.934942] drbd data01/0 drbd100 elysium-test01: updated UUIDs 096A8FD96D357D36:0000000000000000:0000000000000000:0000000000000000 Jul 14 01:11:59 ubuntu kernel: [ 445.934954] drbd data01/0 drbd100: disk( Inconsistent -> UpToDate ) Jul 14 01:11:59 ubuntu kernel: [ 445.934957] drbd data01/0 drbd100 elysium-test01: repl( SyncTarget -> Established ) Jul 14 01:11:59 ubuntu kernel: [ 445.936014] drbd data01/0 drbd100 elysium-test01: helper command: /sbin/drbdadm after-resync-target Jul 14 01:11:59 ubuntu drbdadm[12829]: Don't know which config file belongs to resource data01, trying default ones... Jul 14 01:11:59 ubuntu kernel: [ 445.942075] drbd data01/0 drbd100 elysium-test01: helper command: /sbin/drbdadm after-resync-target exit code 0 (0x0) Jul 14 01:12:01 ubuntu kernel: [ 448.254494] drbd data01 elysium-test01: peer( Primary -> Secondary ) Jul 14 01:12:12 ubuntu kernel: [ 458.528749] drbd data01 elysium-test01: Preparing remote state change 504089555 (primary_nodes=0, weak_nodes=0) Jul 14 01:12:12 ubuntu kernel: [ 458.530153] drbd data01 elysium-test01: Committing remote state change 504089555 Jul 14 01:12:12 ubuntu kernel: [ 458.530168] drbd data01 elysium-test01: peer( Secondary -> Primary ) Jul 14 01:12:14 ubuntu kernel: [ 460.832177] BUG: unable to handle kernel NULL pointer dereference at (null) Jul 14 01:12:14 ubuntu kernel: [ 460.840403] IP: [<ffffffff813f91ed>] memcpy_orig+0x9d/0x110 Jul 14 01:12:14 ubuntu kernel: [ 460.846205] PGD 0 Jul 14 01:12:14 ubuntu kernel: [ 460.848811] Oops: 0002 [#1] SMP Jul 14 01:12:14 ubuntu kernel: [ 460.852394] Modules linked in: drbd_transport_tcp(OE) drbd(OE) ip6table_filter ip6_tables iptable_filter ip_tables x_tables ppdev serio_raw parport_pc pvpanic parport ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi Jul 14 01:12:14 ubuntu kernel: [ 460.905902] CPU: 0 PID: 12729 Comm: drbd_r_data01 Tainted: G OE 4.4.0-28-generic #47-Ubuntu Jul 14 01:12:14 ubuntu kernel: [ 460.915567] Hardware name: Google Google/Google, BIOS Google 01/01/2011 Jul 14 01:12:14 ubuntu kernel: [ 460.922378] task: ffff8800b991e040 ti: ffff8800b9af8000 task.ti: ffff8800b9af8000 Jul 14 01:12:14 ubuntu kernel: [ 460.930084] RIP: 0010:[<ffffffff813f91ed>] [<ffffffff813f91ed>] memcpy_orig+0x9d/0x110 Jul 14 01:12:14 ubuntu kernel: [ 460.938329] RSP: 0018:ffff8800b9afb9a8 EFLAGS: 00010202 Jul 14 01:12:14 ubuntu kernel: [ 460.943760] RAX: 0000000000000000 RBX: 0000000000000012 RCX: 0000000000000200 Jul 14 01:12:14 ubuntu kernel: [ 460.951091] RDX: 0000000000000012 RSI: ffff8800b9db80ae RDI: 0000000000000000 Jul 14 01:12:14 ubuntu kernel: [ 460.958348] RBP: ffff8800b9afb9e0 R08: 0000000000000000 R09: 0000000000000000 Jul 14 01:12:14 ubuntu kernel: [ 460.965591] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800b9afbbb0 Jul 14 01:12:14 ubuntu kernel: [ 460.972841] R13: 0000000000000012 R14: 0000000000000012 R15: ffff8800b9afbb90 Jul 14 01:12:14 ubuntu kernel: [ 460.980087] FS: 0000000000000000(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000 Jul 14 01:12:14 ubuntu kernel: [ 460.988289] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 14 01:12:14 ubuntu kernel: [ 460.994144] CR2: 0000000000000000 CR3: 00000000ba48c000 CR4: 00000000001406f0 Jul 14 01:12:14 ubuntu kernel: [ 461.001397] Stack: Jul 14 01:12:14 ubuntu kernel: [ 461.003631] ffffffff813fde16 ffff8800b9db80c0 0000000000000200 000000000000003e Jul 14 01:12:14 ubuntu kernel: [ 461.011633] 0000000000000012 0000000000000012 000000000000002c ffff8800b9afba40 Jul 14 01:12:14 ubuntu kernel: [ 461.019793] ffffffff8170f018 0000000000000000 ffff88012aa42580 0000000000000002 Jul 14 01:12:14 ubuntu kernel: [ 461.027911] Call Trace: Jul 14 01:12:14 ubuntu kernel: [ 461.030577] [<ffffffff813fde16>] ? copy_to_iter+0x1b6/0x260 Jul 14 01:12:14 ubuntu kernel: [ 461.036358] [<ffffffff8170f018>] skb_copy_datagram_iter+0x68/0x280 Jul 14 01:12:14 ubuntu kernel: [ 461.042960] [<ffffffff817694e3>] tcp_recvmsg+0x613/0xbe0 Jul 14 01:12:14 ubuntu kernel: [ 461.048567] [<ffffffff8179740e>] inet_recvmsg+0x7e/0xb0 Jul 14 01:12:14 ubuntu kernel: [ 461.053987] [<ffffffff816ffa3b>] sock_recvmsg+0x3b/0x50 Jul 14 01:12:14 ubuntu kernel: [ 461.059409] [<ffffffff816ffb91>] kernel_recvmsg+0x61/0x80 Jul 14 01:12:14 ubuntu kernel: [ 461.065002] [<ffffffffc02a9703>] dtt_recv_short+0x63/0x80 [drbd_transport_tcp] Jul 14 01:12:14 ubuntu kernel: [ 461.072666] [<ffffffffc02a97e0>] dtt_recv+0xc0/0x180 [drbd_transport_tcp] Jul 14 01:12:14 ubuntu kernel: [ 461.079771] [<ffffffffc0335f88>] drbd_recv+0x48/0x1f0 [drbd] Jul 14 01:12:14 ubuntu kernel: [ 461.085894] [<ffffffff816ffa3b>] ? sock_recvmsg+0x3b/0x50 Jul 14 01:12:14 ubuntu kernel: [ 461.091699] [<ffffffffc033ef98>] read_in_block+0xa8/0x350 [drbd] Jul 14 01:12:14 ubuntu kernel: [ 461.097937] [<ffffffffc0342140>] ? e_end_resync_block+0x110/0x110 [drbd] Jul 14 01:12:14 ubuntu kernel: [ 461.104848] [<ffffffffc0342250>] receive_Data+0x110/0xcb0 [drbd] Jul 14 01:12:14 ubuntu kernel: [ 461.111071] [<ffffffffc02a9803>] ? dtt_recv+0xe3/0x180 [drbd_transport_tcp] Jul 14 01:12:14 ubuntu kernel: [ 461.118253] [<ffffffffc0335f88>] ? drbd_recv+0x48/0x1f0 [drbd] Jul 14 01:12:14 ubuntu kernel: [ 461.124304] [<ffffffffc0342140>] ? e_end_resync_block+0x110/0x110 [drbd] Jul 14 01:12:14 ubuntu kernel: [ 461.131361] [<ffffffffc0342140>] ? e_end_resync_block+0x110/0x110 [drbd] Jul 14 01:12:14 ubuntu kernel: [ 461.138269] [<ffffffffc0345ee4>] drbd_receiver+0x3e4/0x620 [drbd] Jul 14 01:12:14 ubuntu kernel: [ 461.144573] [<ffffffffc0350420>] ? idr_has_entry+0x10/0x10 [drbd] Jul 14 01:12:14 ubuntu kernel: [ 461.150873] [<ffffffffc035047e>] drbd_thread_setup+0x5e/0x110 [drbd] Jul 14 01:12:14 ubuntu kernel: [ 461.157453] [<ffffffffc0350420>] ? idr_has_entry+0x10/0x10 [drbd] Jul 14 01:12:14 ubuntu kernel: [ 461.163750] [<ffffffff810a0808>] kthread+0xd8/0xf0 Jul 14 01:12:14 ubuntu kernel: [ 461.168754] [<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0 Jul 14 01:12:14 ubuntu kernel: [ 461.175394] [<ffffffff81827a4f>] ret_from_fork+0x3f/0x70 Jul 14 01:12:14 ubuntu kernel: [ 461.180914] [<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0 Jul 14 01:12:14 ubuntu kernel: [ 461.187563] Code: 57 e8 4c 89 5f e0 48 8d 7f e0 73 d2 83 c2 20 48 29 d6 48 29 d7 83 fa 10 72 24 4c 8b 06 4c 8b 4e 08 4c 8b 54 16 f0 4c 8b 5c 16 f8 <4c> 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 90 83 fa Jul 14 01:12:14 ubuntu kernel: [ 461.214595] RIP [<ffffffff813f91ed>] memcpy_orig+0x9d/0x110 Jul 14 01:12:14 ubuntu kernel: [ 461.220601] RSP <ffff8800b9afb9a8> Jul 14 01:12:14 ubuntu kernel: [ 461.224205] CR2: 0000000000000000 Jul 14 01:12:14 ubuntu kernel: [ 461.227643] ---[ end trace 670dbe9e8d37a576 ]--- </snip> After this, nothing seems to work properly (on either vms). Attempts to unmount the volume hang, other commands like drbd-overview hang; eventually I have to reboot both vms to get back to some sort of sanity, yet DRBD still is basically non-functional and causing kennel errors :-( Anyone have any idea whats wrong? Thanks! —jason -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160713/60970b91/attachment.htm>