[DRBD-user] Problems with DRBDv9 on Ubuntu 16.04 in Google Cloud

Jason Dillon jason at planet57.com
Thu Jul 14 04:13:34 CEST 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Howdy folks, I’m having some problems getting a very basic 2 node environment setup with DRBDv9 on Ubuntu 16.04 in Google’s cloud.

I’m using the ubuntu-1604-xenial-v20160627 image with basically this additional customization:

add-apt-repository ppa:linbit/linbit-drbd9-stack
apt update
apt install drbd-utils python-drbdmanage drbd-dkms

That appears to function and compiles the kernel module.

My vm's have a 100gb disk attached as /dev/sdb and I have been able to mostly get something working with:

on elysium-test01 vm:

vgcreate drbdpool /dev/sdb
drbdmanage init 10.12.0.2
drbdmanage add-node elysium-test02 10.12.0.3
drbdmanage add-resource data01
drbdmanage add-volume data01 90gb
drbdmanage assign-resource data01 elysium-test01
drbdmanage assign-resource data01 elysium-test02

on elysium-test02 vm:

vgcreate drbdpool /dev/sdb
drbdmanage join -p 6999 10.12.0.3 1 elysium-test01 10.12.0.2 0 mUEU/uPLZOAFpkZGgmlT

At this point checking with drbd-overview it looks like everything is happy and connected, though elysium-test02 is inconsistent.

on elysium-test01 vm:

mkfs.ext4 -F -E discard /dev/drbd100
mkdir -p /mnt/disks/data01
mount -o discard,defaults /dev/drbd100 /mnt/disks/data01

At this point everything looks okay, and logs show that elysium-test01 is now the primary for data01.

Then the problems start, on the elysium-test02 node, after a few seconds the logs show "BUG: unable to handle kernel NULL pointer dereference at           (null)”

<snip>
Jul 14 01:11:57 ubuntu kernel: [  444.132269] drbd data01/0 drbd100 elysium-test01: received new current UUID: 096A8FD96D357D37
Jul 14 01:11:59 ubuntu kernel: [  445.934934] drbd data01/0 drbd100 elysium-test01: Resync done (total 70 sec; paused 0 sec; 1255580 K/sec)
Jul 14 01:11:59 ubuntu kernel: [  445.934942] drbd data01/0 drbd100 elysium-test01: updated UUIDs 096A8FD96D357D36:0000000000000000:0000000000000000:0000000000000000
Jul 14 01:11:59 ubuntu kernel: [  445.934954] drbd data01/0 drbd100: disk( Inconsistent -> UpToDate )
Jul 14 01:11:59 ubuntu kernel: [  445.934957] drbd data01/0 drbd100 elysium-test01: repl( SyncTarget -> Established )
Jul 14 01:11:59 ubuntu kernel: [  445.936014] drbd data01/0 drbd100 elysium-test01: helper command: /sbin/drbdadm after-resync-target
Jul 14 01:11:59 ubuntu drbdadm[12829]: Don't know which config file belongs to resource data01, trying default ones...
Jul 14 01:11:59 ubuntu kernel: [  445.942075] drbd data01/0 drbd100 elysium-test01: helper command: /sbin/drbdadm after-resync-target exit code 0 (0x0)
Jul 14 01:12:01 ubuntu kernel: [  448.254494] drbd data01 elysium-test01: peer( Primary -> Secondary )
Jul 14 01:12:12 ubuntu kernel: [  458.528749] drbd data01 elysium-test01: Preparing remote state change 504089555 (primary_nodes=0, weak_nodes=0)
Jul 14 01:12:12 ubuntu kernel: [  458.530153] drbd data01 elysium-test01: Committing remote state change 504089555
Jul 14 01:12:12 ubuntu kernel: [  458.530168] drbd data01 elysium-test01: peer( Secondary -> Primary )
Jul 14 01:12:14 ubuntu kernel: [  460.832177] BUG: unable to handle kernel NULL pointer dereference at           (null)
Jul 14 01:12:14 ubuntu kernel: [  460.840403] IP: [<ffffffff813f91ed>] memcpy_orig+0x9d/0x110
Jul 14 01:12:14 ubuntu kernel: [  460.846205] PGD 0 
Jul 14 01:12:14 ubuntu kernel: [  460.848811] Oops: 0002 [#1] SMP 
Jul 14 01:12:14 ubuntu kernel: [  460.852394] Modules linked in: drbd_transport_tcp(OE) drbd(OE) ip6table_filter ip6_tables iptable_filter ip_tables x_tables ppdev serio_raw parport_pc pvpanic parport ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
Jul 14 01:12:14 ubuntu kernel: [  460.905902] CPU: 0 PID: 12729 Comm: drbd_r_data01 Tainted: G           OE   4.4.0-28-generic #47-Ubuntu
Jul 14 01:12:14 ubuntu kernel: [  460.915567] Hardware name: Google Google/Google, BIOS Google 01/01/2011
Jul 14 01:12:14 ubuntu kernel: [  460.922378] task: ffff8800b991e040 ti: ffff8800b9af8000 task.ti: ffff8800b9af8000
Jul 14 01:12:14 ubuntu kernel: [  460.930084] RIP: 0010:[<ffffffff813f91ed>]  [<ffffffff813f91ed>] memcpy_orig+0x9d/0x110
Jul 14 01:12:14 ubuntu kernel: [  460.938329] RSP: 0018:ffff8800b9afb9a8  EFLAGS: 00010202
Jul 14 01:12:14 ubuntu kernel: [  460.943760] RAX: 0000000000000000 RBX: 0000000000000012 RCX: 0000000000000200
Jul 14 01:12:14 ubuntu kernel: [  460.951091] RDX: 0000000000000012 RSI: ffff8800b9db80ae RDI: 0000000000000000
Jul 14 01:12:14 ubuntu kernel: [  460.958348] RBP: ffff8800b9afb9e0 R08: 0000000000000000 R09: 0000000000000000
Jul 14 01:12:14 ubuntu kernel: [  460.965591] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800b9afbbb0
Jul 14 01:12:14 ubuntu kernel: [  460.972841] R13: 0000000000000012 R14: 0000000000000012 R15: ffff8800b9afbb90
Jul 14 01:12:14 ubuntu kernel: [  460.980087] FS:  0000000000000000(0000) GS:ffff88012fc00000(0000) knlGS:0000000000000000
Jul 14 01:12:14 ubuntu kernel: [  460.988289] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 14 01:12:14 ubuntu kernel: [  460.994144] CR2: 0000000000000000 CR3: 00000000ba48c000 CR4: 00000000001406f0
Jul 14 01:12:14 ubuntu kernel: [  461.001397] Stack:
Jul 14 01:12:14 ubuntu kernel: [  461.003631]  ffffffff813fde16 ffff8800b9db80c0 0000000000000200 000000000000003e
Jul 14 01:12:14 ubuntu kernel: [  461.011633]  0000000000000012 0000000000000012 000000000000002c ffff8800b9afba40
Jul 14 01:12:14 ubuntu kernel: [  461.019793]  ffffffff8170f018 0000000000000000 ffff88012aa42580 0000000000000002
Jul 14 01:12:14 ubuntu kernel: [  461.027911] Call Trace:
Jul 14 01:12:14 ubuntu kernel: [  461.030577]  [<ffffffff813fde16>] ? copy_to_iter+0x1b6/0x260
Jul 14 01:12:14 ubuntu kernel: [  461.036358]  [<ffffffff8170f018>] skb_copy_datagram_iter+0x68/0x280
Jul 14 01:12:14 ubuntu kernel: [  461.042960]  [<ffffffff817694e3>] tcp_recvmsg+0x613/0xbe0
Jul 14 01:12:14 ubuntu kernel: [  461.048567]  [<ffffffff8179740e>] inet_recvmsg+0x7e/0xb0
Jul 14 01:12:14 ubuntu kernel: [  461.053987]  [<ffffffff816ffa3b>] sock_recvmsg+0x3b/0x50
Jul 14 01:12:14 ubuntu kernel: [  461.059409]  [<ffffffff816ffb91>] kernel_recvmsg+0x61/0x80
Jul 14 01:12:14 ubuntu kernel: [  461.065002]  [<ffffffffc02a9703>] dtt_recv_short+0x63/0x80 [drbd_transport_tcp]
Jul 14 01:12:14 ubuntu kernel: [  461.072666]  [<ffffffffc02a97e0>] dtt_recv+0xc0/0x180 [drbd_transport_tcp]
Jul 14 01:12:14 ubuntu kernel: [  461.079771]  [<ffffffffc0335f88>] drbd_recv+0x48/0x1f0 [drbd]
Jul 14 01:12:14 ubuntu kernel: [  461.085894]  [<ffffffff816ffa3b>] ? sock_recvmsg+0x3b/0x50
Jul 14 01:12:14 ubuntu kernel: [  461.091699]  [<ffffffffc033ef98>] read_in_block+0xa8/0x350 [drbd]
Jul 14 01:12:14 ubuntu kernel: [  461.097937]  [<ffffffffc0342140>] ? e_end_resync_block+0x110/0x110 [drbd]
Jul 14 01:12:14 ubuntu kernel: [  461.104848]  [<ffffffffc0342250>] receive_Data+0x110/0xcb0 [drbd]
Jul 14 01:12:14 ubuntu kernel: [  461.111071]  [<ffffffffc02a9803>] ? dtt_recv+0xe3/0x180 [drbd_transport_tcp]
Jul 14 01:12:14 ubuntu kernel: [  461.118253]  [<ffffffffc0335f88>] ? drbd_recv+0x48/0x1f0 [drbd]
Jul 14 01:12:14 ubuntu kernel: [  461.124304]  [<ffffffffc0342140>] ? e_end_resync_block+0x110/0x110 [drbd]
Jul 14 01:12:14 ubuntu kernel: [  461.131361]  [<ffffffffc0342140>] ? e_end_resync_block+0x110/0x110 [drbd]
Jul 14 01:12:14 ubuntu kernel: [  461.138269]  [<ffffffffc0345ee4>] drbd_receiver+0x3e4/0x620 [drbd]
Jul 14 01:12:14 ubuntu kernel: [  461.144573]  [<ffffffffc0350420>] ? idr_has_entry+0x10/0x10 [drbd]
Jul 14 01:12:14 ubuntu kernel: [  461.150873]  [<ffffffffc035047e>] drbd_thread_setup+0x5e/0x110 [drbd]
Jul 14 01:12:14 ubuntu kernel: [  461.157453]  [<ffffffffc0350420>] ? idr_has_entry+0x10/0x10 [drbd]
Jul 14 01:12:14 ubuntu kernel: [  461.163750]  [<ffffffff810a0808>] kthread+0xd8/0xf0
Jul 14 01:12:14 ubuntu kernel: [  461.168754]  [<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0
Jul 14 01:12:14 ubuntu kernel: [  461.175394]  [<ffffffff81827a4f>] ret_from_fork+0x3f/0x70
Jul 14 01:12:14 ubuntu kernel: [  461.180914]  [<ffffffff810a0730>] ? kthread_create_on_node+0x1e0/0x1e0
Jul 14 01:12:14 ubuntu kernel: [  461.187563] Code: 57 e8 4c 89 5f e0 48 8d 7f e0 73 d2 83 c2 20 48 29 d6 48 29 d7 83 fa 10 72 24 4c 8b 06 4c 8b 4e 08 4c 8b 54 16 f0 4c 8b 5c 16 f8 <4c> 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 90 83 fa 
Jul 14 01:12:14 ubuntu kernel: [  461.214595] RIP  [<ffffffff813f91ed>] memcpy_orig+0x9d/0x110
Jul 14 01:12:14 ubuntu kernel: [  461.220601]  RSP <ffff8800b9afb9a8>
Jul 14 01:12:14 ubuntu kernel: [  461.224205] CR2: 0000000000000000
Jul 14 01:12:14 ubuntu kernel: [  461.227643] ---[ end trace 670dbe9e8d37a576 ]---
</snip>

After this, nothing seems to work properly (on either vms).  Attempts to unmount the volume hang, other commands like drbd-overview hang; eventually I have to reboot both vms to get back to some sort of sanity, yet DRBD still is basically non-functional and causing kennel errors :-(

Anyone have any idea whats wrong?

Thanks!

—jason



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160713/60970b91/attachment.htm>


More information about the drbd-user mailing list