[DRBD-user] DRBD Issues causing high server load

FAYE Jean Jacques Amos jean-jacques-amos.faye at laposte.fr
Thu May 3 10:18:06 CEST 2018


Hello,

We are experiencing problems with our DRBD replication system.

We upgraded from Ubuntu 12 to Ubuntu 14.04 (kernel 4.4.0-31-generic).

Our version of DRBD module is : 8.4.5
The userland version is : 8.4.4

We often have disconnection problems between primaries and secondaries node.

Here is a log when the problem occurs :

Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.085940] BUG: unable to handle kernel paging request at 0000000000001000
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.085984] IP: [<ffffffff813e1d96>] memcpy_orig+0x16/0x110
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086046] PGD 4f48b067 PUD 4f458067 PMD 0
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086075] Oops: 0000 [#1] SMP
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086097] Modules linked in: nfsv3 drbd vmw_vsock_vmci_transport vsock lru_cache libcrc32c nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache isofs coretem
p crct10dif_pclmul crc32_pclmul ppdev aesni_intel aes_x86_64 lrw gf128mul vmw_balloon glue_helper ablk_helper cryptd input_leds joydev serio_raw vmwgfx ttm drm_kms_helper 8250_fintek parport_pc drm fb_sys
_fops vmw_vmci syscopyarea shpchp sysfillrect i2c_piix4 sysimgblt mac_hid lp parport psmouse mptspi e1000 mptscsih mptbase scsi_transport_spi pata_acpi floppy vmw_pvscsi fjes vmxnet3 [last unloaded: drbd]
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086364] CPU: 0 PID: 11696 Comm: drbd_w_r0 Not tainted 4.4.0-31-generic #50~14.04.1-Ubuntu
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086388] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086418] task: ffff880077e52940 ti: ffff880078490000 task.ti: ffff880078490000
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086439] RIP: 0010:[<ffffffff813e1d96>]  [<ffffffff813e1d96>] memcpy_orig+0x16/0x110
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086477] RSP: 0018:ffff880078493ae0  EFLAGS: 00010202
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086500] RAX: ffff880077f523ec RBX: 00000000000002d4 RCX: 00000000000002bc
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086521] RDX: 0000000000000294 RSI: 0000000000001000 RDI: ffff880077f523ec
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086541] RBP: ffff880078493b20 R08: ffff880078493c30 R09: 352e31303a31303a
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086564] R10: 302b3334352e3130 R11: 6e6f635b20303032 R12: 0000000000000590
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.086594] R13: ffff880078493c50 R14: ffff880078493c60 R15: 0000000000000000
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.087575] FS:  0000000000000000(0000) GS:ffff88007c600000(0000) knlGS:0000000000000000
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.088273] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.089051] CR2: 0000000000001000 CR3: 000000004f46d000 CR4: 00000000000406f0
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.089899] Stack:
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.090670]  ffffffff813e65b7 ffff880078493c30 ffff880077f526c0 ffff880078675a00
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.091474]  ffff880077ecbc00 000000000000faf0 ffff880078493c40 0000000000000590
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.092233]  ffff880078493bb0 ffffffff8173f72d ffffffff8137c2b8 ffff880077e53218
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.092980] Call Trace:
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.093702]  [<ffffffff813e65b7>] ? copy_from_iter+0x2b7/0x2d0
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.094415]  [<ffffffff8173f72d>] tcp_sendmsg+0x61d/0xad0
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.095030]  [<ffffffff8137c2b8>] ? aa_sk_perm+0x78/0x230
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.095747]  [<ffffffff81769937>] inet_sendmsg+0x67/0xa0
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.096335]  [<ffffffff816d6d58>] sock_sendmsg+0x38/0x50
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.096982]  [<ffffffff816d6e6b>] kernel_sendmsg+0x2b/0x30
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.097549]  [<ffffffffc0438c40>] drbd_send+0xc0/0x1b0 [drbd]
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.098159]  [<ffffffffc043a641>] _drbd_no_send_page.isra.38+0x71/0xb0 [drbd]
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.098704]  [<ffffffffc043ab6c>] drbd_send_dblock+0x33c/0x630 [drbd]
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.099232]  [<ffffffff810bd784>] ? __wake_up+0x44/0x50
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.099756]  [<ffffffffc0420bf4>] w_send_dblock+0x94/0x150 [drbd]
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.100284]  [<ffffffffc0421eca>] drbd_worker+0xea/0x350 [drbd]
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.100996]  [<ffffffffc0437210>] ? drbd_destroy_connection+0x160/0x160 [drbd]
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.101562]  [<ffffffffc043725b>] drbd_thread_setup+0x4b/0x130 [drbd]
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.102120]  [<ffffffffc0437210>] ? drbd_destroy_connection+0x160/0x160 [drbd]
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.102612]  [<ffffffff8109b849>] kthread+0xc9/0xe0
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.103108]  [<ffffffff8109b780>] ? kthread_park+0x60/0x60
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.103573]  [<ffffffff817f72cf>] ret_from_fork+0x3f/0x70
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.104023]  [<ffffffff8109b780>] ? kthread_park+0x60/0x60
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.104461] Code: 0f 1f 44 00 00 48 89 f8 48 89 d1 f3 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 <4c> 8b 06 4c 8:
b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 4c 89
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.105828] RIP  [<ffffffff813e1d96>] memcpy_orig+0x16/0x110
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.106322]  RSP <ffff880078493ae0>
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.106766] CR2: 0000000000001000
Apr 28 05:30:28 XXXXXXXXXXXX kernel: [55510.108163] ---[ end trace 4291e6854eec7dff ]---
Apr 28 05:31:46 XXXXXXXXXXXX kernel: [55587.826993] block drbd1: Remote failed to finish a request within ko-count * timeout
Apr 28 05:31:46 XXXXXXXXXXXX kernel: [55587.828030] drbd r0: peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
Apr 28 05:40:12 XXXXXXXXXXXX rsyncd[54980]: name lookup failed for XXX.XX.XXX.XXX: Temporary failure in name resolution
Apr 28 05:40:12 XXXXXXXXXXXX rsyncd[54980]: connect from UNKNOWN (XXX.XX.XXX.XXX)
Apr 28 05:40:12 XXXXXXXXXXXX rsyncd[54980]: rsync to log/XXXXXXXXXXXX from UNKNOWN (XXX.XX.XXX.XXX)
Apr 28 05:40:12 XXXXXXXXXXXX rsyncd[54980]: receiving file list



Have you ever seen this kind of problem ? Do we have to downgrade the kernel version or upgrade the drbd version ?

Thanks in advance for your help.

Regards
Amos

Post-scriptum La Poste

Ce message est confidentiel. Sous reserve de tout accord conclu par
ecrit entre vous et La Poste, son contenu ne represente en aucun cas un engagement de la part de La Poste. Toute publication, utilisation ou diffusion, meme partielle, doit etre autorisee prealablement. Si vous n'etes pas destinataire de ce message, merci d'en avertir immediatement
l'expediteur.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20180503/e5d936aa/attachment-0001.htm>


More information about the drbd-user mailing list