[DRBD-user] ASSERT FAILED: drbd_worker - big trouble

Krzysztof Witek kroko at witek.fr
Thu Oct 13 22:09:40 CEST 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi guys,

I am having big troubles with drbd.
The resources detch themselves. I get a lot of this logs from the kernel 
on both nodes:

[11304.180242] PAX: refcount overflow detected in: drbd2_worker:2212, 
uid/euid: 0/0
[11304.180243] CPU 1
[11304.180243] Modules linked in: ipt_LOG drbd lru_cache ip6table_filter 
ip6table_mangle ip6_tables ipt_REJECT xt_recent xt_state xt_tcpudp 
iptable_filter iptable_mangle kvm_intel kvm authenc esp4 ah4 
xfrm4_mode_transport deflate zlib_deflate ctr twofish_generic 
twofish_x86_64_3way twofish_x86_64 twofish_common camellia serpent 
blowfish_generic blowfish_x86_64 blowfish_common cast5 des_generic xcbc 
rmd160 sha512_generic crypto_null af_key xt_addrtype i7core_edac 
ipt_MASQUERADE nfsd shpchp iptable_nat nf_nat nfs nf_conntrack_ipv4 
nf_conntrack nf_defrag_ipv4 ip_tables x_tables lockd psmouse dcdbas 
mac_hid serio_raw acpi_power_meter wmi edac_core bridge fscache tpm_tis 
auth_rpcgss stp nfs_acl sunrpc lp parport usbhid hid ses enclosure 
megaraid_sas bnx2
[11304.180261]
[11304.180262] Pid: 2212, comm: drbd2_worker Not tainted 3.2.29-grsec
[11304.180264] RIP: 0010:[<ffffffffa03fd9d3>] [<ffffffffa03fd9d3>] 
bm_page_io_async+0x1a3/0x250 [drbd]
[11304.180267] RSP: 0018:ffff880612fb5c60 EFLAGS: 00000a12
[11304.180268] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 
0000000000000000
[11304.180269] RDX: 000000000000a23e RSI: ffffffff8132d670 RDI: 
ffff8805fae32144
[11304.180271] RBP: ffff880612fb5cf0 R08: 0000000000000000 R09: 
ffff88060907c0c0
[11304.180272] R10: 00000001b77407b0 R11: 0000000000000000 R12: 
ffff88060907c0c0
[11304.180273] R13: ffff8806125ba000 R14: ffff880612fb5d20 R15: 
ffff8805fc629200
[11304.180274] FS: 0000000000000000(0000) GS:ffff88063f620000(0000) 
knlGS:0000000000000000
[11304.180275] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[11304.180276] CR2: 000001866c3c1158 CR3: 00000000016bf000 CR4: 
00000000000006f0
[11304.180277] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[11304.180279] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[11304.180280] Process drbd2_worker (pid: 2212, threadinfo 
ffff8805fa419630, task ffff8805fa419100)
[11304.180281] Stack:
[11304.180281] ffff8805fa419630 00000000dbb9ffff fffe484800000001 
00000000dbb9fff8
[11304.180283] ffff880612fb5c90 ffffea0018288670 ffff880612fb5cb0 
ffff8805fa419630
[11304.180285] 0000000000000a9b 0000000000000000 0000000000002a9a 
ffff8806125ba100
[11304.180288] Call Trace:
[11304.180291] [<ffffffffa03fdf28>] bm_rw+0x178/0x440 [drbd]
[11304.180294] [<ffffffffa03ff7a5>] drbd_bm_write+0x15/0x20 [drbd]
[11304.180298] [<ffffffffa041c672>] w_bitmap_io+0xe2/0x2a0 [drbd]
[11304.180302] [<ffffffffa04059be>] drbd_worker+0x21e/0x4c0 [drbd]
[11304.180306] [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180310] [<ffffffffa0419434>] drbd_thread_setup+0x64/0xf0 [drbd]
[11304.180314] [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180316] [<ffffffff8108dafc>] kthread+0x8c/0xa0
[11304.180318] [<ffffffff8169c744>] kernel_thread_helper+0x4/0x10
[11304.180320] [<ffffffff8108da70>] ? flush_kthread_worker+0xa0/0xa0
[11304.180322] [<ffffffff8169c740>] ? gs_change+0x13/0x13
[11304.180322] Code: 80 4d 89 74 24 58 4c 89 e6 49 c7 44 24 50 80 da 3f 
a0 e8 a1 2d f0 e0 f0 41 01 9d d8 0a 00 00 71 0a f0 41 29 9d d8 0a 00 00 
cd 04 <48> 83 c4 68 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00
[11304.180335] Call Trace:
[11304.180338] [<ffffffffa03fdf28>] bm_rw+0x178/0x440 [drbd]
[11304.180341] [<ffffffffa03ff7a5>] drbd_bm_write+0x15/0x20 [drbd]
[11304.180345] [<ffffffffa041c672>] w_bitmap_io+0xe2/0x2a0 [drbd]
[11304.180349] [<ffffffffa04059be>] drbd_worker+0x21e/0x4c0 [drbd]
[11304.180353] [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180357] [<ffffffffa0419434>] drbd_thread_setup+0x64/0xf0 [drbd]
[11304.180362] [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180363] [<ffffffff8108dafc>] kthread+0x8c/0xa0
[11304.180365] [<ffffffff8169c744>] kernel_thread_helper+0x4/0x10
[11304.180367] [<ffffffff8108da70>] ? flush_kthread_worker+0xa0/0xa0
[11304.180369] [<ffffffff8169c740>] ? gs_change+0x13/0x13
[11304.180371] block drbd2: bitmap WRITE of 5871 pages took 265 jiffies
[11304.180382] block drbd2: 734 GB (192322088 bits) marked out-of-sync 
by on disk bit-map.
[11304.180386] block drbd2: ASSERT FAILED: drbd_worker: 
(get_t_state(thi) == Running) in drivers/block/drbd/drbd_worker.c:1645
[11304.180465] block drbd2: Connection closed
[11304.180469] block drbd2: conn( BrokenPipe -> Unconnected )
[11304.180471] block drbd2: receiver terminated
[11304.180472] block drbd2: Restarting drbd2_receiver
[11304.180473] block drbd2: receiver (re)started
[11304.180476] block drbd2: conn( Unconnected -> WFConnection )

Does this speak to anyone?
What is the risk for data? What should be the recommended action?
Thank you

I am having big troubles with drbd.
The resources detch themselves. I get a lot of this logs from the kernel 
on both nodes:
<pre>
[11304.180242] PAX: refcount overflow detected in: drbd2_worker:2212, 
uid/euid: 0/0
[11304.180243] CPU 1
[11304.180243] Modules linked in: ipt_LOG drbd lru_cache ip6table_filter 
ip6table_mangle ip6_tables ipt_REJECT xt_recent xt_state xt_tcpudp 
iptable_filter iptable_mangle kvm_intel kvm authenc esp4 ah4 
xfrm4_mode_transport deflate zlib_deflate ctr twofish_generic 
twofish_x86_64_3way twofish_x86_64 twofish_common camellia serpent 
blowfish_generic blowfish_x86_64 blowfish_common cast5 des_generic xcbc 
rmd160 sha512_generic crypto_null af_key xt_addrtype i7core_edac 
ipt_MASQUERADE nfsd shpchp iptable_nat nf_nat nfs nf_conntrack_ipv4 
nf_conntrack nf_defrag_ipv4 ip_tables x_tables lockd psmouse dcdbas 
mac_hid serio_raw acpi_power_meter wmi edac_core bridge fscache tpm_tis 
auth_rpcgss stp nfs_acl sunrpc lp parport usbhid hid ses enclosure 
megaraid_sas bnx2
[11304.180261]
[11304.180262] Pid: 2212, comm: drbd2_worker Not tainted 3.2.29-grsec
[11304.180264] RIP: 0010:[<ffffffffa03fd9d3>]  [<ffffffffa03fd9d3>] 
bm_page_io_async+0x1a3/0x250 [drbd]
[11304.180267] RSP: 0018:ffff880612fb5c60 EFLAGS: 00000a12
[11304.180268] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 
0000000000000000
[11304.180269] RDX: 000000000000a23e RSI: ffffffff8132d670 RDI: 
ffff8805fae32144
[11304.180271] RBP: ffff880612fb5cf0 R08: 0000000000000000 R09: 
ffff88060907c0c0
[11304.180272] R10: 00000001b77407b0 R11: 0000000000000000 R12: 
ffff88060907c0c0
[11304.180273] R13: ffff8806125ba000 R14: ffff880612fb5d20 R15: 
ffff8805fc629200
[11304.180274] FS:  0000000000000000(0000) GS:ffff88063f620000(0000) 
knlGS:0000000000000000
[11304.180275] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[11304.180276] CR2: 000001866c3c1158 CR3: 00000000016bf000 CR4: 
00000000000006f0
[11304.180277] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[11304.180279] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[11304.180280] Process drbd2_worker (pid: 2212, threadinfo 
ffff8805fa419630, task ffff8805fa419100)
[11304.180281] Stack:
[11304.180281]  ffff8805fa419630 00000000dbb9ffff fffe484800000001 
00000000dbb9fff8
[11304.180283]  ffff880612fb5c90 ffffea0018288670 ffff880612fb5cb0 
ffff8805fa419630
[11304.180285]  0000000000000a9b 0000000000000000 0000000000002a9a 
ffff8806125ba100
[11304.180288] Call Trace:
[11304.180291]  [<ffffffffa03fdf28>] bm_rw+0x178/0x440 [drbd]
[11304.180294]  [<ffffffffa03ff7a5>] drbd_bm_write+0x15/0x20 [drbd]
[11304.180298]  [<ffffffffa041c672>] w_bitmap_io+0xe2/0x2a0 [drbd]
[11304.180302]  [<ffffffffa04059be>] drbd_worker+0x21e/0x4c0 [drbd]
[11304.180306]  [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180310]  [<ffffffffa0419434>] drbd_thread_setup+0x64/0xf0 [drbd]
[11304.180314]  [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180316]  [<ffffffff8108dafc>] kthread+0x8c/0xa0
[11304.180318]  [<ffffffff8169c744>] kernel_thread_helper+0x4/0x10
[11304.180320]  [<ffffffff8108da70>] ? flush_kthread_worker+0xa0/0xa0
[11304.180322]  [<ffffffff8169c740>] ? gs_change+0x13/0x13
[11304.180322] Code: 80 4d 89 74 24 58 4c 89 e6 49 c7 44 24 50 80 da 3f 
a0 e8 a1 2d f0 e0 f0 41 01 9d d8 0a 00 00 71 0a f0 41 29 9d d8 0a 00 00 
cd 04 <48> 83 c4 68 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00
[11304.180335] Call Trace:
[11304.180338]  [<ffffffffa03fdf28>] bm_rw+0x178/0x440 [drbd]
[11304.180341]  [<ffffffffa03ff7a5>] drbd_bm_write+0x15/0x20 [drbd]
[11304.180345]  [<ffffffffa041c672>] w_bitmap_io+0xe2/0x2a0 [drbd]
[11304.180349]  [<ffffffffa04059be>] drbd_worker+0x21e/0x4c0 [drbd]
[11304.180353]  [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180357]  [<ffffffffa0419434>] drbd_thread_setup+0x64/0xf0 [drbd]
[11304.180362]  [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180363]  [<ffffffff8108dafc>] kthread+0x8c/0xa0
[11304.180365]  [<ffffffff8169c744>] kernel_thread_helper+0x4/0x10
[11304.180367]  [<ffffffff8108da70>] ? flush_kthread_worker+0xa0/0xa0
[11304.180369]  [<ffffffff8169c740>] ? gs_change+0x13/0x13
[11304.180371] block drbd2: bitmap WRITE of 5871 pages took 265 jiffies
[11304.180382] block drbd2: 734 GB (192322088 bits) marked out-of-sync 
by on disk bit-map.
[11304.180386] block drbd2: ASSERT FAILED: drbd_worker: 
(get_t_state(thi) == Running) in drivers/block/drbd/drbd_worker.c:1645
[11304.180465] block drbd2: Connection closed
[11304.180469] block drbd2: conn( BrokenPipe -> Unconnected )
[11304.180471] block drbd2: receiver terminated
[11304.180472] block drbd2: Restarting drbd2_receiver
[11304.180473] block drbd2: receiver (re)started
[11304.180476] block drbd2: conn( Unconnected -> WFConnection )

</pre>
Does this speak to anyone?
What is the risk for data?
Thank you
Hi guys,

I am having big troubles with drbd.
The resources detch themselves. I get a lot of this logs from the kernel 
on both nodes:
<pre>
[11304.180242] PAX: refcount overflow detected in: drbd2_worker:2212, 
uid/euid: 0/0
[11304.180243] CPU 1
[11304.180243] Modules linked in: ipt_LOG drbd lru_cache ip6table_filter 
ip6table_mangle ip6_tables ipt_REJECT xt_recent xt_state xt_tcpudp 
iptable_filter iptable_mangle kvm_intel kvm authenc esp4 ah4 
xfrm4_mode_transport deflate zlib_deflate ctr twofish_generic 
twofish_x86_64_3way twofish_x86_64 twofish_common camellia serpent 
blowfish_generic blowfish_x86_64 blowfish_common cast5 des_generic xcbc 
rmd160 sha512_generic crypto_null af_key xt_addrtype i7core_edac 
ipt_MASQUERADE nfsd shpchp iptable_nat nf_nat nfs nf_conntrack_ipv4 
nf_conntrack nf_defrag_ipv4 ip_tables x_tables lockd psmouse dcdbas 
mac_hid serio_raw acpi_power_meter wmi edac_core bridge fscache tpm_tis 
auth_rpcgss stp nfs_acl sunrpc lp parport usbhid hid ses enclosure 
megaraid_sas bnx2
[11304.180261]
[11304.180262] Pid: 2212, comm: drbd2_worker Not tainted 3.2.29-grsec
[11304.180264] RIP: 0010:[<ffffffffa03fd9d3>]  [<ffffffffa03fd9d3>] 
bm_page_io_async+0x1a3/0x250 [drbd]
[11304.180267] RSP: 0018:ffff880612fb5c60 EFLAGS: 00000a12
[11304.180268] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 
0000000000000000
[11304.180269] RDX: 000000000000a23e RSI: ffffffff8132d670 RDI: 
ffff8805fae32144
[11304.180271] RBP: ffff880612fb5cf0 R08: 0000000000000000 R09: 
ffff88060907c0c0
[11304.180272] R10: 00000001b77407b0 R11: 0000000000000000 R12: 
ffff88060907c0c0
[11304.180273] R13: ffff8806125ba000 R14: ffff880612fb5d20 R15: 
ffff8805fc629200
[11304.180274] FS:  0000000000000000(0000) GS:ffff88063f620000(0000) 
knlGS:0000000000000000
[11304.180275] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[11304.180276] CR2: 000001866c3c1158 CR3: 00000000016bf000 CR4: 
00000000000006f0
[11304.180277] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[11304.180279] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[11304.180280] Process drbd2_worker (pid: 2212, threadinfo 
ffff8805fa419630, task ffff8805fa419100)
[11304.180281] Stack:
[11304.180281]  ffff8805fa419630 00000000dbb9ffff fffe484800000001 
00000000dbb9fff8
[11304.180283]  ffff880612fb5c90 ffffea0018288670 ffff880612fb5cb0 
ffff8805fa419630
[11304.180285]  0000000000000a9b 0000000000000000 0000000000002a9a 
ffff8806125ba100
[11304.180288] Call Trace:
[11304.180291]  [<ffffffffa03fdf28>] bm_rw+0x178/0x440 [drbd]
[11304.180294]  [<ffffffffa03ff7a5>] drbd_bm_write+0x15/0x20 [drbd]
[11304.180298]  [<ffffffffa041c672>] w_bitmap_io+0xe2/0x2a0 [drbd]
[11304.180302]  [<ffffffffa04059be>] drbd_worker+0x21e/0x4c0 [drbd]
[11304.180306]  [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180310]  [<ffffffffa0419434>] drbd_thread_setup+0x64/0xf0 [drbd]
[11304.180314]  [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180316]  [<ffffffff8108dafc>] kthread+0x8c/0xa0
[11304.180318]  [<ffffffff8169c744>] kernel_thread_helper+0x4/0x10
[11304.180320]  [<ffffffff8108da70>] ? flush_kthread_worker+0xa0/0xa0
[11304.180322]  [<ffffffff8169c740>] ? gs_change+0x13/0x13
[11304.180322] Code: 80 4d 89 74 24 58 4c 89 e6 49 c7 44 24 50 80 da 3f 
a0 e8 a1 2d f0 e0 f0 41 01 9d d8 0a 00 00 71 0a f0 41 29 9d d8 0a 00 00 
cd 04 <48> 83 c4 68 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00
[11304.180335] Call Trace:
[11304.180338]  [<ffffffffa03fdf28>] bm_rw+0x178/0x440 [drbd]
[11304.180341]  [<ffffffffa03ff7a5>] drbd_bm_write+0x15/0x20 [drbd]
[11304.180345]  [<ffffffffa041c672>] w_bitmap_io+0xe2/0x2a0 [drbd]
[11304.180349]  [<ffffffffa04059be>] drbd_worker+0x21e/0x4c0 [drbd]
[11304.180353]  [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180357]  [<ffffffffa0419434>] drbd_thread_setup+0x64/0xf0 [drbd]
[11304.180362]  [<ffffffffa04193d0>] ? drbd_open+0xb0/0xb0 [drbd]
[11304.180363]  [<ffffffff8108dafc>] kthread+0x8c/0xa0
[11304.180365]  [<ffffffff8169c744>] kernel_thread_helper+0x4/0x10
[11304.180367]  [<ffffffff8108da70>] ? flush_kthread_worker+0xa0/0xa0
[11304.180369]  [<ffffffff8169c740>] ? gs_change+0x13/0x13
[11304.180371] block drbd2: bitmap WRITE of 5871 pages took 265 jiffies
[11304.180382] block drbd2: 734 GB (192322088 bits) marked out-of-sync 
by on disk bit-map.
[11304.180386] block drbd2: ASSERT FAILED: drbd_worker: 
(get_t_state(thi) == Running) in drivers/block/drbd/drbd_worker.c:1645
[11304.180465] block drbd2: Connection closed
[11304.180469] block drbd2: conn( BrokenPipe -> Unconnected )
[11304.180471] block drbd2: receiver terminated
[11304.180472] block drbd2: Restarting drbd2_receiver
[11304.180473] block drbd2: receiver (re)started
[11304.180476] block drbd2: conn( Unconnected -> WFConnection )

</pre>
Does this speak to anyone?
What is the risk for data?
Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20161013/921f2dd9/attachment.htm>


More information about the drbd-user mailing list