[DRBD-user] online-verify crashed drbd-resource

joseph joseph at digiconcept.net
Fri Mar 26 12:36:04 CET 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello drbd-users,

I started the verify command on the primary if one of my drbd-resources 
(a mysql-db on drbd2: 63G of wich 1.2G are used). As it didn't actually 
start verifying (at /proc/drbd it stayed at 0%) but instead resulted in 
a load of over 50 I immediatly disconnected the resource (happened so 
fast, that i didn't actually pay attention if it was one of the drbd2_* 
processes or mysql that was responsible for the load).

Anyhow, now drbd2_receiver on my secondary is still running and can't 
even be killed with kill -9. That means, that without rebooting I 
probably won't be able to reconnect the two resources, right? Or does 
someone has an idea?


Here the output of dmesg on my secondary

[3465134.224587] block drbd2: Online Verify start sector: 0
[3465134.232913] BUG: unable to handle kernel NULL pointer dereference 
at 0000000000000030
[3465134.232913] IP: [<ffffffffa02308b2>] :drbd:w_e_end_ov_req+0x32/0x114
[3465134.232913] PGD 0
[3465134.232913] Oops: 0000 [1] SMP
[3465134.232913] CPU: 5
[3465134.232913] Modules linked in: tcp_diag inet_diag fuse ext2 
nls_utf8 cifs nls_base sha1_generic vzethdev vznetdev simfs vzrst vzcpt 
tun vzdquota vzmon vzdev xt_length ipt_ttl xt_tcpmss xt_multiport 
xt_dscp ipt_MASQUERADE xt_TCPMSS xt_tcpudp xt_state ipt_REJECT ipt_LOG 
xt_limit iptable_mangle iptable_nat nf_nat iptable_filter 
nf_conntrack_ftp nf_conntrack_irc nf_conntrack_ipv4 nf_conntrack 
ip_tables x_tables acpi_cpufreq cpufreq_powersave cpufreq_ondemand 
cpufreq_userspace cpufreq_conservative cpufreq_stats ocfs2_dlmfs 
ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs 
ipv6 f71882fg drbd cn loop snd_pcm snd_timer snd pcspkr soundcore wmi 
i2c_i801 snd_page_alloc evdev i2c_core button ext3 jbd mbcache dm_mirror 
dm_log dm_snapshot dm_mod e1000 ehci_hcd uhci_hcd sd_mod thermal fan 
r8168 freq_table processor thermal_sys raid10 raid456 async_xor 
async_memcpy async_tx xor raid1 raid0 md_mod atiixp ahci sata_nv 
sata_sil sata_via libata dock via82cxxx ide_core 3w_9xxx 3w_xxxx 
scsi_mod [last unloaded: scsi_wait_scan]
[3465134.232913] Pid: 7984, comm: drbd2_worker Not tainted 
2.6.26-2-openvz-amd64 #1 036test001
[3465134.232913] RIP: 0010:[<ffffffffa02308b2>]  [<ffffffffa02308b2>] 
:drbd:w_e_end_ov_req+0x32/0x114
[3465134.232913] RSP: 0018:ffff810313c73e90  EFLAGS: 00010202
[3465134.232913] RAX: 0000000000000000 RBX: ffff81031f887000 RCX: 
ffff81033d9ce000
[3465134.232913] RDX: 0000000000000000 RSI: 0000000000000010 RDI: 
ffff81031f887000
[3465134.232913] RBP: ffff81031f887000 R08: ffff81005b2881d0 R09: 
0000000000000004
[3465134.232913] R10: ffff81031f887108 R11: ffff81031f887000 R12: 
ffff81031f887630
[3465134.232913] R13: ffff8103374f80d0 R14: ffffffffa0254be2 R15: 
ffff81031f887640
[3465134.232913] FS:  0000000000000000(0000) GS:ffff81033d9bb0c0(0000) 
knlGS:0000000000000000
[3465134.232913] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[3465134.232913] CR2: 0000000000000030 CR3: 0000000000201000 CR4: 
00000000000006e0
[3465134.232913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[3465134.232913] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[3465134.232913] Process drbd2_worker (pid: 7984, veid=0, threadinfo 
ffff810313c72000, task ffff810048193810)
[3465134.232913] Stack:  ffff81031f887128 ffff8103374f80d0 
ffff81031f887000 ffff81031f887630
[3465134.232913]  0000000000000000 ffffffffa022f10e ffff810313c73ec0 
ffff810313c73ec0
[3465134.232913]  ffffffff80423446 0000000164627264 0000000000646165 
ffff81031f887630
[3465134.232913] Call Trace:
[3465134.232913]  [<ffffffffa022f10e>] ? :drbd:drbd_worker+0x23e/0x409
[3465134.232913]  [<ffffffff80423446>] ? schedule_timeout+0x85/0xad
[3465134.232913]  [<ffffffffa02458c6>] ? 
:drbd:drbd_thread_setup+0x124/0x1bb
[3465134.232913]  [<ffffffff8020d048>] ? child_rip+0xa/0x12
[3465134.232913]  [<ffffffffa02457a2>] ? :drbd:drbd_thread_setup+0x0/0x1bb
[3465134.232913]  [<ffffffff8020d03e>] ? child_rip+0x0/0x12
[3465134.232913]
[3465134.232913]
[3465134.232913] Code: 55 53 48 89 fb 48 83 ec 08 85 d2 0f 85 ac 00 00 
00 48 8b 46 20 f6 40 18 01 0f 84 9e 00 00 00 48 8b 87 d8 05 00 00 be 10 
00 00 00 <44> 8b 60 30 49 63 fc e8 9f b8 06 e0 48 85 c0 48 89 c5 74 7e 49
[3465134.232913] RIP  [<ffffffffa02308b2>] :drbd:w_e_end_ov_req+0x32/0x114
[3465134.232913]  RSP <ffff810313c73e90>
[3465134.232913] CR2: 0000000000000030
[3465134.232913] ---[ end trace 1a320c0fb997ccd3 ]---


[3465665.496416] block drbd2: Online Verify reached sector 0
[3465665.497035] block drbd2: drbd_pp_alloc interrupted!
[3465665.497035] block drbd2: alloc_ee: Allocation of a page failed
[3465665.497035] block drbd2: error receiving OVRequest, l: 24!
[3465665.499844] block drbd2: asender terminated
[3465665.499844] block drbd2: Terminating asender thread

that's what the primary had to say about that:

[4432776.531522] block drbd2: conn( Connected -> VerifyS )
[4432776.531522] block drbd2: Starting Online Verify from sector 0
[4433306.255700] block drbd2: peer( Secondary -> Unknown ) conn( VerifyS 
-> TearDown ) pdsk( UpToDate -> DUnknown )
[4433306.255700] block drbd2: Online Verify reached sector 0
[4433306.255852] block drbd2: Creating new current UUID
[4433306.256527] block drbd2: meta connection shut down by peer.
[4433306.256527] block drbd2: asender terminated
[4433306.256527] block drbd2: Terminating asender thread
[4433306.284947] block drbd2: Connection closed
[4433306.284947] block drbd2: conn( TearDown -> Unconnected )
[4433306.284947] block drbd2: receiver terminated
[4433306.284947] block drbd2: Restarting receiver thread
[4433306.284947] block drbd2: receiver (re)started
[4433306.284947] block drbd2: conn( Unconnected -> WFConnection )

thanks a lot for reading,

Joe



More information about the drbd-user mailing list