Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello drbd-users, I started the verify command on the primary if one of my drbd-resources (a mysql-db on drbd2: 63G of wich 1.2G are used). As it didn't actually start verifying (at /proc/drbd it stayed at 0%) but instead resulted in a load of over 50 I immediatly disconnected the resource (happened so fast, that i didn't actually pay attention if it was one of the drbd2_* processes or mysql that was responsible for the load). Anyhow, now drbd2_receiver on my secondary is still running and can't even be killed with kill -9. That means, that without rebooting I probably won't be able to reconnect the two resources, right? Or does someone has an idea? Here the output of dmesg on my secondary [3465134.224587] block drbd2: Online Verify start sector: 0 [3465134.232913] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 [3465134.232913] IP: [<ffffffffa02308b2>] :drbd:w_e_end_ov_req+0x32/0x114 [3465134.232913] PGD 0 [3465134.232913] Oops: 0000 [1] SMP [3465134.232913] CPU: 5 [3465134.232913] Modules linked in: tcp_diag inet_diag fuse ext2 nls_utf8 cifs nls_base sha1_generic vzethdev vznetdev simfs vzrst vzcpt tun vzdquota vzmon vzdev xt_length ipt_ttl xt_tcpmss xt_multiport xt_dscp ipt_MASQUERADE xt_TCPMSS xt_tcpudp xt_state ipt_REJECT ipt_LOG xt_limit iptable_mangle iptable_nat nf_nat iptable_filter nf_conntrack_ftp nf_conntrack_irc nf_conntrack_ipv4 nf_conntrack ip_tables x_tables acpi_cpufreq cpufreq_powersave cpufreq_ondemand cpufreq_userspace cpufreq_conservative cpufreq_stats ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs ipv6 f71882fg drbd cn loop snd_pcm snd_timer snd pcspkr soundcore wmi i2c_i801 snd_page_alloc evdev i2c_core button ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod e1000 ehci_hcd uhci_hcd sd_mod thermal fan r8168 freq_table processor thermal_sys raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 md_mod atiixp ahci sata_nv sata_sil sata_via libata dock via82cxxx ide_core 3w_9xxx 3w_xxxx scsi_mod [last unloaded: scsi_wait_scan] [3465134.232913] Pid: 7984, comm: drbd2_worker Not tainted 2.6.26-2-openvz-amd64 #1 036test001 [3465134.232913] RIP: 0010:[<ffffffffa02308b2>] [<ffffffffa02308b2>] :drbd:w_e_end_ov_req+0x32/0x114 [3465134.232913] RSP: 0018:ffff810313c73e90 EFLAGS: 00010202 [3465134.232913] RAX: 0000000000000000 RBX: ffff81031f887000 RCX: ffff81033d9ce000 [3465134.232913] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffff81031f887000 [3465134.232913] RBP: ffff81031f887000 R08: ffff81005b2881d0 R09: 0000000000000004 [3465134.232913] R10: ffff81031f887108 R11: ffff81031f887000 R12: ffff81031f887630 [3465134.232913] R13: ffff8103374f80d0 R14: ffffffffa0254be2 R15: ffff81031f887640 [3465134.232913] FS: 0000000000000000(0000) GS:ffff81033d9bb0c0(0000) knlGS:0000000000000000 [3465134.232913] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [3465134.232913] CR2: 0000000000000030 CR3: 0000000000201000 CR4: 00000000000006e0 [3465134.232913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [3465134.232913] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [3465134.232913] Process drbd2_worker (pid: 7984, veid=0, threadinfo ffff810313c72000, task ffff810048193810) [3465134.232913] Stack: ffff81031f887128 ffff8103374f80d0 ffff81031f887000 ffff81031f887630 [3465134.232913] 0000000000000000 ffffffffa022f10e ffff810313c73ec0 ffff810313c73ec0 [3465134.232913] ffffffff80423446 0000000164627264 0000000000646165 ffff81031f887630 [3465134.232913] Call Trace: [3465134.232913] [<ffffffffa022f10e>] ? :drbd:drbd_worker+0x23e/0x409 [3465134.232913] [<ffffffff80423446>] ? schedule_timeout+0x85/0xad [3465134.232913] [<ffffffffa02458c6>] ? :drbd:drbd_thread_setup+0x124/0x1bb [3465134.232913] [<ffffffff8020d048>] ? child_rip+0xa/0x12 [3465134.232913] [<ffffffffa02457a2>] ? :drbd:drbd_thread_setup+0x0/0x1bb [3465134.232913] [<ffffffff8020d03e>] ? child_rip+0x0/0x12 [3465134.232913] [3465134.232913] [3465134.232913] Code: 55 53 48 89 fb 48 83 ec 08 85 d2 0f 85 ac 00 00 00 48 8b 46 20 f6 40 18 01 0f 84 9e 00 00 00 48 8b 87 d8 05 00 00 be 10 00 00 00 <44> 8b 60 30 49 63 fc e8 9f b8 06 e0 48 85 c0 48 89 c5 74 7e 49 [3465134.232913] RIP [<ffffffffa02308b2>] :drbd:w_e_end_ov_req+0x32/0x114 [3465134.232913] RSP <ffff810313c73e90> [3465134.232913] CR2: 0000000000000030 [3465134.232913] ---[ end trace 1a320c0fb997ccd3 ]--- [3465665.496416] block drbd2: Online Verify reached sector 0 [3465665.497035] block drbd2: drbd_pp_alloc interrupted! [3465665.497035] block drbd2: alloc_ee: Allocation of a page failed [3465665.497035] block drbd2: error receiving OVRequest, l: 24! [3465665.499844] block drbd2: asender terminated [3465665.499844] block drbd2: Terminating asender thread that's what the primary had to say about that: [4432776.531522] block drbd2: conn( Connected -> VerifyS ) [4432776.531522] block drbd2: Starting Online Verify from sector 0 [4433306.255700] block drbd2: peer( Secondary -> Unknown ) conn( VerifyS -> TearDown ) pdsk( UpToDate -> DUnknown ) [4433306.255700] block drbd2: Online Verify reached sector 0 [4433306.255852] block drbd2: Creating new current UUID [4433306.256527] block drbd2: meta connection shut down by peer. [4433306.256527] block drbd2: asender terminated [4433306.256527] block drbd2: Terminating asender thread [4433306.284947] block drbd2: Connection closed [4433306.284947] block drbd2: conn( TearDown -> Unconnected ) [4433306.284947] block drbd2: receiver terminated [4433306.284947] block drbd2: Restarting receiver thread [4433306.284947] block drbd2: receiver (re)started [4433306.284947] block drbd2: conn( Unconnected -> WFConnection ) thanks a lot for reading, Joe