Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I have a dual node Xen/drbd cluster that got a problem last week-end, running 2.6.24-23-xen from ubuntu with self-compiled drbd 8.3.1. for some reason a SAS disk got kicked by its controller, drbd properly detected it : May 17 07:44:20 ifvm1 kernel: [9678143.435599] scsi 0:0:3:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK May 17 07:44:20 ifvm1 kernel: [9678143.435604] end_request: I/O error, dev sdd, sector 255302231 May 17 07:44:20 ifvm1 kernel: [9678143.435609] drbd1: Method to ensure write ordering: flush May 17 07:44:20 ifvm1 kernel: [9678143.435622] drbd1: disk( UpToDate -> Failed ) May 17 07:44:20 ifvm1 kernel: [9678143.435625] drbd1: Local IO failed. Detaching... May 17 07:44:20 ifvm1 kernel: [9692200.309651] drbd1: disk( Failed -> Diskless ) May 17 07:44:20 ifvm1 kernel: [9692200.309675] drbd1: Notified peer that my disk is broken. that drbd was secondary on that host, the primary was still running fine on the other node so I left it untouched till monday when I removed the failing drive from the server. Adding a working drive in the server, I wanted to attach this drive to drbd but this failed. I am not sure which exact command I used at this time, probably "drbdadm attach r1" which gave : May 18 10:59:00 ifvm1 kernel: [9791064.037185] drbd1: drbd_nl_disk_conf: mdev->bc not NULL. so I tried to down this drbd and it OOPs-ed : May 18 10:59:37 ifvm1 kernel: [9791101.108093] drbd1: drbd_nl_disk_conf: mdev->bc not NULL. May 18 10:59:37 ifvm1 kernel: [9791101.108127] Unable to handle kernel paging request at 000000010000002c RIP: May 18 10:59:37 ifvm1 kernel: [9791101.108138] [<ffffffff8029f130>] fput+0x0/0x20 May 18 10:59:37 ifvm1 kernel: [9791101.108150] PGD 9b30067 PUD 0 May 18 10:59:37 ifvm1 kernel: [9791101.108154] Oops: 0002 [1] SMP May 18 10:59:37 ifvm1 kernel: [9791101.108157] CPU 1 May 18 10:59:37 ifvm1 kernel: [9791101.108159] Modules linked in: drbd af_packet xt_physdev ipt_LOG xt_state xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack bridge 8021q sbs dock sbshc ac battery container video output iptable_filter ip_tables x_tables cn coretemp ipmi_devintf ipmi_si ipmi_watchdog ipmi_poweroff ipmi_msghandler parport_pc lp parport loop ipv6 megaraid_sas psmouse serio_raw iTCO_wdt evdev dcdbas iTCO_vendor_support 8250_pnp button pcspkr i5000_edac edac_core shpchp pci_hotplug 8250 serial_core ext3 jbd mbcache sr_mod cdrom pata_acpi ata_piix sg sd_mod ata_generic libata bnx2 ehci_hcd uhci_hcdusbcore mptsas mptscsih mptbase scsi_transport_sas scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirrordm_snapshot dm_mod thermal processor fan fuse May 18 10:59:37 ifvm1 kernel: [9791101.108249] Pid: 6693, comm: cqueue/1 Not tainted 2.6.24-23-xen #1 May 18 10:59:37 ifvm1 kernel: [9791101.108253] RIP: e030:[<ffffffff8029f130>] [<ffffffff8029f130>] fput+0x0/0x20 May 18 10:59:37 ifvm1 kernel: [9791101.108258] RSP: e02b:ffff88001d7c9dc8 EFLAGS: 00010202 May 18 10:59:37 ifvm1 kernel: [9791101.108261] RAX: 0000000000000041 RBX: 0000000000000005 RCX: 0000000000000001 May 18 10:59:37 ifvm1 kernel: [9791101.108264] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000100000004 May 18 10:59:37 ifvm1 kernel: [9791101.108268] RBP: ffff88001f672800 R08: 000015d2cd1350db R09: 0000000000000000 May 18 10:59:37 ifvm1 kernel: [9791101.108271] R10: ffff880001ce6fe0 R11: ffffffff80217eb0 R12: ffff880011a93000 May 18 10:59:37 ifvm1 kernel: [9791101.108274] R13: 000000000000007c R14: ffff880011a93000 R15: ffff88000aefb354 May 18 10:59:37 ifvm1 kernel: [9791101.108280] FS: 00007fc3f762c6e0(0000) GS:ffffffff805c7080(0000) knlGS:0000000000000000 May 18 10:59:37 ifvm1 kernel: [9791101.108283] CS: e033 DS: 0000 ES: 0000 May 18 10:59:37 ifvm1 kernel: [9791101.108287] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 18 10:59:37 ifvm1 kernel: [9791101.108290] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 18 10:59:38 ifvm1 kernel: [9791101.108293] Process cqueue/1 (pid: 6693, threadinfo ffff88001d7c8000, task ffff8800016cc800) May 18 10:59:38 ifvm1 kernel: [9791101.108297] Stack: ffffffff88419168 ffff88001d7c9e60 0000000000000000 ffffffff8061cf80 May 18 10:59:38 ifvm1 kernel: [9791101.108303] ffffffff8061cf80 ffffffff8061c0c0 ffffffff8061cf80 0000000000000000 May 18 10:59:38 ifvm1 kernel: [9791101.108309] 000000011f672800 00000000000000d0 0000000000000030 ffff88001c213410 May 18 10:59:38 ifvm1 kernel: [9791101.108313] Call Trace: May 18 10:59:38 ifvm1 kernel: [9791101.108327] [<ffffffff88419168>] :drbd:drbd_nl_disk_conf+0xb8/0xf30 May 18 10:59:38 ifvm1 kernel: [9791101.108346] [<ffffffff88418bcc>] :drbd:drbd_connector_callback+0x11c/0x210 May 18 10:59:38 ifvm1 kernel: [9791101.108354] [cn:cn_queue_wrapper+0x0/0x30] :cn:cn_queue_wrapper+0x0/0x30 May 18 10:59:38 ifvm1 kernel: [9791101.108360] [cn:cn_queue_wrapper+0xf/0x30] :cn:cn_queue_wrapper+0xf/0x30 May 18 10:59:38 ifvm1 kernel: [9791101.108367] [run_workqueue+0xb2/0x190] run_workqueue+0xb2/0x190 May 18 10:59:38 ifvm1 kernel: [9791101.108373] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 May 18 10:59:38 ifvm1 kernel: [9791101.108378] [worker_thread+0xa3/0x110] worker_thread+0xa3/0x110 May 18 10:59:38 ifvm1 kernel: [9791101.108384] [<ffffffff8024cc80>] autoremove_wake_function+0x0/0x30 May 18 10:59:38 ifvm1 kernel: [9791101.108390] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 May 18 10:59:38 ifvm1 kernel: [9791101.108395] [worker_thread+0x0/0x110] worker_thread+0x0/0x110 May 18 10:59:38 ifvm1 kernel: [9791101.108399] [kthread+0x4b/0x80] kthread+0x4b/0x80 May 18 10:59:38 ifvm1 kernel: [9791101.108405] [child_rip+0xa/0x12] child_rip+0xa/0x12 May 18 10:59:38 ifvm1 kernel: [9791101.108413] [xen_send_IPI_mask+0x0/0x110] xen_send_IPI_mask+0x0/0x110 May 18 10:59:38 ifvm1 kernel: [9791101.108420] [kthread+0x0/0x80] kthread+0x0/0x80 May 18 10:59:38 ifvm1 kernel: [9791101.108424] [child_rip+0x0/0x12] child_rip+0x0/0x12 May 18 10:59:38 ifvm1 kernel: [9791101.108429] May 18 10:59:38 ifvm1 kernel: [9791101.108430] May 18 10:59:38 ifvm1 kernel: [9791101.108431] Code: f0 ff 4f 28 0f 94 c0 84 c0 75 05 f3 c3 0f 1f 00 e9 cb fb ff May 18 10:59:38 ifvm1 kernel: [9791101.108445] RIP [<ffffffff8029f130>] fput+0x0/0x20 May 18 10:59:38 ifvm1 kernel: [9791101.108449] RSP <ffff88001d7c9dc8> May 18 10:59:38 ifvm1 kernel: [9791101.108451] CR2: 000000010000002c May 18 10:59:38 ifvm1 kernel: [9791101.109019] ---[ end trace e8e64f3da06e3ef3 ]--- then drbd was dead, the other running drbds were still running (the kernels threads were running ok) but I could not change any configuration with drbdadm, I had to forcibly reboot this node to get it back in right order. Cheers, Mik