Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, My setup is : 2 hosts connected through openvpn point to point in udp mode. I want a first sync of a 100GB device disk. The drbd device uses 2 lvs for data/meta (It's only a part of my setup, so, no, I can't use the 'basic' layout drbd under a pv). After about 10 hours of sync, I get a kernel Oops : [54599.996257] block drbd4: error receiving RSDataReply, l: 4120! [54599.996322] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [54599.996371] IP: [<ffffffff8040d3ab>] sk_stream_wait_memory+0x88/0x1e5 [54599.996404] PGD 400cb9067 PUD 41f156067 PMD 0 [54599.996431] Oops: 0002 [#1] SMP [54599.996455] last sysfs file: /sys/devices/virtual/block/drbd4/dev [54599.996483] CPU 0 [54599.996502] Modules linked in: hmac tun kvm_amd kvm bridge stp drbd cn loop snd_pcsp snd_pcm snd_timer i2c_nforce2 snd shpchp soundcore k8temp snd_page_alloc i2c_core serio_raw pci_hotplug psmouse evdev button processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif ata_generic ide_pci_generic ohci_hcd sata_nv amd74xx libata forcedeth scsi_mod ide_core ehci_hcd floppy thermal fan thermal_sys [54599.996692] Pid: 5185, comm: drbd4_worker Not tainted 2.6.30-2-amd64 #1 H8DMR-82 [54599.996736] RIP: 0010:[<ffffffff8040d3ab>] [<ffffffff8040d3ab>] sk_stream_wait_memory+0x88/0x1e5 [54599.996783] RSP: 0018:ffff88040e139a40 EFLAGS: 00010246 [54599.996809] RAX: 0000000000000000 RBX: 00000000000005dc RCX: 000000000000c8d8 [54599.996838] RDX: 0000000000000008 RSI: 0000000000000000 RDI: ffffffff804065c6 [54599.996867] RBP: ffff8803e24f1440 R08: 0000000000000000 R09: ffff8803e24f1440 [54599.996896] R10: 0000000000000000 R11: ffff88040e139b08 R12: 00000000000005dc [54599.996925] R13: 0000000000000000 R14: ffff88040e139b08 R15: 7fffffffffffffff [54599.996954] FS: 00007fa0859c6950(0000) GS:ffffc20000000000(0000) knlGS:0000000000000000 [54599.996998] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [54599.997025] CR2: 0000000000000008 CR3: 00000004010aa000 CR4: 00000000000006e0 [54599.997054] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [54599.997083] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [54599.997112] Process drbd4_worker (pid: 5185, threadinfo ffff88040e138000, task ffff88041f176080) [54599.997157] Stack: [54599.997176] 0000000000000000 ffff88041f176080 ffffffff80254742 ffff88040e139a58 [54599.997207] ffff88040e139a58 0000000000000000 ffff8802060f3880 ffff8803e35e6140 [54599.997252] ffff8803e24f1440 ffff88040e139b14 ffff880401c9c000 0000000000000000 [54599.997313] Call Trace: [54599.997334] [<ffffffff80254742>] ? autoremove_wake_function+0x0/0x2e [54599.997364] [<ffffffff8043efda>] ? tcp_sendmsg+0x6fa/0x85b [54599.997391] [<ffffffff80403f24>] ? sock_sendmsg+0xa3/0xbb [54599.997419] [<ffffffff80254742>] ? autoremove_wake_function+0x0/0x2e [54599.997447] [<ffffffff8020e5a9>] ? __switch_to+0xae/0x263 [54599.997475] [<ffffffff80235f65>] ? dequeue_entity+0xf/0x11f [54599.997503] [<ffffffff804041f2>] ? kernel_sendmsg+0x2c/0x3e [54599.997530] [<ffffffffa0222386>] ? drbd_send+0xb9/0x1cf [drbd] [54599.997578] [<ffffffff804b45e8>] ? schedule+0x9/0x1e [54599.997605] [<ffffffffa0222cec>] ? _drbd_send_cmd+0x170/0x184 [drbd] [54599.997644] [<ffffffffa022320d>] ? drbd_send_cmd+0x64/0x8c [drbd] [54599.997682] [<ffffffffa0223388>] ? drbd_send_b_ack+0x37/0x40 [drbd] [54599.997720] [<ffffffffa0210210>] ? drbd_may_finish_epoch+0x122/0x2f8 [drbd] [54599.997759] [<ffffffffa021070f>] ? w_flush+0x54/0x5d [drbd] [54599.997795] [<ffffffffa020a8b2>] ? drbd_worker+0x4c2/0x4cf [drbd] [54599.997831] [<ffffffff804b47df>] ? schedule_timeout+0x9b/0xb6 [54599.997859] [<ffffffff804b47cf>] ? schedule_timeout+0x8b/0xb6 [54599.997886] [<ffffffffa0223d03>] ? drbd_thread_setup+0x16f/0x231 [drbd] [54599.997925] [<ffffffff80210aca>] ? child_rip+0xa/0x20 [54599.997951] [<ffffffffa0223b94>] ? drbd_thread_setup+0x0/0x231 [drbd] [54599.997989] [<ffffffff80210ac0>] ? child_rip+0x0/0x20 [54599.998015] Code: f4 ff ba 32 00 00 00 89 d1 31 d2 f7 f1 83 c2 02 41 89 d4 4d 89 e5 49 bf ff ff ff ff ff ff ff 7f 48 8b 85 e8 01 00 00 48 8d 50 08 <f0> 80 48 08 01 48 8b 7d 78 ba 01 00 00 00 48 89 e6 e8 09 75 e4 [54599.998161] RIP [<ffffffff8040d3ab>] sk_stream_wait_memory+0x88/0x1e5 [54599.998190] RSP <ffff88040e139a40> [54599.998211] CR2: 0000000000000008 [54599.998505] ---[ end trace 71b4da7bbc4c7789 ]--- The drbd device was then in "Timeout" mode. I tried : drbdsetup /dev/drbd4 down (my drbd setup is managed by a software which handle all virtualisation stuff, and which directly uses drbdsetup). I got 2 messages : No response from the DRBD driver! Is the module loaded? No response from the DRBD driver! Is the module loaded? Of course, it is loaded, 3 other devices works. It is now in Disconnecting state, but does not disconnect : 4: cs:Disconnecting ro:Secondary/Unknown ds:Inconsistent/DUnknown C r---- ns:0 nr:75029472 dw:75029472 dr:0 al:0 bm:4574 lo:0 pe:0 ua:1278 ap:0 ep:2 wo:b oos:30857852 On primary, the device is in WFConecction state : 16: cs:WFConnection ro:Primary/Unknown ds:UpToDate/Inconsistent C r---- ns:75034364 nr:0 dw:1038560 dr:75513180 al:284 bm:4544 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:30942364 The crash appears on the node which has 8.3.6 module, the one with 8.3.2 works still well. Now it seems the only way to free /dev/drbd4 is to reboot the host :'( Any help will be welcome. Regards, Maxence -- Maxence DUNNEWIND Contact : maxence at dunnewind.net Site : http://www.dunnewind.net 06 32 39 39 93 GPG : 18AE 61E4 D0B0 1C7C AAC9 E40D 4D39 68DB 0D2E B533 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: Digital signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100126/7024dc39/attachment.pgp>