Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, [xen-users: Please CC me in replies, I am not subscribed to this list.] First of all, please excuse this cross-posting, but I am, so to say, stumped. My cluster of 2 Debian etch boxes (on Intel CPUs with vmx support) running drbd8 on Linux kernel packages linux-image-2.6.18-4-xen-686 (2.6.18.dfsg.1-12etch2) or linux-image-2.6.18-5-xen-686 (2.6.18.dfsg.1-13etch4), the latter sometimes exposing a Xen networking bug similar to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=451297 on one of the systems (but I haven't started debugging this one). I have compiled my own drbd8 kernel modules, so that Debian package version is 8.0.7-1 for both drbd8-utils and drbd8-2.6.18-*-xen-686. Xen packages are from Debian etch, i.e. xen-utils-3.0.3-1, xen-hypervisor-3.0.3-1-i386-pae, etc. version 3.0.3-0-4. domU instances are also Debian etch using linux-image-2.6.18-4-xen-686 to start from dom0 and linux-modules-2.6.18-4-xen-686 installed in the domUs. Both boxes have a ca. 60GB LVM2 VG with >20 LVs of equal size and name on both. There is one drbd ressource for each LV (currently up to drbd28, but using only every second one for now), which are used as backend devices for the Xen domUs. At any time, only one node switches the respective drbd ressource to primary and then starts the domU (live migration is not yet used, so primary-primary does not happen right now). The drbd ressources all have a similar configuration: resource "xen-xxx" { protocol C; startup { wfc-timeout 120; ## 2 minutes. degr-wfc-timeout 120; ## 2 minutes. } disk { on-io-error detach; } net { max-buffers 128; allow-two-primaries; cram-hmac-alg "sha256"; shared-secret "xxxxxxxxxxxxxxxxxxxxxxxx"; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 50M; after "xen-yyy"; } on aaaa { device /dev/drbd26; disk /dev/mapper/xendomains-xxx; address 192.168.255.30:7814; meta-disk internal; } on bbbb { device /dev/drbd26; disk /dev/mapper/xendomains-xxx; address 192.168.255.29:7814; meta-disk internal; } } That is, syncer rate is moderate (1Gbps connection), max-buffers low so as not to run into out-of-memory issues (this used to be a problem), and the syncer is serialized. Sometimes, when rebooting one of the nodes, it will crash the respective other. This is reproducible, but not ever time. Just an hour ago, it happened again and we were able to capture the kernel output on the serial console after reboot was started on the other node (so to be clear: this is the one that wasn't touched by any manual intervention): drbd0: meta connection shut down by peer. drbd26: meta connection shut down by peer. drbd0: tl_clear() drbd26: tl_clear() drbd28: meta connection shut down by peer. drbd28: tl_clear() drbd24: PingAck did not arrive in time. drbd24: short read expecting header on sock: r=-512 drbd24: tl_clear() drbd20: PingAck did not arrive in time. drbd20: short read expecting header on sock: r=-512 drbd20: tl_clear() drbd14: PingAck did not arrive in time. drbd14: short read expecting header on sock: r=-512 drbd14: tl_clear() drbd4: PingAck did not arrive in time. drbd4: short read expecting header on sock: r=-512 drbd4: tl_clear() drbd10: PingAck did not arrive in time. drbd10: short read expecting header on sock: r=-512 drbd10: tl_clear() drbd6: PingAck did not arrive in time. drbd6: short read expecting header on sock: r=-512 drbd6: tl_clear() drbd22: PingAck did not arrive in time. drbd22: short read expecting header on sock: r=-512 drbd22: tl_clear() BUG: unable to handle kernel paging request at virtual address c081e000 printing eip: c01bb090 19529000 -> *pde = 00000001:19f48001 27948000 -> *pme = 00000000:07099067 01099000 -> *pte = 00000000:00000000 Oops: 0000 [#1] SMP Modules linked in: sha256 drbd cn xt_physdev button ac battery ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs bridge ip6t_LOG ipt_LOG xt_state ipt_REJECT xt_tcpudp ip6table_filter ip6_tables ipv6 iptable_mangle iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter ip_tables x_tables xfs dm_crypt loop serio_raw serial_core psmouse rtc pcspkr shpchp pci_hotplug tsdev evdev ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid1 md_mod ide_generic ide_cd cdrom sd_mod generic usbhid tg3 skge ahci libata piix scsi_mod ide_core ehci_hcd uhci_hcd usbcore thermal processor fan CPU: 0 EIP: 0061:[<c01bb090>] Not tainted VLI EFLAGS: 00010202 (2.6.18-5-xen-686 #1) EIP is at csum_partial+0x88/0x120 eax: 00000000 ebx: c01bb0f0 ecx: 00000007 edx: 00000400 esi: c081e080 edi: 00000408 ebp: 00000064 esp: c031fd54 ds: 007b es: 007b ss: 0069 Process swapper (pid: 0, ti=c031e000 task=c02cf660 task.ti=c031e000) Stack: c081e000 00000064 c022d626 c081e000 00000400 00000000 00000010 e4f00d2c 00000000 00000050 00000464 c911cef4 c022e532 c911c800 00000400 e4f00d2c e58258ec e5825900 c031fe54 c0231f33 92a6a1ff c83c4a00 00000003 c031fe54 Call Trace: [<c022d626>] skb_checksum+0x112/0x27e [<c022e532>] pskb_expand_head+0xce/0x112 [<c0231f33>] skb_checksum_help+0x5d/0xac [<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat] [<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle] [<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat] [<c024def0>] dst_output+0x0/0x7 [<c0246e28>] nf_iterate+0x30/0x61 [<c024def0>] dst_output+0x0/0x7 [<c0246f4e>] nf_hook_slow+0x3a/0x90 [<c024def0>] dst_output+0x0/0x7 [<c02500e8>] ip_queue_xmit+0x35f/0x3b3 [<c024def0>] dst_output+0x0/0x7 [<c0115f49>] rebalance_tick+0x116/0x2ae [<c025dab0>] tcp_transmit_skb+0x604/0x632 [<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7 [<c012e066>] hrtimer_run_queues+0x147/0x15f [<c0257960>] tcp_enter_loss+0x1a1/0x1fd [<c02608e3>] tcp_write_timer+0x0/0x5c9 [<c0260cdb>] tcp_write_timer+0x3f8/0x5c9 [<c0123440>] run_timer_softirq+0x101/0x15c [<c011f41e>] __do_softirq+0x5e/0xc3 [<c011f4bd>] do_softirq+0x3a/0x4a [<c0106131>] do_IRQ+0x48/0x53 [<c020c1cc>] evtchn_do_upcall+0x64/0x9b [<c0104a51>] hypervisor_callback+0x3d/0x48 [<c0107342>] raw_safe_halt+0x8c/0xaf [<c0102c5f>] xen_idle+0x22/0x2e [<c0102d7e>] cpu_idle+0x91/0xab [<c03236fc>] start_kernel+0x378/0x37f Code: 00 74 b6 83 e9 02 77 cd 74 16 83 c1 02 0f 84 9f 00 00 00 0f b6 1e 01 d8 83 d0 00 e9 92 00 00 00 66 03 06 83 d0 00 e9 87 00 00 00 <03> 46 80 13 46 84 13 46 88 13 46 8c 13 46 90 13 46 94 13 46 98 EIP: [<c01bb090>] csum_partial+0x88/0x120 SS:ESP 0069:c031fd54 <0>Kernel panic - not syncing: Fatal exception in interrupt BUG: warning at arch/i386/kernel/smp-xen.c:526/smp_call_function() [<c010cddd>] smp_call_function+0x53/0xf8 [<c011b4be>] printk+0x14/0x18 [<c010ce95>] smp_send_stop+0x13/0x1e [<c011aaeb>] panic+0x45/0xde [<c0105235>] die+0x242/0x276 [<c0111d05>] do_page_fault+0xa53/0xb76 [<c01bb0f0>] csum_partial+0xe8/0x120 [<c01112b2>] do_page_fault+0x0/0xb76 [<c0104a0f>] error_code+0x2b/0x30 [<c01bb0f0>] csum_partial+0xe8/0x120 [<c01bb090>] csum_partial+0x88/0x120 [<c022d626>] skb_checksum+0x112/0x27e [<c022e532>] pskb_expand_head+0xce/0x112 [<c0231f33>] skb_checksum_help+0x5d/0xac [<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat] [<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle] [<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat] [<c024def0>] dst_output+0x0/0x7 [<c0246e28>] nf_iterate+0x30/0x61 [<c024def0>] dst_output+0x0/0x7 [<c0246f4e>] nf_hook_slow+0x3a/0x90 [<c024def0>] dst_output+0x0/0x7 [<c02500e8>] ip_queue_xmit+0x35f/0x3b3 [<c024def0>] dst_output+0x0/0x7 [<c0115f49>] rebalance_tick+0x116/0x2ae [<c025dab0>] tcp_transmit_skb+0x604/0x632 [<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7 [<c012e066>] hrtimer_run_queues+0x147/0x15f [<c0257960>] tcp_enter_loss+0x1a1/0x1fd [<c02608e3>] tcp_write_timer+0x0/0x5c9 [<c0260cdb>] tcp_write_timer+0x3f8/0x5c9 [<c0123440>] run_timer_softirq+0x101/0x15c [<c011f41e>] __do_softirq+0x5e/0xc3 [<c011f4bd>] do_softirq+0x3a/0x4a [<c0106131>] do_IRQ+0x48/0x53 [<c020c1cc>] evtchn_do_upcall+0x64/0x9b [<c0104a51>] hypervisor_callback+0x3d/0x48 [<c0107342>] raw_safe_halt+0x8c/0xaf [<c0102c5f>] xen_idle+0x22/0x2e [<c0102d7e>] cpu_idle+0x91/0xab [<c03236fc>] start_kernel+0x378/0x37f (XEN) Domain 0 crashed: rebooting machine in 5 seconds. Then a reboot. ksymoops doesn't want to read the symbols from /proc/kallsyms (and /proc/ksyms) doesn't exist, so the decoding isn't complete: ksymoops 2.4.11 on i686 2.6.18-5-xen-686. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.6.18-5-xen-686/ (default) -m /boot/System.map-2.6.18-5-xen-686 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (regular_file): read_ksyms stat /proc/ksyms failed No modules in ksyms, skipping objects No ksyms, skipping lsmod BUG: unable to handle kernel paging request at virtual address c081e000 c01bb090 19529000 -> *pde = 00000001:19f48001 27948000 -> *pme = 00000000:07099067 01099000 -> *pte = 00000000:00000000 Oops: 0000 [#1] CPU: 0 EIP: 0061:[<c01bb090>] Not tainted VLI Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010202 (2.6.18-5-xen-686 #1) eax: 00000000 ebx: c01bb0f0 ecx: 00000007 edx: 00000400 esi: c081e080 edi: 00000408 ebp: 00000064 esp: c031fd54 ds: 007b es: 007b ss: 0069 Stack: c081e000 00000064 c022d626 c081e000 00000400 00000000 00000010 e4f00d2c 00000000 00000050 00000464 c911cef4 c022e532 c911c800 00000400 e4f00d2c e58258ec e5825900 c031fe54 c0231f33 92a6a1ff c83c4a00 00000003 c031fe54 Call Trace: [<c022d626>] skb_checksum+0x112/0x27e [<c022e532>] pskb_expand_head+0xce/0x112 [<c0231f33>] skb_checksum_help+0x5d/0xac [<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat] [<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle] [<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat] [<c024def0>] dst_output+0x0/0x7 [<c0246e28>] nf_iterate+0x30/0x61 [<c024def0>] dst_output+0x0/0x7 [<c0246f4e>] nf_hook_slow+0x3a/0x90 [<c024def0>] dst_output+0x0/0x7 [<c02500e8>] ip_queue_xmit+0x35f/0x3b3 [<c024def0>] dst_output+0x0/0x7 [<c0115f49>] rebalance_tick+0x116/0x2ae [<c025dab0>] tcp_transmit_skb+0x604/0x632 [<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7 [<c012e066>] hrtimer_run_queues+0x147/0x15f [<c0257960>] tcp_enter_loss+0x1a1/0x1fd [<c02608e3>] tcp_write_timer+0x0/0x5c9 [<c0260cdb>] tcp_write_timer+0x3f8/0x5c9 [<c0123440>] run_timer_softirq+0x101/0x15c [<c011f41e>] __do_softirq+0x5e/0xc3 [<c011f4bd>] do_softirq+0x3a/0x4a [<c0106131>] do_IRQ+0x48/0x53 [<c020c1cc>] evtchn_do_upcall+0x64/0x9b [<c0104a51>] hypervisor_callback+0x3d/0x48 [<c0107342>] raw_safe_halt+0x8c/0xaf [<c0102c5f>] xen_idle+0x22/0x2e [<c0102d7e>] cpu_idle+0x91/0xab [<c03236fc>] start_kernel+0x378/0x37f Code: 00 74 b6 83 e9 02 77 cd 74 16 83 c1 02 0f 84 9f 00 00 00 0f b6 1e 01 d8 83 d0 00 e9 92 00 00 00 66 03 06 83 d0 00 e9 87 00 00 00 <03> 46 80 13 46 84 13 46 88 13 46 8c 13 46 90 13 46 94 13 46 98 >>EIP; c01bb090 <csum_partial+88/120> <===== >>ebx; c01bb0f0 <csum_partial+e8/120> >>esp; c031fd54 <init_thread_union+1d54/2000> Trace; c022d626 <skb_checksum+112/27e> Trace; c022e532 <pskb_expand_head+ce/112> Trace; c0231f33 <skb_checksum_help+5d/ac> Trace; e93ef2ea <END_OF_CODE+2905a2ea/????> Trace; e93f8092 <END_OF_CODE+29063092/????> Trace; e93ef61e <END_OF_CODE+2905a61e/????> Trace; c024def0 <dst_output+0/7> Trace; c0246e28 <nf_iterate+30/61> Trace; c024def0 <dst_output+0/7> Trace; c0246f4e <nf_hook_slow+3a/90> Trace; c024def0 <dst_output+0/7> Trace; c02500e8 <ip_queue_xmit+35f/3b3> Trace; c024def0 <dst_output+0/7> Trace; c0115f49 <rebalance_tick+116/2ae> Trace; c025dab0 <tcp_transmit_skb+604/632> Trace; c025e80c <tcp_retransmit_skb+4e2/5c7> Trace; c012e066 <hrtimer_run_queues+147/15f> Trace; c0257960 <tcp_enter_loss+1a1/1fd> Trace; c02608e3 <tcp_write_timer+0/5c9> Trace; c0260cdb <tcp_write_timer+3f8/5c9> Trace; c0123440 <run_timer_softirq+101/15c> Trace; c011f41e <__do_softirq+5e/c3> Trace; c011f4bd <do_softirq+3a/4a> Trace; c0106131 <do_IRQ+48/53> Trace; c020c1cc <evtchn_do_upcall+64/9b> Trace; c0104a51 <hypervisor_callback+3d/48> Trace; c0107342 <raw_safe_halt+8c/af> Trace; c0102c5f <xen_idle+22/2e> Trace; c0102d7e <cpu_idle+91/ab> Trace; c03236fc <start_kernel+378/37f> This architecture has variable length instructions, decoding before eip is unreliable, take these instructions with a pinch of salt. Code; c01bb065 <csum_partial+5d/120> 00000000 <_EIP>: Code; c01bb065 <csum_partial+5d/120> 0: 00 74 b6 83 add %dh,0xffffff83(%esi,%esi,4) Code; c01bb069 <csum_partial+61/120> 4: e9 02 77 cd 74 jmp 74cd770b <_EIP+0x74cd770b> Code; c01bb06e <csum_partial+66/120> 9: 16 push %ss Code; c01bb06f <csum_partial+67/120> a: 83 c1 02 add $0x2,%ecx Code; c01bb072 <csum_partial+6a/120> d: 0f 84 9f 00 00 00 je b2 <_EIP+0xb2> Code; c01bb078 <csum_partial+70/120> 13: 0f b6 1e movzbl (%esi),%ebx Code; c01bb07b <csum_partial+73/120> 16: 01 d8 add %ebx,%eax Code; c01bb07d <csum_partial+75/120> 18: 83 d0 00 adc $0x0,%eax Code; c01bb080 <csum_partial+78/120> 1b: e9 92 00 00 00 jmp b2 <_EIP+0xb2> Code; c01bb085 <csum_partial+7d/120> 20: 66 03 06 add (%esi),%ax Code; c01bb088 <csum_partial+80/120> 23: 83 d0 00 adc $0x0,%eax Code; c01bb08b <csum_partial+83/120> 26: e9 87 00 00 00 jmp b2 <_EIP+0xb2> This decode from eip onwards should be reliable Code; c01bb090 <csum_partial+88/120> 00000000 <_EIP>: Code; c01bb090 <csum_partial+88/120> <===== 0: 03 46 80 add 0xffffff80(%esi),%eax <===== Code; c01bb093 <csum_partial+8b/120> 3: 13 46 84 adc 0xffffff84(%esi),%eax Code; c01bb096 <csum_partial+8e/120> 6: 13 46 88 adc 0xffffff88(%esi),%eax Code; c01bb099 <csum_partial+91/120> 9: 13 46 8c adc 0xffffff8c(%esi),%eax Code; c01bb09c <csum_partial+94/120> c: 13 46 90 adc 0xffffff90(%esi),%eax Code; c01bb09f <csum_partial+97/120> f: 13 46 94 adc 0xffffff94(%esi),%eax Code; c01bb0a2 <csum_partial+9a/120> 12: 13 46 98 adc 0xffffff98(%esi),%eax EIP: [<c01bb090>] csum_partial+0x88/0x120 SS:ESP 0069:c031fd54 <0>Kernel panic - not syncing: Fatal exception in interrupt [<c010cddd>] smp_call_function+0x53/0xf8 [<c011b4be>] printk+0x14/0x18 [<c010ce95>] smp_send_stop+0x13/0x1e [<c011aaeb>] panic+0x45/0xde [<c0105235>] die+0x242/0x276 [<c0111d05>] do_page_fault+0xa53/0xb76 [<c01bb0f0>] csum_partial+0xe8/0x120 [<c01112b2>] do_page_fault+0x0/0xb76 [<c0104a0f>] error_code+0x2b/0x30 [<c01bb0f0>] csum_partial+0xe8/0x120 [<c01bb090>] csum_partial+0x88/0x120 [<c022d626>] skb_checksum+0x112/0x27e [<c022e532>] pskb_expand_head+0xce/0x112 [<c0231f33>] skb_checksum_help+0x5d/0xac [<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat] [<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle] [<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat] [<c024def0>] dst_output+0x0/0x7 [<c0246e28>] nf_iterate+0x30/0x61 [<c024def0>] dst_output+0x0/0x7 [<c0246f4e>] nf_hook_slow+0x3a/0x90 [<c024def0>] dst_output+0x0/0x7 [<c02500e8>] ip_queue_xmit+0x35f/0x3b3 [<c024def0>] dst_output+0x0/0x7 [<c0115f49>] rebalance_tick+0x116/0x2ae [<c025dab0>] tcp_transmit_skb+0x604/0x632 [<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7 [<c012e066>] hrtimer_run_queues+0x147/0x15f [<c0257960>] tcp_enter_loss+0x1a1/0x1fd [<c02608e3>] tcp_write_timer+0x0/0x5c9 [<c0260cdb>] tcp_write_timer+0x3f8/0x5c9 [<c0123440>] run_timer_softirq+0x101/0x15c [<c011f41e>] __do_softirq+0x5e/0xc3 [<c011f4bd>] do_softirq+0x3a/0x4a [<c0106131>] do_IRQ+0x48/0x53 [<c020c1cc>] evtchn_do_upcall+0x64/0x9b [<c0104a51>] hypervisor_callback+0x3d/0x48 [<c0107342>] raw_safe_halt+0x8c/0xaf [<c0102c5f>] xen_idle+0x22/0x2e [<c0102d7e>] cpu_idle+0x91/0xab [<c03236fc>] start_kernel+0x378/0x37f Warning (Oops_read): Code line not seen, dumping what data is available >>EIP; c01bb090 <csum_partial+88/120> <===== These are production boxes, so I should avoid rebooting them too often for debugging purposes. Any hints on how to approach and solve this issue would be highly appreciated. with best regards, Rene -- ------------------------------------------------- Gibraltar firewall http://www.gibraltar.at/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20071203/75edd362/attachment.pgp>