[DRBD-user] Kernel BUG on drbd disconnect

Mon Dec 3 13:16:51 CET 2007

Hi all,

[xen-users: Please CC me in replies, I am not subscribed to this list.]

First of all, please excuse this cross-posting, but I am, so to say, stumped. 

My cluster of 2 Debian etch boxes (on Intel CPUs with vmx support) running 
drbd8 on Linux kernel packages linux-image-2.6.18-4-xen-686 
(2.6.18.dfsg.1-12etch2) or linux-image-2.6.18-5-xen-686 
(2.6.18.dfsg.1-13etch4), the latter sometimes exposing a Xen networking bug 
similar to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=451297 on one of 
the systems (but I haven't started debugging this one). I have compiled my 
own drbd8 kernel modules, so that Debian package version is 8.0.7-1 for both 
drbd8-utils and drbd8-2.6.18-*-xen-686. Xen packages are from Debian etch, 
i.e. xen-utils-3.0.3-1, xen-hypervisor-3.0.3-1-i386-pae, etc. version 
3.0.3-0-4. domU instances are also Debian etch using 
linux-image-2.6.18-4-xen-686 to start from dom0 and 
linux-modules-2.6.18-4-xen-686 installed in the domUs.

Both boxes have a ca. 60GB LVM2 VG with >20 LVs of equal size and name on 
both. There is one drbd ressource for each LV (currently up to drbd28, but 
using only every second one for now), which are used as backend devices for 
the Xen domUs. At any time, only one node switches the respective drbd 
ressource to primary and then starts the domU (live migration is not yet 
used, so primary-primary does not happen right now). The drbd ressources all 
have a similar configuration:

resource "xen-xxx" {
  protocol C;
  startup {
    wfc-timeout       120;  ## 2 minutes.
    degr-wfc-timeout  120;  ## 2 minutes.
  }
  disk {
    on-io-error detach;
  }
  net {
    max-buffers 128;
    allow-two-primaries;
    cram-hmac-alg "sha256";
    shared-secret "xxxxxxxxxxxxxxxxxxxxxxxx";
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }
  syncer {
    rate 50M;
    after "xen-yyy";
  }
  on aaaa {
    device      /dev/drbd26;
    disk        /dev/mapper/xendomains-xxx;
    address     192.168.255.30:7814;
    meta-disk   internal;
  }
  on bbbb {
    device      /dev/drbd26;
    disk        /dev/mapper/xendomains-xxx;
    address    192.168.255.29:7814;
    meta-disk  internal;
  }
}

That is, syncer rate is moderate (1Gbps connection), max-buffers low so as not 
to run into out-of-memory issues (this used to be a problem), and the syncer 
is serialized.

Sometimes, when rebooting one of the nodes, it will crash the respective 
other. This is reproducible, but not ever time. Just an hour ago, it happened 
again and we were able to capture the kernel output on the serial console 
after reboot was started on the other node (so to be clear: this is the one 
that wasn't touched by any manual intervention):

drbd0: meta connection shut down by peer.
drbd26: meta connection shut down by peer.
drbd0: tl_clear()
drbd26: tl_clear()
drbd28: meta connection shut down by peer.
drbd28: tl_clear()
drbd24: PingAck did not arrive in time.
drbd24: short read expecting header on sock: r=-512
drbd24: tl_clear()
drbd20: PingAck did not arrive in time.
drbd20: short read expecting header on sock: r=-512
drbd20: tl_clear()
drbd14: PingAck did not arrive in time.
drbd14: short read expecting header on sock: r=-512
drbd14: tl_clear()
drbd4: PingAck did not arrive in time.
drbd4: short read expecting header on sock: r=-512
drbd4: tl_clear()
drbd10: PingAck did not arrive in time.
drbd10: short read expecting header on sock: r=-512
drbd10: tl_clear()
drbd6: PingAck did not arrive in time.
drbd6: short read expecting header on sock: r=-512
drbd6: tl_clear()
drbd22: PingAck did not arrive in time.
drbd22: short read expecting header on sock: r=-512
drbd22: tl_clear()
BUG: unable to handle kernel paging request at virtual address c081e000
 printing eip:
c01bb090
19529000 -> *pde = 00000001:19f48001
27948000 -> *pme = 00000000:07099067
01099000 -> *pte = 00000000:00000000
Oops: 0000 [#1]
SMP 
Modules linked in: sha256 drbd cn xt_physdev button ac battery ocfs2_dlmfs 
ocfs2_dlm ocfs2_nodemanager configfs bridge ip6t_LOG ipt_LOG xt_state 
ipt_REJECT xt_tcpudp ip6table_filter ip6_tables ipv6 iptable_mangle 
iptable_nat ip_nat ip_conntrack nfnetlink iptable_filter ip_tables x_tables 
xfs dm_crypt loop serio_raw serial_core psmouse rtc pcspkr shpchp pci_hotplug 
tsdev evdev ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid1 md_mod 
ide_generic ide_cd cdrom sd_mod generic usbhid tg3 skge ahci libata piix 
scsi_mod ide_core ehci_hcd uhci_hcd usbcore thermal processor fan
CPU:    0
EIP:    0061:[<c01bb090>]    Not tainted VLI
EFLAGS: 00010202   (2.6.18-5-xen-686 #1) 
EIP is at csum_partial+0x88/0x120
eax: 00000000   ebx: c01bb0f0   ecx: 00000007   edx: 00000400
esi: c081e080   edi: 00000408   ebp: 00000064   esp: c031fd54
ds: 007b   es: 007b   ss: 0069
Process swapper (pid: 0, ti=c031e000 task=c02cf660 task.ti=c031e000)
Stack: c081e000 00000064 c022d626 c081e000 00000400 00000000 00000010 e4f00d2c 
       00000000 00000050 00000464 c911cef4 c022e532 c911c800 00000400 e4f00d2c 
       e58258ec e5825900 c031fe54 c0231f33 92a6a1ff c83c4a00 00000003 c031fe54 
Call Trace:
 [<c022d626>] skb_checksum+0x112/0x27e
 [<c022e532>] pskb_expand_head+0xce/0x112
 [<c0231f33>] skb_checksum_help+0x5d/0xac
 [<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat]
 [<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle]
 [<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat]
 [<c024def0>] dst_output+0x0/0x7
 [<c0246e28>] nf_iterate+0x30/0x61
 [<c024def0>] dst_output+0x0/0x7
 [<c0246f4e>] nf_hook_slow+0x3a/0x90
 [<c024def0>] dst_output+0x0/0x7
 [<c02500e8>] ip_queue_xmit+0x35f/0x3b3
 [<c024def0>] dst_output+0x0/0x7
 [<c0115f49>] rebalance_tick+0x116/0x2ae
 [<c025dab0>] tcp_transmit_skb+0x604/0x632
 [<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7
 [<c012e066>] hrtimer_run_queues+0x147/0x15f
 [<c0257960>] tcp_enter_loss+0x1a1/0x1fd
 [<c02608e3>] tcp_write_timer+0x0/0x5c9
 [<c0260cdb>] tcp_write_timer+0x3f8/0x5c9
 [<c0123440>] run_timer_softirq+0x101/0x15c
 [<c011f41e>] __do_softirq+0x5e/0xc3
 [<c011f4bd>] do_softirq+0x3a/0x4a
 [<c0106131>] do_IRQ+0x48/0x53
 [<c020c1cc>] evtchn_do_upcall+0x64/0x9b
 [<c0104a51>] hypervisor_callback+0x3d/0x48
 [<c0107342>] raw_safe_halt+0x8c/0xaf
 [<c0102c5f>] xen_idle+0x22/0x2e
 [<c0102d7e>] cpu_idle+0x91/0xab
 [<c03236fc>] start_kernel+0x378/0x37f
Code: 00 74 b6 83 e9 02 77 cd 74 16 83 c1 02 0f 84 9f 00 00 00 0f b6 1e 01 d8 
83 d0 00 e9 92 00 00 00 66 03 06 83 d0 00 e9 87 00 00 00 <03> 46 80 13 46 84 
13 46 88 13 46 8c 13 46 90 13 46 94 13 46 98 
EIP: [<c01bb090>] csum_partial+0x88/0x120 SS:ESP 0069:c031fd54
 <0>Kernel panic - not syncing: Fatal exception in interrupt
 BUG: warning at arch/i386/kernel/smp-xen.c:526/smp_call_function()
 [<c010cddd>] smp_call_function+0x53/0xf8
 [<c011b4be>] printk+0x14/0x18
 [<c010ce95>] smp_send_stop+0x13/0x1e
 [<c011aaeb>] panic+0x45/0xde
 [<c0105235>] die+0x242/0x276
 [<c0111d05>] do_page_fault+0xa53/0xb76
 [<c01bb0f0>] csum_partial+0xe8/0x120
 [<c01112b2>] do_page_fault+0x0/0xb76
 [<c0104a0f>] error_code+0x2b/0x30
 [<c01bb0f0>] csum_partial+0xe8/0x120
 [<c01bb090>] csum_partial+0x88/0x120
 [<c022d626>] skb_checksum+0x112/0x27e
 [<c022e532>] pskb_expand_head+0xce/0x112
 [<c0231f33>] skb_checksum_help+0x5d/0xac
 [<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat]
 [<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle]
 [<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat]
 [<c024def0>] dst_output+0x0/0x7
 [<c0246e28>] nf_iterate+0x30/0x61
 [<c024def0>] dst_output+0x0/0x7
 [<c0246f4e>] nf_hook_slow+0x3a/0x90
 [<c024def0>] dst_output+0x0/0x7
 [<c02500e8>] ip_queue_xmit+0x35f/0x3b3
 [<c024def0>] dst_output+0x0/0x7
 [<c0115f49>] rebalance_tick+0x116/0x2ae
 [<c025dab0>] tcp_transmit_skb+0x604/0x632
 [<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7
 [<c012e066>] hrtimer_run_queues+0x147/0x15f
 [<c0257960>] tcp_enter_loss+0x1a1/0x1fd
 [<c02608e3>] tcp_write_timer+0x0/0x5c9
 [<c0260cdb>] tcp_write_timer+0x3f8/0x5c9
 [<c0123440>] run_timer_softirq+0x101/0x15c
 [<c011f41e>] __do_softirq+0x5e/0xc3
 [<c011f4bd>] do_softirq+0x3a/0x4a
 [<c0106131>] do_IRQ+0x48/0x53
 [<c020c1cc>] evtchn_do_upcall+0x64/0x9b
 [<c0104a51>] hypervisor_callback+0x3d/0x48
 [<c0107342>] raw_safe_halt+0x8c/0xaf
 [<c0102c5f>] xen_idle+0x22/0x2e
 [<c0102d7e>] cpu_idle+0x91/0xab
 [<c03236fc>] start_kernel+0x378/0x37f
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.

Then a reboot. ksymoops doesn't want to read the symbols from /proc/kallsyms 
(and /proc/ksyms) doesn't exist, so the decoding isn't complete:

ksymoops 2.4.11 on i686 2.6.18-5-xen-686.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.6.18-5-xen-686/ (default)
     -m /boot/System.map-2.6.18-5-xen-686 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
BUG: unable to handle kernel paging request at virtual address c081e000
c01bb090
19529000 -> *pde = 00000001:19f48001
27948000 -> *pme = 00000000:07099067
01099000 -> *pte = 00000000:00000000
Oops: 0000 [#1]
CPU:    0
EIP:    0061:[<c01bb090>]    Not tainted VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202   (2.6.18-5-xen-686 #1) 
eax: 00000000   ebx: c01bb0f0   ecx: 00000007   edx: 00000400
esi: c081e080   edi: 00000408   ebp: 00000064   esp: c031fd54
ds: 007b   es: 007b   ss: 0069
Stack: c081e000 00000064 c022d626 c081e000 00000400 00000000 00000010 e4f00d2c 
       00000000 00000050 00000464 c911cef4 c022e532 c911c800 00000400 e4f00d2c 
       e58258ec e5825900 c031fe54 c0231f33 92a6a1ff c83c4a00 00000003 c031fe54 
Call Trace:
 [<c022d626>] skb_checksum+0x112/0x27e
 [<c022e532>] pskb_expand_head+0xce/0x112
 [<c0231f33>] skb_checksum_help+0x5d/0xac
 [<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat]
 [<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle]
 [<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat]
 [<c024def0>] dst_output+0x0/0x7
 [<c0246e28>] nf_iterate+0x30/0x61
 [<c024def0>] dst_output+0x0/0x7
 [<c0246f4e>] nf_hook_slow+0x3a/0x90
 [<c024def0>] dst_output+0x0/0x7
 [<c02500e8>] ip_queue_xmit+0x35f/0x3b3
 [<c024def0>] dst_output+0x0/0x7
 [<c0115f49>] rebalance_tick+0x116/0x2ae
 [<c025dab0>] tcp_transmit_skb+0x604/0x632
 [<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7
 [<c012e066>] hrtimer_run_queues+0x147/0x15f
 [<c0257960>] tcp_enter_loss+0x1a1/0x1fd
 [<c02608e3>] tcp_write_timer+0x0/0x5c9
 [<c0260cdb>] tcp_write_timer+0x3f8/0x5c9
 [<c0123440>] run_timer_softirq+0x101/0x15c
 [<c011f41e>] __do_softirq+0x5e/0xc3
 [<c011f4bd>] do_softirq+0x3a/0x4a
 [<c0106131>] do_IRQ+0x48/0x53
 [<c020c1cc>] evtchn_do_upcall+0x64/0x9b
 [<c0104a51>] hypervisor_callback+0x3d/0x48
 [<c0107342>] raw_safe_halt+0x8c/0xaf
 [<c0102c5f>] xen_idle+0x22/0x2e
 [<c0102d7e>] cpu_idle+0x91/0xab
 [<c03236fc>] start_kernel+0x378/0x37f
Code: 00 74 b6 83 e9 02 77 cd 74 16 83 c1 02 0f 84 9f 00 00 00 0f b6 1e 01 d8 
83 d0 00 e9 92 00 00 00 66 03 06 83 d0 00 e9 87 00 00 00 <03> 46 80 13 46 84 
13 46 88 13 46 8c 13 46 90 13 46 94 13 46 98 

>>EIP; c01bb090 <csum_partial+88/120>   <=====

>>ebx; c01bb0f0 <csum_partial+e8/120>
>>esp; c031fd54 <init_thread_union+1d54/2000>

Trace; c022d626 <skb_checksum+112/27e>
Trace; c022e532 <pskb_expand_head+ce/112>
Trace; c0231f33 <skb_checksum_help+5d/ac>
Trace; e93ef2ea <END_OF_CODE+2905a2ea/????>
Trace; e93f8092 <END_OF_CODE+29063092/????>
Trace; e93ef61e <END_OF_CODE+2905a61e/????>
Trace; c024def0 <dst_output+0/7>
Trace; c0246e28 <nf_iterate+30/61>
Trace; c024def0 <dst_output+0/7>
Trace; c0246f4e <nf_hook_slow+3a/90>
Trace; c024def0 <dst_output+0/7>
Trace; c02500e8 <ip_queue_xmit+35f/3b3>
Trace; c024def0 <dst_output+0/7>
Trace; c0115f49 <rebalance_tick+116/2ae>
Trace; c025dab0 <tcp_transmit_skb+604/632>
Trace; c025e80c <tcp_retransmit_skb+4e2/5c7>
Trace; c012e066 <hrtimer_run_queues+147/15f>
Trace; c0257960 <tcp_enter_loss+1a1/1fd>
Trace; c02608e3 <tcp_write_timer+0/5c9>
Trace; c0260cdb <tcp_write_timer+3f8/5c9>
Trace; c0123440 <run_timer_softirq+101/15c>
Trace; c011f41e <__do_softirq+5e/c3>
Trace; c011f4bd <do_softirq+3a/4a>
Trace; c0106131 <do_IRQ+48/53>
Trace; c020c1cc <evtchn_do_upcall+64/9b>
Trace; c0104a51 <hypervisor_callback+3d/48>
Trace; c0107342 <raw_safe_halt+8c/af>
Trace; c0102c5f <xen_idle+22/2e>
Trace; c0102d7e <cpu_idle+91/ab>
Trace; c03236fc <start_kernel+378/37f>

This architecture has variable length instructions, decoding before eip
is unreliable, take these instructions with a pinch of salt.

Code;  c01bb065 <csum_partial+5d/120>
00000000 <_EIP>:
Code;  c01bb065 <csum_partial+5d/120>
   0:   00 74 b6 83               add    %dh,0xffffff83(%esi,%esi,4)
Code;  c01bb069 <csum_partial+61/120>
   4:   e9 02 77 cd 74            jmp    74cd770b <_EIP+0x74cd770b>
Code;  c01bb06e <csum_partial+66/120>
   9:   16                        push   %ss
Code;  c01bb06f <csum_partial+67/120>
   a:   83 c1 02                  add    $0x2,%ecx
Code;  c01bb072 <csum_partial+6a/120>
   d:   0f 84 9f 00 00 00         je     b2 <_EIP+0xb2>
Code;  c01bb078 <csum_partial+70/120>
  13:   0f b6 1e                  movzbl (%esi),%ebx
Code;  c01bb07b <csum_partial+73/120>
  16:   01 d8                     add    %ebx,%eax
Code;  c01bb07d <csum_partial+75/120>
  18:   83 d0 00                  adc    $0x0,%eax
Code;  c01bb080 <csum_partial+78/120>
  1b:   e9 92 00 00 00            jmp    b2 <_EIP+0xb2>
Code;  c01bb085 <csum_partial+7d/120>
  20:   66 03 06                  add    (%esi),%ax
Code;  c01bb088 <csum_partial+80/120>
  23:   83 d0 00                  adc    $0x0,%eax
Code;  c01bb08b <csum_partial+83/120>
  26:   e9 87 00 00 00            jmp    b2 <_EIP+0xb2>

This decode from eip onwards should be reliable

Code;  c01bb090 <csum_partial+88/120>
00000000 <_EIP>:
Code;  c01bb090 <csum_partial+88/120>   <=====
   0:   03 46 80                  add    0xffffff80(%esi),%eax   <=====
Code;  c01bb093 <csum_partial+8b/120>
   3:   13 46 84                  adc    0xffffff84(%esi),%eax
Code;  c01bb096 <csum_partial+8e/120>
   6:   13 46 88                  adc    0xffffff88(%esi),%eax
Code;  c01bb099 <csum_partial+91/120>
   9:   13 46 8c                  adc    0xffffff8c(%esi),%eax
Code;  c01bb09c <csum_partial+94/120>
   c:   13 46 90                  adc    0xffffff90(%esi),%eax
Code;  c01bb09f <csum_partial+97/120>
   f:   13 46 94                  adc    0xffffff94(%esi),%eax
Code;  c01bb0a2 <csum_partial+9a/120>
  12:   13 46 98                  adc    0xffffff98(%esi),%eax

EIP: [<c01bb090>] csum_partial+0x88/0x120 SS:ESP 0069:c031fd54
 <0>Kernel panic - not syncing: Fatal exception in interrupt
 [<c010cddd>] smp_call_function+0x53/0xf8
 [<c011b4be>] printk+0x14/0x18
 [<c010ce95>] smp_send_stop+0x13/0x1e
 [<c011aaeb>] panic+0x45/0xde
 [<c0105235>] die+0x242/0x276
 [<c0111d05>] do_page_fault+0xa53/0xb76
 [<c01bb0f0>] csum_partial+0xe8/0x120
 [<c01112b2>] do_page_fault+0x0/0xb76
 [<c0104a0f>] error_code+0x2b/0x30
 [<c01bb0f0>] csum_partial+0xe8/0x120
 [<c01bb090>] csum_partial+0x88/0x120
 [<c022d626>] skb_checksum+0x112/0x27e
 [<c022e532>] pskb_expand_head+0xce/0x112
 [<c0231f33>] skb_checksum_help+0x5d/0xac
 [<e93ef2ea>] ip_nat_fn+0x42/0x184 [iptable_nat]
 [<e93f8092>] ipt_local_hook+0x76/0xcc [iptable_mangle]
 [<e93ef61e>] ip_nat_local_fn+0x34/0xaa [iptable_nat]
 [<c024def0>] dst_output+0x0/0x7
 [<c0246e28>] nf_iterate+0x30/0x61
 [<c024def0>] dst_output+0x0/0x7
 [<c0246f4e>] nf_hook_slow+0x3a/0x90
 [<c024def0>] dst_output+0x0/0x7
 [<c02500e8>] ip_queue_xmit+0x35f/0x3b3
 [<c024def0>] dst_output+0x0/0x7
 [<c0115f49>] rebalance_tick+0x116/0x2ae
 [<c025dab0>] tcp_transmit_skb+0x604/0x632
 [<c025e80c>] tcp_retransmit_skb+0x4e2/0x5c7
 [<c012e066>] hrtimer_run_queues+0x147/0x15f
 [<c0257960>] tcp_enter_loss+0x1a1/0x1fd
 [<c02608e3>] tcp_write_timer+0x0/0x5c9
 [<c0260cdb>] tcp_write_timer+0x3f8/0x5c9
 [<c0123440>] run_timer_softirq+0x101/0x15c
 [<c011f41e>] __do_softirq+0x5e/0xc3
 [<c011f4bd>] do_softirq+0x3a/0x4a
 [<c0106131>] do_IRQ+0x48/0x53
 [<c020c1cc>] evtchn_do_upcall+0x64/0x9b
 [<c0104a51>] hypervisor_callback+0x3d/0x48
 [<c0107342>] raw_safe_halt+0x8c/0xaf
 [<c0102c5f>] xen_idle+0x22/0x2e
 [<c0102d7e>] cpu_idle+0x91/0xab
 [<c03236fc>] start_kernel+0x378/0x37f
Warning (Oops_read): Code line not seen, dumping what data is available

>>EIP; c01bb090 <csum_partial+88/120>   <=====

These are production boxes, so I should avoid rebooting them too often for 
debugging purposes. Any hints on how to approach and solve this issue would 
be highly appreciated.

with best regards,
Rene

-- 
-------------------------------------------------
Gibraltar firewall       http://www.gibraltar.at/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20071203/75edd362/attachment.pgp>