[DRBD-user] kernel ( 2.4.25) panic with drbd0.7.10

Thu Apr 14 11:28:05 CEST 2005

hi phillip,
this is the crash_dump which got generated I generated after running
"ksymoops" over the panic trace.

I would again like to add  ( this panic occurs on a primary server
when secondary comes and tries to sync with primary... and not all the
time.... but 30-40% of the time).

Here is the crash dump
------------------------------------------------------------------------------------

ksymoops 2.4.9 on i686 2.4.25-SGkernel.  Options used
kernel BUG at page_alloc.c:98!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c013cfe0>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000001   ebx: c1af5380   ecx: 00000000   edx: 00000000
esi: e7f00080   edi: 00000000   ebp: f702a500   esp: c02a3d88
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c02a3000)
Stack: f7cad400 f7cad400 00000000 c02a2000 c01ff170 f7cad400 e817b780 c02a2000
       00000001 e7f00080 f702a5d8 f702a500 c01fa8ec e7efe180 e7f00080 e7f00080
       c01fa927 e7f00080 f7c70400 e7f00080 f702a500 c01faa5f e7f00080 00000000
Call Trace:    [<c01ff170>] [<c01fa8ec>] [<c01fa927>] [<c01faa5f>] [<c022bd21>]
  [<c021b143>] [<c0223e88>] [<c022b0c5>] [<c022b614>] [<c021049c>] [<c0210689>]
  [<c01ff723>] [<c01ff85c>] [<c01ff999>] [<c0124cc9>] [<c010af69>] [<c01070b0>]
  [<c010daa8>] [<c01070b0>] [<c01070d9>] [<c0107172>] [<c0105000>]
Code: 0f 0b 62 00 6f 23 26 c0 e9 69 fd ff ff 8d 76 00 55 57 56 53

>>EIP; c013cfe0 <__free_pages_ok+2c0/2d0>   <=====

>>ebx; c1af5380 <_end+17ce748/386e0428>
>>esi; e7f00080 <_end+27bd9448/386e0428>
>>ebp; f702a500 <_end+36d038c8/386e0428>
>>esp; c02a3d88 <init_task_union+1d88/2000>

Trace; c01ff170 <dev_queue_xmit+290/350>
Trace; c01fa8ec <skb_release_data+7c/a0>
Trace; c01fa927 <kfree_skbmem+17/80>
Trace; c01faa5f <__kfree_skb+cf/120>
Trace; c022bd21 <tcp_v4_destroy_sock+71/170>
Trace; c021b143 <tcp_destroy_sock+73/200>
Trace; c0223e88 <tcp_rcv_state_process+898/ac0>
Trace; c022b0c5 <tcp_v4_do_rcv+b5/150>
Trace; c022b614 <tcp_v4_rcv+4b4/5e0>
Trace; c021049c <ip_local_deliver_finish+12c/140>
Trace; c0210689 <ip_rcv_finish+1d9/236>
Trace; c01ff723 <netif_receive_skb+d3/190>
Trace; c01ff85c <process_backlog+7c/120>
Trace; c01ff999 <net_rx_action+99/140>
Trace; c0124cc9 <do_softirq+d9/e0>
Trace; c010af69 <do_IRQ+e9/f0>
Trace; c01070b0 <default_idle+0/50>
Trace; c010daa8 <call_do_IRQ+5/d>
Trace; c01070b0 <default_idle+0/50>
Trace; c01070d9 <default_idle+29/50>
Trace; c0107172 <cpu_idle+52/70>
Trace; c0105000 <_stext+0/0>

Code;  c013cfe0 <__free_pages_ok+2c0/2d0>
00000000 <_EIP>:
Code;  c013cfe0 <__free_pages_ok+2c0/2d0>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c013cfe2 <__free_pages_ok+2c2/2d0>
   2:   62 00                     bound  %eax,(%eax)
Code;  c013cfe4 <__free_pages_ok+2c4/2d0>
   4:   6f                        outsl  %ds:(%esi),(%dx)
Code;  c013cfe5 <__free_pages_ok+2c5/2d0>
   5:   23 26                     and    (%esi),%esp
Code;  c013cfe7 <__free_pages_ok+2c7/2d0>
   7:   c0 e9 69                  shr    $0x69,%cl
Code;  c013cfea <__free_pages_ok+2ca/2d0>
   a:   fd                        std    
Code;  c013cfeb <__free_pages_ok+2cb/2d0>
   b:   ff                        (bad)  
Code;  c013cfec <__free_pages_ok+2cc/2d0>
   c:   ff 8d 76 00 55 57         decl   0x57550076(%ebp)
Code;  c013cff2 <rmqueue+2/250>
  12:   56                        push   %esi
Code;  c013cff3 <rmqueue+3/250>
  13:   53                        push   %ebx

-------------------------------------------------------------------------------------

TIA
-sunil

On 4/12/05, Philipp Reisner <philipp.reisner at linbit.com> wrote:
> Am Donnerstag, 7. April 2005 12:35 schrieb sunil arora:
> > Hi all,
> >
> > this is my first post to this mailing list. I am using HA cluster ( of
> > 2 nodes) using heartbeat and drbd. I am using drbd0.7.10 with kernel
> > 2.4.25 ( smp support taken custamized compilation).
> >
> > kernel panic is happening when we do perform following steps on cluster:
> >
> > 2 nodes are working in HA cluster
> >
> >                  node1 ---------------------node 2
> >                   primary                           secondary
> >
> > 1)  node1 is primary and node2 is secondary
> > 2)  reboot node1, ..node2 takes over as primary
> > 3)  now when node1 comes up after reboot and tries to sinc with
> > current primary (node2). In 50% of the cases kernel oops with
> > following dump all the time
> >
> > crash is happening at   line no. 98 ( page_alloc.c )  caused in
> > drbd_reciever I tried to dig around in the kernel a bit and found out , the
> > panic is being caused by
> >
> > freeing of some pages while being in Interrupt.
> >
> > Has anybody faced this problem ?
> > or drbd0.7.10 has some problem with kernel 2.4.25
> >
> 
> Is it possible for you to get the whole oops message, to
> run it throuch ksymoops and to post it to the list ?
> 
> -Philipp
> --
> : Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
> : LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
> : Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :
>