Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
hi phillip, this is the crash_dump which got generated I generated after running "ksymoops" over the panic trace. I would again like to add ( this panic occurs on a primary server when secondary comes and tries to sync with primary... and not all the time.... but 30-40% of the time). Here is the crash dump ------------------------------------------------------------------------------------ ksymoops 2.4.9 on i686 2.4.25-SGkernel. Options used kernel BUG at page_alloc.c:98! invalid operand: 0000 CPU: 0 EIP: 0010:[<c013cfe0>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010202 eax: 00000001 ebx: c1af5380 ecx: 00000000 edx: 00000000 esi: e7f00080 edi: 00000000 ebp: f702a500 esp: c02a3d88 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c02a3000) Stack: f7cad400 f7cad400 00000000 c02a2000 c01ff170 f7cad400 e817b780 c02a2000 00000001 e7f00080 f702a5d8 f702a500 c01fa8ec e7efe180 e7f00080 e7f00080 c01fa927 e7f00080 f7c70400 e7f00080 f702a500 c01faa5f e7f00080 00000000 Call Trace: [<c01ff170>] [<c01fa8ec>] [<c01fa927>] [<c01faa5f>] [<c022bd21>] [<c021b143>] [<c0223e88>] [<c022b0c5>] [<c022b614>] [<c021049c>] [<c0210689>] [<c01ff723>] [<c01ff85c>] [<c01ff999>] [<c0124cc9>] [<c010af69>] [<c01070b0>] [<c010daa8>] [<c01070b0>] [<c01070d9>] [<c0107172>] [<c0105000>] Code: 0f 0b 62 00 6f 23 26 c0 e9 69 fd ff ff 8d 76 00 55 57 56 53 >>EIP; c013cfe0 <__free_pages_ok+2c0/2d0> <===== >>ebx; c1af5380 <_end+17ce748/386e0428> >>esi; e7f00080 <_end+27bd9448/386e0428> >>ebp; f702a500 <_end+36d038c8/386e0428> >>esp; c02a3d88 <init_task_union+1d88/2000> Trace; c01ff170 <dev_queue_xmit+290/350> Trace; c01fa8ec <skb_release_data+7c/a0> Trace; c01fa927 <kfree_skbmem+17/80> Trace; c01faa5f <__kfree_skb+cf/120> Trace; c022bd21 <tcp_v4_destroy_sock+71/170> Trace; c021b143 <tcp_destroy_sock+73/200> Trace; c0223e88 <tcp_rcv_state_process+898/ac0> Trace; c022b0c5 <tcp_v4_do_rcv+b5/150> Trace; c022b614 <tcp_v4_rcv+4b4/5e0> Trace; c021049c <ip_local_deliver_finish+12c/140> Trace; c0210689 <ip_rcv_finish+1d9/236> Trace; c01ff723 <netif_receive_skb+d3/190> Trace; c01ff85c <process_backlog+7c/120> Trace; c01ff999 <net_rx_action+99/140> Trace; c0124cc9 <do_softirq+d9/e0> Trace; c010af69 <do_IRQ+e9/f0> Trace; c01070b0 <default_idle+0/50> Trace; c010daa8 <call_do_IRQ+5/d> Trace; c01070b0 <default_idle+0/50> Trace; c01070d9 <default_idle+29/50> Trace; c0107172 <cpu_idle+52/70> Trace; c0105000 <_stext+0/0> Code; c013cfe0 <__free_pages_ok+2c0/2d0> 00000000 <_EIP>: Code; c013cfe0 <__free_pages_ok+2c0/2d0> <===== 0: 0f 0b ud2a <===== Code; c013cfe2 <__free_pages_ok+2c2/2d0> 2: 62 00 bound %eax,(%eax) Code; c013cfe4 <__free_pages_ok+2c4/2d0> 4: 6f outsl %ds:(%esi),(%dx) Code; c013cfe5 <__free_pages_ok+2c5/2d0> 5: 23 26 and (%esi),%esp Code; c013cfe7 <__free_pages_ok+2c7/2d0> 7: c0 e9 69 shr $0x69,%cl Code; c013cfea <__free_pages_ok+2ca/2d0> a: fd std Code; c013cfeb <__free_pages_ok+2cb/2d0> b: ff (bad) Code; c013cfec <__free_pages_ok+2cc/2d0> c: ff 8d 76 00 55 57 decl 0x57550076(%ebp) Code; c013cff2 <rmqueue+2/250> 12: 56 push %esi Code; c013cff3 <rmqueue+3/250> 13: 53 push %ebx ------------------------------------------------------------------------------------- TIA -sunil On 4/12/05, Philipp Reisner <philipp.reisner at linbit.com> wrote: > Am Donnerstag, 7. April 2005 12:35 schrieb sunil arora: > > Hi all, > > > > this is my first post to this mailing list. I am using HA cluster ( of > > 2 nodes) using heartbeat and drbd. I am using drbd0.7.10 with kernel > > 2.4.25 ( smp support taken custamized compilation). > > > > kernel panic is happening when we do perform following steps on cluster: > > > > 2 nodes are working in HA cluster > > > > node1 ---------------------node 2 > > primary secondary > > > > 1) node1 is primary and node2 is secondary > > 2) reboot node1, ..node2 takes over as primary > > 3) now when node1 comes up after reboot and tries to sinc with > > current primary (node2). In 50% of the cases kernel oops with > > following dump all the time > > > > crash is happening at line no. 98 ( page_alloc.c ) caused in > > drbd_reciever I tried to dig around in the kernel a bit and found out , the > > panic is being caused by > > > > freeing of some pages while being in Interrupt. > > > > Has anybody faced this problem ? > > or drbd0.7.10 has some problem with kernel 2.4.25 > > > > Is it possible for you to get the whole oops message, to > run it throuch ksymoops and to post it to the list ? > > -Philipp > -- > : Dipl-Ing Philipp Reisner Tel +43-1-8178292-50 : > : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : > : Schönbrunnerstr 244, 1120 Vienna, Austria http://www.linbit.com : >