<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.5730.11" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial size=2><SPAN class=672582021-09052007>We are seeing a
panic in <FONT face="Times New Roman"
size=3>drbd_al_to_on_disk_bm()</FONT></SPAN></FONT></DIV>
<DIV><FONT face="Times New Roman" size=3><SPAN
class=672582021-09052007></SPAN></FONT> </DIV>
<DIV><FONT><SPAN class=672582021-09052007>Below is the stack and a possible
cause:</SPAN></FONT></DIV>
<DIV><FONT><SPAN class=672582021-09052007>May 3 05:17:59 choip kernel: EIP
is at drbd_al_to_on_disk_bm+0x18/0x470 [drbd]<BR>May 3 05:17:59 choip
kernel: eax: 00000000 ebx: ecbd013c ecx: 00000000 edx:
00000000<BR>May 3 05:17:59 choip kernel: esi: ecbd013c edi:
00000001 ebp: eb2b3ebc esp: eb2b3e34<BR>May 3 05:17:59 choip
heartbeat: [5898]: info: standby: acquire [all] resources from
chois.sn.stratus.com<BR>May 3 05:17:59 choip kernel: ds: 007b es:
007b ss: 0069<BR>May 3 05:17:59 choip heartbeat: [13993]: info:
acquire all HA resources (standby).<BR>May 3 05:17:59 choip kernel:
Process drbd15_receiver (pid: 5758, threadinfo=eb2b2000
task=c591c030)<BR>May 3 05:17:59 choip kernel: Stack: <0>00000000
e7d8fe98 eb2b3eaa cabe263a eb2b3e78 ee238515 00001000 000000d0 <BR>May 3
05:17:59 choip kernel: 0000002c 0000009f 0000003c
00000004 000000d0 eb2b3e80 00000002 eb2b3e94 <BR>May 3 05:17:59 choip
ResourceManager[14004]: info: Acquiring resource group: choip.sn.stratus.com
drbddisk::shared.fs Filesystem::/dev/drbd15::/shared 134.111.32.220 httpd
smd<BR>May 3 05:17:59 choip kernel: eb2b3e80
eb2b3ebc ee42533c 00000004 00000001 0000009f 00000000 00000016 <BR>May 3
05:17:59 choip kernel: Call Trace:<BR>May 3 05:17:59 choip kernel:
[<c0105a01>] show_stack_log_lvl+0xa1/0xe0<BR>May 3 05:17:59 choip
ResourceManager[14004]: info: Running /etc/ha.d/resource.d/drbddisk shared.fs
start<BR>May 3 05:17:59 choip kernel: [<c0105bf1>]
show_registers+0x181/0x200<BR>May 3 05:17:59 choip kernel:
[<c0105e10>] die+0x100/0x1b0<BR>May 3 05:17:59 choip kernel:
[<c01168f6>] do_page_fault+0x3c6/0x8c1<BR>May 3 05:17:59 choip
kernel: [<c010565f>] error_code+0x2b/0x30<BR>May 3 05:17:59
choip kernel: [<ee41ad8e>] after_state_ch+0x77e/0xa70
[drbd]<BR>May 3 05:17:59 choip kernel: [<ee40e1b1>]
receive_state+0x281/0x3c0 [drbd]<BR>May 3 05:17:59 choip kernel:
[<ee40e8a2>] drbdd+0x42/0x170 [drbd]<BR>May 3 05:17:59 choip
kernel: [<ee40fc05>] drbdd_init+0x1c5/0x210 [drbd]<BR>May 3
05:17:59 choip kernel: [<ee41b10c>] drbd_thread_setup+0x8c/0x100
[drbd]<BR>May 3 05:17:59 choip kernel: [<c0103485>]
kernel_thread_helper+0x5/0x10<BR>May 3 05:17:59 choip kernel: Code: ff ff
ff 8b 52 0c eb 94 8d 74 26 00 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec 7c c7
45 90 00 10 00 00 89 c3 8b 80 c0 03 00 00 <f0> 0f ba 68 28 01 19 d2 31 c0
85 d2 0f 94 c0 85 c0 75 76 fc b9 <BR>May 3 05:17:59 choip kernel:
<0>Fatal exception: panic in 5 seconds </SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=672582021-09052007></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN
class=672582021-09052007>======================</SPAN></FONT></DIV>
<DIV><FONT><SPAN class=672582021-09052007>OK ...wait_event() is a macro and
lc_try_lock() is inline. so....lc->flags below is likely where we
died.<BR>static inline int lc_try_lock(struct lru_cache* lc)<BR>{<BR>
return !test_and_set_bit(__LC_DIRTY,&lc->flags);
<=====I think we are here!!!!!<BR>}<BR><BR>Dump of assembler code for
function drbd_al_to_on_disk_bm:<BR>0x00013520
<drbd_al_to_on_disk_bm+0>: push %ebp<BR>0x00013521
<drbd_al_to_on_disk_bm+1>: mov %esp,%ebp<BR>0x00013523
<drbd_al_to_on_disk_bm+3>: push %edi<BR>0x00013524
<drbd_al_to_on_disk_bm+4>: push %esi<BR>0x00013525
<drbd_al_to_on_disk_bm+5>: push %ebx<BR>0x00013526
<drbd_al_to_on_disk_bm+6>: sub $0x7c,%esp<BR>0x00013529
<drbd_al_to_on_disk_bm+9>: movl
$0x1000,0xffffff90(%ebp)<BR>0x00013530 <drbd_al_to_on_disk_bm+16>:
mov %eax,%ebx<BR>0x00013532 <drbd_al_to_on_disk_bm+18>:
mov 0x3c0(%eax),%eax <=====dead here!!!<BR>0x00013538
<drbd_al_to_on_disk_bm+24>: lock btsl $0x1,0x28(%eax)<BR>0x0001353e
<drbd_al_to_on_disk_bm+30>: sbb %edx,%edx<BR>0x00013540
<drbd_al_to_on_disk_bm+32>: xor %eax,%eax<BR>0x00013542
<drbd_al_to_on_disk_bm+34>: test %edx,%edx<BR>0x00013544
<drbd_al_to_on_disk_bm+36>: sete %al<BR>0x00013547
<drbd_al_to_on_disk_bm+39>: test %eax,%eax<BR>0x00013549
<drbd_al_to_on_disk_bm+41>: jne 0x135c1
<drbd_al_to_on_disk_bm+161><BR>0x0001354b
<drbd_al_to_on_disk_bm+43>: cld<BR></SPAN></FONT></DIV>
<DIV><FONT><SPAN class=672582021-09052007>Here is theory since I cannot
reproduce at will. It seems to me that on the panic'ed node our disk had a
fault </SPAN></FONT></DIV>
<DIV><FONT><SPAN class=672582021-09052007>inserted. so we went diskless.
At that point we call after_state_ch() and did this:<BR> if ( os.disk >
Diskless && ns.disk == Diskless ) {<BR>
/* since inc_local() only works as long as
disk>=Inconsistent,<BR>
and it is Diskless here, local_cnt can only go down, it
can<BR> not
increase... It will reach zero */<BR>
wait_event(mdev->misc_wait,
!atomic_read(&mdev->local_cnt));<BR><BR>
drbd_free_bc(mdev->bc); mdev->bc =
NULL;<BR>
lc_free(mdev->resync); mdev->resync = NULL;<BR>
lc_free(mdev->act_log); mdev->act_log =
NULL; //We free things
here!!!!<BR> }<BR>So we freed the the
lrucache.<BR><BR>But later, the peer got a fault inserted and entered
diskless so we set peer to Secondary . The panic'ed node receives that
state and call after_state_ch() and did this:<BR> if( ns.pdsk <
Inconsistent ) {<BR> /*
Diskless Peer becomes primary */<BR>
if (os.peer == Secondary && ns.peer == Primary )
{<BR>
drbd_uuid_new_current(mdev);<BR>
}<BR>
/* Diskless Peer becomes secondary */<BR>
if (os.peer == Primary && ns.peer ==
Secondary ) {<BR>
drbd_al_to_on_disk_bm(mdev); <BR>
}<BR> }<BR>But the lc entry was freed by the
time we called drbd_al_to_on_disk_bm(). If this is correct, I am
still not sure</SPAN></FONT></DIV>
<DIV><FONT><SPAN class=672582021-09052007>how to best fix
this.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=672582021-09052007></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN
class=672582021-09052007>Thanks</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=672582021-09052007>EM--</SPAN></FONT></DIV>
<DIV><FONT><SPAN
class=672582021-09052007> </DIV></SPAN></FONT></BODY></HTML>