[DRBD-user] Kernel Panic

Lars Ellenberg lars.ellenberg at linbit.com
Tue Dec 16 14:22:06 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Dec 16, 2008 at 08:53:58AM +0000, Ben Clewett wrote:
>
>
> Hi Guys,
>
> I had a fault with 8.2.7 last night.  During the time where DRBD was  
> handling the fault I had a kernel panic/hang.  I believe the panic was  
> probably caused by DRBD.  Because of the panic/hang there is very little  
> in the log file.  What I have is listed below.
>
> Can any person suggest whether this may be a DRBD problem?   Only I want  
> to put this server live this evening, and I'm now very worried about it!
>
> Any help very welcome!
>
> Regards,  Ben

I don't see anything in the messages below that suggests drbd is the
problem here.  for the information given so far, it can be anything.

hook up a serial console and log it to capture
any future oops/panic message.

> --------------------------
>
> Linux hp-tm-12 2.6.25.18-0.2-default
>
> version: 8.2.7 (api:88/proto:86-88)
> GIT-hash: 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d build by  
> root at hp-tm-12, 2008-11-25 17:19:15
>
> --------------------------
>
> /var/log/messages on dead server:
>
> 00:06:25 drbd0: sock was shut down by peer
> 00:06:25 drbd0: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe  
> ) pdsk( UpToDate -> DUnknown )
> 00:06:25 drbd0: short read expecting header on sock: r=0
>
> ** Kernel Hang/Panic until reboot **
>
> 08:22:48  [<ffffffff80217368>] mtrr_add_page+0x270/0x34d
> 08:22:48  [<ffffffff80217745>] mtrr_file_add+0x91/0xaa
> 08:22:48  [<ffffffff80217b12>] mtrr_ioctl+0x3b4/0x542
> 08:22:48  [<ffffffff802df107>] proc_reg_unlocked_ioctl+0x7c/0xd7
> 08:22:48  [<ffffffff802acada>] vfs_ioctl+0x2a/0x78
> 08:22:48  [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261
> 08:22:48  [<ffffffff802acdde>] sys_ioctl+0x55/0x77
> 08:22:48  [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f
> 08:22:48  [<00007fc201c72b67>]
> 08:22:48 kernel:
> 08:22:48 ---[ end trace 0a6413c31e348d2f ]---
> 08:22:48 ------------[ cut here ]------------
>
> --------------------------
>
> /var/log/messages on server which did not hang:
>
>
> 00:06:25 drbd0: PingAck did not arrive in time.
> 00:06:25 drbd0: peer( Secondary -> Unknown ) conn( Connected ->  
> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> 00:06:25 drbd0: asender terminated
> 00:06:25 drbd0: Terminating asender thread
> 00:06:25 drbd0: Creating new current UUID
> 00:06:25 drbd0: short read expecting header on sock: r=-512
> 00:06:25 drbd0: Connection closed
> 00:06:25 drbd0: conn( NetworkFailure -> Unconnected )
> 00:06:25 drbd0: receiver terminated
> 00:06:25 drbd0: Restarting receiver thread
> 00:06:25 drbd0: receiver (re)started
> 00:06:25 drbd0: conn( Unconnected -> WFConnection )
> 00:06:41 drbd0: Handshake successful: Agreed network protocol version 88
> 00:06:41 drbd0: conn( WFConnection -> WFReportParams )
> 00:06:41 drbd0: Starting asender thread (from drbd0_receiver [3491])
> 00:06:41 drbd0: data-integrity-alg: <not-used>
> 00:06:41 drbd0: drbd_sync_handshake:
> 00:06:41 drbd0: self  
> 6C43B920C2584C8B:65FCA875E302675F:B4D1BAA9F48439CF:90CE33222F63999E
> 00:06:41 drbd0: peer  
> 65FCA875E302675F:0000000000000000:B4D1BAA9F48439CE:90CE33222F63999E
> 00:06:41 drbd0: uuid_compare()=1 by rule 7
> 00:06:41 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams ->  
> WFBitMapS ) pdsk( DUnknown -> UpToDate )
> 00:06:51 drbd0: PingAck did not arrive in time.
> 00:06:51 drbd0: peer( Secondary -> Unknown ) conn( WFBitMapS ->  
> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> 00:06:51 drbd0: asender terminated
> 00:06:51 drbd0: Terminating asender thread
> 00:06:51 drbd0: error receiving ReportBitMap, l: 4088!
> 00:06:51 drbd0: Connection closed
> 00:06:51 drbd0: conn( NetworkFailure -> Unconnected )
> 00:06:51 drbd0: receiver terminated
> 00:06:51 drbd0: Restarting receiver thread
> 00:06:51 drbd0: receiver (re)started
> 00:06:51 drbd0: conn( Unconnected -> WFConnection )
> 00:07:16 drbd0: Handshake successful: Agreed network protocol version 88
> 00:07:16 drbd0: conn( WFConnection -> WFReportParams )
> 00:07:16 drbd0: Starting asender thread (from drbd0_receiver [3491])
> 00:07:16 drbd0: data-integrity-alg: <not-used>
> 00:07:16 drbd0: drbd_sync_handshake:
> 00:07:16 drbd0: self  
> 6C43B920C2584C8B:65FCA875E302675F:B4D1BAA9F48439CF:90CE33222F63999E
> 00:07:16 drbd0: peer  
> 65FCA875E302675F:0000000000000000:B4D1BAA9F48439CE:90CE33222F63999E
> 00:07:16 drbd0: uuid_compare()=1 by rule 7
> 00:07:16 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams ->  
> WFBitMapS ) pdsk( DUnknown -> UpToDate )
> 00:07:16 drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate ->  
> Inconsistent )
> 00:07:16 drbd0: Began resync as SyncSource (will sync 9768 KB [2442 bits  
> set]).
> 00:10:23 drbd1: PingAck did not arrive in time.
> 00:10:23 drbd1: peer( Primary -> Unknown ) conn( Connected ->  
> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> 00:10:23 drbd1: asender terminated
> 00:10:23 drbd1: Terminating asender thread
> 00:10:23 drbd1: short read expecting header on sock: r=-512
> 00:10:23 drbd1: Connection closed
> 00:10:23 drbd1: conn( NetworkFailure -> Unconnected )
> 00:10:23 drbd1: receiver terminated
> 00:10:23 drbd1: Restarting receiver thread
> 00:10:23 drbd1: receiver (re)started
> 00:10:23 drbd1: conn( Unconnected -> WFConnection )
> 00:10:26 drbd0: PingAck did not arrive in time.
> 00:10:26 drbd0: peer( Secondary -> Unknown ) conn( SyncSource ->  
> NetworkFailure )
> 00:10:26 drbd0: asender terminated
> 00:10:26 drbd0: Terminating asender thread
> 00:10:26 drbd0: short read expecting header on sock: r=-512
> 00:10:26 drbd0: Connection closed
> 00:10:26 drbd0: conn( NetworkFailure -> Unconnected )
> 00:10:26 drbd0: receiver terminated
> 00:10:26 drbd0: Restarting receiver thread
> 00:10:26 drbd0: receiver (re)started
> 00:10:26 drbd0: conn( Unconnected -> WFConnection )
> 00:12:10 drbd1: Handshake successful: Agreed network protocol version 88
> 00:12:10 drbd1: conn( WFConnection -> WFReportParams )
> 00:12:10 drbd1: Starting asender thread (from drbd1_receiver [3494])
> 00:12:10 drbd1: data-integrity-alg: <not-used>
> 00:12:10 drbd1: drbd_sync_handshake:
> 00:12:10 drbd1: self  
> D5FEF42F4E1EBD62:0000000000000000:FBB62BAB73B59A46:0437B1B84EC0633D
> 00:12:10 drbd1: peer  
> 1419EE6C8C122AAF:D5FEF42F4E1EBD63:FBB62BAB73B59A46:0437B1B84EC0633D
> 00:12:10 drbd1: uuid_compare()=-1 by rule 5
> 00:12:10 drbd1: peer( Unknown -> Primary ) conn( WFReportParams ->  
> WFBitMapT ) pdsk( DUnknown -> UpToDate )
> 00:12:21 drbd1: PingAck did not arrive in time.
> 00:12:21 drbd1: peer( Primary -> Unknown ) conn( WFBitMapT ->  
> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> 00:12:21 drbd1: asender terminated
> 00:12:21 drbd1: Terminating asender thread
> 00:12:21 drbd1: error receiving ReportBitMap, l: 4088!
> 00:12:21 drbd1: Connection closed
> 00:12:21 drbd1: conn( NetworkFailure -> Unconnected )
> 00:12:21 drbd1: receiver terminated
> 00:12:21 drbd1: Restarting receiver thread
> 00:12:21 drbd1: receiver (re)started
> 00:12:21 drbd1: conn( Unconnected -> WFConnection )
>
>
> Last entry until peer was rebooted.
>

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list