Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Dec 16, 2008 at 08:53:58AM +0000, Ben Clewett wrote: > > > Hi Guys, > > I had a fault with 8.2.7 last night. During the time where DRBD was > handling the fault I had a kernel panic/hang. I believe the panic was > probably caused by DRBD. Because of the panic/hang there is very little > in the log file. What I have is listed below. > > Can any person suggest whether this may be a DRBD problem? Only I want > to put this server live this evening, and I'm now very worried about it! > > Any help very welcome! > > Regards, Ben I don't see anything in the messages below that suggests drbd is the problem here. for the information given so far, it can be anything. hook up a serial console and log it to capture any future oops/panic message. > -------------------------- > > Linux hp-tm-12 2.6.25.18-0.2-default > > version: 8.2.7 (api:88/proto:86-88) > GIT-hash: 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d build by > root at hp-tm-12, 2008-11-25 17:19:15 > > -------------------------- > > /var/log/messages on dead server: > > 00:06:25 drbd0: sock was shut down by peer > 00:06:25 drbd0: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe > ) pdsk( UpToDate -> DUnknown ) > 00:06:25 drbd0: short read expecting header on sock: r=0 > > ** Kernel Hang/Panic until reboot ** > > 08:22:48 [<ffffffff80217368>] mtrr_add_page+0x270/0x34d > 08:22:48 [<ffffffff80217745>] mtrr_file_add+0x91/0xaa > 08:22:48 [<ffffffff80217b12>] mtrr_ioctl+0x3b4/0x542 > 08:22:48 [<ffffffff802df107>] proc_reg_unlocked_ioctl+0x7c/0xd7 > 08:22:48 [<ffffffff802acada>] vfs_ioctl+0x2a/0x78 > 08:22:48 [<ffffffff802acd6f>] do_vfs_ioctl+0x247/0x261 > 08:22:48 [<ffffffff802acdde>] sys_ioctl+0x55/0x77 > 08:22:48 [<ffffffff8020bffa>] system_call_after_swapgs+0x8a/0x8f > 08:22:48 [<00007fc201c72b67>] > 08:22:48 kernel: > 08:22:48 ---[ end trace 0a6413c31e348d2f ]--- > 08:22:48 ------------[ cut here ]------------ > > -------------------------- > > /var/log/messages on server which did not hang: > > > 00:06:25 drbd0: PingAck did not arrive in time. > 00:06:25 drbd0: peer( Secondary -> Unknown ) conn( Connected -> > NetworkFailure ) pdsk( UpToDate -> DUnknown ) > 00:06:25 drbd0: asender terminated > 00:06:25 drbd0: Terminating asender thread > 00:06:25 drbd0: Creating new current UUID > 00:06:25 drbd0: short read expecting header on sock: r=-512 > 00:06:25 drbd0: Connection closed > 00:06:25 drbd0: conn( NetworkFailure -> Unconnected ) > 00:06:25 drbd0: receiver terminated > 00:06:25 drbd0: Restarting receiver thread > 00:06:25 drbd0: receiver (re)started > 00:06:25 drbd0: conn( Unconnected -> WFConnection ) > 00:06:41 drbd0: Handshake successful: Agreed network protocol version 88 > 00:06:41 drbd0: conn( WFConnection -> WFReportParams ) > 00:06:41 drbd0: Starting asender thread (from drbd0_receiver [3491]) > 00:06:41 drbd0: data-integrity-alg: <not-used> > 00:06:41 drbd0: drbd_sync_handshake: > 00:06:41 drbd0: self > 6C43B920C2584C8B:65FCA875E302675F:B4D1BAA9F48439CF:90CE33222F63999E > 00:06:41 drbd0: peer > 65FCA875E302675F:0000000000000000:B4D1BAA9F48439CE:90CE33222F63999E > 00:06:41 drbd0: uuid_compare()=1 by rule 7 > 00:06:41 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> > WFBitMapS ) pdsk( DUnknown -> UpToDate ) > 00:06:51 drbd0: PingAck did not arrive in time. > 00:06:51 drbd0: peer( Secondary -> Unknown ) conn( WFBitMapS -> > NetworkFailure ) pdsk( UpToDate -> DUnknown ) > 00:06:51 drbd0: asender terminated > 00:06:51 drbd0: Terminating asender thread > 00:06:51 drbd0: error receiving ReportBitMap, l: 4088! > 00:06:51 drbd0: Connection closed > 00:06:51 drbd0: conn( NetworkFailure -> Unconnected ) > 00:06:51 drbd0: receiver terminated > 00:06:51 drbd0: Restarting receiver thread > 00:06:51 drbd0: receiver (re)started > 00:06:51 drbd0: conn( Unconnected -> WFConnection ) > 00:07:16 drbd0: Handshake successful: Agreed network protocol version 88 > 00:07:16 drbd0: conn( WFConnection -> WFReportParams ) > 00:07:16 drbd0: Starting asender thread (from drbd0_receiver [3491]) > 00:07:16 drbd0: data-integrity-alg: <not-used> > 00:07:16 drbd0: drbd_sync_handshake: > 00:07:16 drbd0: self > 6C43B920C2584C8B:65FCA875E302675F:B4D1BAA9F48439CF:90CE33222F63999E > 00:07:16 drbd0: peer > 65FCA875E302675F:0000000000000000:B4D1BAA9F48439CE:90CE33222F63999E > 00:07:16 drbd0: uuid_compare()=1 by rule 7 > 00:07:16 drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> > WFBitMapS ) pdsk( DUnknown -> UpToDate ) > 00:07:16 drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> > Inconsistent ) > 00:07:16 drbd0: Began resync as SyncSource (will sync 9768 KB [2442 bits > set]). > 00:10:23 drbd1: PingAck did not arrive in time. > 00:10:23 drbd1: peer( Primary -> Unknown ) conn( Connected -> > NetworkFailure ) pdsk( UpToDate -> DUnknown ) > 00:10:23 drbd1: asender terminated > 00:10:23 drbd1: Terminating asender thread > 00:10:23 drbd1: short read expecting header on sock: r=-512 > 00:10:23 drbd1: Connection closed > 00:10:23 drbd1: conn( NetworkFailure -> Unconnected ) > 00:10:23 drbd1: receiver terminated > 00:10:23 drbd1: Restarting receiver thread > 00:10:23 drbd1: receiver (re)started > 00:10:23 drbd1: conn( Unconnected -> WFConnection ) > 00:10:26 drbd0: PingAck did not arrive in time. > 00:10:26 drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> > NetworkFailure ) > 00:10:26 drbd0: asender terminated > 00:10:26 drbd0: Terminating asender thread > 00:10:26 drbd0: short read expecting header on sock: r=-512 > 00:10:26 drbd0: Connection closed > 00:10:26 drbd0: conn( NetworkFailure -> Unconnected ) > 00:10:26 drbd0: receiver terminated > 00:10:26 drbd0: Restarting receiver thread > 00:10:26 drbd0: receiver (re)started > 00:10:26 drbd0: conn( Unconnected -> WFConnection ) > 00:12:10 drbd1: Handshake successful: Agreed network protocol version 88 > 00:12:10 drbd1: conn( WFConnection -> WFReportParams ) > 00:12:10 drbd1: Starting asender thread (from drbd1_receiver [3494]) > 00:12:10 drbd1: data-integrity-alg: <not-used> > 00:12:10 drbd1: drbd_sync_handshake: > 00:12:10 drbd1: self > D5FEF42F4E1EBD62:0000000000000000:FBB62BAB73B59A46:0437B1B84EC0633D > 00:12:10 drbd1: peer > 1419EE6C8C122AAF:D5FEF42F4E1EBD63:FBB62BAB73B59A46:0437B1B84EC0633D > 00:12:10 drbd1: uuid_compare()=-1 by rule 5 > 00:12:10 drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> > WFBitMapT ) pdsk( DUnknown -> UpToDate ) > 00:12:21 drbd1: PingAck did not arrive in time. > 00:12:21 drbd1: peer( Primary -> Unknown ) conn( WFBitMapT -> > NetworkFailure ) pdsk( UpToDate -> DUnknown ) > 00:12:21 drbd1: asender terminated > 00:12:21 drbd1: Terminating asender thread > 00:12:21 drbd1: error receiving ReportBitMap, l: 4088! > 00:12:21 drbd1: Connection closed > 00:12:21 drbd1: conn( NetworkFailure -> Unconnected ) > 00:12:21 drbd1: receiver terminated > 00:12:21 drbd1: Restarting receiver thread > 00:12:21 drbd1: receiver (re)started > 00:12:21 drbd1: conn( Unconnected -> WFConnection ) > > > Last entry until peer was rebooted. > -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed