Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Thu, Jul 19, 2007 at 01:58:38AM -0700, Alex Dean wrote:
> Thanks for the reply. Answers to your questions below.
>
> Lars Ellenberg wrote:
>
> >On Wed, Jul 18, 2007 at 04:19:02PM -0500, alex at crackpot.org wrote:
> >>My 2 drbd boxen are called 42 and 43.
> >>drbd version: 0.7.16 (api:77/proto:74)
> >>
> >>* Today, 42 was primary.
> >>* A co-worker noticed that it was not connected to 43. (42 =
> >>'st:Primary/Unknown ld:Consistent', 43 = 'st:Secondary/Unknown
> >>ld:Consistent')
> >>* I saw that 43 said 'cs:WFConnection'. Co-worker did 'drbdadm
> >>connect' on 42, and it kernel paniced.
> >
> >what cs: was 42 in, before the "drbdadm connect" ?
>
> Looks like it was in 'WFReportParams'. This is the last 'drbd' notices
> in /var/log/messages before yesterday.
>
> Jul 12 05:37:42 dellpe2850-42 kernel: drbd0: [kjournald/5686]
> sock_sendmsg time expired, ko = 4294967295
> Jul 12 05:38:03 dellpe2850-42 kernel: drbd0: [kjournald/5686]
> sock_sendmsg time expired, ko = 4294967295
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: PingAck did not arrive in time.
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: drbd0_asender [13219]:
> cstate Connected --> NetworkFailure
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: asender terminated
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: drbd0_receiver [17511]:
> cstate NetworkFailure --> BrokenPipe
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: short read expecting header
> on sock: r=-512
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: short sent UnplugRemote
> size=8 sent=-1001
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: worker terminated
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: drbd0_receiver [17511]:
> cstate BrokenPipe --> Unconnected
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: Connection lost.
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: drbd0_receiver [17511]:
> cstate Unconnected --> WFConnection
> Jul 12 05:38:26 dellpe2850-42 kernel: drbd0: drbd0_receiver [17511]:
> cstate WFConnection --> WFReportParams
>
> >what is in the kernel logs,
>
> Jul 18 12:51:19 dellpe2850-42 kernel: drbd0: interrupted during initial
> handshake
> Jul 18 12:51:19 dellpe2850-42 kernel: drbd0: worker terminated
> Jul 18 12:51:19 dellpe2850-42 kernel: Unable to handle kernel NULL
> pointer dereference at 000000000000080c RIP:
>
> This is the last entry in /var/log/messages before reboot.
>
> >what lead to them being disconnected in the first place?
>
> Most likely temporary network failure.
>
> >
> >what does the panic/oops look like?
>
> I have only a screen-shot, so I can't paste in the full panic message.
> I will transcribe it in full if you'd like. There's a call trace
> containing (among other things) : 'force_sig_info+35',
ok...
interrupted during initial handshake, then
NULL pointer dereference in force_sig_info...
this appears to be because of a race-condition bug I remember vaguely.
I'm not sure exactly which changelog item of which 0.7.x this corresponds to,
but it should be fixed in the newest 0.7.
--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.