[DRBD-user] 'drbdadm connect' panic?

Lars Ellenberg lars.ellenberg at linbit.com
Thu Jul 19 13:57:27 CEST 2007

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Thu, Jul 19, 2007 at 01:58:38AM -0700, Alex Dean wrote:
> Thanks for the reply.  Answers to your questions below.
> 
> Lars Ellenberg wrote:
> 
> >On Wed, Jul 18, 2007 at 04:19:02PM -0500, alex at crackpot.org wrote:
> >>My 2 drbd boxen are called 42 and 43.
> >>drbd version: 0.7.16 (api:77/proto:74)
> >>
> >>* Today, 42 was primary.
> >>* A co-worker noticed that it was not connected to 43.  (42 =  
> >>'st:Primary/Unknown ld:Consistent', 43 = 'st:Secondary/Unknown  
> >>ld:Consistent')
> >>* I saw that 43 said 'cs:WFConnection'.  Co-worker did 'drbdadm  
> >>connect' on 42, and it kernel paniced.
> >
> >what cs: was 42 in, before the "drbdadm connect" ?
> 
> Looks like it was in 'WFReportParams'.  This is the last 'drbd' notices 
> in /var/log/messages before yesterday.
> 
> Jul 12 05:37:42 dellpe2850-42 kernel: drbd0: [kjournald/5686] 
> sock_sendmsg time expired, ko = 4294967295
> Jul 12 05:38:03 dellpe2850-42 kernel: drbd0: [kjournald/5686] 
> sock_sendmsg time expired, ko = 4294967295
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: PingAck did not arrive in time.
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: drbd0_asender [13219]: 
> cstate Connected --> NetworkFailure
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: asender terminated
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: drbd0_receiver [17511]: 
> cstate NetworkFailure --> BrokenPipe
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: short read expecting header 
> on sock: r=-512
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: short sent UnplugRemote 
> size=8 sent=-1001
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: worker terminated
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: drbd0_receiver [17511]: 
> cstate BrokenPipe --> Unconnected
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: Connection lost.
> Jul 12 05:38:06 dellpe2850-42 kernel: drbd0: drbd0_receiver [17511]: 
> cstate Unconnected --> WFConnection
> Jul 12 05:38:26 dellpe2850-42 kernel: drbd0: drbd0_receiver [17511]: 
> cstate WFConnection --> WFReportParams
> 
> >what is in the kernel logs,
> 
> Jul 18 12:51:19 dellpe2850-42 kernel: drbd0: interrupted during initial 
> handshake
> Jul 18 12:51:19 dellpe2850-42 kernel: drbd0: worker terminated
> Jul 18 12:51:19 dellpe2850-42 kernel: Unable to handle kernel NULL 
> pointer dereference at 000000000000080c RIP:
> 
> This is the last entry in /var/log/messages before reboot.
> 
> >what lead to them being disconnected in the first place?
> 
> Most likely temporary network failure.
> 
> >
> >what does the panic/oops look like?
> 
> I have only a screen-shot, so I can't paste in the full panic message. 
> I will transcribe it in full if you'd like.  There's a call trace 
> containing (among other things) : 'force_sig_info+35', 
                                    

ok...
 interrupted during initial handshake, then
 NULL pointer dereference in force_sig_info...
this appears to be because of a race-condition bug I remember vaguely.

I'm not sure exactly which changelog item of which 0.7.x this corresponds to,
but it should be fixed in the newest 0.7.

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list