[DRBD-user] drbd-0.7.15 sock_sendmsg time expired for no apparent reason during fsck journal recovery (repost)

Lars Ellenberg Lars.Ellenberg at linbit.com
Mon Feb 20 10:21:01 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2006-02-19 14:21:03 -0500
\ Maurice Volaski:
> First posted on 2/7 with no reply. It hasn't happened since, but once
> is enough to mean something is wrong somewhere.....
> 
> 
> On the second go around, drbd successfully resynced all 5600 GB
> without crashing. =D>
> 
> I ran a quick fsck to recover the journals from the earlier crash,
> and it worked OK for every resource but one. For that one, it just
> sat there. I looked at the log and saw numerous messages like so:
> 
> Feb  6 15:18:32 [kernel] drbd6: [fsck.ext3/4455] sock_sendmsg time
> expired, ko = 4294967295

this message is triggered whenever the (larger) data packets get stuck
i.e. the peer does not process writes "in time".
but it still answers to "drbd ping" packets on the other socket in time,
otherwise we'd declare the connection/peer dead and go into WFConnection.

we start to decrement the ko count (which defaults to zero, thats why you
see a ((unsigned int)-1) above).  once that counter hits zero, we
declare the io-subsystem of the peer dead/the peer uncooperative,
and go into StandAlone.

the count is reset as soon as the peer processes one single write.

this is a hint that either the io subsystem of your peer is broken,
or has been very busy at that time.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list