Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2006-02-19 14:21:03 -0500 \ Maurice Volaski: > First posted on 2/7 with no reply. It hasn't happened since, but once > is enough to mean something is wrong somewhere..... > > > On the second go around, drbd successfully resynced all 5600 GB > without crashing. =D> > > I ran a quick fsck to recover the journals from the earlier crash, > and it worked OK for every resource but one. For that one, it just > sat there. I looked at the log and saw numerous messages like so: > > Feb 6 15:18:32 [kernel] drbd6: [fsck.ext3/4455] sock_sendmsg time > expired, ko = 4294967295 this message is triggered whenever the (larger) data packets get stuck i.e. the peer does not process writes "in time". but it still answers to "drbd ping" packets on the other socket in time, otherwise we'd declare the connection/peer dead and go into WFConnection. we start to decrement the ko count (which defaults to zero, thats why you see a ((unsigned int)-1) above). once that counter hits zero, we declare the io-subsystem of the peer dead/the peer uncooperative, and go into StandAlone. the count is reset as soon as the peer processes one single write. this is a hint that either the io subsystem of your peer is broken, or has been very busy at that time. -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.