Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2006-10-25 14:28:46 +0200 \ Leroy van Logchem: > > >/ 2006-10-21 14:14:34 +0200 > >\ Sim: > > > >>Hi! > >> > >>Today I have see this and the server was been locked: > >> > >>Oct 21 11:28:51 mx kernel: drbd0: [kjournald/1476] sock_sendmsg time > >>expired, ko = 4294967295 > >> > >... > > > >>Oct 21 11:29:50 mx kernel: drbd0: [kjournald/1476] sock_sendmsg time > >>expired, ko = 4294967283 > >> > > > >google for drbd ko-count ... > >I assume your Secondaries IO-subsystem got stuck. > > > >you should configure ko-count = 6 or so. > >it would have started to count ".... ko = 6", > >and at "ko = 0" it would have gone StandAlone, > >recovering from this situation. > > > >you should also upgrade to a 2.6 kernel, > >where this won't lock up the box but only the drbd partition. > >2.4 has only _one_ runqueue for _all_ disks, if it gets stuck > >on one disk, all io is stuck. > > > Same here. Running a 2.6 kernel without ko-count. We see about 400 > sendmsg_time expired per day, is there a tuning method to reduce > these? get network equipment that can handle your traffic, or alternatively disks that can cope with what your network delivers :) maybe using a different io-scheduler helps, too (I still prefer deadline). otherwise, the timeout parameter in drbd.conf is your tunable. note that its unit is 0.1 seconds, the default is 60 (== 6 seconds). maybe using larger (or smaller) sndbuf-size makes a difference, too. -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.