Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2005-09-28 16:01:23 +0930 \ Jonathan Soong: > Hi guys, > > Seem to have hit a snag on a 2.6 FC2 machine running drbd-0.7.5-3 don't use 0.7.5; we are at 0.7.13, meanwhile. > This pair had been running fine for a couple of months, then the Primary (Machine A) died, a manual failover > worked and we're currently running on the Secondary (Machine B). > > Machine B is now running and in WFConnection. Whenever I try to sync Machine A back up with it I get a hard kernel > panic: > Sep 26 11:41:54 ipadca kernel: drbd2: Secondary/Unknown --> Secondary/Primary > Sep 26 11:42:00 ipadca kernel: drbd1: [drbd1_worker/3807] sock_sendmsg time expired, ko = 4294967295 > Sep 26 11:42:03 ipadca kernel: drbd1: [drbd1_worker/3807] sock_sendmsg time expired, ko = 4294967294 > | you have either hardware problems on the other disk, or your nic is broken. > Sep 26 11:42:21 ipadca kernel: drbd2: [drbd2_receiver/3871] sock_sendmsg time expired, ko = 4294967289 > Sep 26 11:42:24 ipadca kernel: drbd1: [drbd1_worker/3807] sock_sendmsg time expired, ko = 4294967287 > Sep 26 11:42:24 ipadca kernel: drbd1: Got NegRSDReply. WE ARE LOST. We lost our up-to-date disk. > Sep 26 11:42:24 ipadca kernel: Kernel panic: drbd1: Got NegRSDReply. WE ARE LOST. We lost our up-to-date disk. that is an error message in plain english. "Got NegRSDReply. WE ARE LOST. We lost our up-to-date disk." what else shall we do when we lose our only up-to-date disk, but panic? > I tried to "drbdadm invalidate all" on Machine A, but had no luck... > I notice in the CHANGELOG for drbd 0.7.6 that "'drbdadm invalidate [res]' was not working correct", perhaps that > is why I can't invalide my Machine A. > Is there some other way I can invalidate the data so the sync happens ok? > > Is it possible to upgrade my Machine A to 0.7.6 whilst leaving Machine > B on 0.7.5 (They both use proto 74, but different api's)???? "api" is just how the local user space tools (drbdadm , drbdsetup) talk with the module. "protocol" is what the modules talk on the wire. > The big problem is that these machines are about 400km away in the middle of the Australian outback :) sounds interessting :) grep in the kernel log about hard io failure on the lower level devices. I guess there is some part of one of your disks is no longer readable. -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.