[DRBD-user] Kernel Panic

Lars Ellenberg Lars.Ellenberg at linbit.com
Wed Sep 28 12:51:01 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


/ 2005-09-28 16:01:23 +0930
\ Jonathan Soong:
> Hi guys,
> 
> Seem to have hit a snag on a 2.6 FC2 machine running drbd-0.7.5-3

don't use 0.7.5;  we are at 0.7.13, meanwhile.

> This pair had been running fine for a couple of months, then the Primary (Machine A) died, a manual failover 
> worked and we're currently running on the Secondary (Machine B).
> 
> Machine B is now running and in WFConnection. Whenever I try to sync Machine A back up with it I get a hard kernel 
> panic:
> Sep 26 11:41:54 ipadca kernel: drbd2: Secondary/Unknown --> Secondary/Primary
> Sep 26 11:42:00 ipadca kernel: drbd1: [drbd1_worker/3807] sock_sendmsg time expired, ko = 4294967295
> Sep 26 11:42:03 ipadca kernel: drbd1: [drbd1_worker/3807] sock_sendmsg time expired, ko = 4294967294
>  |

you have either hardware problems on the other disk,
or your nic is broken.

> Sep 26 11:42:21 ipadca kernel: drbd2: [drbd2_receiver/3871] sock_sendmsg time expired, ko = 4294967289
> Sep 26 11:42:24 ipadca kernel: drbd1: [drbd1_worker/3807] sock_sendmsg time expired, ko = 4294967287
> Sep 26 11:42:24 ipadca kernel: drbd1: Got NegRSDReply. WE ARE LOST. We lost our up-to-date disk.
> Sep 26 11:42:24 ipadca kernel: Kernel panic: drbd1: Got NegRSDReply. WE ARE LOST. We lost our up-to-date disk.

that is an error message in plain english.
"Got NegRSDReply. WE ARE LOST. We lost our up-to-date disk."
what else shall we do when we lose our only up-to-date disk, but panic?

> I tried to "drbdadm invalidate all" on Machine A, but had no luck...
> I notice in the CHANGELOG for drbd 0.7.6 that "'drbdadm invalidate [res]' was not working correct", perhaps that 
> is why I can't invalide my Machine A.
> Is there some other way I can invalidate the data so the sync happens ok?
> 
> Is it possible to upgrade my Machine A to 0.7.6 whilst leaving Machine
> B on 0.7.5 (They both use proto 74, but different api's)????

"api" is just how the local user space tools (drbdadm , drbdsetup) talk
with the module.

"protocol" is what the modules talk on the wire.

> The big problem is that these machines are about 400km away in the middle of the Australian outback :)

sounds interessting :)

grep in the kernel log about hard io failure on the lower level devices.
I guess there is some part of one of your disks is no longer readable.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list