[Drbd-dev] DRBD-8: recent regression causing corruption and crashes

Lars Ellenberg Lars.Ellenberg at linbit.com
Fri Aug 11 20:45:59 CEST 2006


/ 2006-08-11 12:01:23 -0400
\ Graham, Simon:
> Quick update:
> 

How exactly do you "test"?
Kernel and hardware?
(sorry, if you posted that earlier, just point me to it)

I triggered a full sync (drbdadm invalidate),
and while that was running, access the Primary(SyncSource)
(cp -av /somethinghuge/ /mnt/drbd-mount-point/)

> > 1. I get errors during initial synchronization of a volume like this
> > that cause the resync to be aborted:
> > 
> > drbd15: tl_verify: failed to find req e51a4da0, sector 0 in list

I don't see those here.

> DRBD, Cmd: WriteAck, BlkId: SYNCER Sector: 0, AckLen: 8000

I don't see these either.

> > 2. I get panics with the following signature:- these look like they
> are
> > happening when a local write
> >     on the primary (which this node is) completes.
> 
> The panic signature seems to change - for example, I just got one like
> this in the receiver thread:
> 
> drbd15: ASSERT( drbd_req_get_sector(i) == sector ) in
> /sandbox/sgraham/sn/trunk/platform/drbd/8.0/drbd/drbd_main.c:313
> drbd15: tl_verify: found req e63d0240 but it has wrong sector (8 versus
> 0)

nor these.

> drbd15: in tl_clear_barrier:374: ap_pending_cnt = -1 < 0 !

this is bad...

What I do see here is: "ap_pending > 0" still too often, when I
disconnect during resync + write activity, effectively blocking the
Primary's io subsystem.  seemingly we still got bugs in tl_clear :(
need to look into that further.

> Code:  Bad EIP value.
>  <0>Fatal exception: panic in 5 seconds

outch.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :


More information about the drbd-dev mailing list