[Drbd-dev] Primary/Diskless node cannot reconnect

Graham, Simon Simon.Graham at stratus.com
Tue Nov 3 13:40:44 CET 2009


> 
> No.
> The correct fix for your problem probably is not only this,
> but some addition to the "exposed data uuid" stuff as well.
> 
> Because it is Primary, there may be cached pages,
> file system and applications usually have a rough idea
> what data they expect to live where.
> 
> What this is supposed to do is avoid a timewarp into stale data,
> if you lose network first, hum along for hours,
> and then lose the disk as well.
> 
> Or vice versa.
> 
> You are then only allowed to attach or connect to the
> data you had last access to, not to the other set,
> as the other set would mean a time warp into stale data.
> 

Good point -- if you lose the network first then I agree. However, if
you lose the primary side disk first then I don't think you can hit this
'time warp'.

My first thought when looking at this was to NOT attempt to update the
current UUID on the Primary if it is diskless when you lose the
connection - however, this doesn't work in the specific case that caused
us to see this problem -- in that case, we had a DRBD device sitting on
a physical disk which had actually gone bad; however, we didn't see this
until we tried to write the meta-data with the updated UUID when we lost
the network connection...

Maybe we just need to back out the UUID update if you cant flush it to
disk...

Simon


More information about the drbd-dev mailing list