[DRBD-user] drbd 8.0pre3 lost pingack

Tue Aug 15 22:59:56 CEST 2006

Hello All,

I'm experiencing a problem with drbd-8.0pre3 under massive load. I have a
two-primray cluster, which uses gfs1 over drbd. When I'm copying some
gigabytes (typically divided into many small files) to or from the shared
partitions, I randomly come across this issue in the kernel log:

drbd1: PingAck did not arrive in time.
drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
drbd1: Creating new current UUID
drbd1: asender terminated
drbd1: conn( NetworkFailure -> BrokenPipe )
drbd1: short read expecting header on sock: r=-512
drbd1: worker terminated
drbd1: conn( BrokenPipe -> Unconnected )
drbd1: State change failed: Refusing to be Primary without at least one consistent disk
drbd1:   state = { cs:Unconnected st:Primary/Unknown ds:UpToDate/DUnknown r--- }
drbd1:  wanted = { cs:Unconnected st:Primary/Unknown ds:Outdated/DUnknown r--- }
drbd1: outdate-peer helper broken, returned 0
drbd1: Writing meta data super block now.
drbd1: Connection lost.
drbd1: State change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
drbd1:  old = { cs:Unconnected st:Primary/Unknown ds:UpToDate/DUnknown r--- }
drbd1:  new = { cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown r--- }
drbd1: conn( Unconnected -> WFConnection )

Has someone any idea, why this pingack gets lost? The nodes are
interconnected via gigabit cross-link ethernet, so no buggy switches or
routers are in-between.

Any help would highly be appreciated.

Regrads,
Balint