Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello All,
I'm experiencing a problem with drbd-8.0pre3 under massive load. I have a
two-primray cluster, which uses gfs1 over drbd. When I'm copying some
gigabytes (typically divided into many small files) to or from the shared
partitions, I randomly come across this issue in the kernel log:
drbd1: PingAck did not arrive in time.
drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
drbd1: Creating new current UUID
drbd1: asender terminated
drbd1: conn( NetworkFailure -> BrokenPipe )
drbd1: short read expecting header on sock: r=-512
drbd1: worker terminated
drbd1: conn( BrokenPipe -> Unconnected )
drbd1: State change failed: Refusing to be Primary without at least one consistent disk
drbd1: state = { cs:Unconnected st:Primary/Unknown ds:UpToDate/DUnknown r--- }
drbd1: wanted = { cs:Unconnected st:Primary/Unknown ds:Outdated/DUnknown r--- }
drbd1: outdate-peer helper broken, returned 0
drbd1: Writing meta data super block now.
drbd1: Connection lost.
drbd1: State change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated'
drbd1: old = { cs:Unconnected st:Primary/Unknown ds:UpToDate/DUnknown r--- }
drbd1: new = { cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown r--- }
drbd1: conn( Unconnected -> WFConnection )
Has someone any idea, why this pingack gets lost? The nodes are
interconnected via gigabit cross-link ethernet, so no buggy switches or
routers are in-between.
Any help would highly be appreciated.
Regrads,
Balint