Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello All, I'm experiencing a problem with drbd-8.0pre3 under massive load. I have a two-primray cluster, which uses gfs1 over drbd. When I'm copying some gigabytes (typically divided into many small files) to or from the shared partitions, I randomly come across this issue in the kernel log: drbd1: PingAck did not arrive in time. drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) drbd1: Creating new current UUID drbd1: asender terminated drbd1: conn( NetworkFailure -> BrokenPipe ) drbd1: short read expecting header on sock: r=-512 drbd1: worker terminated drbd1: conn( BrokenPipe -> Unconnected ) drbd1: State change failed: Refusing to be Primary without at least one consistent disk drbd1: state = { cs:Unconnected st:Primary/Unknown ds:UpToDate/DUnknown r--- } drbd1: wanted = { cs:Unconnected st:Primary/Unknown ds:Outdated/DUnknown r--- } drbd1: outdate-peer helper broken, returned 0 drbd1: Writing meta data super block now. drbd1: Connection lost. drbd1: State change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' drbd1: old = { cs:Unconnected st:Primary/Unknown ds:UpToDate/DUnknown r--- } drbd1: new = { cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown r--- } drbd1: conn( Unconnected -> WFConnection ) Has someone any idea, why this pingack gets lost? The nodes are interconnected via gigabit cross-link ethernet, so no buggy switches or routers are in-between. Any help would highly be appreciated. Regrads, Balint