[DRBD-user] pingAck failed - help to avoid it?

Theophanis Kontogiannis theophanis_kontogiannis at yahoo.gr
Mon Mar 16 00:48:12 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello all

 

I have a two nodes cluster on Centos 5.2, kernel
2.6.18-92.1.22.el5.centos.plus, drbd-8.3.0-3 and
drbd-km-2.6.18_92.1.22.el5.centos.plus-8.3.0-3 compiled and installed as rpm
by myself.

 

Though I do have two GigabitEth NICs connected back-to-back for DRBD and
clustering, from time to time, especially during heavy traffic on the public
GigEth interfaces of the cluster nodes, I get the following:

 

drbd0: PingAck did not arrive in time.

drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk(
UpToDate -> DUnknown ) susp( 0 -> 1 )

drbd0: asender terminated

drbd0: Terminating asender thread

drbd0: short read expecting header on sock: r=-512

drbd0: Creating new current UUID

drbd0: Connection closed

drbd0: helper command: /sbin/drbdadm fence-peer minor-0

drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 2 (0x200)

drbd0: fence-peer helper broken, returned 2

drbd0: Considering state change from bad state. Error would be: 'Refusing to
be Primary while peer is not outdated'

drbd0:  old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown
s--- }

drbd0:  new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s---
}

drbd0: conn( NetworkFailure -> Unconnected )

drbd0: receiver terminated

drbd0: Restarting receiver thread

drbd0: receiver (re)started

drbd0: Considering state change from bad state. Error would be: 'Refusing to
be Primary while peer is not outdated'

drbd0:  old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s---
}

drbd0:  new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s---
}

drbd0: conn( Unconnected -> WFConnection )

drbd1: PingAck did not arrive in time.

drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk(
UpToDate -> DUnknown ) susp( 0 -> 1 )

drbd1: asender terminated

drbd1: Terminating asender thread

drbd1: short read expecting header on sock: r=-512

drbd1: Creating new current UUID

drbd1: Connection closed

drbd1: helper command: /sbin/drbdadm fence-peer minor-1

drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 2 (0x200)

drbd1: fence-peer helper broken, returned 2

drbd1: Considering state change from bad state. Error would be: 'Refusing to
be Primary while peer is not outdated'

drbd1:  old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown
s--- }

drbd1:  new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s---
}

drbd1: conn( NetworkFailure -> Unconnected )

drbd1: receiver terminated

drbd1: Restarting receiver thread

drbd1: receiver (re)started

drbd1: Considering state change from bad state. Error would be: 'Refusing to
be Primary while peer is not outdated'

drbd1:  old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s---
}

drbd1:  new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s---
}

drbd1: conn( Unconnected -> WFConnection )

 

 

Fencing is working since the node that failed to send the PinkAck gets
fenced (and rebooted).

However, any ideas why this is happening since there is private link for
DRBD?

The machines are AMD X2 2GHz with 4GB Ram each.

 

Also I fail to identify on the man pages and the on-line tutorial/manual,
the parameters that will make me fine tune this behavior, so I would also
appreciate some help on that too.

 

Thank you all for your time.

 

Theophanis Kontogiannis

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090316/feab0149/attachment.htm>


More information about the drbd-user mailing list