Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all I have a two nodes cluster on Centos 5.2, kernel 2.6.18-92.1.22.el5.centos.plus, drbd-8.3.0-3 and drbd-km-2.6.18_92.1.22.el5.centos.plus-8.3.0-3 compiled and installed as rpm by myself. Though I do have two GigabitEth NICs connected back-to-back for DRBD and clustering, from time to time, especially during heavy traffic on the public GigEth interfaces of the cluster nodes, I get the following: drbd0: PingAck did not arrive in time. drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) drbd0: asender terminated drbd0: Terminating asender thread drbd0: short read expecting header on sock: r=-512 drbd0: Creating new current UUID drbd0: Connection closed drbd0: helper command: /sbin/drbdadm fence-peer minor-0 drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 2 (0x200) drbd0: fence-peer helper broken, returned 2 drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' drbd0: old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown s--- } drbd0: new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } drbd0: conn( NetworkFailure -> Unconnected ) drbd0: receiver terminated drbd0: Restarting receiver thread drbd0: receiver (re)started drbd0: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' drbd0: old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } drbd0: new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s--- } drbd0: conn( Unconnected -> WFConnection ) drbd1: PingAck did not arrive in time. drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) drbd1: asender terminated drbd1: Terminating asender thread drbd1: short read expecting header on sock: r=-512 drbd1: Creating new current UUID drbd1: Connection closed drbd1: helper command: /sbin/drbdadm fence-peer minor-1 drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 2 (0x200) drbd1: fence-peer helper broken, returned 2 drbd1: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' drbd1: old = { cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown s--- } drbd1: new = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } drbd1: conn( NetworkFailure -> Unconnected ) drbd1: receiver terminated drbd1: Restarting receiver thread drbd1: receiver (re)started drbd1: Considering state change from bad state. Error would be: 'Refusing to be Primary while peer is not outdated' drbd1: old = { cs:Unconnected ro:Primary/Unknown ds:UpToDate/DUnknown s--- } drbd1: new = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown s--- } drbd1: conn( Unconnected -> WFConnection ) Fencing is working since the node that failed to send the PinkAck gets fenced (and rebooted). However, any ideas why this is happening since there is private link for DRBD? The machines are AMD X2 2GHz with 4GB Ram each. Also I fail to identify on the man pages and the on-line tutorial/manual, the parameters that will make me fine tune this behavior, so I would also appreciate some help on that too. Thank you all for your time. Theophanis Kontogiannis -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090316/feab0149/attachment.htm>