Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
CentOS 5.5, x86_64, drbd 8.3.8. After running fine for a month, this started a couple of days ago on only one of three drbd volumes (the other two being fine): Jan 8 09:54:53 tiger kernel: block drbd11: Began resync as SyncSource (will sync 1882624060 KB [470656015 bits set]). Jan 8 09:54:57 tiger kernel: block drbd11: sock_recvmsg returned -104 Jan 8 09:54:57 tiger kernel: block drbd11: peer( Secondary -> Unknown ) conn( SyncSource -> NetworkFailure ) Jan 8 09:54:57 tiger kernel: block drbd11: asender terminated Jan 8 09:54:57 tiger kernel: block drbd11: Terminating asender thread Jan 8 09:54:57 tiger kernel: block drbd11: sock was shut down by peer Jan 8 09:54:57 tiger kernel: block drbd11: short read expecting header on sock: r=0 Jan 8 09:54:57 tiger kernel: block drbd11: drbd_send_block/ack() failed Jan 8 09:54:57 tiger kernel: block drbd11: Connection closed Jan 8 09:54:57 tiger kernel: block drbd11: conn( NetworkFailure -> Unconnected ) Jan 8 09:54:57 tiger kernel: block drbd11: receiver terminated Jan 8 09:54:57 tiger kernel: block drbd11: Restarting receiver thread Jan 8 09:54:57 tiger kernel: block drbd11: receiver (re)started Jan 8 09:54:57 tiger kernel: block drbd11: conn( Unconnected -> WFConnection ) Jan 8 09:54:57 tiger kernel: block drbd11: Handshake successful: Agreed network protocol version 94 Jan 8 09:54:57 tiger kernel: block drbd11: Peer authenticated using 20 bytes of 'sha1' HMAC Jan 8 09:54:57 tiger kernel: block drbd11: conn( WFConnection -> WFReportParams ) Jan 8 09:54:57 tiger kernel: block drbd11: Starting asender thread (from drbd11_receiver [4148]) Jan 8 09:54:57 tiger kernel: block drbd11: data-integrity-alg: sha1 Jan 8 09:54:57 tiger kernel: block drbd11: drbd_sync_handshake: Jan 8 09:54:57 tiger kernel: block drbd11: self 59D4BF73D5EC82B7:0CC8A25C011E7E57:1FF443C818621FD3:D319C97BC05715A4 bits:470570058 flags:0 Jan 8 09:54:57 tiger kernel: block drbd11: peer 0CC8A25C011E7E56:0000000000000000:3AEE8F0D28607364:01DF314E083700DD bits:470569999 flags:0 Jan 8 09:54:57 tiger kernel: block drbd11: uuid_compare()=1 by rule 70 Jan 8 09:54:57 tiger kernel: block drbd11: Becoming sync source due to disk states. Jan 8 09:54:57 tiger kernel: block drbd11: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) Jan 8 09:54:58 tiger kernel: block drbd11: conn( WFBitMapS -> SyncSource ) which repeats itself continuously. The resync gets to 0.1% and then starts over. The replication link is a dual bonded GbE point-to-point pair, and is in full operating condition. All other hardware is fine, and has been all along. Nothing has been changed. Can anyone give me a clue as to why this is happening? Steve