Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 3/4/11 12:38 PM, William Seligman wrote:
> I've RTFM'ed and google'd on this problem. Now I ask the experts.
Now that I've joined this list, I looked at the archives directly. I see that
Cory Coager reported the same problem:
http://lists.linbit.com/pipermail/drbd-user/2011-March/015735.html
Lars Ellenberg suggested that the problem was due to a bad NIC. Maybe... but
what are the odds that two different systems have a bad NIC?
> Setup: Two systems; hypatia is primary, orestes is secondary. OS is Scientific
> Linux 5.5: kernel 2.6.18-194.26.1.el5xen; DRBD version drbd-8.3.8.1-30.el5.
>
> Each has two partitions that are used for separate DRBD devices: /dev/md0
> (software RAID1) and /dev/sdd2. On both systems:
>
> partition /dev/md0 => device drbd1
> partition /dev/sdd2 => device drbd2
>
> The DRBD traffic goes over a single Ethernet cable that connects the two systems.
>
> For drbd1, the control heirarchy is Corosync->DRBD->LVM->Xen.
> For drbd2, the control is Corosync->DRBD->Just mount the thing.
>
> The complicated one is drbd1, but it seems to work just fine. The problem
> appears to be with drbd2, which doesn't do much of anything; it's a work/backup
> directory which I use to take infrequent (~two months) snapshots of the virtual
> machines on drbd1.
>
> Every ten seconds, the error messages at the end of this post appear in the log
> of the primary system and there are similar lines on the secondary system. It
> seems that drbd2 is losing its connection, re-establishing, and doing a re-sync.
>
> Everything works, most of the time. But once every few weeks there's enough of a
> delay that Corosync takes notice and STONITHs one of the systems, which is a big
> pain.
>
> I've tried:
> - switching from Protocol C to Protocol A
> - setting "net {ping-timeout 100;}"
> - throttling the connection by "syncer {rate 10M;}" (used to be 100M)
>
> Any ideas?
>
> Mar 4 12:26:25 hypatia kernel: block drbd2: meta connection shut down by peer.
> Mar 4 12:26:25 hypatia kernel: block drbd2: peer( Secondary -> Unknown ) conn(
> Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> Mar 4 12:26:25 hypatia kernel: block drbd2: asender terminated
> Mar 4 12:26:25 hypatia kernel: block drbd2: Terminating asender thread
> Mar 4 12:26:25 hypatia kernel: block drbd2: sock was shut down by peer
> Mar 4 12:26:25 hypatia kernel: block drbd2: short read expecting header on
> sock: r=0
> Mar 4 12:26:25 hypatia kernel: block drbd2: Creating new current UUID
> Mar 4 12:26:25 hypatia kernel: block drbd2: Connection closed
> Mar 4 12:26:25 hypatia kernel: block drbd2: conn( NetworkFailure -> Unconnected )
> Mar 4 12:26:25 hypatia kernel: block drbd2: receiver terminated
> Mar 4 12:26:25 hypatia kernel: block drbd2: Restarting receiver thread
> Mar 4 12:26:25 hypatia kernel: block drbd2: receiver (re)started
> Mar 4 12:26:25 hypatia kernel: block drbd2: conn( Unconnected -> WFConnection )
> Mar 4 12:26:25 hypatia kernel: block drbd2: Handshake successful: Agreed
> network protocol version 94
> Mar 4 12:26:25 hypatia kernel: block drbd2: conn( WFConnection -> WFReportParams )
> Mar 4 12:26:25 hypatia kernel: block drbd2: Starting asender thread (from
> drbd2_receiver [7920])
> Mar 4 12:26:25 hypatia kernel: block drbd2: data-integrity-alg: <not-used>
> Mar 4 12:26:25 hypatia kernel: block drbd2: drbd_sync_handshake:
> Mar 4 12:26:25 hypatia kernel: block drbd2: self
> C4884637D2C418DF:922772A0478F5E1F:2DE51139CD7C3DF7:EB27F748FC21DC65 bits:0 flags:0
> Mar 4 12:26:25 hypatia kernel: block drbd2: peer
> 922772A0478F5E1E:0000000000000000:2DE51139CD7C3DF6:EB27F748FC21DC65 bits:0 flags:0
> Mar 4 12:26:25 hypatia kernel: block drbd2: uuid_compare()=1 by rule 70
> Mar 4 12:26:25 hypatia kernel: block drbd2: peer( Unknown -> Secondary ) conn(
> WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
> Mar 4 12:26:25 hypatia kernel: block drbd2: conn( WFBitMapS -> SyncSource )
> pdsk( UpToDate -> Inconsistent )
> Mar 4 12:26:25 hypatia kernel: block drbd2: Began resync as SyncSource (will
> sync 0 KB [0 bits set]).
> Mar 4 12:26:25 hypatia kernel: block drbd2: Resync done (total 1 sec; paused 0
> sec; 0 K/sec)
> Mar 4 12:26:25 hypatia kernel: block drbd2: conn( SyncSource -> Connected )
> pdsk( Inconsistent -> UpToDate )
>
--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5894 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110304/a017a273/attachment.bin>