[DRBD-user] Sync Error DRBD Utils 8.9.2rc1 / DRBD Version 8.4.7 / Debian Jessie latest / Kernel 4.9.15

Lars Ellenberg lars.ellenberg at linbit.com
Thu Mar 23 15:25:53 CET 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Mar 21, 2017 at 09:14:18PM +0000, Ditzler, Walter Robert wrote:
> Hi,
> 
> i am using drbd since years and had never any complainments. last week i started to test it on a new environment:
> 
> server: debian jessie 8.7 with custom kernel 4.9.15 (.config attached)
> drbd-utils: installed via aptitude 8.9.2rc1
> drbd kernel Version: 8.4.7 nothing changed

Are you sure you are running the DRBD version you think you do?
see version tags in /proc/drbd, modinfo drbd,
find /lib/modules -name "drbd*"

> file system: disk > lvs2 > drbd > xen vm
> 
> 
> |---------|
> |  host1  |-----------------------------------
> |---------|
>    |   |
>    |   |
>    |   | br255 (bond255 / drbd sync)         br0 (bond0)
>    |   |
>    |   |
> |---------|
> |  host2  |-----------------------------------
> |---------|
> 
> 
> the problem is, I had very difficulties to get the sync to work,
> various boot's. now I see cat /proc/drbd that drbd starts to sync. the
> problem is, it interrupt all the time.

It complaints about timeouts.
So either you have a problem there, and the other node really *is* too
slow, or overloaded, or you need to adjust the timeouts.

> when start over it doesn't
> start at the last break, it starts from the beginning 0%.

Well, yes, *each* resync starts at "0%" and goes to "100%",
for that specific resync.
But of course it will not resync again what it resynced with the
previous partial resync, unless those blocks had been re-dirtied
meanwhile.


> any idea on this?
> 
> bellow I sent my config's and the syslog, I hope someone see what im doing wrong
> 
> regards walter.

> Mar 21 21:47:56 srv-ldje-xend02 kernel: [21036.968588] block drbd13:
>   Remote failed to finish a request within 12140ms >
>   ko-count (4) * timeout (30 * 0.1s)

That ^^ for example.


> Mar 21 21:47:26 drbd12: Began resync as SyncSource (will sync 3832200 KB [958050 bits set]).
> Mar 21 21:48:58 drbd12: Began resync as SyncSource (will sync 3578732 KB [894683 bits set]).

> Mar 21 21:47:26 drbd13: Began resync as SyncSource (will sync 4399228 KB [1099807 bits set]).
> Mar 21 21:48:57 drbd13: Began resync as SyncSource (will sync 4185484 KB [1046371 bits set]).

> Mar 21 21:47:26 drbd18: Began resync as SyncSource (will sync 49941016 KB [12485254 bits set]).
> Mar 21 21:48:57 drbd18: Began resync as SyncSource (will sync 49834304 KB [12458576 bits set]).

See how those bit counts decrease?
But yes, seems to be very slow there,
or rather, quickly running into hard congestion.

Since you claim you did not change anything regarding DRBD (version or config),
I'd suggest that some of your other changes half-broke something.

What exactly is hard to tell from the information available.

I'd start with double-checking the bonded NICs,
or the bond itself for "flapping",
and look at the performance of the IO backends on both nodes.
Then drill up or down depending on what you find.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed



More information about the drbd-user mailing list