Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Mar 21, 2017 at 09:14:18PM +0000, Ditzler, Walter Robert wrote: > Hi, > > i am using drbd since years and had never any complainments. last week i started to test it on a new environment: > > server: debian jessie 8.7 with custom kernel 4.9.15 (.config attached) > drbd-utils: installed via aptitude 8.9.2rc1 > drbd kernel Version: 8.4.7 nothing changed Are you sure you are running the DRBD version you think you do? see version tags in /proc/drbd, modinfo drbd, find /lib/modules -name "drbd*" > file system: disk > lvs2 > drbd > xen vm > > > |---------| > | host1 |----------------------------------- > |---------| > | | > | | > | | br255 (bond255 / drbd sync) br0 (bond0) > | | > | | > |---------| > | host2 |----------------------------------- > |---------| > > > the problem is, I had very difficulties to get the sync to work, > various boot's. now I see cat /proc/drbd that drbd starts to sync. the > problem is, it interrupt all the time. It complaints about timeouts. So either you have a problem there, and the other node really *is* too slow, or overloaded, or you need to adjust the timeouts. > when start over it doesn't > start at the last break, it starts from the beginning 0%. Well, yes, *each* resync starts at "0%" and goes to "100%", for that specific resync. But of course it will not resync again what it resynced with the previous partial resync, unless those blocks had been re-dirtied meanwhile. > any idea on this? > > bellow I sent my config's and the syslog, I hope someone see what im doing wrong > > regards walter. > Mar 21 21:47:56 srv-ldje-xend02 kernel: [21036.968588] block drbd13: > Remote failed to finish a request within 12140ms > > ko-count (4) * timeout (30 * 0.1s) That ^^ for example. > Mar 21 21:47:26 drbd12: Began resync as SyncSource (will sync 3832200 KB [958050 bits set]). > Mar 21 21:48:58 drbd12: Began resync as SyncSource (will sync 3578732 KB [894683 bits set]). > Mar 21 21:47:26 drbd13: Began resync as SyncSource (will sync 4399228 KB [1099807 bits set]). > Mar 21 21:48:57 drbd13: Began resync as SyncSource (will sync 4185484 KB [1046371 bits set]). > Mar 21 21:47:26 drbd18: Began resync as SyncSource (will sync 49941016 KB [12485254 bits set]). > Mar 21 21:48:57 drbd18: Began resync as SyncSource (will sync 49834304 KB [12458576 bits set]). See how those bit counts decrease? But yes, seems to be very slow there, or rather, quickly running into hard congestion. Since you claim you did not change anything regarding DRBD (version or config), I'd suggest that some of your other changes half-broke something. What exactly is hard to tell from the information available. I'd start with double-checking the bonded NICs, or the bond itself for "flapping", and look at the performance of the IO backends on both nodes. Then drill up or down depending on what you find. -- : Lars Ellenberg : LINBIT | Keeping the Digital World Running : DRBD -- Heartbeat -- Corosync -- Pacemaker DRBD® and LINBIT® are registered trademarks of LINBIT __ please don't Cc me, but send to list -- I'm subscribed