[DRBD-user] drbd resyncing entire device after each reboot
Hanspeter Kunz
hkunz at ifi.uzh.ch
Mon Oct 8 10:45:36 CEST 2018
On Mon, 2018-10-08 at 09:52 +0200, Hanspeter Kunz wrote:
> On Sat, 2018-10-06 at 00:02 -0400, Digimer wrote:
> > On 2018-10-05 04:02 PM, Hanspeter Kunz wrote:
> > > Hi there,
> > >
> > > I see a strange behavior on a freshly set up pair of machines
> > > (debian
> > > stretch, drbd 8.4.7):
> > >
> > > after each reboot, the whole drbd device is resynced from
> > > scratch,
> > > even
> > > if both drbd devices report to be uptodate before the reboot. I
> > > never
> > > experienced this on other drbd installations I have.
> > >
> > > I just rebooted the secondary machine, after starting drbd syslog
> > > gives
> > > me the following information on that machine:
> > >
> > > Oct 5 21:36:43 claire drbd[3578]: Starting DRBD resources:[
> > > Oct 5 21:36:43 claire drbd[3578]: create res: nfs
> > > Oct 5 21:36:43 claire drbd[3578]: prepare disk: nfs
> > > Oct 5 21:36:43 claire kernel: [ 379.663592] drbd nfs: Starting
> > > worker thread (from drbdsetup-84 [3596])
> > > Oct 5 21:36:43 claire kernel: [ 379.664004] block drbd0: disk(
> > > Diskless -> Attaching )
> > > Oct 5 21:36:43 claire kernel: [ 379.664629] drbd nfs: Method to
> > > ensure write ordering: flush
> > > Oct 5 21:36:43 claire kernel: [ 379.664634] block drbd0: max
> > > BIO
> > > size = 1048576
> > > Oct 5 21:36:43 claire kernel: [ 379.664642] block drbd0:
> > > drbd_bm_resize called with capacity == 53685452728
> > > Oct 5 21:36:43 claire kernel: [ 379.875816] block drbd0: resync
> > > bitmap: bits=6710681591 words=104854400 pages=204794
> > > Oct 5 21:36:43 claire kernel: [ 379.875819] block drbd0: size =
> > > 25 TB (26842726364 KB)
> > > Oct 5 21:36:44 claire drbd[3578]: adjust disk: nfs
> > > Oct 5 21:36:44 claire kernel: [ 381.510770] block drbd0:
> > > recounting of set bits took additional 32 jiffies
> > > Oct 5 21:36:44 claire kernel: [ 381.510772] block drbd0: 0 KB
> > > (0
> > > bits) marked out-of-sync by on disk bit-map.
> > > Oct 5 21:36:44 claire kernel: [ 381.510778] block drbd0: disk(
> > > Attaching -> UpToDate )
> > > Oct 5 21:36:44 claire kernel: [ 381.510789] block drbd0:
> > > attached
> > > to UUIDs
> > > 0000000000000004:0000000000000000:B6D88D552E97D8B6:B6D78D552E97D8
> > > B7
> > > Oct 5 21:36:44 claire drbd[3578]: adjust net: nfs
> > > Oct 5 21:36:44 claire drbd[3578]: ]
> > > Oct 5 21:36:44 claire kernel: [ 381.516705] drbd nfs: conn(
> > > StandAlone -> Unconnected )
> > > Oct 5 21:36:44 claire kernel: [ 381.516756] drbd nfs: Starting
> > > receiver thread (from drbd_w_nfs [3598])
> > > Oct 5 21:36:44 claire kernel: [ 381.516823] drbd nfs: receiver
> > > (re)started
> > > Oct 5 21:36:44 claire kernel: [ 381.516883] drbd nfs: conn(
> > > Unconnected -> WFConnection )
> > > Oct 5 21:36:45 claire kernel: [ 382.250879] drbd nfs: Handshake
> > > successful: Agreed network protocol version 101
> > > Oct 5 21:36:45 claire kernel: [ 382.250884] drbd nfs: Feature
> > > flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
> > > Oct 5 21:36:45 claire kernel: [ 382.251202] drbd nfs: Peer
> > > authenticated using 20 bytes HMAC
> > > Oct 5 21:36:45 claire kernel: [ 382.251307] drbd nfs: conn(
> > > WFConnection -> WFReportParams )
> > > Oct 5 21:36:45 claire kernel: [ 382.251366] drbd nfs: Starting
> > > ack_recv thread (from drbd_r_nfs [3607])
> > > Oct 5 21:36:45 claire kernel: [ 382.310672] block drbd0:
> > > drbd_sync_handshake:
> > > Oct 5 21:36:45 claire kernel: [ 382.310680] block drbd0: self
> > > 0000000000000004:0000000000000000:B6D88D552E97D8B6:B6D78D552E97D8
> > > B7
> > > bits:0 flags:0
> > > Oct 5 21:36:45 claire kernel: [ 382.310687] block drbd0: peer
> > > 06D17ADE18B89143:0000000000000005:B6D88D552E97D8B7:B6D78D552E97D8
> > > B7
> > > bits:0 flags:0
> > > Oct 5 21:36:45 claire kernel: [ 382.310691] block drbd0:
> > > uuid_compare()=-2 by rule 20
> > > Oct 5 21:36:45 claire kernel: [ 382.310696] block drbd0:
> > > Writing
> > > the whole bitmap, full sync required after drbd_sync_handshake.
> > > Oct 5 21:36:47 claire kernel: [ 383.728620] block drbd0: bitmap
> > > WRITE of 204794 pages took 1228 ms
> > > Oct 5 21:36:47 claire kernel: [ 383.728626] block drbd0: 25 TB
> > > (6710681591 bits) marked out-of-sync by on disk bit-map.
> > > Oct 5 21:36:47 claire kernel: [ 383.728693] block drbd0: peer(
> > > Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk(
> > > UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate )
> > > Oct 5 21:36:47 claire drbd[3578]: WARN: stdin/stdout is not a
> > > TTY;
> > > using /dev/console.
> > > Oct 5 21:36:47 claire systemd[1]: Started LSB: Control DRBD
> > > resources..
> > > Oct 5 21:36:47 claire kernel: [ 384.049775] block drbd0:
> > > receive
> > > bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
> > > compression: 100.0%
> > > Oct 5 21:36:47 claire kernel: [ 384.145044] block drbd0: send
> > > bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
> > > compression: 100.0%
> > > Oct 5 21:36:47 claire kernel: [ 384.145049] block drbd0: conn(
> > > WFBitMapT -> WFSyncUUID )
> > > Oct 5 21:36:47 claire kernel: [ 384.275789] block drbd0:
> > > updated
> > > sync uuid
> > > 0001000000000004:0000000000000000:B6D88D552E97D8B6:B6D78D552E97D8
> > > B7
> > > Oct 5 21:36:47 claire kernel: [ 384.275945] block drbd0: helper
> > > command: /sbin/drbdadm before-resync-target minor-0
> > > Oct 5 21:36:47 claire kernel: [ 384.279872] block drbd0: helper
> > > command: /sbin/drbdadm before-resync-target minor-0 exit code 0
> > > (0x0)
> > > Oct 5 21:36:47 claire kernel: [ 384.279905] block drbd0: conn(
> > > WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent )
> > > Oct 5 21:36:47 claire kernel: [ 384.279949] block drbd0: Began
> > > resync as SyncTarget (will sync 26842726364 KB [6710681591 bits
> > > set]).
> > >
> > > Probably the explanation is simple, I just do not see it.
> > >
> > > If you need the configuration (although it should be identical to
> > > similar drbd configs which are working without problems) I am
> > > happy
> > > to
> > > provide it.
> > >
> > > Best and many thanks if any body could shed some light on this,
> > > Hp
> >
> > Can you share your config? Are you using thin LVM?
>
> this is my config as reported by "drbdsetup show":
>
> resource nfs {
> options {
> }
> net {
> max-buffers 131072;
> cram-hmac-alg "sha1";
> shared-secret "REMOVED";
> verify-alg "sha1";
> }
> _remote_host {
> address ipv4 192.168.3.182:7788;
> }
> _this_host {
> address ipv4 192.168.3.181:7788;
> volume 0 {
> device minor 0;
> disk "/dev/storage/nfs";
> meta-disk internal;
> disk {
> resync-rate 122880k; # bytes/second
> al-extents 3389;
> c-fill-target 40960s; # bytes
> c-max-rate 4096000k; # bytes/second
> c-min-rate 81920k; # bytes/second
> }
> }
> }
> }
>
> this is the volume information for /dev/storage/nfs
>
> lvdisplay /dev/storage/nfs
> --- Logical volume ---
> LV Path /dev/storage/nfs
> LV Name nfs
> VG Name storage
> LV UUID TcncF5-uhtd-d9ea-C1fO-cu4U-eo06-2Y0UCq
> LV Write Access read/write
> LV Creation host, time claris, 2018-09-27 14:28:14 +0200
> LV Status available
> # open 2
> LV Size 25.00 TiB
> Current LE 6553600
> Segments 1
> Allocation inherit
> Read ahead sectors auto
> - currently set to 256
> Block device 254:0
>
> > Also, 8.4.7 is _ancient_. Nearly countless bug fixes since then,
> > which
> > may or may not relate. In any case, updating is _strongly_
> > recommended.
>
> ok, I might give this a try (right now I use what is shipped with
> debian stable). Remember, I have more or less exactly the same setup
> running on quote a few other machines (since many years) without
> problems, so I do not think that updating will solve the above
> problem.
I just switched the drbd primaries and rebooted the new secondary. that
worked. Now, even after switching the primary back to the first machine
oder having two secodaries, rebooting either of the machines, drbd
starts up as expected (without re-syncing the whole device).
although it seems to work as expected now, I would still be interested
in knowing what might have caused this (and why switching primaries
apparently repaired it) - if anybody has an idea.
Best and many thanks,
Hp
--
Hanspeter Kunz University of Zurich
Systems Administrator Department of Informatics
Email: hkunz at ifi.uzh.ch Binzmühlestrasse 14
Tel: +41.(0)44.63-56714 Office 2.E.07
http://www.ifi.uzh.ch CH-8050 Zurich, Switzerland
Spamtraps: hkunz.bogus at ailab.ch hkunz.bogus at ifi.uzh.ch
---
A word to the wise is enough.
-- Miguel de Cervantes
More information about the drbd-user
mailing list