[DRBD-user] DRBD 9 auto-promote not changing role to Primary, but is writable
Robert Altnoeder
robert.altnoeder at linbit.com
Fri Nov 15 10:27:18 CET 2019
Could you try a few things, so we can get a better picture of what's
happening there:
- Can you get a hash of the data on the backend device that DRBD is
writing to, from before and after one of those dubious Secondary-mode
writes, to verify whether or not any data is actually changed?
- Can you switch the other peer into the Primary role manually (so that
the node where the problem occurs should refuse to become a Primary) and
see what happens when ZFS tries to write to that log?
I tried to reproduce the problem from user space (with auto-promote off
and trying to read/write from/to a Secondary), where it does not seem to
happen (everything is normal, cannot even read from a Secondary).
However, I expect those ZFS operations to be done by some code in the
kernel itself, and something may not be playing by the rules there -
maybe ZFS is causing some I/O without doing a proper open/close cycle,
or we are missing something in DRBD for some I/O case that's supposed to
be valid.
br,
Robert
On 11/14/19 10:28 PM, Doug Cahill wrote:
> I spent some more time looking into this with another developer and I
> can see while running "drbdsetup events2 r0" that there is a quick
> blip when I add the drbd r0 resource to my pool as the log device:
>
> change resource name:r0 role:Primary
> change resource name:r0 role:Secondary
>
> However, if I export and/or import the pool, the event never registers
> again. When I write to a vdisk on this pool I can see the nr:11766480
> dw:11766452 counts increase on the peer which leads me to believe
> blocks are being written, yet the state never changes.
>
> I also tried to run dd to the "peer" side drbd device while the
> "active" side was writing data and found a message stating the peer
> may not become primary while the device is opened read-only in my
> syslog which doesn't make sense. The device is being written to, so
> how is the block device state being tricked to thinking it is read only?
>
> =========in the log from the node I'm writing to the drbd resource
> drbd r0 dccdx0: Preparing remote state change 892694821
> drbd r0: State change failed: Peer may not become primary while device
> is opened read-only
> kernel: [92771.927574] drbd r0 dccdx0: Aborting remote state change
> 892694821
>
> On Thu, Nov 14, 2019 at 10:39 AM Doug Cahill <handruin at gmail.com
> <mailto:handruin at gmail.com>> wrote:
>
> On Thu, Nov 14, 2019 at 4:52 AM Roland Kammerer
> <roland.kammerer at linbit.com <mailto:roland.kammerer at linbit.com>>
> wrote:
>
> On Wed, Nov 13, 2019 at 03:08:37PM -0500, Doug Cahill wrote:
> > I'm configuring a two node setup with drbd 9.0.20-1 on CentOS 7
> > (3.10.0-957.1.3.el7.x86_64) with a single resource backed by
> an SSDs. I've
> > explicitly enabled auto-promote in my resource configuration
> to use this
> > feature.
> >
> > The drbd device is being used in a single-primary
> configuration as a zpool
> > SLOG device. The zpool is only ever imported on one node at
> a time and the
> > import is successful during cluster failover events between
> nodes. I
> > confirmed through zdb that the zpool includes the configured
> drbd device
> > path.
> >
> > My concern is that the drbdadm status output shows the Role
> of the drbd
> > resource as "Secondary" on both sides. The documentations
> reads that the
> > drbd resource will be auto promoted to primary when it is
> opened for
> > writing.
>
> But also demoted when closed (don't know if this happens in your
> scenario).
>
> > drbdadm status
> > r0 role:Secondary
> > disk:UpToDate
> > dccdx0 role:Secondary
> > peer-disk:UpToDate
>
> Maybe it is closed and demoted again and you look at it at the
> wrong
> points in time? Better look into the syslog for role changes,
> or monitor
> with "drbdsetup events2 r0". Do you see switches to Primary there?
>
>
> I checked the drbdadm status while my dd write session was in
> progress and I see no change from Secondary to Primary. I also
> checked the stats under /sys/class and it looks the same.
>
> cat /sys/kernel/debug/drbd/resources/r0/connections/dccdx0/0/proc_drbd
> 0: cs:Established ro:Secondary/Secondary ds:UpToDate/UpToDate C
> r-----
> ns:3330728 nr:0 dw:20103080 dr:26292 al:131 bm:0 lo:0 pe:[0;0]
> ua:0 ap:[0;0] ep:1 wo:1 oos:0
> resync: used:0/61 hits:64 misses:4 starving:0 locked:0 changed:2
> act_log: used:0/1237 hits:28951 misses:536 starving:0 locked:0
> changed:132
> blocked on activity log: 0/0/0
>
>
> Best, rck
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
> https://lists.linbit.com/mailman/listinfo/drbd-user
>
>
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user at lists.linbit.com
> https://lists.linbit.com/mailman/listinfo/drbd-user
--
Robert ALTNOEDER - Software Developer
+43-1-817-82-92 x72 <tel:+4318178292>
robert.altnoeder at linbit.com <mailto:robert.altnoeder at linbit.com>
LIN <http://www.linbit.com/en/>BIT <http://www.linbit.com/en/> | Keeping
the Digital World Running
DRBD HA - Disaster Recovery - Software-defined Storage
t <https://twitter.com/linbit> / f
<https://www.facebook.com/pg/linbitdrbd/posts/> / in
<https://www.linkedin.com/company/linbit> / y
<https://www.youtube.com/user/linbit> / g+
<https://plus.google.com/+Linbit/about>
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
More information about the drbd-user
mailing list