[DRBD-user] DRBD 9 auto-promote not changing role to Primary, but is writable

Robert Altnoeder robert.altnoeder at linbit.com
Fri Nov 15 10:27:18 CET 2019


Could you try a few things, so we can get a better picture of what's
happening there:
- Can you get a hash of the data on the backend device that DRBD is
writing to, from before and after one of those dubious Secondary-mode
writes, to verify whether or not any data is actually changed?
- Can you switch the other peer into the Primary role manually (so that
the node where the problem occurs should refuse to become a Primary) and
see what happens when ZFS tries to write to that log?

I tried to reproduce the problem from user space (with auto-promote off
and trying to read/write from/to a Secondary), where it does not seem to
happen (everything is normal, cannot even read from a Secondary).
However, I expect those ZFS operations to be done by some code in the
kernel itself, and something may not be playing by the rules there -
maybe ZFS is causing some I/O without doing a proper open/close cycle,
or we are missing something in DRBD for some I/O case that's supposed to
be valid.

br,
Robert

On 11/14/19 10:28 PM, Doug Cahill wrote:
> I spent some more time looking into this with another developer and I
> can see while running "drbdsetup events2 r0" that there is a quick
> blip when I add the drbd r0 resource to my pool as the log device:
>
> change resource name:r0 role:Primary
> change resource name:r0 role:Secondary
>
> However, if I export and/or import the pool, the event never registers
> again.  When I write to a vdisk on this pool I can see the nr:11766480
> dw:11766452 counts increase on the peer which leads me to believe
> blocks are being written, yet the state never changes.
>
> I also tried to run dd to the "peer" side drbd device while the
> "active" side was writing data and found a message stating the peer
> may not become primary while the device is opened read-only in my
> syslog which doesn't make sense.  The device is being written to, so
> how is the block device state being tricked to thinking it is read only?
>
> =========in the log from the node I'm writing to the drbd resource
> drbd r0 dccdx0: Preparing remote state change 892694821
> drbd r0: State change failed: Peer may not become primary while device
> is opened read-only
> kernel: [92771.927574] drbd r0 dccdx0: Aborting remote state change
> 892694821
>
> On Thu, Nov 14, 2019 at 10:39 AM Doug Cahill <handruin at gmail.com
> <mailto:handruin at gmail.com>> wrote:
>
>     On Thu, Nov 14, 2019 at 4:52 AM Roland Kammerer
>     <roland.kammerer at linbit.com <mailto:roland.kammerer at linbit.com>>
>     wrote:
>
>         On Wed, Nov 13, 2019 at 03:08:37PM -0500, Doug Cahill wrote:
>         > I'm configuring a two node setup with drbd 9.0.20-1 on CentOS 7
>         > (3.10.0-957.1.3.el7.x86_64) with a single resource backed by
>         an SSDs.  I've
>         > explicitly enabled auto-promote in my resource configuration
>         to use this
>         > feature.
>         >
>         > The drbd device is being used in a single-primary
>         configuration as a zpool
>         > SLOG device.  The zpool is only ever imported on one node at
>         a time and the
>         > import is successful during cluster failover events between
>         nodes.  I
>         > confirmed through zdb that the zpool includes the configured
>         drbd device
>         > path.
>         >
>         > My concern is that the drbdadm status output shows the Role
>         of the drbd
>         > resource as "Secondary" on both sides.  The documentations
>         reads that the
>         > drbd resource will be auto promoted to primary when it is
>         opened for
>         > writing.
>
>         But also demoted when closed (don't know if this happens in your
>         scenario).
>
>         > drbdadm status
>         > r0 role:Secondary
>         >   disk:UpToDate
>         >   dccdx0 role:Secondary
>         >     peer-disk:UpToDate
>
>         Maybe it is closed and demoted again and you look at it at the
>         wrong
>         points in time? Better look into the syslog for role changes,
>         or monitor
>         with "drbdsetup events2 r0". Do you see switches to Primary there?
>
>
>     I checked the drbdadm status while my dd write session was in
>     progress and I see no change from Secondary to Primary.   I also
>     checked the stats under /sys/class and it looks the same.
>
>     cat /sys/kernel/debug/drbd/resources/r0/connections/dccdx0/0/proc_drbd
>      0: cs:Established ro:Secondary/Secondary ds:UpToDate/UpToDate C
>     r-----
>         ns:3330728 nr:0 dw:20103080 dr:26292 al:131 bm:0 lo:0 pe:[0;0]
>     ua:0 ap:[0;0] ep:1 wo:1 oos:0
>     resync: used:0/61 hits:64 misses:4 starving:0 locked:0 changed:2
>     act_log: used:0/1237 hits:28951 misses:536 starving:0 locked:0
>     changed:132
>     blocked on activity log: 0/0/0
>
>
>         Best, rck
>         _______________________________________________
>         Star us on GITHUB: https://github.com/LINBIT
>         drbd-user mailing list
>         drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>         https://lists.linbit.com/mailman/listinfo/drbd-user
>
>
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user at lists.linbit.com
> https://lists.linbit.com/mailman/listinfo/drbd-user


-- 
Robert ALTNOEDER - Software Developer
+43-1-817-82-92 x72 <tel:+4318178292>
robert.altnoeder at linbit.com <mailto:robert.altnoeder at linbit.com>

LIN <http://www.linbit.com/en/>BIT <http://www.linbit.com/en/> | Keeping
the Digital World Running
DRBD HA - Disaster Recovery - Software-defined Storage
t <https://twitter.com/linbit> / f
<https://www.facebook.com/pg/linbitdrbd/posts/> / in
<https://www.linkedin.com/company/linbit> / y
<https://www.youtube.com/user/linbit> / g+
<https://plus.google.com/+Linbit/about>

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.


More information about the drbd-user mailing list