[DRBD-user] DRBD 9 auto-promote not changing role to Primary, but is writable

Doug Cahill handruin at gmail.com
Fri Nov 15 20:57:10 CET 2019


On Fri, Nov 15, 2019 at 4:34 AM Robert Altnoeder <
robert.altnoeder at linbit.com> wrote:

> Could you try a few things, so we can get a better picture of what's
> happening there:
> - Can you get a hash of the data on the backend device that DRBD is
> writing to, from before and after one of those dubious Secondary-mode
> writes, to verify whether or not any data is actually changed?
>

Looks like the sha256sum changes on both primary and secondary local
backing devices to drbd when I write to my zpool that has the drbd log
device.

node0 before sha256sum
[root at dccdx0 ~]# dd if=/dev/sda1 bs=8M iflag=direct | sha256sum
17179869184 bytes (17 GB) copied, 69.4921 s, 247 MB/s
730d0f908c64a42ffc168211350b87a72c72ed56de2feef4be0904342acf20ac  -

node1 before sha256sum
[root at dccdx1 ~]# dd if=/dev/sda1 bs=8M iflag=direct | sha256sum
17179869184 bytes (17 GB) copied, 70.4586 s, 244 MB/s
adbab9ee2a96ed476fe649cd10dc17994767190ae350a7be146c40427e272a73  -

Write test:
[root at dccdx0 ~]# dd if=/dev/urandom of=/dev/zvol/act_per_pool000/test_drbd
bs=4k count=100000 oflag=sync,direct
409600000 bytes (410 MB) copied, 44.8633 s, 9.1 MB/s

node0 after sha256sum
[root at dccdx0 ~]# dd if=/dev/sda1 bs=8M iflag=direct | sha256sum
17179869184 bytes (17 GB) copied, 71.6324 s, 240 MB/s
e8c02e50daf281973b04ea1b76e6cdb8760a789245ade987ba5410deba68067d  -

node1 after sha256sum
[root at dccdx1 ~]# dd if=/dev/sda1 bs=8M iflag=direct | sha256sum
2048+0 records in
2048+0 records out
17179869184 bytes (17 GB) copied, 68.326 s, 251 MB/s
da8b90e8f57c20e4ea47a498157cb2865249d8b8cefc36aedb49a4467572924f  -


> - Can you switch the other peer into the Primary role manually (so that
> the node where the problem occurs should refuse to become a Primary) and
> see what happens when ZFS tries to write to that log?
>

Attempt 1 with zpool imported:
This is the secondary side where the zpool is not imported.  I'm
[root at dccdx1 ~]# drbdadm primary r0
r0: State change failed: (-10) State change was refused by peer node
additional info from kernel:
Declined by peer dccdx0 (id: 1), see the kernel log there
Command 'drbdsetup primary r0' terminated with exit code 11

Info logged in /var/log/messages:
dccdx1: Preparing remote state change 1839699090
dccdx0 kernel: [171910.178046] drbd r0: State change failed: Peer may not
become primary while device is opened read-only
dccdx0 kernel: [171910.195954] drbd r0 dccdx1: Aborting remote state change
1839699090

Attempt 2 with zpool exported:
export the pool on primary node.
secondary node I promote drbd resource to primary:
[root at dccdx1 ~]# drbdadm primary r0
[root at dccdx1 ~]# drbdadm status
r0 role:Primary
  disk:UpToDate
  dccdx0 role:Secondary
    peer-disk:UpToDate

import zpool on primary node with drbd device as secondary:
[root at dccdx0 ~]# zpool import -f -o cachefile=none -d
/dev/drbd/by-disk/disk/by-path -d /dev/disk/by-path -d /dev/mapper
act_per_pool000
The devices below are missing, use '-m' to import the pool anyway:
   pci-0000:18:00.0-scsi-0:0:2:0-part1 [log]
cannot import 'act_per_pool000': one or more devices is currently
unavailable



> I tried to reproduce the problem from user space (with auto-promote off
> and trying to read/write from/to a Secondary), where it does not seem to
> happen (everything is normal, cannot even read from a Secondary).
> However, I expect those ZFS operations to be done by some code in the
> kernel itself, and something may not be playing by the rules there -
> maybe ZFS is causing some I/O without doing a proper open/close cycle,
> or we are missing something in DRBD for some I/O case that's supposed to
> be valid.
>

Another dev is looking into using system tap so we can debug the kernel
calls to block devices to see why they aren't being flagged to open as
writable.  We are as puzzled how the zfs vdisk kernel call is or is not
being captured so that drbd detects this to auto-promote.


>
> br,
> Robert
>
> On 11/14/19 10:28 PM, Doug Cahill wrote:
> > I spent some more time looking into this with another developer and I
> > can see while running "drbdsetup events2 r0" that there is a quick
> > blip when I add the drbd r0 resource to my pool as the log device:
> >
> > change resource name:r0 role:Primary
> > change resource name:r0 role:Secondary
> >
> > However, if I export and/or import the pool, the event never registers
> > again.  When I write to a vdisk on this pool I can see the nr:11766480
> > dw:11766452 counts increase on the peer which leads me to believe
> > blocks are being written, yet the state never changes.
> >
> > I also tried to run dd to the "peer" side drbd device while the
> > "active" side was writing data and found a message stating the peer
> > may not become primary while the device is opened read-only in my
> > syslog which doesn't make sense.  The device is being written to, so
> > how is the block device state being tricked to thinking it is read only?
> >
> > =========in the log from the node I'm writing to the drbd resource
> > drbd r0 dccdx0: Preparing remote state change 892694821
> > drbd r0: State change failed: Peer may not become primary while device
> > is opened read-only
> > kernel: [92771.927574] drbd r0 dccdx0: Aborting remote state change
> > 892694821
> >
> > On Thu, Nov 14, 2019 at 10:39 AM Doug Cahill <handruin at gmail.com
> > <mailto:handruin at gmail.com>> wrote:
> >
> >     On Thu, Nov 14, 2019 at 4:52 AM Roland Kammerer
> >     <roland.kammerer at linbit.com <mailto:roland.kammerer at linbit.com>>
> >     wrote:
> >
> >         On Wed, Nov 13, 2019 at 03:08:37PM -0500, Doug Cahill wrote:
> >         > I'm configuring a two node setup with drbd 9.0.20-1 on CentOS 7
> >         > (3.10.0-957.1.3.el7.x86_64) with a single resource backed by
> >         an SSDs.  I've
> >         > explicitly enabled auto-promote in my resource configuration
> >         to use this
> >         > feature.
> >         >
> >         > The drbd device is being used in a single-primary
> >         configuration as a zpool
> >         > SLOG device.  The zpool is only ever imported on one node at
> >         a time and the
> >         > import is successful during cluster failover events between
> >         nodes.  I
> >         > confirmed through zdb that the zpool includes the configured
> >         drbd device
> >         > path.
> >         >
> >         > My concern is that the drbdadm status output shows the Role
> >         of the drbd
> >         > resource as "Secondary" on both sides.  The documentations
> >         reads that the
> >         > drbd resource will be auto promoted to primary when it is
> >         opened for
> >         > writing.
> >
> >         But also demoted when closed (don't know if this happens in your
> >         scenario).
> >
> >         > drbdadm status
> >         > r0 role:Secondary
> >         >   disk:UpToDate
> >         >   dccdx0 role:Secondary
> >         >     peer-disk:UpToDate
> >
> >         Maybe it is closed and demoted again and you look at it at the
> >         wrong
> >         points in time? Better look into the syslog for role changes,
> >         or monitor
> >         with "drbdsetup events2 r0". Do you see switches to Primary
> there?
> >
> >
> >     I checked the drbdadm status while my dd write session was in
> >     progress and I see no change from Secondary to Primary.   I also
> >     checked the stats under /sys/class and it looks the same.
> >
> >     cat
> /sys/kernel/debug/drbd/resources/r0/connections/dccdx0/0/proc_drbd
> >      0: cs:Established ro:Secondary/Secondary ds:UpToDate/UpToDate C
> >     r-----
> >         ns:3330728 nr:0 dw:20103080 dr:26292 al:131 bm:0 lo:0 pe:[0;0]
> >     ua:0 ap:[0;0] ep:1 wo:1 oos:0
> >     resync: used:0/61 hits:64 misses:4 starving:0 locked:0 changed:2
> >     act_log: used:0/1237 hits:28951 misses:536 starving:0 locked:0
> >     changed:132
> >     blocked on activity log: 0/0/0
> >
> >
> >         Best, rck
> >         _______________________________________________
> >         Star us on GITHUB: https://github.com/LINBIT
> >         drbd-user mailing list
> >         drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
> >         https://lists.linbit.com/mailman/listinfo/drbd-user
> >
> >
> > _______________________________________________
> > Star us on GITHUB: https://github.com/LINBIT
> > drbd-user mailing list
> > drbd-user at lists.linbit.com
> > https://lists.linbit.com/mailman/listinfo/drbd-user
>
>
> --
> Robert ALTNOEDER - Software Developer
> +43-1-817-82-92 x72 <tel:+4318178292>
> robert.altnoeder at linbit.com <mailto:robert.altnoeder at linbit.com>
>
> LIN <http://www.linbit.com/en/>BIT <http://www.linbit.com/en/> | Keeping
> the Digital World Running
> DRBD HA - Disaster Recovery - Software-defined Storage
> t <https://twitter.com/linbit> / f
> <https://www.facebook.com/pg/linbitdrbd/posts/> / in
> <https://www.linkedin.com/company/linbit> / y
> <https://www.youtube.com/user/linbit> / g+
> <https://plus.google.com/+Linbit/about>
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> _______________________________________________
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user at lists.linbit.com
> https://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20191115/23611880/attachment-0001.htm>


More information about the drbd-user mailing list