[DRBD-user] DRBD 9 auto-promote not changing role to Primary, but is writable
Doug Cahill
handruin at gmail.com
Mon Nov 18 16:14:03 CET 2019
To follow up, one of our other engineers may have discovered why drbd won't
auto-promote in this use case. Turns out in zfs 0.7.12 the device is being
opened with the flag FMODE_EXCL being passed into
blkdev_get_by_path() which drbd isn't detecting during auto-promote. He
plans to back port a change from zfs 8 that will use the flag FMODE_WRITE
which should trigger the code path to auto-promote the resource in drbd.
On Fri, Nov 15, 2019 at 2:57 PM Doug Cahill <handruin at gmail.com> wrote:
>
>
>
> On Fri, Nov 15, 2019 at 4:34 AM Robert Altnoeder <
> robert.altnoeder at linbit.com> wrote:
>
>> Could you try a few things, so we can get a better picture of what's
>> happening there:
>> - Can you get a hash of the data on the backend device that DRBD is
>> writing to, from before and after one of those dubious Secondary-mode
>> writes, to verify whether or not any data is actually changed?
>>
>
> Looks like the sha256sum changes on both primary and secondary local
> backing devices to drbd when I write to my zpool that has the drbd log
> device.
>
> node0 before sha256sum
> [root at dccdx0 ~]# dd if=/dev/sda1 bs=8M iflag=direct | sha256sum
> 17179869184 bytes (17 GB) copied, 69.4921 s, 247 MB/s
> 730d0f908c64a42ffc168211350b87a72c72ed56de2feef4be0904342acf20ac -
>
> node1 before sha256sum
> [root at dccdx1 ~]# dd if=/dev/sda1 bs=8M iflag=direct | sha256sum
> 17179869184 bytes (17 GB) copied, 70.4586 s, 244 MB/s
> adbab9ee2a96ed476fe649cd10dc17994767190ae350a7be146c40427e272a73 -
>
> Write test:
> [root at dccdx0 ~]# dd if=/dev/urandom
> of=/dev/zvol/act_per_pool000/test_drbd bs=4k count=100000 oflag=sync,direct
> 409600000 bytes (410 MB) copied, 44.8633 s, 9.1 MB/s
>
> node0 after sha256sum
> [root at dccdx0 ~]# dd if=/dev/sda1 bs=8M iflag=direct | sha256sum
> 17179869184 bytes (17 GB) copied, 71.6324 s, 240 MB/s
> e8c02e50daf281973b04ea1b76e6cdb8760a789245ade987ba5410deba68067d -
>
> node1 after sha256sum
> [root at dccdx1 ~]# dd if=/dev/sda1 bs=8M iflag=direct | sha256sum
> 2048+0 records in
> 2048+0 records out
> 17179869184 bytes (17 GB) copied, 68.326 s, 251 MB/s
> da8b90e8f57c20e4ea47a498157cb2865249d8b8cefc36aedb49a4467572924f -
>
>
>> - Can you switch the other peer into the Primary role manually (so that
>> the node where the problem occurs should refuse to become a Primary) and
>> see what happens when ZFS tries to write to that log?
>>
>
> Attempt 1 with zpool imported:
> This is the secondary side where the zpool is not imported. I'm
> [root at dccdx1 ~]# drbdadm primary r0
> r0: State change failed: (-10) State change was refused by peer node
> additional info from kernel:
> Declined by peer dccdx0 (id: 1), see the kernel log there
> Command 'drbdsetup primary r0' terminated with exit code 11
>
> Info logged in /var/log/messages:
> dccdx1: Preparing remote state change 1839699090
> dccdx0 kernel: [171910.178046] drbd r0: State change failed: Peer may not
> become primary while device is opened read-only
> dccdx0 kernel: [171910.195954] drbd r0 dccdx1: Aborting remote state
> change 1839699090
>
> Attempt 2 with zpool exported:
> export the pool on primary node.
> secondary node I promote drbd resource to primary:
> [root at dccdx1 ~]# drbdadm primary r0
> [root at dccdx1 ~]# drbdadm status
> r0 role:Primary
> disk:UpToDate
> dccdx0 role:Secondary
> peer-disk:UpToDate
>
> import zpool on primary node with drbd device as secondary:
> [root at dccdx0 ~]# zpool import -f -o cachefile=none -d
> /dev/drbd/by-disk/disk/by-path -d /dev/disk/by-path -d /dev/mapper
> act_per_pool000
> The devices below are missing, use '-m' to import the pool anyway:
> pci-0000:18:00.0-scsi-0:0:2:0-part1 [log]
> cannot import 'act_per_pool000': one or more devices is currently
> unavailable
>
>
>
>> I tried to reproduce the problem from user space (with auto-promote off
>> and trying to read/write from/to a Secondary), where it does not seem to
>> happen (everything is normal, cannot even read from a Secondary).
>> However, I expect those ZFS operations to be done by some code in the
>> kernel itself, and something may not be playing by the rules there -
>> maybe ZFS is causing some I/O without doing a proper open/close cycle,
>> or we are missing something in DRBD for some I/O case that's supposed to
>> be valid.
>>
>
> Another dev is looking into using system tap so we can debug the kernel
> calls to block devices to see why they aren't being flagged to open as
> writable. We are as puzzled how the zfs vdisk kernel call is or is not
> being captured so that drbd detects this to auto-promote.
>
>
>>
>> br,
>> Robert
>>
>> On 11/14/19 10:28 PM, Doug Cahill wrote:
>> > I spent some more time looking into this with another developer and I
>> > can see while running "drbdsetup events2 r0" that there is a quick
>> > blip when I add the drbd r0 resource to my pool as the log device:
>> >
>> > change resource name:r0 role:Primary
>> > change resource name:r0 role:Secondary
>> >
>> > However, if I export and/or import the pool, the event never registers
>> > again. When I write to a vdisk on this pool I can see the nr:11766480
>> > dw:11766452 counts increase on the peer which leads me to believe
>> > blocks are being written, yet the state never changes.
>> >
>> > I also tried to run dd to the "peer" side drbd device while the
>> > "active" side was writing data and found a message stating the peer
>> > may not become primary while the device is opened read-only in my
>> > syslog which doesn't make sense. The device is being written to, so
>> > how is the block device state being tricked to thinking it is read only?
>> >
>> > =========in the log from the node I'm writing to the drbd resource
>> > drbd r0 dccdx0: Preparing remote state change 892694821
>> > drbd r0: State change failed: Peer may not become primary while device
>> > is opened read-only
>> > kernel: [92771.927574] drbd r0 dccdx0: Aborting remote state change
>> > 892694821
>> >
>> > On Thu, Nov 14, 2019 at 10:39 AM Doug Cahill <handruin at gmail.com
>> > <mailto:handruin at gmail.com>> wrote:
>> >
>> > On Thu, Nov 14, 2019 at 4:52 AM Roland Kammerer
>> > <roland.kammerer at linbit.com <mailto:roland.kammerer at linbit.com>>
>> > wrote:
>> >
>> > On Wed, Nov 13, 2019 at 03:08:37PM -0500, Doug Cahill wrote:
>> > > I'm configuring a two node setup with drbd 9.0.20-1 on CentOS
>> 7
>> > > (3.10.0-957.1.3.el7.x86_64) with a single resource backed by
>> > an SSDs. I've
>> > > explicitly enabled auto-promote in my resource configuration
>> > to use this
>> > > feature.
>> > >
>> > > The drbd device is being used in a single-primary
>> > configuration as a zpool
>> > > SLOG device. The zpool is only ever imported on one node at
>> > a time and the
>> > > import is successful during cluster failover events between
>> > nodes. I
>> > > confirmed through zdb that the zpool includes the configured
>> > drbd device
>> > > path.
>> > >
>> > > My concern is that the drbdadm status output shows the Role
>> > of the drbd
>> > > resource as "Secondary" on both sides. The documentations
>> > reads that the
>> > > drbd resource will be auto promoted to primary when it is
>> > opened for
>> > > writing.
>> >
>> > But also demoted when closed (don't know if this happens in your
>> > scenario).
>> >
>> > > drbdadm status
>> > > r0 role:Secondary
>> > > disk:UpToDate
>> > > dccdx0 role:Secondary
>> > > peer-disk:UpToDate
>> >
>> > Maybe it is closed and demoted again and you look at it at the
>> > wrong
>> > points in time? Better look into the syslog for role changes,
>> > or monitor
>> > with "drbdsetup events2 r0". Do you see switches to Primary
>> there?
>> >
>> >
>> > I checked the drbdadm status while my dd write session was in
>> > progress and I see no change from Secondary to Primary. I also
>> > checked the stats under /sys/class and it looks the same.
>> >
>> > cat
>> /sys/kernel/debug/drbd/resources/r0/connections/dccdx0/0/proc_drbd
>> > 0: cs:Established ro:Secondary/Secondary ds:UpToDate/UpToDate C
>> > r-----
>> > ns:3330728 nr:0 dw:20103080 dr:26292 al:131 bm:0 lo:0 pe:[0;0]
>> > ua:0 ap:[0;0] ep:1 wo:1 oos:0
>> > resync: used:0/61 hits:64 misses:4 starving:0 locked:0 changed:2
>> > act_log: used:0/1237 hits:28951 misses:536 starving:0 locked:0
>> > changed:132
>> > blocked on activity log: 0/0/0
>> >
>> >
>> > Best, rck
>> > _______________________________________________
>> > Star us on GITHUB: https://github.com/LINBIT
>> > drbd-user mailing list
>> > drbd-user at lists.linbit.com <mailto:drbd-user at lists.linbit.com>
>> > https://lists.linbit.com/mailman/listinfo/drbd-user
>> >
>> >
>> > _______________________________________________
>> > Star us on GITHUB: https://github.com/LINBIT
>> > drbd-user mailing list
>> > drbd-user at lists.linbit.com
>> > https://lists.linbit.com/mailman/listinfo/drbd-user
>>
>>
>> --
>> Robert ALTNOEDER - Software Developer
>> +43-1-817-82-92 x72 <tel:+4318178292>
>> robert.altnoeder at linbit.com <mailto:robert.altnoeder at linbit.com>
>>
>> LIN <http://www.linbit.com/en/>BIT <http://www.linbit.com/en/> | Keeping
>> the Digital World Running
>> DRBD HA - Disaster Recovery - Software-defined Storage
>> t <https://twitter.com/linbit> / f
>> <https://www.facebook.com/pg/linbitdrbd/posts/> / in
>> <https://www.linkedin.com/company/linbit> / y
>> <https://www.youtube.com/user/linbit> / g+
>> <https://plus.google.com/+Linbit/about>
>>
>> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>> _______________________________________________
>> Star us on GITHUB: https://github.com/LINBIT
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> https://lists.linbit.com/mailman/listinfo/drbd-user
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20191118/39cb7dc0/attachment.htm>
More information about the drbd-user
mailing list