[DRBD-user] DRBD Trouble (block drbd0: local WRITE IO error sector)

Mon Feb 20 13:51:30 CET 2017

On Mon, Feb 20, 2017 at 02:01:54PM +0900, Seiichirou Hiraoka wrote:
> Hello Lars,
> Thank you for a polite answer.
> 
> As you can see below, we found that this event can be avoided by
> invalidating the WRITE SAME command.
> 
> 1) echo 0 | tee
> /sys/block/sdc/device/../0:0:2:0/scsi_disk/0:0:2:0/max_write_same_blocks
> Change max_write_same_block from 65535 to 0. (We use sdc as DRBD area)
> 2) vgchange -an; vgchange
> Because it uses LVM, it reflects the above change. (Is it OK if I run
> it while running? Please let me know if you know)

"deactivating" a volume group deactivates all LVs in it.
You cannot deactivate an LV that is in use.

> Also, do you worry about performance degradation by disabling WRITE SAME?

No.

"WRITE SAME" can only be used to initialize larger areas with identical
blocks, and is typically used to efficiently zero-out stuff, without
re-sending tons of zeros through all stacks down to the backend device.

Ext4 for example uses it during first mount to finish the "lazy" journal
and inode table initialization. You could avoid that by simply disabling
the "lazy" flags on mkfs.ext4, i.e.
  mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 ...

> 2017-02-17 22:14 GMT+09:00 Lars Ellenberg <lars.ellenberg at linbit.com>:
> > On Fri, Feb 03, 2017 at 03:32:39PM +0900, Seiichirou Hiraoka wrote:
> >> Hello.
> >>
> >> I use DRBD in the following environment.
> >>
> >> OS: Redhat Enterprise Linux 7.1
> >> Pacemaker: 1.1.12 (CentOS Repository)
> >> Corosync: 2.3.4 (CentOS Repository)
> >> DRBD: 8.4.9 (ELRepo)
> >> # rpm -qi drbd84-utils
> >> Name        : drbd84-utils
> >> Version     : 8.9.2
> >> Release     : 2.el7.elrepo
> >> Architecture: x86_64
> >> Vendor: The ELRepo Project (http://elrepo.org)
> >> # rpm -qi kmod-drbd84
> >> Name        : kmod-drbd84
> >> Version     : 8.4.9
> >> Release     : 1.el7.elrepo
> >> Architecture: x86_64
> >>
> >> Although DRBD is operated on two servers (server1, server2),
> >> the following error message suddenly appears
> >> and writing to the DRBD area can not be performed.
> >>
> >> . server1(master)
> >> Jan 20 10:41:16 server1 kernel: block drbd0: local WRITE IO error sector 118616936+40 on dm-0
> >> Jan 20 10:41:16 server1 kernel: block drbd0: disk( UpToDate -> Failed )
> >> Jan 20 10:41:16 server1 kernel: block drbd0: Local IO failed in __req_mod. Detaching...
> >> Jan 20 10:41:16 server1 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> >> Jan 20 10:41:16 server1 kernel: block drbd0: disk( Failed -> Diskless )
> >> Jan 20 10:41:16 server1 kernel: block drbd0: Got NegDReply; Sector 117512416s, len 4096.
> >> Jan 20 10:41:16 server1 kernel: drbd0: WRITE SAME failed. Manually zeroing.
> >
> >
> > That ^^ is the relevant hint.
> >
> > VMWare "virtual" disks seem to love to pretend to be able to do WRITE SAME,
> > but when the actually see such requests, fail them with IO error.
> > (Not blaming VMWare here, maybe other (real/virtual) disks show the same
> > behavior. It's just the most frequent "offender" currently)
> >
> > That's not easy for DRBD to handle.
> > Next DRBD release will have a config switch to turn off write-same
> > support for specific DRBD volumes.
> >
> > Meanwhile, available work arounds:
> >
> > use a different type of virtual disk ("sata" may work), something that
> > does not claim to support something it then does not handle.
> >
> > or, *early* in the boot process (before you bring up DRBD),
> > disable write same like this:
> > echo 0 | tee /sys/block/*/device/../*/scsi_disk/*/max_write_same_blocks
> > (for the relevant backend devices)
> >
> > If you use LVM, you may need to vgchange -an ; vgchange -ay after that,
> > (at least for the relevant VGs), if they have already been activated.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD® and LINBIT® are registered trademarks of LINBIT
__
please don't Cc me, but send to list -- I'm subscribed