[DRBD-user] Xen DomU on DRBD device: barrier errors

Wiebe Cazemier wiebe at halfgaar.net
Thu May 31 09:12:50 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I can't seem to find any option to disable disk caching in Xen. Perhaps it has to do with the fact that Xen shares it's disk driver with the dom0, because of paravirtualized drivers? I can see that you too use the virtio and not emu, so I guess that shouldn't be an issue, but I don't know. 

----- Original Message -----

> From: "Marcel Kraan" <marcel at kraan.net>
> To: "Wiebe Cazemier" <wiebe at halfgaar.net>
> Cc: drbd-user at lists.linbit.com
> Sent: Thursday, 31 May, 2012 8:19:31 AM
> Subject: Re: [DRBD-user] Xen DomU on DRBD device: barrier errors

> They say that the original host does the caching.
> In KVM it's easy to turndown the cache
> I use libvirtd and qemu-kvm to start the vm's
> see attachement.

> On 31 mei 2012, at 08:07, Wiebe Cazemier wrote:

> > Hi,
> 

> > How did you turn off the cache? And how much did that influence
> > performance?
> 

> > I'll try, but I'd rather find a solution that doesn't involve
> > turning
> > off the cache.
> 

> > Wiebe
> 

> > ----- Original Message -----
> 

> > > From: "Marcel Kraan" < marcel at kraan.net >
> > 
> 
> > > To: "Wiebe Cazemier" < wiebe at halfgaar.net >
> > 
> 
> > > Cc: drbd-user at lists.linbit.com
> > 
> 
> > > Sent: Thursday, 31 May, 2012 7:58:10 AM
> > 
> 
> > > Subject: Re: [DRBD-user] Xen DomU on DRBD device: barrier errors
> > 
> 

> > > Helllo Wiebe,
> > 
> 

> > > I had that also.
> > 
> 
> > > But when i turned down the cache on the storage, and the other vm
> > > clients the error was gone.
> > 
> 
> > > So no cache on the VM's and it worked very good.
> > 
> 

> > > marcel
> > 
> 

> > > On 30 mei 2012, at 17:13, Wiebe Cazemier wrote:
> > 
> 

> > > > Hi,
> > > 
> > 
> 

> > > > I'm testing setting up a Xen DomU with a DRBD storage for easy
> > > > failover. Most of the time, immediately after booting the DomU,
> > > > I
> > > > get an IO error:
> > > 
> > 
> 

> > > > [ 3.153370] EXT3-fs (xvda2): using internal journal
> > > 
> > 
> 
> > > > [ 3.277115] ip_tables: (C) 2000-2006 Netfilter Core Team
> > > 
> > 
> 
> > > > [ 3.336014] nf_conntrack version 0.5.0 (3899 buckets, 15596
> > > > max)
> > > 
> > 
> 
> > > > [ 3.515604] init: failsafe main process (397) killed by TERM
> > > > signal
> > > 
> > 
> 
> > > > [ 3.801589] blkfront: barrier: write xvda2 op failed
> > > 
> > 
> 
> > > > [ 3.801597] blkfront: xvda2: barrier or flush: disabled
> > > 
> > 
> 
> > > > [ 3.801611] end_request: I/O error, dev xvda2, sector 52171168
> > > 
> > 
> 
> > > > [ 3.801630] end_request: I/O error, dev xvda2, sector 52171168
> > > 
> > 
> 
> > > > [ 3.801642] Buffer I/O error on device xvda2, logical block
> > > > 6521396
> > > 
> > 
> 
> > > > [ 3.801652] lost page write due to I/O error on xvda2
> > > 
> > 
> 
> > > > [ 3.801755] Aborting journal on device xvda2.
> > > 
> > 
> 
> > > > [ 3.804415] EXT3-fs (xvda2): error: ext3_journal_start_sb:
> > > > Detected
> > > > aborted journal
> > > 
> > 
> 
> > > > [ 3.804434] EXT3-fs (xvda2): error: remounting filesystem
> > > > read-only
> > > 
> > 
> 
> > > > [ 3.814754] journal commit I/O error
> > > 
> > 
> 
> > > > [ 6.973831] init: udev-fallback-graphics main process (538)
> > > > terminated with status 1
> > > 
> > 
> 
> > > > [ 6.992267] init: plymouth-splash main process (546) terminated
> > > > with
> > > > status 1
> > > 
> > 
> 

> > > > The manpage of drbdsetup says that LVM (which I use) doesn't
> > > > support
> > > > barriers (better known as "tagged command queuing" or "native
> > > > command queuing"), so I configured the DRBD device not to use
> > > > barriers. This can be seen in /proc/drbd (by "wo:f, meaning
> > > > flush,
> > > > the next method drbd chooses after barrier):
> > > 
> > 
> 

> > > > 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C
> > > > r----
> > > 
> > 
> 
> > > > ns:2160152 nr:520204 dw:2680344 dr:2678107 al:3549 bm:9183 lo:0
> > > > pe:0
> > > > ua:0 ap:0 ep:1 wo:f oos:0
> > > 
> > 
> 

> > > > And on the other host:
> > > 
> > 
> 

> > > > 3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C
> > > > r----
> > > 
> > 
> 
> > > > ns:0 nr:2160152 dw:2160152 dr:0 al:0 bm:8052 lo:0 pe:0 ua:0
> > > > ap:0
> > > > ep:1
> > > > wo:f oos:0
> > > 
> > 
> 

> > > > I also enabled the option disable_sendpage, as per the DRBD
> > > > docs:
> > > 
> > 
> 

> > > > cat /sys/module/drbd/parameters/disable_sendpage
> > > 
> > 
> 
> > > > Y
> > > 
> > 
> 

> > > > I also tried adding barriers=0 to fstab as mount option. Still
> > > > it
> > > > says:
> > > 
> > 
> 

> > > > [ 58.603896] blkfront: barrier: write xvda2 op failed
> > > 
> > 
> 
> > > > [ 58.603903] blkfront: xvda2: barrier or flush: disabled
> > > 
> > 
> 

> > > > I don't even know if ext3 has a nobarrier option, but it does
> > > > seem
> > > > to
> > > > work. But, because only one of my storage systems is battery
> > > > backed,
> > > > it would not be smart.
> > > 
> > 
> 

> > > > Why does it still compain about barriers when I disabled that?
> > > 
> > 
> 

> > > > Both hosts are:
> > > 
> > 
> 

> > > > Debian: 6.0.4
> > > 
> > 
> 
> > > > uname -a: Linux 2.6.32-5-xen-amd64
> > > 
> > 
> 
> > > > drbd: 8.3.7
> > > 
> > 
> 
> > > > Xen: 4.0.1
> > > 
> > 
> 

> > > > Guest:
> > > 
> > 
> 

> > > > Ubuntu 12.04 LTS
> > > 
> > 
> 
> > > > uname -a: Linux 3.2.0-24-generic pvops
> > > 
> > 
> 

> > > > drbd resource:
> > > 
> > 
> 

> > > > resource drbdvm
> > > 
> > 
> 
> > > > {
> > > 
> > 
> 
> > > > meta-disk internal;
> > > 
> > 
> 
> > > > device /dev/drbd3;
> > > 
> > 
> 

> > > > startup
> > > 
> > 
> 
> > > > {
> > > 
> > 
> 
> > > > # The timeout value when the last known state of the other side
> > > > was
> > > > available. 0 means infinite.
> > > 
> > 
> 
> > > > wfc-timeout 0;
> > > 
> > 
> 

> > > > # Timeout value when the last known state was disconnected. 0
> > > > means
> > > > infinite.
> > > 
> > 
> 
> > > > degr-wfc-timeout 180;
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 

> > > > syncer
> > > 
> > 
> 
> > > > {
> > > 
> > 
> 
> > > > # This is recommended only for low-bandwidth lines, to only
> > > > send
> > > > those
> > > 
> > 
> 
> > > > # blocks which really have changed.
> > > 
> > 
> 
> > > > #csums-alg md5;
> > > 
> > 
> 

> > > > # Set to about half your net speed
> > > 
> > 
> 
> > > > rate 60M;
> > > 
> > 
> 

> > > > # It seems that this option moved to the 'net' section in drbd
> > > > 8.4.
> > > > (later release than Debian has currently)
> > > 
> > 
> 
> > > > verify-alg md5;
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 

> > > > net
> > > 
> > 
> 
> > > > {
> > > 
> > 
> 
> > > > # The manpage says this is recommended only in pre-production
> > > > (because of its performance), to determine
> > > 
> > 
> 
> > > > # if your LAN card has a TCP checksum offloading bug.
> > > 
> > 
> 
> > > > #data-integrity-alg md5;
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 

> > > > disk
> > > 
> > 
> 
> > > > {
> > > 
> > 
> 
> > > > # Detach causes the device to work over-the-network-only after
> > > > the
> > > 
> > 
> 
> > > > # underlying disk fails. Detach is not default for historical
> > > > reasons, but is
> > > 
> > 
> 
> > > > # recommended by the docs.
> > > 
> > 
> 
> > > > # However, the Debian defaults in drbd.conf suggest the machine
> > > > will
> > > > reboot in that event...
> > > 
> > 
> 
> > > > on-io-error detach;
> > > 
> > 
> 

> > > > # LVM doesn't support barriers, so disabling it. It will revert
> > > > to
> > > > flush. Check wo: in /proc/drbd. If you don't disable it, you
> > > > get
> > > > IO
> > > > errors.
> > > 
> > 
> 
> > > > no-disk-barrier;
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 

> > > > on host1
> > > 
> > 
> 
> > > > {
> > > 
> > 
> 
> > > > # universe is a VG
> > > 
> > 
> 
> > > > disk /dev/universe/drbdvm-disk;
> > > 
> > 
> 
> > > > address 10.0.0.1:7792;
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 

> > > > on host2
> > > 
> > 
> 
> > > > {
> > > 
> > 
> 
> > > > # universe is a VG
> > > 
> > 
> 
> > > > disk /dev/universe/drbdvm-disk;
> > > 
> > 
> 
> > > > address 10.0.0.2:7792;
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 

> > > > In my test setup: the primary host's storage is 9650SE SATA-II
> > > > RAID
> > > > PCIe with battery. The secondary is software RAID1.
> > > 
> > 
> 

> > > > Isn't DRBD+Xen widely used? With these problems, it's not going
> > > > to
> > > > work.
> > > 
> > 
> 

> > > > Any help welcome.
> > > 
> > 
> 

> > > > _______________________________________________
> > > 
> > 
> 
> > > > drbd-user mailing list
> > > 
> > 
> 
> > > > drbd-user at lists.linbit.com
> > > 
> > 
> 
> > > > http://lists.linbit.com/mailman/listinfo/drbd-user
> > > 
> > 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120531/adfe932d/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2012-05-31 at 08.17.55.png
Type: image/png
Size: 173706 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120531/adfe932d/attachment.png>


More information about the drbd-user mailing list