[DRBD-user] Xen DomU on DRBD device: barrier errors

Thu May 31 08:07:00 CEST 2012

Hi, 

How did you turn off the cache? And how much did that influence performance? 

I'll try, but I'd rather find a solution that doesn't involve turning off the cache. 

Wiebe 

----- Original Message -----

> From: "Marcel Kraan" <marcel at kraan.net>
> To: "Wiebe Cazemier" <wiebe at halfgaar.net>
> Cc: drbd-user at lists.linbit.com
> Sent: Thursday, 31 May, 2012 7:58:10 AM
> Subject: Re: [DRBD-user] Xen DomU on DRBD device: barrier errors

> Helllo Wiebe,

> I had that also.
> But when i turned down the cache on the storage, and the other vm
> clients the error was gone.
> So no cache on the VM's and it worked very good.

> marcel

> On 30 mei 2012, at 17:13, Wiebe Cazemier wrote:

> > Hi,
> 

> > I'm testing setting up a Xen DomU with a DRBD storage for easy
> > failover. Most of the time, immediately after booting the DomU, I
> > get an IO error:
> 

> > [ 3.153370] EXT3-fs (xvda2): using internal journal
> 
> > [ 3.277115] ip_tables: (C) 2000-2006 Netfilter Core Team
> 
> > [ 3.336014] nf_conntrack version 0.5.0 (3899 buckets, 15596 max)
> 
> > [ 3.515604] init: failsafe main process (397) killed by TERM signal
> 
> > [ 3.801589] blkfront: barrier: write xvda2 op failed
> 
> > [ 3.801597] blkfront: xvda2: barrier or flush: disabled
> 
> > [ 3.801611] end_request: I/O error, dev xvda2, sector 52171168
> 
> > [ 3.801630] end_request: I/O error, dev xvda2, sector 52171168
> 
> > [ 3.801642] Buffer I/O error on device xvda2, logical block 6521396
> 
> > [ 3.801652] lost page write due to I/O error on xvda2
> 
> > [ 3.801755] Aborting journal on device xvda2.
> 
> > [ 3.804415] EXT3-fs (xvda2): error: ext3_journal_start_sb: Detected
> > aborted journal
> 
> > [ 3.804434] EXT3-fs (xvda2): error: remounting filesystem read-only
> 
> > [ 3.814754] journal commit I/O error
> 
> > [ 6.973831] init: udev-fallback-graphics main process (538)
> > terminated with status 1
> 
> > [ 6.992267] init: plymouth-splash main process (546) terminated
> > with
> > status 1
> 

> > The manpage of drbdsetup says that LVM (which I use) doesn't
> > support
> > barriers (better known as "tagged command queuing" or "native
> > command queuing"), so I configured the DRBD device not to use
> > barriers. This can be seen in /proc/drbd (by "wo:f, meaning flush,
> > the next method drbd chooses after barrier):
> 

> > 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
> 
> > ns:2160152 nr:520204 dw:2680344 dr:2678107 al:3549 bm:9183 lo:0
> > pe:0
> > ua:0 ap:0 ep:1 wo:f oos:0
> 

> > And on the other host:
> 

> > 3: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
> 
> > ns:0 nr:2160152 dw:2160152 dr:0 al:0 bm:8052 lo:0 pe:0 ua:0 ap:0
> > ep:1
> > wo:f oos:0
> 

> > I also enabled the option disable_sendpage, as per the DRBD docs:
> 

> > cat /sys/module/drbd/parameters/disable_sendpage
> 
> > Y
> 

> > I also tried adding barriers=0 to fstab as mount option. Still it
> > says:
> 

> > [ 58.603896] blkfront: barrier: write xvda2 op failed
> 
> > [ 58.603903] blkfront: xvda2: barrier or flush: disabled
> 

> > I don't even know if ext3 has a nobarrier option, but it does seem
> > to
> > work. But, because only one of my storage systems is battery
> > backed,
> > it would not be smart.
> 

> > Why does it still compain about barriers when I disabled that?
> 

> > Both hosts are:
> 

> > Debian: 6.0.4
> 
> > uname -a: Linux 2.6.32-5-xen-amd64
> 
> > drbd: 8.3.7
> 
> > Xen: 4.0.1
> 

> > Guest:
> 

> > Ubuntu 12.04 LTS
> 
> > uname -a: Linux 3.2.0-24-generic pvops
> 

> > drbd resource:
> 

> > resource drbdvm
> 
> > {
> 
> > meta-disk internal;
> 
> > device /dev/drbd3;
> 

> > startup
> 
> > {
> 
> > # The timeout value when the last known state of the other side was
> > available. 0 means infinite.
> 
> > wfc-timeout 0;
> 

> > # Timeout value when the last known state was disconnected. 0 means
> > infinite.
> 
> > degr-wfc-timeout 180;
> 
> > }
> 

> > syncer
> 
> > {
> 
> > # This is recommended only for low-bandwidth lines, to only send
> > those
> 
> > # blocks which really have changed.
> 
> > #csums-alg md5;
> 

> > # Set to about half your net speed
> 
> > rate 60M;
> 

> > # It seems that this option moved to the 'net' section in drbd 8.4.
> > (later release than Debian has currently)
> 
> > verify-alg md5;
> 
> > }
> 

> > net
> 
> > {
> 
> > # The manpage says this is recommended only in pre-production
> > (because of its performance), to determine
> 
> > # if your LAN card has a TCP checksum offloading bug.
> 
> > #data-integrity-alg md5;
> 
> > }
> 

> > disk
> 
> > {
> 
> > # Detach causes the device to work over-the-network-only after the
> 
> > # underlying disk fails. Detach is not default for historical
> > reasons, but is
> 
> > # recommended by the docs.
> 
> > # However, the Debian defaults in drbd.conf suggest the machine
> > will
> > reboot in that event...
> 
> > on-io-error detach;
> 

> > # LVM doesn't support barriers, so disabling it. It will revert to
> > flush. Check wo: in /proc/drbd. If you don't disable it, you get IO
> > errors.
> 
> > no-disk-barrier;
> 
> > }
> 

> > on host1
> 
> > {
> 
> > # universe is a VG
> 
> > disk /dev/universe/drbdvm-disk;
> 
> > address 10.0.0.1:7792;
> 
> > }
> 

> > on host2
> 
> > {
> 
> > # universe is a VG
> 
> > disk /dev/universe/drbdvm-disk;
> 
> > address 10.0.0.2:7792;
> 
> > }
> 
> > }
> 

> > In my test setup: the primary host's storage is 9650SE SATA-II RAID
> > PCIe with battery. The secondary is software RAID1.
> 

> > Isn't DRBD+Xen widely used? With these problems, it's not going to
> > work.
> 

> > Any help welcome.
> 

> > _______________________________________________
> 
> > drbd-user mailing list
> 
> > drbd-user at lists.linbit.com
> 
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120531/c46bef8d/attachment.htm>