[DRBD-user] Effects of zeroing a DRBD device before use

Fri Oct 8 13:47:28 CEST 2010

Hi Sunny, and thanks for your input.

> -----Original Message-----
> From: Sunny [mailto:sunyucong at gmail.com]
> Sent: jeudi 7 octobre 2010 18:41
> To: Patrick Zwahlen
> Cc: drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] Effects of zeroing a DRBD device before use
> 
> For option 2) I am having the same problem, I roughly found it is
> related to that I enabled the "Storage IO control" on the parent VMFS
> store.  Which has a congestion control throttle from 30ms to 100ms,
> that's not necessarily enough in this setup.

My current semi-prod setup is running ESX 4.0u2, so I haven't touched/enabled the "Storage IO control". I will make sure it remains disabled when going ESX 4.1

> So when VM write to iscsi -> iscsi target writes to VMFS -> VMFS
> writes get throttled -> iscsi target is staled -> VM hangs
> 
> and in my case, ESXi itself also locks up because of the iscsi IO
> queue overflow, which is very bad :-(
> 
> So, disabling that storage IO control on that vmfs store where your
> iscsi target host resist on, may help. also, move to a dedicated
> machine and a backup plan is not that expensive but more reliable.
> 
> 
> On Thu, Oct 7, 2010 at 7:43 AM, Patrick Zwahlen <paz at navixia.com>
> wrote:
> > Hi all,
> >
> > I'm looking for some inputs from the experts !
> >
> > Short story
> > -----------
> > zeroing my DRBD device before using it turns a non-working system
> into a
> > working one, and I'm trying to figure out why. I'm also trying to
> > understand if I will have other problems down the road.
> >
> > Long story
> > ----------
> > I am building a pair of redundant iSCSI targets for VMware ESX4.1,
> using
> > the following software components:
> > - Fedora 12 x86_64
> > - DRBD 8.3.8.1
> > - pacemaker 1.0.9
> > - corosync 1.2.8
> > - SCST iSCSI Target (using SVN trunk, almost 2.0)
> >
> > SCST isn't cluster aware, so I'm using DRBD in primary/secondary
> mode.
> > I'm creating two iSCSI targets, one on each node, with mutual
> failover
> > and no multipath. As a reference for the discussion, I'm attaching my
> > resource agent, my CIB and my DRBD config files. The resource agent
> is a
> > modification of iSCSITarget/iSCSILun with some SCST specifics.
> >
> > When running this setup on a pair of physical hosts, everything works
> > fine. However, my interest is in small setups and I want to run the
> two
> > targets in VMs, hosted on the ESX hosts that will be the iSCSI
> > initiators. The market calls this a virtual SAN... I know, I know,
> this
> > is not recommended, but it definitely exists as commercial solutions,
> > and makes a lot of sense for small setups. I'm not looking for perf,
> but
> > for high-availability.
> >
> > This being said, I have two ways to present disk space (physical) to
> > DRBD (they are /dev/sdb and /dev/sdc in the VMs):
> >
> > 1) Map raid volumes to the Fedora VMs using RDM (Raw Device Mapping)
> > 2) Format the raid volumes with VMFS, and create virtual disks
> (VMDKs)
> > in that datastore for the Fedora VMs.
> >
> > Option 1) obviously works better, but is not always possible (many
> > restrictions on RAID controllers, for instance).
> >
> > Option 2) works fine until I put iSCSI WRITE load on my Fedora VM.
> When
> > using large blocks, I quickly end up with stale VMs. The iSCSI target
> > complains that the backend device doesn't respond, and the kernel
> gives
> > me 120 seconds timeouts for the DRBD threads. The DRBD backend
> devices
> > appear dead. At this stage, there is no iSCSI traffic anymore, CPU
> usage
> > in null, memory is fine, starvation.... Rebooting the Fedora VM
> solves
> > the problem. Seen from a DRBD/SCST point of view, it's as if backend
> > hardware was failing. However, physical disks/arrays are fine. The
> > problem is clearly within VMware.
> >
> > One of the VMware recommendation is to create the large VMDKs in
> > 'eagerZeroedThick', which basically zeroes everything before use.
> This
> > helps, but doesn't solve the problem completely.
> >
> > I then tried a third option: format /dev/drbd0 with XFS, create one
> BIG
> > file (using dd) on that filesystem, and export this file via
> iSCSI/SCST
> > (instead of exporting the /dev/drbd0 block device directly). I
> couldn't
> > crash this setup, but I don't like the idea of having a single 200G
> file
> > on a 99% full filesystem.
> >
> > This brought me to option 4: I directly export /dev/drbd0 via SCST
> (same
> > as option 1 and 2), but before using it, I issue a:
> >
> >        dd if=/dev/zero of=/dev/drbd0 bs=4096
> >
> > I'm now running this setup since 2 weeks, trying to put as much load
> as
> > I can on it (mainly using dd, bonnie++, DiskTT and running VMware
> > Storage vMotion). The only issue I have faced is that sometimes the
> > pacemaker 'monitor' action takes more than 20 seconds to run on DRBD,
> so
> > I have increased this timeout to 60s. Since then, no problem at all!
> >
> > As you can imagine, I'm pretty happy with the setup, but I still
> don't
> > fully understand why it now works. I hate these situations...
> >
> > Can zeroing make such a big difference ? Does it just make a
> difference
> > at the RAID/disk level, or does it also make a difference at the DRBD
> > level ?
> >
> > Sorry for the long e-mail, and thanks a ton for any input. - Patrick
> -
> >
> > PS: Based on my reading, many people are trying to implement such
> > solutions. XtraVirt had a VM at some point, but not anymore. People
> are
> > trying to do it with OpenFiler, but IET and VMware don't like each
> > other. My setup is not documented the way it should, but I'm ready to
> > share if anyone wants to play with it.
> >
> >
> >
> ***********************************************************************
> ***************
> > This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom they
> > are addressed. If you have received this email in error please notify
> > the system manager. postmaster at navixia.com
> >
> ***********************************************************************
> ***************
> >
> > _______________________________________________
> > drbd-user mailing list
> > drbd-user at lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> >
> >