[DRBD-user] DRBD + ZFS

David Bruzos david.bruzos at jaxport.com
Wed Sep 1 00:30:14 CEST 2021


Eric,
    Cool, I'll try to help where I can.  I am not intimately familiar with MySQL internals, but the information in the article wil apply to anything that writes to ZFS in blocks, so probably still applicable in your case, just making whatever size adjustments make sense.  The key is to determine what is your primary goal with ZFS and then run some benchmarks and see if your iops are where you need them.

Good luck!


-- 
David Bruzos (Systems Administrator)
Jacksonville Port Authority
2831 Talleyrand Ave.
Jacksonville, FL  32206
Cell: (904) 625-0969
Office: (904) 357-3069
Email: david.bruzos at jaxport.com

On Tue, Aug 31, 2021 at 08:34:21PM +0000, Eric Robinson wrote:
> EXTERNAL
> This message is from an external sender.
> Please use caution when opening attachments, clicking links, and responding.
> If in doubt, contact the person or the helpdesk by phone.
> ________________________________
> 
> 
> David --
> 
> That is good feedback and thanks much for the link. If I gather correctly, the thrust of the article is related to InnoDB optimization. Believe it or not, we employ a hybrid model. Each of our databases consists of approximately 5000 tables of different sizes and structures. Most of them are still on MyISAM with only 20 or so on InnoDB. (In my experience over the past 15 years of hosting hundreds of MySQL databases, InnoDB is a bloated, fragile, resource-gulping freakshow, so we only use it for the handful of tables that demand it. That said, I realize most other people would see it differently.)
> 
> I hope you won't mind if I circle back and ask you some questions when the new servers get here and I start testing different approaches to storage.
> 
> > -----Original Message-----
> > From: David Bruzos <david.bruzos at jaxport.com>
> > Sent: Monday, August 30, 2021 6:26 AM
> > To: Eric Robinson <eric.robinson at psmnv.com>
> > Cc: rabin at isoc.org.il; drbd-user at lists.linbit.com
> > Subject: Re: [DRBD-user] DRBD + ZFS
> >
> > Hi Eric,
> >     Sorry about the delay.  The article you provided is interesting, but rather
> > specific to a workload that would show rather dramatic results on VDO.  In
> > your case, the main objective is making the most our of your NVME storage,
> > while maintaining good performance.  The article would be very much
> > applicable if you were doing replication over a slow WAN link or something
> > like that, but I imagine that the network is not going to be a bottleneck for
> > you, so saving throughput at the DRBD layer is probably not a big advantage.
> >     The real space and performance killer (if done wrong) in your case is going
> > to be proper block alignments to optimize the mysql workload.  Depending
> > on your underlining storage optimal block size (usually 4KB) and the vdev
> > type you want to use (EG. raidz, mirror), you will have to make sure that
> > everything is optimized for mysql's 16KB writes.  As I pointed out earlier,
> > mirror will be simplest/fastest and raidz is doable, but will be slower for
> > writes (may not matter if you got enough iops).  The key is that with raidz,
> > you will have to take more factors into account to ensure everything is
> > optimal.  In my case for example, my newest setup uses raidz and
> > compression for making the most our of my NVME, but I use ashift=9 (512
> > byte blocks) to be able to make 4K zvols for my VMs and still greatly benefit
> > from compression.
> >     It is important to point out that the raidz details are not unique to ZFS.
> > Most people that use tradditional raid5 setups use it in a suboptimal manner
> > and actually have terrible performance and either can't tell, or eventually
> > move to raid10, because "raid5 sucks".  In any case, to answer your question,
> > I would still use ZFS instead of VDO for multiple reasons and I would still use it
> > only under DRBD in this case.  You have a standard workload, so you should
> > be able to optimize it to fit your objectives.
> >
> > Here is a good article about mysql on ZFS that should get you started:
> >
> >     https://shatteredsilicon.net/blog/2020/06/05/mysql-mariadb-innodb-on-
> > zfs/
> >
> >
> > David
> >
> > --
> > David Bruzos (Systems Administrator)
> > Jacksonville Port Authority
> > 2831 Talleyrand Ave.
> > Jacksonville, FL  32206
> > Cell: (904) 625-0969
> > Office: (904) 357-3069
> > Email: david.bruzos at jaxport.com
> >
> > On Tue, Aug 24, 2021 at 09:26:22PM +0000, Eric Robinson wrote:
> > > EXTERNAL
> > > This message is from an external sender.
> > > Please use caution when opening attachments, clicking links, and
> > responding.
> > > If in doubt, contact the person or the helpdesk by phone.
> > > ________________________________
> > >
> > >
> > > Hi David --
> > >
> > > Here is a link to a Linbit article about using DRBD with VDO. While the focus
> > of this article is VDO, I assume the compression recommendation would
> > apply to other technologies such as ZFS. As the article states, their goal was
> > to compress data before it gets passed off to DRBD, because then DRBD
> > replication is faster and more efficient. This was echoed in some follow-up
> > conversation I had with a Linbit rep (or someone from Red Hat, I forget
> > which).
> > >
> > > https://linbit.com/blog/albireo-virtual-data-optimizer-vdo-on-drbd/
> > >
> > > My use case is multi-tenant MySQL servers. I'll have 125+ separate
> > instances of MySQL running on each cluster node, all out of separate
> > directories and listening on separate ports. The instances will be divided into
> > 4 sets of 50, which live on 4 separate filesystems, on 4 separate DRBD disks.
> > I've used this approach before very successfully with up to 60 MySQL
> > instances, and now I'm dramatically increasing the server power and doubling
> > the number of instances. 4 separate DRBD threads will handle the replication.
> > I'll be using corosync+pacemaker for the HA stack. I'd really like to compress
> > the data and make the most of the available NVME media. The servers do
> > not have RAID controllers. I'll be using ZFS, mdraid, or LVM to create 4
> > separate arrays for my DRBD backing disks.
> > >
> > > --Eric
> > >
> > > > -----Original Message-----
> > > > From: David Bruzos <david.bruzos at jaxport.com>
> > > > Sent: Tuesday, August 24, 2021 2:03 PM
> > > > To: Eric Robinson <eric.robinson at psmnv.com>
> > > > Cc: rabin at isoc.org.il; drbd-user at lists.linbit.com
> > > > Subject: Re: [DRBD-user] DRBD + ZFS
> > > >
> > > > Hello Eric:
> > > >
> > > > > What degree of performance degradation have you observed with
> > DRBD
> > > > over ZFS? Our servers will be using NVME drives with 25Gbit networking:
> > > >
> > > >     Unfortunately, I have not had the time to properly benchmark and
> > > > compare a setup like yours with DRBD on top of ZFS.  Very
> > > > superficial tests show that my I/O is more than sufficient for my
> > > > workload, so I'm then more interested is the data integrity,
> > > > snapshotting, compression, etc.  I would not want to create
> > > > misinformation by sharing I/O stats that are not taking into account
> > > > the many aspects of a proper ZFS benchmark and that are not being
> > compared against an alternative setup.
> > > >     In the days of spinning rust storage, I always used mirrored
> > > > vdevs, always added a fast ZIL, lots of RAM for ARC and a couple of
> > > > caching devices for L2ARC, so the performance was great when
> > compared with the alternatives.
> > > >
> > > > > Since you don't recommend having ZFS above DRBD, what filesystem
> > > > > do
> > > > you use over DRBD?
> > > >
> > > >     I've always had good results with XFS on LVM (very thin).  That
> > > > combination usually gives you good flexibility at the VM level and
> > > > the performance is great.  These days, ext4 is a reasonable choice,
> > > > but I still use XFS most of the time.
> > > >     I would like to see what other folks think about the XFS+LVM
> > > > combination for VMs vs something like ext4+LVM.
> > > >
> > > > > Linbit recommends that compression take place above DRBD rather
> > > > > than
> > > > below. What are your thoughts about their recommendation versus your
> > > > approach?
> > > >
> > > >     If you can provide a link to their recommendation, I can be more
> > specific.
> > > > In any case, I'm sure their recommendation is reasonable depending
> > > > on what your specific workload is.  In my case, I mostly use
> > > > compression at the backing storage level, because it gives me a
> > > > predictable and well understood VM environment where I can run a
> > > > wide variety of guest operating systems, applications, workloads,
> > > > etc, without having to worry about the specifics for each possible VM
> > scenario.
> > > >     The reason I normally don't use ZFS for VMs is because I believe
> > > > it best serves its purpose at the backing storage level for many
> > > > reasons.  ZFS is designed to leverage lots of RAM for ARC, to handle
> > > > the storage directly, to do many things with your hardware that are
> > > > very much abstracted away at the guest level.
> > > >
> > > > What is your specific usage scenario?
> > > >
> > > >
> > > > --
> > > > David Bruzos (Systems Administrator) Jacksonville Port Authority
> > > > 2831 Talleyrand Ave.
> > > > Jacksonville, FL  32206
> > > > Cell: (904) 625-0969
> > > > Office: (904) 357-3069
> > > > Email: david.bruzos at jaxport.com
> > > >
> > > > On Tue, Aug 24, 2021 at 03:21:10PM +0000, Eric Robinson wrote:
> > > > > EXTERNAL
> > > > > This message is from an external sender.
> > > > > Please use caution when opening attachments, clicking links, and
> > > > responding.
> > > > > If in doubt, contact the person or the helpdesk by phone.
> > > > > ________________________________
> > > > >
> > > > >
> > > > > Hi David --
> > > > >
> > > > > Thanks for your feedback! I do have a couple of follow-up
> > > > questions/comments.
> > > > >
> > > > > What degree of performance degradation have you observed with
> > DRBD
> > > > over ZFS? Our servers will be using NVME drives with 25Gbit networking.
> > > > > Since you don't recommend having ZFS above DRBD, what filesystem
> > > > > do
> > > > you use over DRBD?
> > > > > Linbit recommends that compression take place above DRBD rather
> > > > > than
> > > > below. What are your thoughts about their recommendation versus your
> > > > approach?
> > > > >
> > > > > --Eric
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: David Bruzos <david.bruzos at jaxport.com>
> > > > > > Sent: Saturday, August 21, 2021 8:34 AM
> > > > > > To: Eric Robinson <eric.robinson at psmnv.com>
> > > > > > Cc: rabin at isoc.org.il; drbd-user at lists.linbit.com
> > > > > > Subject: Re: [DRBD-user] DRBD + ZFS
> > > > > >
> > > > > > Hello folks,
> > > > > >     I've used DRBD over ZFS for many years and my experience has
> > > > > > been very possitive.  My primary use case has been virtual
> > > > > > machine backing storage for Xen hypervisors, with dom0 running ZFS
> > and DRBD.
> > > > > > The realtime nature of DRBD replication allows for VM
> > > > > > migrations, etc, and ZFS makes remote incremental backups
> > > > > > awesome.  Overall, it is a combination that is hard to beat.
> > > > > >
> > > > > > * Key things to keep in mind:
> > > > > >
> > > > > >     . The performance of DRBD on ZFS is not the best in the
> > > > > > world, but the benefits of a properly configured and used setup
> > > > > > far outweigh the performance costs.
> > > > > >     . If you are not limited buy storage size (typical when
> > > > > > using rotating disks), I would absolutely recommend mirror vdevs
> > > > > > with
> > > > > > ashift=12 for best results in most circumstances.
> > > > > >     . If space is a limiting factor (typical with SSD/NVME), I
> > > > > > use raidz, but careful considerations have to be made, so you
> > > > > > don't end up wasting tuns of space, because of
> > ashift/blocksize/striping issues.
> > > > > >     . Compression works great under the DRBD devices, but
> > > > > > volblocksize/ashift details are extremely important to get the
> > > > > > most out of
> > > > it.
> > > > > >     . I would not create additional ZFS file systems on top of
> > > > > > the DRBD devices for compression or any other intensive feature,
> > > > > > just not worth it, you want that as close to the physical storage as
> > possible.
> > > > > >
> > > > > >     I do run a few ZFS file systems on virtual machines that are
> > > > > > backed by DRBD devices on top of ZFS, but I am after other ZFS
> > > > > > features in those cases.  The VMs running ZFS have
> > > > > > compression=off, no vdev redundancy, optimized volblocksize for
> > > > > > the situation/workload in question, etc.  My typical goto
> > > > > > filesystem for VMs is XFS, because it is lean-and-mean and has
> > > > > > the kind of features that
> > > > everyone should want in a general purpose FS.
> > > > > >
> > > > > > If you have specific questions, let me know.
> > > > > >
> > > > > > David
> > > > > >
> > > > > > --
> > > > > > David Bruzos (Systems Administrator) Jacksonville Port Authority
> > > > > > 2831 Talleyrand Ave.
> > > > > > Jacksonville, FL  32206
> > > > > > Cell: (904) 625-0969
> > > > > > Office: (904) 357-3069
> > > > > > Email: david.bruzos at jaxport.com
> > > > > >
> > > > > > On Fri, Aug 20, 2021 at 11:32:31AM +0000, Eric Robinson wrote:
> > > > > > > EXTERNAL
> > > > > > > This message is from an external sender.
> > > > > > > Please use caution when opening attachments, clicking links,
> > > > > > > and
> > > > > > responding.
> > > > > > > If in doubt, contact the person or the helpdesk by phone.
> > > > > > > ________________________________
> > > > > > >
> > > > > > > My main motivation is the desire for a compressed filesystem.
> > > > > > > I have
> > > > > > experimented with using VDO for that purpose and it works, but
> > > > > > the setup is complex and I don’t know if I trust it to work well
> > > > > > when VDO is in a stack of Pacemaker cluster resources. If there
> > > > > > a better way of getting compression to work above DRBD?
> > > > > > >
> > > > > > > -Eric
> > > > > > >
> > > > > > >
> > > > > > > From: rabin at isoc.org.il <rabin at isoc.org.il>
> > > > > > > Sent: Thursday, August 19, 2021 4:43 PM
> > > > > > > To: Eric Robinson <eric.robinson at psmnv.com>
> > > > > > > Cc: drbd-user at lists.linbit.com
> > > > > > > Subject: Re: [DRBD-user] DRBD + ZFS
> > > > > > >
> > > > > > > Not sure ZFS is the right choice as an underline for a
> > > > > > > resource, it is powerful but also complex (as a code base),
> > > > > > > which will probably will make it
> > > > > > slow.
> > > > > > >
> > > > > > > unless you are going to expose the ZVOL or the dataset
> > > > > > > directly to be consumed, stacking ZFS over DRBD over ZFS,
> > > > > > > seems to me as a bad
> > > > idea.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Rabin
> > > > > > >
> > > > > > >
> > > > > > > On Wed, 18 Aug 2021 at 09:37, Eric Robinson
> > > > > > <eric.robinson at psmnv.com<mailto:eric.robinson at psmnv.com>>
> > wrote:
> > > > > > > I’m considering deploying DRBD between ZFS layers. The lowest
> > > > > > > layer
> > > > > > RAIDZ will serve as the DRBD backing device. Then I would build
> > > > > > another ZFS filesystem on top to benefit from compression. Any
> > > > > > thoughs, experiences, opinions, positive or negative?
> > > > > > >
> > > > > > > --Eric
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > > confidential and
> > > > > > intended solely for intended recipients. If you are not the
> > > > > > named addressee you should not disseminate, distribute, copy or
> > > > > > alter this email. Any views or opinions presented in this email
> > > > > > are solely those of the author and might not represent those of
> > > > > > Physician Select Management. Warning: Although Physician Select
> > > > > > Management
> > > > has
> > > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > > this email, the company cannot accept responsibility for any
> > > > > > loss or
> > > > damage arising from the use of this email or attachments.
> > > > > > > _______________________________________________
> > > > > > > Star us on GITHUB: https://github.com/LINBIT drbd-user mailing
> > > > > > > list
> > > > > > > drbd-user at lists.linbit.com<mailto:drbd-user at lists.linbit.com>
> > > > > > > https://lists.linbit.com/mailman/listinfo/drbd-user
> > > > > > > Disclaimer : This email and any files transmitted with it are
> > > > > > > confidential and
> > > > > > intended solely for intended recipients. If you are not the
> > > > > > named addressee you should not disseminate, distribute, copy or
> > > > > > alter this email. Any views or opinions presented in this email
> > > > > > are solely those of the author and might not represent those of
> > > > > > Physician Select Management. Warning: Although Physician Select
> > > > > > Management
> > > > has
> > > > > > taken reasonable precautions to ensure no viruses are present in
> > > > > > this email, the company cannot accept responsibility for any
> > > > > > loss or
> > > > damage arising from the use of this email or attachments.
> > > > > >
> > > > > > > _______________________________________________
> > > > > > > Star us on GITHUB: https://github.com/LINBIT drbd-user mailing
> > > > > > > list drbd-user at lists.linbit.com
> > > > > > > https://lists.linbit.com/mailman/listinfo/drbd-user
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > >
> > __________________________________________________________
> > > > > > ______________________________________
> > > > > >
> > > > > > Please note that under Florida's public records law (F.S.
> > > > > > 668.6076), most written communications to or from the
> > > > > > Jacksonville Port Authority are public records, available to the
> > > > > > public and media upon request. Your email communications may
> > > > > > therefore be subject to public disclosure. If you have received
> > > > > > this email in error, please notify the sender by return email
> > > > > > and delete immediately without
> > > > forwarding to others.
> > > > > Disclaimer : This email and any files transmitted with it are
> > > > > confidential and
> > > > intended solely for intended recipients. If you are not the named
> > > > addressee you should not disseminate, distribute, copy or alter this
> > > > email. Any views or opinions presented in this email are solely
> > > > those of the author and might not represent those of Physician
> > > > Select Management. Warning: Although Physician Select Management
> > has
> > > > taken reasonable precautions to ensure no viruses are present in
> > > > this email, the company cannot accept responsibility for any loss or
> > damage arising from the use of this email or attachments.
> > > >
> > > > --
> > > >
> > __________________________________________________________
> > > > ______________________________________
> > > >
> > > > Please note that under Florida's public records law (F.S. 668.6076),
> > > > most written communications to or from the Jacksonville Port
> > > > Authority are public records, available to the public and media upon
> > > > request. Your email communications may therefore be subject to
> > > > public disclosure. If you have received this email in error, please
> > > > notify the sender by return email and delete immediately without
> > forwarding to others.
> > > Disclaimer : This email and any files transmitted with it are confidential and
> > intended solely for intended recipients. If you are not the named addressee
> > you should not disseminate, distribute, copy or alter this email. Any views or
> > opinions presented in this email are solely those of the author and might not
> > represent those of Physician Select Management. Warning: Although
> > Physician Select Management has taken reasonable precautions to ensure
> > no viruses are present in this email, the company cannot accept responsibility
> > for any loss or damage arising from the use of this email or attachments.
> >
> > --
> > __________________________________________________________
> > ______________________________________
> >
> > Please note that under Florida's public records law (F.S. 668.6076), most
> > written communications to or from the Jacksonville Port Authority are public
> > records, available to the public and media upon request. Your email
> > communications may therefore be subject to public disclosure. If you have
> > received this email in error, please notify the sender by return email and
> > delete immediately without forwarding to others.
> Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

-- 
________________________________________________________________________________________________

Please note that under Florida's public records law (F.S. 668.6076), most 
written communications 
to or from the Jacksonville Port Authority are 
public records, available to the public and media 
upon request. Your email 
communications may therefore be subject to public disclosure. If you have 
received this email in error, please notify the sender by return email and 
delete immediately 
without forwarding to others.


More information about the drbd-user mailing list