[DRBD-user] shared "local" storage for failover and load balancing

Sun Oct 11 23:30:53 CEST 2009

Hi,

I have a fairly simple goal for which DRBD seems a perfect choice. Only 
thing is, I will need an additional component to get where I want to be. 
After looking up and down the Internet I am now at the point where there 
are too many options of which I cannot envision the impact/complexity so 
I am hoping for some advice from people who have been there.

I have two application servers that need to do failover and load 
balancing. On both servers I need read/write access to the same local 
files so processing on a certain file can restart on the other server in 
case of failover. Additionally, I would like file-locking (flock and/or 
lockf) to work on the shared files.

I think it boils down to these choices:

    * dual-primary setup of DRBD with some clustered file system (GFS
      maybe),
    * primary-secondary setup of DRBD on my database servers with NFS,
    * primary-secondary setup of DRBD with Lustre,
    * use MySQL Cluster instead of DRBD.

The first option is discouraged for production purposes on drbd.org (I 
forgot where, but it is mentioned).

Using NFS with DRBD will probably only work if I run it on the database 
servers (we two of those as well) because running an NFS client on the 
same machine as the server is discouraged (see 
http://www.linux-ha.org/HaNFS). I don't really like having the files not 
local even on the master application server. Since I have no actual 
experience with this kind of set up I am worried that failover will not 
always be transparent to client processes.

Using Lustre with DRBD seems a good fit for my situation, but I wonder 
if it's overkill for our situation and if it's going to be complex to 
deploy.

Since we are going to use MySQL Cluster for storing meta information for 
the tasks we need to process, we might store our files in the database 
as well. As with NFS this has the drawback that I will be storing files 
on a different node. The files we process for a task can grow either 
large (in excess of 100MB) or plentiful (more than 100.000 files) on a 
regular basis, so I am not thrilled of storing all of that in a database 
when a local file is what I need.

For sake of simplicity I am leaning toward NFS with DRBD, but Lustre 
seems to fit my wishes better. Or should I simply go for primary-primary?
Any alternative suggestions are also welcome. I looked into all kinds of 
stuff but that does not seem to offer any better options:

    * clustered LVM is intended for something entirely different I think
    * pNFS is not usable and maybe also not intended for my purpose
    * other clustered file systems (GlusterFS, GPFS,...?) would probably
      still need something like dual-primary DRBD

Any advice on the direction to take would be great.

This is our architecture:

    * 2 application servers and 2 database servers
    * all servers run Debian5
    * the application is written in Java, based on the Spring framework
      and runs in Tomcat
    * we use ActiveMQ for JMS messaging and this should also help us
      achieve failover and load balancing
    * MySQL Cluster is used for application persistence and will be set
      up with two data nodes on our database servers, and two management
      nodes together with two MySQL front-ends on the application servers.

File locking on the shared storage would allow ActiveMQ to have 
automatic fail-back. 
(http://activemq.apache.org/shared-file-system-master-slave.html)

Below is some more background info on what we are doing.

We are developing an application that will process potentially large 
files (PDF and PostScript files that can contain upward of 100.000 
pages, which can amount to hundreds of MBs per file) or more than 
100.000 separate files (some 20KB each). The intention is to achieve 
failover and load balancing using 2 application servers and 2 database 
servers.

For the database technology we have chosen MySQL Cluster which seems to 
be able to achieve load balancing and failover out of the box if we use 
our two database servers as data nodes and let the application servers 
host MySQL front-ends and management nodes. The database will be used to 
store meta information about the tasks we are processing. Because of the 
potential size of our content files I don't want to store these in the 
database.

For the application we intent to rely on failover and load balancing 
capabilities of ActveMQ once we are processing a task. Submission of 
tasks will mainly occur by polling hotfolders. I don't think we will 
have much trouble implementing the hotfolder in a master/slave setup and 
let heartbeat (or some clever JMS messaging) take care of failover. 
Processing a content file is triggered by a JMS message, so as long as 
ActiveMQ does its job I don't really need to worry about both servers 
processing the same file.

So that leaves me with one issue: we need to store my content files in a 
way that both application servers have read/write access to them. When 
an application server fails, we depend on ActiveMQ to resend a JMS 
message on the survivor which will then have to restart processing. So 
the only thing needed is that the same files are available.

If you read up to here, thanks for your patience ;)

Thanks,
Manuel.