[DRBD-user] DRBD and iSCSI

Roof, Morey R. MRoof at admin.nmt.edu
Wed Sep 8 17:22:15 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I also use a very similar setup here.  We don't have any dataloss at all
because we run IET against a block device and use protocol C.  If the
failover happens in less than 60 seconds, where everything is up and
running on the other node, then you will be fine.  However, for some
clients you will have to adjust the SCSI timeout values but generally
Windows 2003 and up and any recent Linux system will be just fine.
Also, VMWare ESX 3.5 and up doesn't have any problem either.

I have done extensive testing with all sorts of failure possibilities
and everything passed without issue.  Copies completed after a short
pause and the copies compared perfectly at the bit level.  

There are some corner cases that are extremely hard to simulate that I
believe work but I have been unable to perfectly verify in regards to
ESX systems.  These cases relate to the SCSI RESERVE/RELEASE commands
used by VMWare.  If an ESX server issues a RESERVE command and then the
iSCSI system does a failover then the when the failover completes the
RESERVE won't be present on the other node when iSCSI service is
restored.  So, if another ESX system in a cluster issues a SCSI command
to a critical area that the other ESX server was trying to protect then
data corruption could occur.

However, from my testing what appears to happen is that ESX systems see
the iSCSI session as being gone and does a new login (on the other node
when service is restored).  This is in effect just like a SCSI LUN or
BUS reset and since ESX uses SCSI-2 based RESERVE/RELEASE then it
assumes those reservations are reset and then issues new RESERVE
commands.  The only problem is that I haven't been able to extactly
capture this pattern on the wire with wireshark in my testing.  I have
come close but not completely as it is hard to cause a failure at that
exact moment between a RESERVE/RELEASE pair. 

Perhaps others can share their insights on this aspect of such a


-----Original Message-----
From: drbd-user-bounces at lists.linbit.com
[mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Mark Watts
Sent: Wednesday, September 08, 2010 9:03 AM
To: Jiri Vitek; drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] DRBD and iSCSI

Hash: SHA1

On 09/08/2010 03:57 PM, Jiri Vitek wrote:
> you need properly configured heartbeat with ietd which will provide 
> failover to ietd and ip on which ietd listen. With this setup 
> initiator will detect connection error and wait with data for failover

> reconnection.
> I'm using this setup in production for 1 year, and works perfectly.

How quickly will clients retry and reconnect, or is that configurable?
Would this give much/any data loss?


- --
Mark Watts BSc RHCE MBCS
Senior Systems Engineer, IPR Secure Managed Hosting www.QinetiQ.com
QinetiQ - Delivering customer-focused solutions GPG Key:
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

drbd-user mailing list
drbd-user at lists.linbit.com

More information about the drbd-user mailing list