[DRBD-user] DRBD and iSCSI

Wed Sep 8 17:51:05 CEST 2010

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/08/2010 04:22 PM, Roof, Morey R. wrote:
> Hello,
> 
> 
> I also use a very similar setup here.  We don't have any dataloss at all
> because we run IET against a block device and use protocol C.  If the
> failover happens in less than 60 seconds, where everything is up and
> running on the other node, then you will be fine.  However, for some
> clients you will have to adjust the SCSI timeout values but generally
> Windows 2003 and up and any recent Linux system will be just fine.
> Also, VMWare ESX 3.5 and up doesn't have any problem either.
> 
> I have done extensive testing with all sorts of failure possibilities
> and everything passed without issue.  Copies completed after a short
> pause and the copies compared perfectly at the bit level.  
> 
> There are some corner cases that are extremely hard to simulate that I
> believe work but I have been unable to perfectly verify in regards to
> ESX systems.  These cases relate to the SCSI RESERVE/RELEASE commands
> used by VMWare.  If an ESX server issues a RESERVE command and then the
> iSCSI system does a failover then the when the failover completes the
> RESERVE won't be present on the other node when iSCSI service is
> restored.  So, if another ESX system in a cluster issues a SCSI command
> to a critical area that the other ESX server was trying to protect then
> data corruption could occur.
> 
> However, from my testing what appears to happen is that ESX systems see
> the iSCSI session as being gone and does a new login (on the other node
> when service is restored).  This is in effect just like a SCSI LUN or
> BUS reset and since ESX uses SCSI-2 based RESERVE/RELEASE then it
> assumes those reservations are reset and then issues new RESERVE
> commands.  The only problem is that I haven't been able to extactly
> capture this pattern on the wire with wireshark in my testing.  I have
> come close but not completely as it is hard to cause a failure at that
> exact moment between a RESERVE/RELEASE pair. 
> 
> Perhaps others can share their insights on this aspect of such a
> cluster.
> 
> -Morey
> 
> -----Original Message-----
> From: drbd-user-bounces at lists.linbit.com
> [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Mark Watts
> Sent: Wednesday, September 08, 2010 9:03 AM
> To: Jiri Vitek; drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] DRBD and iSCSI
> 
> On 09/08/2010 03:57 PM, Jiri Vitek wrote:
>> you need properly configured heartbeat with ietd which will provide 
>> failover to ietd and ip on which ietd listen. With this setup 
>> initiator will detect connection error and wait with data for failover
> 
>> reconnection.
> 
>> I'm using this setup in production for 1 year, and works perfectly.
> 
> 
> How quickly will clients retry and reconnect, or is that configurable?
> Would this give much/any data loss?
> 
> Mark.
> 

Excellent. Thank you for this detailed reply - its giving me far more
confidence than I've had previously :)

Mark.

- -- 
Mark Watts BSc RHCE MBCS
Senior Systems Engineer, IPR Secure Managed Hosting
www.QinetiQ.com
QinetiQ - Delivering customer-focused solutions
GPG Key: http://www.linux-corner.info/mwatts.gpg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAkyHsOkACgkQBn4EFUVUIO0cTQCg2OaTAsn8xlHBWzLr4mQZL49E
X3AAoMhKVxEuW92tGVzjda2r8cRL7U0H
=n0H7
-----END PGP SIGNATURE-----