Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/08/2010 04:22 PM, Roof, Morey R. wrote: > Hello, > > > I also use a very similar setup here. We don't have any dataloss at all > because we run IET against a block device and use protocol C. If the > failover happens in less than 60 seconds, where everything is up and > running on the other node, then you will be fine. However, for some > clients you will have to adjust the SCSI timeout values but generally > Windows 2003 and up and any recent Linux system will be just fine. > Also, VMWare ESX 3.5 and up doesn't have any problem either. > > I have done extensive testing with all sorts of failure possibilities > and everything passed without issue. Copies completed after a short > pause and the copies compared perfectly at the bit level. > > There are some corner cases that are extremely hard to simulate that I > believe work but I have been unable to perfectly verify in regards to > ESX systems. These cases relate to the SCSI RESERVE/RELEASE commands > used by VMWare. If an ESX server issues a RESERVE command and then the > iSCSI system does a failover then the when the failover completes the > RESERVE won't be present on the other node when iSCSI service is > restored. So, if another ESX system in a cluster issues a SCSI command > to a critical area that the other ESX server was trying to protect then > data corruption could occur. > > However, from my testing what appears to happen is that ESX systems see > the iSCSI session as being gone and does a new login (on the other node > when service is restored). This is in effect just like a SCSI LUN or > BUS reset and since ESX uses SCSI-2 based RESERVE/RELEASE then it > assumes those reservations are reset and then issues new RESERVE > commands. The only problem is that I haven't been able to extactly > capture this pattern on the wire with wireshark in my testing. I have > come close but not completely as it is hard to cause a failure at that > exact moment between a RESERVE/RELEASE pair. > > Perhaps others can share their insights on this aspect of such a > cluster. > > -Morey > > -----Original Message----- > From: drbd-user-bounces at lists.linbit.com > [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Mark Watts > Sent: Wednesday, September 08, 2010 9:03 AM > To: Jiri Vitek; drbd-user at lists.linbit.com > Subject: Re: [DRBD-user] DRBD and iSCSI > > On 09/08/2010 03:57 PM, Jiri Vitek wrote: >> you need properly configured heartbeat with ietd which will provide >> failover to ietd and ip on which ietd listen. With this setup >> initiator will detect connection error and wait with data for failover > >> reconnection. > >> I'm using this setup in production for 1 year, and works perfectly. > > > How quickly will clients retry and reconnect, or is that configurable? > Would this give much/any data loss? > > Mark. > Excellent. Thank you for this detailed reply - its giving me far more confidence than I've had previously :) Mark. - -- Mark Watts BSc RHCE MBCS Senior Systems Engineer, IPR Secure Managed Hosting www.QinetiQ.com QinetiQ - Delivering customer-focused solutions GPG Key: http://www.linux-corner.info/mwatts.gpg -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAkyHsOkACgkQBn4EFUVUIO0cTQCg2OaTAsn8xlHBWzLr4mQZL49E X3AAoMhKVxEuW92tGVzjda2r8cRL7U0H =n0H7 -----END PGP SIGNATURE-----