Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I also use a very similar setup here. We don't have any dataloss at all because we run IET against a block device and use protocol C. If the failover happens in less than 60 seconds, where everything is up and running on the other node, then you will be fine. However, for some clients you will have to adjust the SCSI timeout values but generally Windows 2003 and up and any recent Linux system will be just fine. Also, VMWare ESX 3.5 and up doesn't have any problem either. I have done extensive testing with all sorts of failure possibilities and everything passed without issue. Copies completed after a short pause and the copies compared perfectly at the bit level. There are some corner cases that are extremely hard to simulate that I believe work but I have been unable to perfectly verify in regards to ESX systems. These cases relate to the SCSI RESERVE/RELEASE commands used by VMWare. If an ESX server issues a RESERVE command and then the iSCSI system does a failover then the when the failover completes the RESERVE won't be present on the other node when iSCSI service is restored. So, if another ESX system in a cluster issues a SCSI command to a critical area that the other ESX server was trying to protect then data corruption could occur. However, from my testing what appears to happen is that ESX systems see the iSCSI session as being gone and does a new login (on the other node when service is restored). This is in effect just like a SCSI LUN or BUS reset and since ESX uses SCSI-2 based RESERVE/RELEASE then it assumes those reservations are reset and then issues new RESERVE commands. The only problem is that I haven't been able to extactly capture this pattern on the wire with wireshark in my testing. I have come close but not completely as it is hard to cause a failure at that exact moment between a RESERVE/RELEASE pair. Perhaps others can share their insights on this aspect of such a cluster. -Morey -----Original Message----- From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Mark Watts Sent: Wednesday, September 08, 2010 9:03 AM To: Jiri Vitek; drbd-user at lists.linbit.com Subject: Re: [DRBD-user] DRBD and iSCSI -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/08/2010 03:57 PM, Jiri Vitek wrote: > you need properly configured heartbeat with ietd which will provide > failover to ietd and ip on which ietd listen. With this setup > initiator will detect connection error and wait with data for failover > reconnection. > > I'm using this setup in production for 1 year, and works perfectly. > How quickly will clients retry and reconnect, or is that configurable? Would this give much/any data loss? Mark. - -- Mark Watts BSc RHCE MBCS Senior Systems Engineer, IPR Secure Managed Hosting www.QinetiQ.com QinetiQ - Delivering customer-focused solutions GPG Key: http://www.linux-corner.info/mwatts.gpg -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAkyHpZ8ACgkQBn4EFUVUIO1tUwCfUqCzgR1wqK4nq29Z0uxh4GAs Kp0AnR+MrKGVMI0Kz0mpZsHFTcMtIlxy =MwQD -----END PGP SIGNATURE----- _______________________________________________ drbd-user mailing list drbd-user at lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user