Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wednesday 08 September 2010 17:51:05 Mark Watts wrote: > On 09/08/2010 04:22 PM, Roof, Morey R. wrote: > > Hello, > > > > > > I also use a very similar setup here. We don't have any dataloss at all > > because we run IET against a block device and use protocol C. If the > > failover happens in less than 60 seconds, where everything is up and > > running on the other node, then you will be fine. However, for some > > clients you will have to adjust the SCSI timeout values but generally > > Windows 2003 and up and any recent Linux system will be just fine. > > Also, VMWare ESX 3.5 and up doesn't have any problem either. > > > > I have done extensive testing with all sorts of failure possibilities > > and everything passed without issue. Copies completed after a short > > pause and the copies compared perfectly at the bit level. > > > > There are some corner cases that are extremely hard to simulate that I > > believe work but I have been unable to perfectly verify in regards to > > ESX systems. These cases relate to the SCSI RESERVE/RELEASE commands > > used by VMWare. If an ESX server issues a RESERVE command and then the > > iSCSI system does a failover then the when the failover completes the > > RESERVE won't be present on the other node when iSCSI service is > > restored. So, if another ESX system in a cluster issues a SCSI command > > to a critical area that the other ESX server was trying to protect then > > data corruption could occur. > > > > However, from my testing what appears to happen is that ESX systems see > > the iSCSI session as being gone and does a new login (on the other node > > when service is restored). This is in effect just like a SCSI LUN or > > BUS reset and since ESX uses SCSI-2 based RESERVE/RELEASE then it > > assumes those reservations are reset and then issues new RESERVE > > commands. The only problem is that I haven't been able to extactly > > capture this pattern on the wire with wireshark in my testing. I have > > come close but not completely as it is hard to cause a failure at that > > exact moment between a RESERVE/RELEASE pair. > > > > Perhaps others can share their insights on this aspect of such a > > cluster. > > > > -Morey > > > > -----Original Message----- > > From: drbd-user-bounces at lists.linbit.com > > [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Mark Watts > > Sent: Wednesday, September 08, 2010 9:03 AM > > To: Jiri Vitek; drbd-user at lists.linbit.com > > Subject: Re: [DRBD-user] DRBD and iSCSI > > > > On 09/08/2010 03:57 PM, Jiri Vitek wrote: > >> you need properly configured heartbeat with ietd which will provide > >> failover to ietd and ip on which ietd listen. With this setup > >> initiator will detect connection error and wait with data for failover > >> > >> reconnection. > >> > >> I'm using this setup in production for 1 year, and works perfectly. > > > > How quickly will clients retry and reconnect, or is that configurable? > > Would this give much/any data loss? > > > > Mark. > > Excellent. Thank you for this detailed reply - its giving me far more > confidence than I've had previously :) > > Mark. > Maybe I might throw in a warning: I use a active/passive iSCSI server based on Heartbeat and DRBD as well and it serves both blockdevices for Xen guests as storage for a Novell NSS file system. After deliberately failing over (so minimal disconnection time) the volume survived all the time, except for one: it was corrupted. Novell technical services pointed out this could happen to any filesystem since it is the equivalent of pulling the harddrive cable. Rgds, B.