[DRBD-user] Concurrent local writes questions

Tue Sep 2 01:35:27 CEST 2014

Hello everyone,

I have been researching "Concurrent local writes" and I read over the excellent exchanges in the April`09 archives.
http://lists.linbit.com/pipermail/drbd-user/2009-April/thread.html#11873

The message Morey Roof posted about not seeing the issue, and it got me thinking... I have only been seeing this problem in our test lab where the hardware between the two nodes is not equivalent.

Our test lab SAN is built primarily on older equipment we have lying around, so we have an older FC array backing the secondary node which is an older 2950 dell poweredge and the primary is an r710 dell with direct enterprise sata disks.

Our production SAS drive r720xd based SAN has no issues.  We are using blockio on both.  But it got me to wondering...

All the Concurrent local write messages were for the same sector and size...
  - When the first message is 'in flight', has the write acknowledge been sent up the stack yet?
  - Is it possible that ESXi's shared filesystem just retrying the same write (in which case it doesn't matter if the last write is dropped)?
  - Could this all just be a timing issue where the I/O is assumed by ESXi to be taking too long and being reissued (as if the packet was dropped on the network)?

When I get back to the lab, I will certainly try switching to fileio and doing some wireshark captures.

Thanks everyone.
With kind regards. -Peter

==================== 
Peter Brunnengräber
Sr. Engineer
BCC Global, Inc.
  pbrunnen at bccglobal.com