Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, Sep 17, 2010 at 11:22:53AM +0200, Fabrice Charlier wrote: > Hi, > > Three months ago we deployed a "web cluster" for LAMP hosting. We > based this solution on drdb in active/active mode combined with > ocfs2. This solution matches correctly our needs but several times ( > ~ once a month) a problem appears: we have enable the > "data-integrity-alg" option to (try to) avoid silent corruption of > data and several times, this feature detected that data have been > altered during transit between the two nodes. As we have > active/active nodes and as we use automatic split brain recovery > policies proposed in official documentation, the two members of the > mirror are disconnected and we have to resync it manually to > continue normal operation. We have already disabled all TCP > offloading capabilities of all NICs without success. > > Is it possible to ask drdb to retry to send the block until success > in this kind of situation? No. It is likely one of those cases where in-flight buffers are changed. Problem is known since a long time, but recently has drawn some more attention again, http://lwn.net/Articles/399148/ http://www.spinics.net/lists/linux-scsi/msg44074.html Maybe disable the drbd level checksum, and trust the TCP checksum. > If not, are you planning to implement this feature? Maybe. But it does not have particular priority. Maybe we rather wait for the VM and VFS layer to fix it for us. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed