2016-05-20 17:29 GMT+08:00 Lars Ellenberg <lars.ellenberg at linbit.com>:
>  * no "stable pages"
>    If the content of the buffer that was originally submitted changes
>    after that has been submitted, but before we notify completion, the
>    local IO subsystem and the network may DMA (or otherwise use)
>    different generations of that buffer to their respective destinations.
>    Assumption would be that if someone "changes" such a buffer while it
>    is being written, that someone would later re-submit the changes, and
>    eventually the affected block would become identical again,
>    because there would be no further change during the last write.
>    In that case this would only be a temporary state.
>    But sometimes that re-submit does not really happen.
>    Which may even be "legal", for example if that was for some
>    already-deleted temp file, file system optimizations could
>    decide that there is no reason to do the actual write-out,
>    unless there is memory pressure. Or some application manages
>    its own data buffers, and uses direct-IO to submit them,
>    then does further changes, without explicitly re-submitting,
>    and without waiting for the write to be complete.
>    Or someone else has mmap'ed the same stuff,
>    and keeps modifying it, without ever explicitly syncing.
>    BTW, behaviour changes with kernel versions.
>    In which case this can become a permanent state -- until the next
>    stable write to the same block.
> Differing data caused by lack of stable pages is usually harmless,
> because it typically affects areas which are "unreachable".
> But I know of no easy-to-automate way to decide which of these caused
> the data to be different just by looking at it during online-verify.
> So there.
> What next?

    thanks a lot for so detailed explanation. the information should
put into document or at least some blog.

   the "no stable page" part is something maybe I am suffering, since
the under layer of drbd is software raid1. I know software raid1 may
have inconsistent sectors between disks. previously I only know that
swap partition or file will cause the software raid1 inconsistent. but
it seems there are other situations.

   I now try to force repair the software raid1, to see if I can make
drbd verify more happy. if this is the reason, then next time I should
use software raid5 or raid6 for drbd to make sure data-integrity.

  thanks again for your kindly help!!


