[DRBD-user] Block out of sync right after full resync

Fri Oct 10 09:44:56 CEST 2014

"buffer modified by upper layers during write" means whatever sits on 
top of drbd changes data "in flight".
Please search the list archives, this is a FAQ.
Swap and some file systems do that - usually it's some kind of 
optimization. I suspect VMWare VMs hosted in ext4 do that too.
There's probably nothing wrong, but DRBD can't know. You should do your 
data integrity checking on some higher level (fsck for example)
Lionel.

Le 10/10/2014 01:45, aurelien panizza a écrit :
> Hi all,
>
> I've got a problem on my environnement.
> I set up my primary server (pacemaker + drbd) which ran alone for a 
> while, and then I added the second server (currently only DRBD).
> Both server can see each other and /proc/drbd reports "uptodate/uptodate".
> If I run a verify on that resource (right after the full resync), it 
> reports some blocks out of sync ( generally from 100 to 1500 on my 
> 80GO LVM partition).
> So I disconnect/connect the slave and oos report 0 block.
> I run again a verify and some block are still out of sync. What I've 
> notived is that it seems to be almost always the same blocks which are 
> out of sync.
> I tried to do a full resync multiple times but had the same issue.
> I also tried to replace the physical secondary server by a virtual 
> machine (in order to check if the issue came from the secondary 
> server) but had the same issue.
>
> I then activated "data-integrity-alg crc32c" and got a couple of 
> "Digest mismatch, buffer modified by upper layers during write: 
> 167134312s +4096" in the primary log.
>
> I tried on a different network card but got the same errors.
>
> My full configuration file:
>
>   protocol C;
>   meta-disk internal;
>   device /dev/drbd0;
>   disk /dev/sysvg/drbd;
>
>   handlers {
>          split-brain "/usr/lib/drbd/notify-split-brain.sh xxx at xxx";
>          out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh xxx at xxx";
>          fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>          after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>   }
>
>   net {
>          cram-hmac-alg "sha1";
>          shared-secret "drbd";
>          sndbuf-size 512k;
>          max-buffers 8000;
>          max-epoch-size 8000;
>          verify-alg md5;
>          after-sb-0pri disconnect;
>          after-sb-1pri disconnect;
>          after-sb-2pri disconnect;
>          data-integrity-alg crc32c;
>   }
>
>   disk {
>         al-extents 3389;
>         fencing resource-only;
>   }
>
>   syncer {
>         rate 90M;
>   }
>   on host1 {
>         address 10.110.1.71:7799 <http://10.110.1.71:7799>;
>   }
>   on host2 {
>         address 10.110.1.72:7799 <http://10.110.1.72:7799>;
>   }
> }
>
> My OS : Redhat6 2.6.32-431.20.3.el6.x86_64
> DRBD version : drbd84-8.4.4-1
>
> ethtool -k eth0
> Features for eth0:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: off
> large-receive-offload: off
> ntuple-filters: off
> receive-hashing: off
>
>
> Secondary server is currently not in the HA (pacemaker) but I don't 
> think this the problem.
> I have got another HA on 2 physical host with the exact same 
> configuration and drbd/os version (but not same server model) and 
> everything's OK.
>
> As the primary server is in production, I can't stop the application 
> (Database) to check if the alerts are false positive.
>
> Would you have any advice ?
> Could it be the primary server which have corrupted block or wrong 
> metadata ?
>
> Regards,
>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20141010/8296a23d/attachment.htm>