[DRBD-user] Repeated out-of-sync blocks

Tue Dec 9 00:37:10 CET 2008

Hello,

We run an online-verify from cron nightly, and certain blocks are 
frequently turning up out-of-sync. Notice how the same blocks show up 
repeatedly:

Nov  9 20:27:34: Out of sync: start=182520104, size=8 (sectors)
Nov  9 21:52:30: Out of sync: start=182520104, size=8 (sectors)
Nov 10 21:28:25: Out of sync: start=109249744, size=8 (sectors)
Nov 10 21:46:23: Out of sync: start=182520104, size=8 (sectors)
Nov 11 20:28:48: Out of sync: start=182520104, size=8 (sectors)
Nov 12 20:11:09: Out of sync: start=109249744, size=8 (sectors)
Nov 12 20:28:47: Out of sync: start=182520104, size=8 (sectors)
Nov 13 20:28:54: Out of sync: start=182520104, size=8 (sectors)
Nov 14 20:10:24: Out of sync: start=109249744, size=8 (sectors)
Nov 14 20:28:03: Out of sync: start=182520104, size=8 (sectors)
Nov 15 20:26:27: Out of sync: start=182520104, size=8 (sectors)
Nov 16 20:10:23: Out of sync: start=109249744, size=8 (sectors)
Nov 16 20:27:56: Out of sync: start=182520104, size=8 (sectors)
Nov 17 20:11:05: Out of sync: start=109249744, size=8 (sectors)
Nov 17 20:29:20: Out of sync: start=182520104, size=8 (sectors)
Nov 18 20:28:36: Out of sync: start=182520104, size=8 (sectors)
Nov 24 20:35:25: Out of sync: start=182520104, size=8 (sectors)
Dec  2 20:04:52: Out of sync: start=109249744, size=8 (sectors)
Dec  2 20:17:30: Out of sync: start=182520104, size=8 (sectors)
Dec  3 20:04:35: Out of sync: start=109249744, size=8 (sectors)
Dec  6 20:03:47: Out of sync: start=109249744, size=8 (sectors)
Dec  6 20:16:15: Out of sync: start=182520104, size=8 (sectors)
Dec  7 20:03:36: Out of sync: start=109249744, size=8 (sectors)
Dec  7 20:16:00: Out of sync: start=182520104, size=8 (sectors)

What does it mean that the same blocks are marked out-of-sync 
regularly? Each night this happens, we manually repair by running:

drbdadm disconnect <resource>
drbdadm connect <resource>

And when we have run another online verify immediately following the 
repair, the nodes are in sync again; so it seems the repair is 
working ... at least temporarily.

(Both the online-verify and above drbdadm commands are run from the 
Secondary node, though it's my understanding that this doesn't 
matter.)

I've read some threads on this list regarding the possibility of race 
conditions during online-verification, but this seems unlikely in my 
case since the out-of-sync blocks are so regularly the same blocks.

We are using DRBD Protocol C on LVM on raid1, hosting an ext3 
filesystem. Any ideas as to the reasons for this behavior are 
appreciated.

Thank you,
Jeffrey