[DRBD-user] Repeated out-of-sync blocks
Jeffrey Froman
drbd.tcijf at olympus.net
Tue Dec 9 00:37:10 CET 2008
Hello,
We run an online-verify from cron nightly, and certain blocks are
frequently turning up out-of-sync. Notice how the same blocks show up
repeatedly:
Nov 9 20:27:34: Out of sync: start=182520104, size=8 (sectors)
Nov 9 21:52:30: Out of sync: start=182520104, size=8 (sectors)
Nov 10 21:28:25: Out of sync: start=109249744, size=8 (sectors)
Nov 10 21:46:23: Out of sync: start=182520104, size=8 (sectors)
Nov 11 20:28:48: Out of sync: start=182520104, size=8 (sectors)
Nov 12 20:11:09: Out of sync: start=109249744, size=8 (sectors)
Nov 12 20:28:47: Out of sync: start=182520104, size=8 (sectors)
Nov 13 20:28:54: Out of sync: start=182520104, size=8 (sectors)
Nov 14 20:10:24: Out of sync: start=109249744, size=8 (sectors)
Nov 14 20:28:03: Out of sync: start=182520104, size=8 (sectors)
Nov 15 20:26:27: Out of sync: start=182520104, size=8 (sectors)
Nov 16 20:10:23: Out of sync: start=109249744, size=8 (sectors)
Nov 16 20:27:56: Out of sync: start=182520104, size=8 (sectors)
Nov 17 20:11:05: Out of sync: start=109249744, size=8 (sectors)
Nov 17 20:29:20: Out of sync: start=182520104, size=8 (sectors)
Nov 18 20:28:36: Out of sync: start=182520104, size=8 (sectors)
Nov 24 20:35:25: Out of sync: start=182520104, size=8 (sectors)
Dec 2 20:04:52: Out of sync: start=109249744, size=8 (sectors)
Dec 2 20:17:30: Out of sync: start=182520104, size=8 (sectors)
Dec 3 20:04:35: Out of sync: start=109249744, size=8 (sectors)
Dec 6 20:03:47: Out of sync: start=109249744, size=8 (sectors)
Dec 6 20:16:15: Out of sync: start=182520104, size=8 (sectors)
Dec 7 20:03:36: Out of sync: start=109249744, size=8 (sectors)
Dec 7 20:16:00: Out of sync: start=182520104, size=8 (sectors)
What does it mean that the same blocks are marked out-of-sync
regularly? Each night this happens, we manually repair by running:
drbdadm disconnect <resource>
drbdadm connect <resource>
And when we have run another online verify immediately following the
repair, the nodes are in sync again; so it seems the repair is
working ... at least temporarily.
(Both the online-verify and above drbdadm commands are run from the
Secondary node, though it's my understanding that this doesn't
matter.)
I've read some threads on this list regarding the possibility of race
conditions during online-verification, but this seems unlikely in my
case since the out-of-sync blocks are so regularly the same blocks.
We are using DRBD Protocol C on LVM on raid1, hosting an ext3
filesystem. Any ideas as to the reasons for this behavior are
appreciated.
Thank you,
Jeffrey
More information about the drbd-user
mailing list