[DRBD-user] drbd resource ahead / behind problem

envisionrx ron.wells at envision-rx.com
Tue Mar 6 23:39:48 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hey all, I have another problem with our DR cluster.  I'm using the
on-congestion pull-ahead option with our stacked resources.  It's a pretty
new install, and seems to be working pretty well except for a couple issues. 
This issue is that for some reason the primary and secondary nodes get out
of sync as to what state they are in.  I look at the DR node and it
indicates something like this:

14: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate A r-----
    ns:0 nr:15140 dw:15140 dr:2624 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b
oos:442

while the primary node shows something like this:

14: cs:Ahead ro:Primary/Secondary ds:UpToDate/Inconsistent A r-----
    ns:1190696 nr:0 dw:2371072 dr:408433 al:0 bm:70 lo:0 pe:0 ua:0 ap:0 ep:1
wo:b oos:555382

As you can see the dr node thinks it's up to date and not Behind, but also
thinks it's out of sync but not as out of sync as the pri, while the pri
node knows that it is Ahead.

the logs show an assert: ASSERT FAILED cstate = Connected, expected:
WFSyncUUID|WFBitMapT|Behind

here is some info on the servers (which are identical, except the DR server
has more hard disk space):
The servers are 16 core amd based supermicro servers with 16GB of memory and
a 7 TB raid5 array running off of an Adaptec 6805 controller.

I'm using openfiler 2.99.2 as the basis of the storage servers, although i
don't use the web interface since I have drbd and corosync configured and
the web interface is useless for my case.

drbdadm -V
DRBDADM_BUILDTAG=GIT-hash:\ 0de839cee13a4160eed6037c4bddd066645e23c5\ build\
by\ rmake-chroot at localhost.localdomain\,\ 2011-08-12\ 18:38:56
DRBDADM_API_VERSION=88
DRBD_KERNEL_VERSION_CODE=0x08030b
DRBDADM_VERSION_CODE=0x08030b
DRBDADM_VERSION=8.3.11

uname -a
Linux openfiler2 2.6.32-131.17.1.el6-0.11.smp.gcc4.4.x86_64 #1 SMP Sat Nov
19 14:13:16 WET 2011 x86_64 x86_64 x86_64 GNU/Linux

Log snippet:
local5.info<174>: Mar  6 16:03:04 openfiler3
snapshot-resync-target-lvm.sh[12280]:   Logical volume
"1024data4backing-before-resync" created
kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: helper
command: /sbin/drbdadm before-resync-target minor-14 exit code 0 (0x0)
kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: conn( Behind
-> SyncTarget ) disk( Outdated -> Inconsistent )
kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: Began resync
as SyncTarget (will sync 1104 KB [276 bits set]).
kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: conn(
SyncTarget -> Behind )
kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: Resync done
(total 1 sec; paused 0 sec; 1104 K/sec)
kern.err<3>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: ASSERT( (n_oos
- mdev->rs_failed) == 0 ) in
/tmp/rmake/builds/kernel/linux-2.6.32-131.17.1.el6/drbd-8.3.git/drbd/drbd_worker.c:872
kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: 3 % had equal
checksums, eliminated: 36K; transferred 1068K total 1104K
kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: updated UUIDs
0002000000000000:0000000000000000:0001000000000000:0001000000000000
kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: conn( Behind
-> Connected ) disk( Inconsistent -> UpToDate )
kern.warn<4>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: cs:Connected
rs_left=51 > rs_total=0 (rs_failed 0)
kern.warn<4>: Mar  6 16:03:05 openfiler3 kernel:last message repeated 7
times
kern.info<6>: Mar  6 16:03:05 openfiler3 kernel: block drbd14: bitmap WRITE
of 3 pages took 44 jiffies
kern.info<6>: Mar  6 16:03:05 openfiler3 kernel: block drbd14: 204 KB (51
bits) marked out-of-sync by on disk bit-map.
kern.err<3>: Mar  6 16:03:05 openfiler3 kernel: block drbd14: ASSERT FAILED
cstate = Connected, expected: WFSyncUUID|WFBitMapT|Behind
kern.err<3>: Mar  6 16:03:07 openfiler3 kernel:last message repeated 31
times

-- 
View this message in context: http://old.nabble.com/drbd-resource-ahead---behind-problem-tp33454636p33454636.html
Sent from the DRBD - User mailing list archive at Nabble.com.




More information about the drbd-user mailing list