[DRBD-user] drbd resource ahead / behind problem

envisionrx ron.wells at envision-rx.com
Tue Mar 6 23:42:16 CET 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Also should have included that the only way I know to get out of this state
is to disconnect the resource then reconnect.  at that point it marks all of
the disk out of date and basically does a verify of the whole disk.


envisionrx wrote:
> 
> Hey all, I have another problem with our DR cluster.  I'm using the
> on-congestion pull-ahead option with our stacked resources.  It's a pretty
> new install, and seems to be working pretty well except for a couple
> issues.  This issue is that for some reason the primary and secondary
> nodes get out of sync as to what state they are in.  I look at the DR node
> and it indicates something like this:
> 
> 14: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate A r-----
>     ns:0 nr:15140 dw:15140 dr:2624 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b
> oos:442
> 
> while the primary node shows something like this:
> 
> 14: cs:Ahead ro:Primary/Secondary ds:UpToDate/Inconsistent A r-----
>     ns:1190696 nr:0 dw:2371072 dr:408433 al:0 bm:70 lo:0 pe:0 ua:0 ap:0
> ep:1 wo:b oos:555382
> 
> As you can see the dr node thinks it's up to date and not Behind, but also
> thinks it's out of sync but not as out of sync as the pri, while the pri
> node knows that it is Ahead.
> 
> the logs show an assert: ASSERT FAILED cstate = Connected, expected:
> WFSyncUUID|WFBitMapT|Behind
> 
> here is some info on the servers (which are identical, except the DR
> server has more hard disk space):
> The servers are 16 core amd based supermicro servers with 16GB of memory
> and a 7 TB raid5 array running off of an Adaptec 6805 controller.
> 
> I'm using openfiler 2.99.2 as the basis of the storage servers, although i
> don't use the web interface since I have drbd and corosync configured and
> the web interface is useless for my case.
> 
> drbdadm -V
> DRBDADM_BUILDTAG=GIT-hash:\ 0de839cee13a4160eed6037c4bddd066645e23c5\
> build\ by\ rmake-chroot at localhost.localdomain\,\ 2011-08-12\ 18:38:56
> DRBDADM_API_VERSION=88
> DRBD_KERNEL_VERSION_CODE=0x08030b
> DRBDADM_VERSION_CODE=0x08030b
> DRBDADM_VERSION=8.3.11
> 
> uname -a
> Linux openfiler2 2.6.32-131.17.1.el6-0.11.smp.gcc4.4.x86_64 #1 SMP Sat Nov
> 19 14:13:16 WET 2011 x86_64 x86_64 x86_64 GNU/Linux
> 
> Log snippet:
> local5.info<174>: Mar  6 16:03:04 openfiler3
> snapshot-resync-target-lvm.sh[12280]:   Logical volume
> "1024data4backing-before-resync" created
> kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: helper
> command: /sbin/drbdadm before-resync-target minor-14 exit code 0 (0x0)
> kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: conn(
> Behind -> SyncTarget ) disk( Outdated -> Inconsistent )
> kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: Began
> resync as SyncTarget (will sync 1104 KB [276 bits set]).
> kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: conn(
> SyncTarget -> Behind )
> kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: Resync done
> (total 1 sec; paused 0 sec; 1104 K/sec)
> kern.err<3>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: ASSERT(
> (n_oos - mdev->rs_failed) == 0 ) in
> /tmp/rmake/builds/kernel/linux-2.6.32-131.17.1.el6/drbd-8.3.git/drbd/drbd_worker.c:872
> kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: 3 % had
> equal checksums, eliminated: 36K; transferred 1068K total 1104K
> kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: updated
> UUIDs 0002000000000000:0000000000000000:0001000000000000:0001000000000000
> kern.info<6>: Mar  6 16:03:04 openfiler3 kernel: block drbd14: conn(
> Behind -> Connected ) disk( Inconsistent -> UpToDate )
> kern.warn<4>: Mar  6 16:03:04 openfiler3 kernel: block drbd14:
> cs:Connected rs_left=51 > rs_total=0 (rs_failed 0)
> kern.warn<4>: Mar  6 16:03:05 openfiler3 kernel:last message repeated 7
> times
> kern.info<6>: Mar  6 16:03:05 openfiler3 kernel: block drbd14: bitmap
> WRITE of 3 pages took 44 jiffies
> kern.info<6>: Mar  6 16:03:05 openfiler3 kernel: block drbd14: 204 KB (51
> bits) marked out-of-sync by on disk bit-map.
> kern.err<3>: Mar  6 16:03:05 openfiler3 kernel: block drbd14: ASSERT
> FAILED cstate = Connected, expected: WFSyncUUID|WFBitMapT|Behind
> kern.err<3>: Mar  6 16:03:07 openfiler3 kernel:last message repeated 31
> times
> 
> 

-- 
View this message in context: http://old.nabble.com/drbd-resource-ahead---behind-problem-tp33454636p33454665.html
Sent from the DRBD - User mailing list archive at Nabble.com.




More information about the drbd-user mailing list