Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Also should have included that the only way I know to get out of this state is to disconnect the resource then reconnect. at that point it marks all of the disk out of date and basically does a verify of the whole disk. envisionrx wrote: > > Hey all, I have another problem with our DR cluster. I'm using the > on-congestion pull-ahead option with our stacked resources. It's a pretty > new install, and seems to be working pretty well except for a couple > issues. This issue is that for some reason the primary and secondary > nodes get out of sync as to what state they are in. I look at the DR node > and it indicates something like this: > > 14: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate A r----- > ns:0 nr:15140 dw:15140 dr:2624 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b > oos:442 > > while the primary node shows something like this: > > 14: cs:Ahead ro:Primary/Secondary ds:UpToDate/Inconsistent A r----- > ns:1190696 nr:0 dw:2371072 dr:408433 al:0 bm:70 lo:0 pe:0 ua:0 ap:0 > ep:1 wo:b oos:555382 > > As you can see the dr node thinks it's up to date and not Behind, but also > thinks it's out of sync but not as out of sync as the pri, while the pri > node knows that it is Ahead. > > the logs show an assert: ASSERT FAILED cstate = Connected, expected: > WFSyncUUID|WFBitMapT|Behind > > here is some info on the servers (which are identical, except the DR > server has more hard disk space): > The servers are 16 core amd based supermicro servers with 16GB of memory > and a 7 TB raid5 array running off of an Adaptec 6805 controller. > > I'm using openfiler 2.99.2 as the basis of the storage servers, although i > don't use the web interface since I have drbd and corosync configured and > the web interface is useless for my case. > > drbdadm -V > DRBDADM_BUILDTAG=GIT-hash:\ 0de839cee13a4160eed6037c4bddd066645e23c5\ > build\ by\ rmake-chroot at localhost.localdomain\,\ 2011-08-12\ 18:38:56 > DRBDADM_API_VERSION=88 > DRBD_KERNEL_VERSION_CODE=0x08030b > DRBDADM_VERSION_CODE=0x08030b > DRBDADM_VERSION=8.3.11 > > uname -a > Linux openfiler2 2.6.32-131.17.1.el6-0.11.smp.gcc4.4.x86_64 #1 SMP Sat Nov > 19 14:13:16 WET 2011 x86_64 x86_64 x86_64 GNU/Linux > > Log snippet: > local5.info<174>: Mar 6 16:03:04 openfiler3 > snapshot-resync-target-lvm.sh[12280]: Logical volume > "1024data4backing-before-resync" created > kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: helper > command: /sbin/drbdadm before-resync-target minor-14 exit code 0 (0x0) > kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: conn( > Behind -> SyncTarget ) disk( Outdated -> Inconsistent ) > kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: Began > resync as SyncTarget (will sync 1104 KB [276 bits set]). > kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: conn( > SyncTarget -> Behind ) > kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: Resync done > (total 1 sec; paused 0 sec; 1104 K/sec) > kern.err<3>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: ASSERT( > (n_oos - mdev->rs_failed) == 0 ) in > /tmp/rmake/builds/kernel/linux-2.6.32-131.17.1.el6/drbd-8.3.git/drbd/drbd_worker.c:872 > kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: 3 % had > equal checksums, eliminated: 36K; transferred 1068K total 1104K > kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: updated > UUIDs 0002000000000000:0000000000000000:0001000000000000:0001000000000000 > kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: conn( > Behind -> Connected ) disk( Inconsistent -> UpToDate ) > kern.warn<4>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: > cs:Connected rs_left=51 > rs_total=0 (rs_failed 0) > kern.warn<4>: Mar 6 16:03:05 openfiler3 kernel:last message repeated 7 > times > kern.info<6>: Mar 6 16:03:05 openfiler3 kernel: block drbd14: bitmap > WRITE of 3 pages took 44 jiffies > kern.info<6>: Mar 6 16:03:05 openfiler3 kernel: block drbd14: 204 KB (51 > bits) marked out-of-sync by on disk bit-map. > kern.err<3>: Mar 6 16:03:05 openfiler3 kernel: block drbd14: ASSERT > FAILED cstate = Connected, expected: WFSyncUUID|WFBitMapT|Behind > kern.err<3>: Mar 6 16:03:07 openfiler3 kernel:last message repeated 31 > times > > -- View this message in context: http://old.nabble.com/drbd-resource-ahead---behind-problem-tp33454636p33454665.html Sent from the DRBD - User mailing list archive at Nabble.com.