Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hey all, I have another problem with our DR cluster. I'm using the on-congestion pull-ahead option with our stacked resources. It's a pretty new install, and seems to be working pretty well except for a couple issues. This issue is that for some reason the primary and secondary nodes get out of sync as to what state they are in. I look at the DR node and it indicates something like this: 14: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate A r----- ns:0 nr:15140 dw:15140 dr:2624 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:442 while the primary node shows something like this: 14: cs:Ahead ro:Primary/Secondary ds:UpToDate/Inconsistent A r----- ns:1190696 nr:0 dw:2371072 dr:408433 al:0 bm:70 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:555382 As you can see the dr node thinks it's up to date and not Behind, but also thinks it's out of sync but not as out of sync as the pri, while the pri node knows that it is Ahead. the logs show an assert: ASSERT FAILED cstate = Connected, expected: WFSyncUUID|WFBitMapT|Behind here is some info on the servers (which are identical, except the DR server has more hard disk space): The servers are 16 core amd based supermicro servers with 16GB of memory and a 7 TB raid5 array running off of an Adaptec 6805 controller. I'm using openfiler 2.99.2 as the basis of the storage servers, although i don't use the web interface since I have drbd and corosync configured and the web interface is useless for my case. drbdadm -V DRBDADM_BUILDTAG=GIT-hash:\ 0de839cee13a4160eed6037c4bddd066645e23c5\ build\ by\ rmake-chroot at localhost.localdomain\,\ 2011-08-12\ 18:38:56 DRBDADM_API_VERSION=88 DRBD_KERNEL_VERSION_CODE=0x08030b DRBDADM_VERSION_CODE=0x08030b DRBDADM_VERSION=8.3.11 uname -a Linux openfiler2 2.6.32-131.17.1.el6-0.11.smp.gcc4.4.x86_64 #1 SMP Sat Nov 19 14:13:16 WET 2011 x86_64 x86_64 x86_64 GNU/Linux Log snippet: local5.info<174>: Mar 6 16:03:04 openfiler3 snapshot-resync-target-lvm.sh[12280]: Logical volume "1024data4backing-before-resync" created kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: helper command: /sbin/drbdadm before-resync-target minor-14 exit code 0 (0x0) kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: conn( Behind -> SyncTarget ) disk( Outdated -> Inconsistent ) kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: Began resync as SyncTarget (will sync 1104 KB [276 bits set]). kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: conn( SyncTarget -> Behind ) kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: Resync done (total 1 sec; paused 0 sec; 1104 K/sec) kern.err<3>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: ASSERT( (n_oos - mdev->rs_failed) == 0 ) in /tmp/rmake/builds/kernel/linux-2.6.32-131.17.1.el6/drbd-8.3.git/drbd/drbd_worker.c:872 kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: 3 % had equal checksums, eliminated: 36K; transferred 1068K total 1104K kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: updated UUIDs 0002000000000000:0000000000000000:0001000000000000:0001000000000000 kern.info<6>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: conn( Behind -> Connected ) disk( Inconsistent -> UpToDate ) kern.warn<4>: Mar 6 16:03:04 openfiler3 kernel: block drbd14: cs:Connected rs_left=51 > rs_total=0 (rs_failed 0) kern.warn<4>: Mar 6 16:03:05 openfiler3 kernel:last message repeated 7 times kern.info<6>: Mar 6 16:03:05 openfiler3 kernel: block drbd14: bitmap WRITE of 3 pages took 44 jiffies kern.info<6>: Mar 6 16:03:05 openfiler3 kernel: block drbd14: 204 KB (51 bits) marked out-of-sync by on disk bit-map. kern.err<3>: Mar 6 16:03:05 openfiler3 kernel: block drbd14: ASSERT FAILED cstate = Connected, expected: WFSyncUUID|WFBitMapT|Behind kern.err<3>: Mar 6 16:03:07 openfiler3 kernel:last message repeated 31 times -- View this message in context: http://old.nabble.com/drbd-resource-ahead---behind-problem-tp33454636p33454636.html Sent from the DRBD - User mailing list archive at Nabble.com.