Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Well, this is still happening, and with any resource that is used much at all. I moved some vms from the problematic resource and it started happening on the new resource. I could really use some help in trying to figure out what in the world is going on, as this makes our dr node pretty useless. I've been pouring over the logs and have seen some troubling messages, but I don't know if they are really a problem or just normal. I see these a LOT: kern.info<6>: Mar 16 11:22:05 openfiler2 kernel: block drbd15: conn( Ahead -> SyncSource ) pdsk( Outdated -> Inconsistent ) kern.info<6>: Mar 16 11:22:05 openfiler2 kernel: block drbd15: Began resync as SyncSource (will sync 748 KB [187 bits set]). ... kern.warn<4>: Mar 16 11:22:12 openfiler2 kernel: block drbd15: cs:Ahead rs_left=465 > rs_total=187 (rs_failed 0) Which, from what I can tell seem to indicate that there is more data out of sync than the driver expected, is that correct? what causes that and should I be concerned? Get these quite often too, both between the two local nodes and between the local and offsite node: kern.warn<4>: Mar 16 11:22:06 openfiler2 kernel: block drbd5: Digest mismatch, buffer modified by upper layers during write: 331859168s +4096 Sometimes these cause the connection to be dropped and re-established, and sometimes not, here is a case where it did: kern.warn<4>: Mar 16 11:26:06 openfiler2 kernel: block drbd15: Digest mismatch, buffer modified by upper layers during write: 330987176s +4096 kern.err<3>: Mar 16 11:26:06 openfiler2 kernel: block drbd15: meta connection shut down by peer. I'm also seeing these: kern.warn<4>: Mar 16 10:23:01 openfiler2 kernel: block drbd4: helper command: /sbin/drbdadm fence-peer minor-4 exit code 1 (0x100) kern.err<3>: Mar 16 10:23:01 openfiler2 kernel: block drbd4: fence-peer helper broken, returned 1 and when the nodes get into the state where the remote node believes it's uptodate, but isn't i see these on the local node: kern.warn<4>: Mar 16 10:31:05 openfiler2 kernel: block drbd14: Local backing block device frozen? I've found that if I force the network down on the remote node like this: ifdown eth0;sleep 15;ifup eth0 drbdadm connect all then the nodes will reconnect and not resync all the data on the drives, but that is very messy and scares the crap out of me. Insights, thoughts, anything?? envisionrx wrote: > > This seems to be a problem with this one resource, the other resources > aren't exhibiting this issue. every time I disconnect and reconnect and > then let it sync it finishes and is left in this weird state. Migrating > files off this resource while I dig more into the logs for a clue as to > what might be up. I guess I'll rebuild the resource after I get > everything off of it to see if that clears it up. If any one has any > suggestions in the mean time please let me know! > -- View this message in context: http://old.nabble.com/drbd-resource-ahead---behind-problem-tp33454636p33518386.html Sent from the DRBD - User mailing list archive at Nabble.com.