[Csync2] Csync2 latest source code : replication not working... is it a regression issue?

Lars Ellenberg lars.ellenberg at linbit.com
Thu Nov 10 22:39:04 CET 2011


On Thu, Nov 10, 2011 at 11:06:27PM +0530, Samba wrote:
> Here is the debugging output for the problematic file (the one that is
> modified on master), /opt/test/db.c, when the command "csync2 -xvvv" is run.
> 
> SQL: SELECT peername FROM dirty GROUP BY peername
> >
> > SQL Query finished.
> >
> > SQL: SELECT filename, myname, forced FROM dirty WHERE peername = '
> > pdev22vm2.platform.avaya.com' ORDER by filename ASC
> >
> > SQL Query finished.
> >
> > Connecting to host pdev22vm2.platform.avaya.com (PLAIN) ...
> >
> > Local> CONFIG \n
> > Peer> OK (cmd_finished).\n
> >  check: /opt/test/db.c 10, /opt/test/ 10, 10.
> >
> > Dirty item /opt/test/db.c pdev22vm1.platform.avaya.com 1
> > Local> HELLO pdev22vm1.platform.avaya.com\n
> > Peer> OK (cmd_finished).\n
> > Match (+): /opt/test on /opt/test/db.c
> >
> > Updating /opt/test/db.c on pdev22vm2.platform.avaya.com ...
> >
> > Local> FLUSH
> > s5XgBkj_4nlkSAUZZ6pjNT0klWL3d0m4wI0hV3hMir93GFl68iYVKYct1qKEM9ff
> > /opt/test/db.c\n
> > Peer> OK (cmd_finished).\n
> > Local> PATCH
> > s5XgBkj_4nlkSAUZZ6pjNT0klWL3d0m4wI0hV3hMir93GFl68iYVKYct1qKEM9ff
> > /opt/test/db.c\n
> > Peer> Stating original file /opt rc: 0 mode: 40755\n
> > While syncing file /opt/test/db.c:
> >
> > ERROR from peer(/opt/test/db.c): pdev22vm2.platform.avaya.com Stating
> > original file /opt rc: 0 mode: 40755
> >
> > Match (+): /opt/test on /opt/test/db.c
> >
> > Auto-resolving conflict: Won 'master/slave' test.
> >
> > Updating /opt/test/db.c on pdev22vm2.platform.avaya.com ...
> >
> > Local> FLUSH
> > s5XgBkj_4nlkSAUZZ6pjNT0klWL3d0m4wI0hV3hMir93GFl68iYVKYct1qKEM9ff
> > /opt/test/db.c\n
> > Peer> Changing owner of /var/backups/csync2/opt to user 0 and group 0, rc=
> > 0 \n
> > While syncing file /opt/test/db.c:
> >
> > ERROR from peer(/opt/test/db.c): pdev22vm2.platform.avaya.com Changing
> > owner of /var/backups/csync2/opt to user 0 and group 0, rc= 0


Well, it says it can not change owner and group of /var/backups/csync2/opt to 0.
So, can it?
Does that exist?
Bug in the backup code path?
Problem with that backup mount point?

What does strace have to say (on the receiving side)?

> >
> > ERROR: Auto-resolving failed. Giving up.
> >
> > File stays in dirty state. Try again later...
> >
> > Local> BYE\n
> > Peer> \n
> > ERROR from peer(<no file>): pdev22vm2.platform.avaya.com
> >
> > SQL: SELECT command, logfile FROM action GROUP BY command, logfile
> >
> > SQL Query finished.
> >
> > SQL: COMMIT
> >
> > Finished with 3 errors.
> >
> 
> I'm not sure what is causing this issue and whether it is mistake on my
> part or a regression introduced recently in the code on git.
> 
> Thanks and Regards,
> Samba
> 
> > Interestingly, many of the files (that are reported as L by command
> > "csync2 -T") which could not be replicated when I ran "csync2 -x" have been
> > replicated smoothly when I explicitly ran "csync2 -x file_name" for each
> > such file. The only problem has been the file which has been modified on
> > the master and thus marked X by 'csync2 -T' command.
> >
> > Is it that csync2 is stopping to replicate all the remaining files when it
> > encounters the first conflict? (although this is not a conflict and perhaps
> > a regression issue).

When it encounters the first error, it stops, yes.

> > It would be great if a fix can be delivered for this issue, or at least
> > some hints as to where this needs to be fixed and how?

> >>> ERROR from peer(<no file>): slave.abc.com Changing mode of
> >>> /var/backups/csync2/opt/test/db.c to mode 33188, rc= 0

That mode should be printed in octal.  0100644

Again, it says it cannot change the mode of some backup file.
Are you trying to backup to some file system that does not support
ownership / modes? Does it actually fail earlier, and the message is
misleading?

> >> Csync2 1.34 is working properly but the latest git codebase is giving
> >> this error. Am I doing something wrong or is this a regression issue?

If it really works with 1.34, and no longer with 2.0, then yes,
that fits my understanding of what is called a "regression" :-/

> >> If it
> >> is a regression issue, then I can test and verify a patch if you can
> >> provide one (may be i can try if some hints can be given, but i'm not so
> >> good at C, hence hesitation :) ).

I'm confident that you will manage ;)

> >> PS:
> >>
> >> I had faced another minor issue with the latest codebase from Git where
> >> it complained "no such column: TRUE" pointing to the line:218 in the file

Yeah, I have a patch pending for that (and a few other issues).
I just need to find some time to sanitize and push them...

> >> I had changed "OR TRUE" to "OR 1=1" which I think would work on all the
> >> databases.
> >>
> >> I'm using sqlite3, and the issue may be with the standard compliance of
> >> sqlite3 but I think it is better to go with minimum common denominator and
> >> hence would suggest to use "OR 1=1" instead of "OR TRUE" in the above
> >> mentioned context.

Correct.

It used to be "OR 1", but I think postgres complained "not a boolean",
so it became "OR TRUE", now sqlite complains,
so it needs to become "OR 1=1".
Or split into different code paths.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com


More information about the Csync2 mailing list