[Csync2] Write-error while sending data

Lars Ellenberg Lars.Ellenberg at linbit.com
Wed Oct 25 11:06:28 CEST 2006


/ 2006-10-06 15:54:49 +0200
\ Giampaolo Tomassoni:
> Dears,
> 
> I have a couple of machines loosely connected through an IPSEC channel (racoon) by a couple of ADSL lines.
> 
> These machines actually run linux-2.6.16 from Gentoo.
> 
> I'm using csync2-1.32 to keep the web contents of these two servers in sync and, being my channel well authenticated and encripted by IPSEC, I'm using csync in 'clean' channel mode.
> 
> That said, I sometimes experience a 'Write-error while sending data'.
> 
> I see there was a previous report about this error from Vic Berdin, who started a thread about it on Oct. 25, 2005. That thread seems not have came to find the cause of this trouble, so I tried to dig it my own.
> 
> My findings are that this error is not triggered by a crash of the
> csync2 peer (how suggested at first by Clifford Wolf in the
> afore-mentioned thread), instead it is due to a SIGALRM signal
> interrupting a blocking write on the connection channel.

...
> Database idle in transaction. Forcing COMMIT.
> SQL: COMMIT TRANSACTION
> Write-error while sending data.

> The expiring alarm seems somehow related to committing an active SQL
> transaction, and it seems to happen more or less 2 secs after the file
> transfer begins. This means that if you have a fast connection to your
> peer you may probably never see this error since either your files
> takes less than 2 seconds to tranfer or the write() enters and leaves
> the blocking state fast enought to keep low the chance of being
> interrupted.

as far as I can see, to set signal handlers in csync2 only
signal() is used. this should have BSD semantics and make
read() and write() restart themselves if interrupted.

one could do that explicitly by using sigaction with SA_RESTART instead.
that would probably be the correct patch, replacing signal(SIGALRM,...)
with sigaction(...) with SA_RESTART set.

> You'll find a small patch attached to this post which seems to fix the
> problem. It is made with respect to csync2-1.33.


diff -rud csync2-1.33/conn.c csync2-1.33+write-error-fix/conn.c
--- csync2-1.33/conn.c	2006-08-08 19:46:54.000000000 +0200
+++ csync2-1.33+write-error-fix/conn.c	2006-10-06 15:07:56.000000000 +0200
@@ -324,8 +324,14 @@
 
 int conn_write(const void *buf, size_t count)
 {
+	int nw;
+
 	conn_debug("Local", buf, count);
-	return WRITE(buf, count);
+
+	do nw = WRITE(buf, count);
+	while(nw <= 0 && errno == EINTR);
+
+	return nw;
 }

and if it is interrupted by something else but SIGALRM?
still retry?  I don't think this is correct.

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :


More information about the Csync2 mailing list