[DRBD-user] Sync slow with drbd 0.7.10 and linux 2.6.11

Mon Mar 14 18:51:15 CET 2005

I tried your suggestions.  At first everything seemed to be faster.  
Then I started up my normal processing on the primary node.  The I/O got 
slow.  I then shut down drbd on the secondary/inconsistent node.  When I 
started it up again, almost all the I/O on the primary stopped.  Here 
are the messages I saw on the secondary:

drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967295
drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967294
drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967293
drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967292
drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967291
drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967290
drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967289
drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967288
drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967287
drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967286
drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967285 ...

When I stopped the secnodary, the primary took off with normal I/O.  The 
primary had no error messages out of the normal ones you would expect 
from the secondary going up and down, except for:

drbd0: drbd0_receiver [4190]: cstate NetworkFailure --> BrokenPipe
drbd0: error receiving ReportBitMap, l: 4088!
drbd0: worker terminated
drbd0: drbd0_receiver [4190]: cstate BrokenPipe --> Unconnected
drbd0: Connection lost.
drbd0: drbd0_receiver [4190]: cstate Unconnected --> WFConnection

Philipp Reisner wrote:

>Am Mittwoch, 9. März 2005 23:10 schrieb Harry Edmon:
>  
>
>>I have two 2.6.11 systems with 3ware cards hooked up via drbd 0.7.10.
>>When set up freshair2 (8000 3ware card) as the primary and sync it to
>>funnel1 (9000 3ware card), it runs 20-30 MBytes/sec.  However, when I
>>reverse this (funnel1->freshair2) the sync rate is 7 MBytes/sec.  I have
>>tested the network and disk bandwidth by doing a rcp from funnel1to
>>freshair2, and I get 27 Mbytes/sec, which seems to eliminates the
>>network and the disk.  This is with nothing else running on either
>>machine.  All that is left appears to be drbd.
>>
>>Both units are dual Xeon systems, and have have tried them with both
>>hyperthreading on and off.   Anyone have any ideas?  I have attached the
>>drbd.conf file.
>>    
>>
>
>It seems that the 2.6.11 IO-Scheduling code has some new surprises
>ready for DRBD. I need some time to understand this new issues 
>completely.
>
>What you can do so far:
>
>  max-buffers 4096
>  max-epoch-size 1024
>
>  __AND__
>  
>  you need to tune the nr_requests parameter of the backing device
>  via sys-fs. -> 1024 gave me reasonable performance...
>
>-Philipp
>  
>