Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I tried your suggestions. At first everything seemed to be faster. Then I started up my normal processing on the primary node. The I/O got slow. I then shut down drbd on the secondary/inconsistent node. When I started it up again, almost all the I/O on the primary stopped. Here are the messages I saw on the secondary: drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967294 drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967293 drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967292 drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967291 drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967290 drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967289 drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967288 drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967287 drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967286 drbd0: [drbd0_receiver/26463] sock_sendmsg time expired, ko = 4294967285 ... When I stopped the secnodary, the primary took off with normal I/O. The primary had no error messages out of the normal ones you would expect from the secondary going up and down, except for: drbd0: drbd0_receiver [4190]: cstate NetworkFailure --> BrokenPipe drbd0: error receiving ReportBitMap, l: 4088! drbd0: worker terminated drbd0: drbd0_receiver [4190]: cstate BrokenPipe --> Unconnected drbd0: Connection lost. drbd0: drbd0_receiver [4190]: cstate Unconnected --> WFConnection Philipp Reisner wrote: >Am Mittwoch, 9. März 2005 23:10 schrieb Harry Edmon: > > >>I have two 2.6.11 systems with 3ware cards hooked up via drbd 0.7.10. >>When set up freshair2 (8000 3ware card) as the primary and sync it to >>funnel1 (9000 3ware card), it runs 20-30 MBytes/sec. However, when I >>reverse this (funnel1->freshair2) the sync rate is 7 MBytes/sec. I have >>tested the network and disk bandwidth by doing a rcp from funnel1to >>freshair2, and I get 27 Mbytes/sec, which seems to eliminates the >>network and the disk. This is with nothing else running on either >>machine. All that is left appears to be drbd. >> >>Both units are dual Xeon systems, and have have tried them with both >>hyperthreading on and off. Anyone have any ideas? I have attached the >>drbd.conf file. >> >> > >It seems that the 2.6.11 IO-Scheduling code has some new surprises >ready for DRBD. I need some time to understand this new issues >completely. > >What you can do so far: > > max-buffers 4096 > max-epoch-size 1024 > > __AND__ > > you need to tune the nr_requests parameter of the backing device > via sys-fs. -> 1024 gave me reasonable performance... > >-Philipp > >