Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all, I'm running DRBD on two following machines: Scientific Linux 6.1 Intel Xeon E5620 Quad Core 12GB Ram LSI 9280-4i4e + CacheCade 1.0 (CacheCade is currently disabled) 12x 1TB Seagate Constellation SAS 7200 RPM drives DRBD has been configured in master/slave -fashion and there is 10 GbE dedicated link between machines for DRBD traffic. Drives have been configured in RAID-10 mode with 64 kB Stripe Size. DRBD version is 8.4.0 (api:1/proto:86-100). Couple of weeks ago I had to do some maintenance work on node1 (normally the master node) so I moved all services to run on node2. Everything went well and since machines are identical I didn't bother to move services back to node1. Few days later I noticed that performance was somewhat degraded but difference was so minimal that I didn't focus on it at all. A little later I was asked to do some simple read performance tests. Everything looked ok when doing direct reads on DRBD-device: dd if=/dev/drbd0 of=/dev/null bs=1M iflag=direct ^C17317+0 records in 17316+0 records out 18157142016 bytes (18 GB) copied, 21.3465 s, 851 MB/s But with buffered reads things get slow: dd if=/dev/drbd0 of=/dev/null bs=1M ^C11131+0 records in 11130+0 records out 11670650880 bytes (12 GB) copied, 105.299 s, 111 MB/s However, the underlying disk seems to be fine: dd if=/dev/sdb of=/dev/null bs=1M ^C14087+0 records in 14086+0 records out 14770241536 bytes (15 GB) copied, 19.8579 s, 744 MB/s I moved services back to node1 and the problem was gone (dd if=/dev/drbd0 of=/dev/null bs=1M, 37312528384 bytes (37 GB) copied, 54.8519 s, 680 MB/s). Now I started to investigate what caused the issues and moved services to node2 and performance problems hit again so I thought that there has to be something wrong on node2. I compared settings between the two machines to make sure they are really identical and found nothing strange between them. Raid sets are fine and there are no error messages in log files. At this point I decided to reboot node1 and after that moved services back to it. After the reboot performance dropped also on node1 and I haven't been able to find out anything that could really help getting performance up again. So it seems like DRBD has huge effect when it comes to buffered reads. It may very well be that I have forgotten to do some sysctl or such tuning after reboot but I can't figure out what it could be. Any ideas how to work this out or is this expected behaviour? I'm currently using following settings on my disks: echo deadline > /sys/block/sdb/queue/scheduler echo 0 > /sys/block/sdb/queue/iosched/front_merges echo 150 > /sys/block/sdb/queue/iosched/read_expire echo 1500 > /sys/block/sdb/queue/iosched/write_expire echo 32000000 > /proc/sys/vm/dirty_background_bytes echo 384000000 > /proc/sys/vm/dirty_bytes echo 1024 > /sys/block/sdb/queue/nr_requests and current DRBD resource configuration: resource drbd0 { device /dev/drbd0; disk /dev/sdb1; meta-disk internal; options { cpu-mask 15; } net { protocol C; max-buffers 8000; max-epoch-size 8000; unplug-watermark 16; sndbuf-size 0; } disk { al-extents 3389; disk-barrier no; disk-flushes no; } on node1 { address 10.10.10.1:7789; } on node2 { address 10.10.10.2:7789; } } Best regards, Samuli Heinonen