Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi. So, I have a 2-node setup ... it functions fine, but I recently (last night/this morning) upgraded to the current Xen source, Xen patches to linux 2.6.18, and DRBD 8.3.0. It *functions* fine, so far, however when I was re-syncing my data mirrors on both hosts, I noticed something really disturbing. The host that only had Secondary roles on it was re-syncing the disk at a substantially slower rate than it should have been. The host with Primary roles may have been performance impacted as well, but I didn't take metrics at the time. When I went to pass out at 5am, both nodes were giving me an ETA of 4 hours to sync. When I checked on them ~7h later, the host with Primary roles was complete and the host with Secondary roles was not! I grabbed some data points: md1 : active raid1 sdb3[2] sda3[0] 727688192 blocks [2/1] [U_] [===================>.] recovery = 98.9% (720040384/727688192) finish=5.1 min speed=24888K/sec Just for fun, I shut down the DRBD devices and module (/etc/init.d/drbd stop) and checked again: md1 : active raid1 sdb3[2] sda3[0] 727688192 blocks [2/1] [U_] [===================>.] recovery = 98.2% (715188224/727688192) finish=4.8 min speed=43250K/sec That's a pretty significant difference. I turned on the DRBD module and devices and observed the sync performance drop back to the original values. So, I turned down the DRBD module and devices, let the sync complete and have restarted it (poor hard drives). Here's my starting point: md1 : active raid1 sdb3[2] sda3[0] 727688192 blocks [2/1] [U_] [>....................] recovery = 1.2% (8791680/727688192) finish=131.6 min speed=91042K/sec (side note, it's interesting to see the performance difference between the inside and outside tracks of the physical disk, eh?) Starting the DRBD module: # modprobe drbd md1 : active raid1 sdb3[2] sda3[0] 727688192 blocks [2/1] [U_] [=>...................] recovery = 9.0% (65796864/727688192) finish=122. 8min speed=89828K/sec I see no significant change to re-sync speed. As it progresses through the disk, it will slow a bit, so the <2M/s difference is not surprising. Starting 1 device (Secondary role): # drbdadm up ftp01-root # cat /proc/drbd version: 8.3.0 (api:88/proto:86-89) GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by root at build-hardy-x64, 2008-12-22 04:14:22 2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r--- ns:0 nr:2346 dw:2346 dr:0 al:0 bm:10 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 md1 : active raid1 sdb3[2] sda3[0] 727688192 blocks [2/1] [U_] [==>..................] recovery = 10.0% (72998592/727688192) finish=143. 3min speed=76120K/sec That's a bit more significant ... loss of >12.7M/s in sync performance. In observing the re-sync speed, I see it fluxuating between 83M/s and 62M/s. >From /proc/drbd, we can see the device is not very active, however: 2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r--- ns:0 nr:2735 dw:2735 dr:0 al:0 bm:10 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 Firing up additional devices in the Secondary role just continues to beat down the re-sync performance (down into the 40M/s range). Unconfiguring all of the devices brings the speed back up to the upper 80's (near full speed). Snippits from the drbd.conf: common { syncer { rate 70M; verify-alg md5; } protocol C; startup { wfc-timeout 0; ## Infinite! degr-wfc-timeout 120; ## 2 minutes. } disk { on-io-error detach; } net { allow-two-primaries; } } ###################################################################### resource ftp01-root { device /dev/drbd2; disk /dev/datavg/ftp01-root; flexible-meta-disk internal; on xen-33-18-02 { address 192.168.250.12:7702; } on xen-33-18-03 { address 192.168.250.13:7702; } } All of the rest of the devices are similar. Server configuration: Linux: 2.6.18.8-xen-x64-20081222 #8 SMP Mon Dec 22 05:08:39 EST 2008 x86_64 GNU/Linux Xen: Xen 3.3.1-rc4 DRBD: 8.3.0 (build against the above kernel tree) /dev/md1 is a linux software mirror of 2 SATA2 drives (/dev/sda3 and /dev/sdb3) of about 694GB in size LVM sits on top of the md mirror presenting a single "disk" and LVM is used to chop into LVs ftp01-data datavg -wi-a- 140.00G ftp01-root datavg -wi-a- 4.00G Each LV is used by DRBD to present a "disk" device to the Xen guest: root = '/dev/xvda1 ro' disk = [ 'drbd:ftp01-root,xvda1,w', 'drbd:ftp01-data,xvda2,w', ] Aside from these performance issues, and a pretty disturbing issue with resizing an LV/DRBD device/Filesystem with flexible-meta-disk: internal, everything has been running OK. I do not suspect the upgrade to the Xen, linux, or DRBD source ... I've seen these performance issues in the past, but never been annoyed enough to capture them in detail. Any ideas what I can do to pep up the performance? My ultimate goal is to use the new stacking feature in DRBD 8.3.0 to put a 3rd node at a remote (WAN) location ... any thing I should consider for that with my configuration as detailed above? Thanks, Sam -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20081222/f50827ab/attachment.htm>