Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, I'm in the process of trying to debug what I suspect is an I/O issue on a highly-available SCADA server operated by a mining company. The systems run Ubuntu 10.04 LTS with two network interfaces, one on their business network, and one on their control network, both gigabit links. drbdadm --version reports: > DRBDADM_BUILDTAG=GIT-hash:\ ea9e28dbff98e331a62bcbcc63a6135808fe2917\ build\ by\ buildd at panlong\,\ 2012-05-18\ 08:21:18 > DRBDADM_API_VERSION=88 > DRBD_KERNEL_VERSION_CODE=0x080307 > DRBDADM_VERSION_CODE=0x080307 > DRBDADM_VERSION=8.3.7 The system logs PLC-generated process data every 5 seconds, and at two times of the day, at midnight and midday, it misses a sample with the logging taking 6 seconds. There's no obvious CPU spike at this time, so my hunch is I/O, and so I'm looking at ways to try and improve this. iotop didn't show any huge spikes that I'd imagine the disks would have trouble with. Then again, since it's effectively polling, I could have "blinked" and missed it. DR:BD is configured with a disk partition on a RAID array as its backing store, the same array being shared with the OS and swap space. I don't know if the array has RAM or flash based cache, all I know is it uses the cciss driver. The configuration file looks like this: > global { > usage-count yes; > } > common { > syncer { rate 50M; } > } > resource r0 { > protocol C; > handlers { > } > startup { > wfc-timeout 10; # 10 seconds > degr-wfc-timeout 120; # 2 minutes. > } > disk { > on-io-error pass_on; > } > net { > sndbuf-size 512k; > timeout 60; # 6 seconds (unit = 0.1 seconds) > ping-int 10; # 10 seconds (unit = 1 second) > ping-timeout 10; # 500 ms (unit = 0.1 seconds) > max-buffers 4096; > max-epoch-size 4096; > after-sb-0pri discard-zero-changes; > after-sb-1pri consensus; > after-sb-2pri disconnect; > rr-conflict disconnect; > } > syncer { > rate 50M; > al-extents 257; > } > on node1 { > device /dev/drbd0; > disk /dev/cciss/c0d0p4; > address 10.20.30.1:7788; > meta-disk internal; > } > on node2 { > device /dev/drbd0; > disk /dev/cciss/c0d0p4; > address 10.20.30.2:7788; > meta-disk internal; > } > } One thing I'm looking to adjust is the al-extents option, as reading the literature, 257 looks a little small. The historian will be writing little bits of data every 5 seconds as part of its logging function, and so I suspect raising this may help. However, I cannot bring the system down to adjust it at this time. I read that some settings can be changed on-the-fly, so I tried setting it to 521 and issued the following dry-run command: > root at node1:~# drbdadm -d adjust all > drbdsetup 0 syncer --set-defaults --create-device --rate=50M --al-extents=521 I saw --create-device and thought, what does "--create-device" mean? Is it a *destructive* re-creation of the block device? I've since reverted my changes back to what's shown above on both nodes, and have not proceeded. The documentation on exactly what that command does is unclear to me. Is it a sane thing to try and adjust this parameter (or others) on-the-fly like this, and am I going the right way about it? Thanks in advance. Regards, -- Stuart Longland Systems Engineer _ ___ \ /|_) | T: +61 7 3535 9619 \/ | \ | 38b Douglas Street F: +61 7 3535 9699 SYSTEMS Milton QLD 4064 http://www.vrt.com.au