Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Mystery appears to be solved! The Ethernet card being used for DRBD replication was flaky in the old secondary. Apparently replication was sometimes going super slow. That explains why BOTH nodes had the high iowait problem when they were primary, but NEITHER had high iowait when they were secondary. We're using Protocol C, so processes on the primary kept queuing up waiting for io calls to complete because DRBD could not write them to the other node fast enough. It also explains my other question about the resync continually stalling. Seriously, did NOBODY in the list notice when I said the resync was going at a max of about 80K? When the NIC issue was resolved, resyncs now happen at 30,000K. :-) I discovered the problem when I rebooted the server again and it said "PCIe training error, slot3" and the system halted. You guessed it, slot 3 was the NIC doing the replication. I reseated the card and it came up fine and now replication is fast and I do not expect any more iowaits. Next task... Replace that NIC. Thanks for everyone's help and suggestions. -- Eric Robinson Disclaimer - February 18, 2011 This email and any files transmitted with it are confidential and intended solely for drbd-user at lists.linbit.com. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physicians' Managed Care or Physician Select Management. Warning: Although Physicians' Managed Care or Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments. This disclaimer was added by Policy Patrol: http://www.policypatrol.com/