Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
During the (necessary[*]) transition to nonblocking write hints somewhere in 0.6.10+cvs, I introduced a bug that triggers under memory pressure and high IO load, leading to a pattern like this: [*] blocking write hints can under certain circumstances block bdflush or kupdated, which then might prevent other bottom halfs from running, and leads to a "stalled" system, which still answers to pings. So when it now shows below behaviour, previously it most likely would have been stalled for some milliseconds to hours ... syslog on PRIMARY node: > kernel: drbd0: send_cmd_dontwait returned 4 > kernel: drbd0: [bonnie++/20787] sock_sendmsg returned -32 > kernel: drbd0: send_cmd_dontwait returned -1000 > kernel: drbd0: send_cmd_dontwait returned -1000 > kernel: drbd0: Connection lost. > kernel: drbd0: Connection established. > kernel: drbd0: Synchronisation started blks=64 > kernel: drbd0: Synchronisation done. syslog on SECONDARY node: > kernel: drbd0: unknown packet type! > kernel: drbd0: Connection lost. > kernel: drbd0: Connection established. The problem was that because the IO hints are nonblocking now, they sometimes are only partially sent. With the guarantee to send IO hints either completely, or not at all, but nonblocking in any case, we believe to have it fixed in current CVS. If some of you encounter the problem, or can reproduce it in some test setup, please give feedback about whether a new CVS checkout solves the problem. Thanks, Lars Ellenberg