[DRBD-user] 0.6.11 "unknown packet type" ... connection lost

Mon Feb 23 18:15:07 CET 2004

During the (necessary[*]) transition to nonblocking write hints
somewhere in 0.6.10+cvs, I introduced a bug that triggers under
memory pressure and high IO load, leading to a pattern like this:

[*]
 blocking write hints can under certain circumstances block
 bdflush or kupdated, which then might prevent other bottom halfs from
 running, and leads to a "stalled" system, which still answers to pings.
 So when it now shows below behaviour, previously it most likely
 would have been stalled for some milliseconds to hours ...

syslog on PRIMARY node:
> kernel: drbd0: send_cmd_dontwait returned 4
> kernel: drbd0: [bonnie++/20787] sock_sendmsg returned -32
> kernel: drbd0: send_cmd_dontwait returned -1000
> kernel: drbd0: send_cmd_dontwait returned -1000
> kernel: drbd0: Connection lost.
> kernel: drbd0: Connection established.
> kernel: drbd0: Synchronisation started blks=64
> kernel: drbd0: Synchronisation done.

syslog on SECONDARY node:
> kernel: drbd0: unknown packet type!
> kernel: drbd0: Connection lost.
> kernel: drbd0: Connection established.

The problem was that because the IO hints are nonblocking now,
they sometimes are only partially sent.  With the guarantee to
send IO hints either completely, or not at all, but nonblocking in
any case, we believe to have it fixed in current CVS.

If some of you encounter the problem, or can reproduce it in
some test setup, please give feedback about whether a new
CVS checkout solves the problem.

Thanks,

	Lars Ellenberg