Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Tony Willoughby wrote: > > Greetings, > > We've had an incident that I am trying to understand. > > Configuration: > Two IBM E-Server x330's running Heartbeat/DRBD (0.6.4). > Redhat 7.3 > Protocol C > Crossover Ethernet > > (I know that 0.6.4 is old, but we have a rather staggered release > cycle and our customers tend to upgrade infrequently.) > > At some point the secondary machine started reporting SCSI errors (the > disk eventually failed). It is not known how long the system was > having these errors. > > The primary machine started to become unresponsive. > > Here is the odd thing: Any command that accessed the filesystem above > DRBD (e.g. "ls /the/mirrored/partition") would hang. Once the > secondary was shutdown the commands that were hung suddenly > completed. > > I'm not necessarily looking for a fix (although if I were told this > was fixed in a latter release you'd make my day :^), I'm trying to > understand why this would happen. > > Anyone have any ideas? Note: I am a user not a writer of drbd, and I have some Promise raid boxes that put me in the above situation ALL too often. 0.6.10 behaves the same way. Proto C requires that before the primary returns "data written", both host's subsystems have to return "data written". IIRC ls (and many other commands) at a minimum may end up updating things like access time on some file/directory entries, that's a write that requires a "data written" on both systems, so you get to wait until Proto C is satisfied. -- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter