[DRBD-user] Secondary SCSI Errors causing Primary Unresponsiveness

Tony Willoughby tony.willoughby at bigbandnet.com
Wed Sep 15 17:43:34 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Greetings,

We've had an incident that I am trying to understand.  

Configuration:
  Two IBM E-Server x330's running Heartbeat/DRBD (0.6.4).
  Redhat 7.3
  Protocol C
  Crossover Ethernet

(I know that 0.6.4 is old, but we have a rather staggered release
cycle and our customers tend to upgrade infrequently.)

At some point the secondary machine started reporting SCSI errors (the
disk eventually failed).  It is not known how long the system was
having these errors.

The primary machine started to become unresponsive.

Here is the odd thing:  Any command that accessed the filesystem above
DRBD  (e.g. "ls /the/mirrored/partition") would hang.  Once the
secondary was shutdown the commands that were hung suddenly
completed.  

I'm not necessarily looking for a fix (although if I were told this
was fixed in a latter release you'd make my day :^), I'm trying to
understand why this would happen.

Anyone have any ideas?


-- 
Tony Willoughby
Bigband Networks
mailto:tony.willoughby at bigbandnet.com




More information about the drbd-user mailing list