[DRBD-user] drbd-0.7.5 on alpha, process stuck in D state after (simulated) disk failure

Nils Juergens ju at isf.rwth-aachen.de
Mon Dec 20 14:15:57 CET 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Folks,

we've had a couple of alpha boxen lying around on which I want to build 
a HA web and email server. The two machines are identical hardware-wise, 
with one system disk and one data disk (for drbd) each. Both systems are 
running a fresh install of debian sarge, and I used the sarge drbd and 
heartbeat packages. I'm using linux-2.4.27, compiled from the debian 
kernel-source package, using the default debian .config.

Initial resync did run without a problem, and I was able to switch 
to/from primary on both nodes and mount the filesystem on the primary. 
Failing over to the other node with heartbeat also works like a charm.

The next step for me was to try the case where a disk goes bad, and I 
simulated a disk failure by spinning the disk down (using scsi-stop from 
scsi-idle). drbd detected this as an disk error and reacted accordingly:

------------------------------------------------------------------------
Log excerpt from the primary (phobos) on which the disk failure occured:
------------------------------------------------------------------------
phobos:~# dmesg | tail -n 9
drbd0: Secondary/Secondary --> Primary/Secondary
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on drbd(147,0), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Device 08:10 not ready.
  I/O error: dev 08:10, sector 143112608
drbd0: drbd_md_sync_page_io(,143112608,WRITE) failed!
drbd0: Local IO failed. Detaching...
drbd0: Notified peer that my disk is broken.

phobos:~# cat /proc/drbd
version: 0.7.5 (api:76/proto:74)
SVN Revision: 1578 build by @wide, 2004-10-17 00:16:00
  0: cs:DiskLessClient st:Primary/Secondary ld:Inconsistent
     ns:1161252 nr:54509116 dw:55670368 dr:3076 al:2526 bm:4093 lo:2 
pe:0 ua:0 ap:0
  1: cs:Unconfigured
------------------------------------------------------------------------
Log excerpt from the secondary (deimos):
------------------------------------------------------------------------
deimos:~# dmesg | tail -n 2

drbd0: Secondary/Secondary --> Secondary/Primary
drbd0: PARTNER DISKLESS

deimos:/# cat /proc/drbd
version: 0.7.5 (api:76/proto:74)
SVN Revision: 1578 build by @wide, 2004-10-17 00:16:00
  0: cs:ServerForDLess st:Secondary/Primary ld:Consistent
     ns:54509116 nr:1161252 dw:1251732 dr:54457734 al:883 bm:3836 lo:0 
pe:0 ua:0 ap:0
  1: cs:Unconfigured
------------------------------------------------------------------------

So far everything looks good, but I run into a big problem: All 
processes trying to access the mounted filesystem (ext3, mounted on 
/var/redundant) are stuck in D state and never return. It seems that 
drbd is not processing any requests, though it seems to be in a sane state.

Has anyone experienced something akin to this? Has anyone sucessfully 
used drbd on alpha? Maybe I stumbled upon a 64bit problem? (alpha is 
64bit little endian).

If this is a bug, maybe one of the developers can direct me in the right 
direction on how to debug this. I'd be more than happy to put in a 
couple of hours of debugging :)

I'm going to try using the 2.6 kernel tree, but I really would prefer 
the 2.4 tree for production, because IMHO 2.6 isn't there yet (close, 
but still).

thanks a lot,
Nils Juergens




More information about the drbd-user mailing list