Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Folks,
we've had a couple of alpha boxen lying around on which I want to build
a HA web and email server. The two machines are identical hardware-wise,
with one system disk and one data disk (for drbd) each. Both systems are
running a fresh install of debian sarge, and I used the sarge drbd and
heartbeat packages. I'm using linux-2.4.27, compiled from the debian
kernel-source package, using the default debian .config.
Initial resync did run without a problem, and I was able to switch
to/from primary on both nodes and mount the filesystem on the primary.
Failing over to the other node with heartbeat also works like a charm.
The next step for me was to try the case where a disk goes bad, and I
simulated a disk failure by spinning the disk down (using scsi-stop from
scsi-idle). drbd detected this as an disk error and reacted accordingly:
------------------------------------------------------------------------
Log excerpt from the primary (phobos) on which the disk failure occured:
------------------------------------------------------------------------
phobos:~# dmesg | tail -n 9
drbd0: Secondary/Secondary --> Primary/Secondary
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on drbd(147,0), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Device 08:10 not ready.
I/O error: dev 08:10, sector 143112608
drbd0: drbd_md_sync_page_io(,143112608,WRITE) failed!
drbd0: Local IO failed. Detaching...
drbd0: Notified peer that my disk is broken.
phobos:~# cat /proc/drbd
version: 0.7.5 (api:76/proto:74)
SVN Revision: 1578 build by @wide, 2004-10-17 00:16:00
0: cs:DiskLessClient st:Primary/Secondary ld:Inconsistent
ns:1161252 nr:54509116 dw:55670368 dr:3076 al:2526 bm:4093 lo:2
pe:0 ua:0 ap:0
1: cs:Unconfigured
------------------------------------------------------------------------
Log excerpt from the secondary (deimos):
------------------------------------------------------------------------
deimos:~# dmesg | tail -n 2
drbd0: Secondary/Secondary --> Secondary/Primary
drbd0: PARTNER DISKLESS
deimos:/# cat /proc/drbd
version: 0.7.5 (api:76/proto:74)
SVN Revision: 1578 build by @wide, 2004-10-17 00:16:00
0: cs:ServerForDLess st:Secondary/Primary ld:Consistent
ns:54509116 nr:1161252 dw:1251732 dr:54457734 al:883 bm:3836 lo:0
pe:0 ua:0 ap:0
1: cs:Unconfigured
------------------------------------------------------------------------
So far everything looks good, but I run into a big problem: All
processes trying to access the mounted filesystem (ext3, mounted on
/var/redundant) are stuck in D state and never return. It seems that
drbd is not processing any requests, though it seems to be in a sane state.
Has anyone experienced something akin to this? Has anyone sucessfully
used drbd on alpha? Maybe I stumbled upon a 64bit problem? (alpha is
64bit little endian).
If this is a bug, maybe one of the developers can direct me in the right
direction on how to debug this. I'd be more than happy to put in a
couple of hours of debugging :)
I'm going to try using the 2.6 kernel tree, but I really would prefer
the 2.4 tree for production, because IMHO 2.6 isn't there yet (close,
but still).
thanks a lot,
Nils Juergens