Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Folks, we've had a couple of alpha boxen lying around on which I want to build a HA web and email server. The two machines are identical hardware-wise, with one system disk and one data disk (for drbd) each. Both systems are running a fresh install of debian sarge, and I used the sarge drbd and heartbeat packages. I'm using linux-2.4.27, compiled from the debian kernel-source package, using the default debian .config. Initial resync did run without a problem, and I was able to switch to/from primary on both nodes and mount the filesystem on the primary. Failing over to the other node with heartbeat also works like a charm. The next step for me was to try the case where a disk goes bad, and I simulated a disk failure by spinning the disk down (using scsi-stop from scsi-idle). drbd detected this as an disk error and reacted accordingly: ------------------------------------------------------------------------ Log excerpt from the primary (phobos) on which the disk failure occured: ------------------------------------------------------------------------ phobos:~# dmesg | tail -n 9 drbd0: Secondary/Secondary --> Primary/Secondary kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on drbd(147,0), internal journal EXT3-fs: mounted filesystem with ordered data mode. Device 08:10 not ready. I/O error: dev 08:10, sector 143112608 drbd0: drbd_md_sync_page_io(,143112608,WRITE) failed! drbd0: Local IO failed. Detaching... drbd0: Notified peer that my disk is broken. phobos:~# cat /proc/drbd version: 0.7.5 (api:76/proto:74) SVN Revision: 1578 build by @wide, 2004-10-17 00:16:00 0: cs:DiskLessClient st:Primary/Secondary ld:Inconsistent ns:1161252 nr:54509116 dw:55670368 dr:3076 al:2526 bm:4093 lo:2 pe:0 ua:0 ap:0 1: cs:Unconfigured ------------------------------------------------------------------------ Log excerpt from the secondary (deimos): ------------------------------------------------------------------------ deimos:~# dmesg | tail -n 2 drbd0: Secondary/Secondary --> Secondary/Primary drbd0: PARTNER DISKLESS deimos:/# cat /proc/drbd version: 0.7.5 (api:76/proto:74) SVN Revision: 1578 build by @wide, 2004-10-17 00:16:00 0: cs:ServerForDLess st:Secondary/Primary ld:Consistent ns:54509116 nr:1161252 dw:1251732 dr:54457734 al:883 bm:3836 lo:0 pe:0 ua:0 ap:0 1: cs:Unconfigured ------------------------------------------------------------------------ So far everything looks good, but I run into a big problem: All processes trying to access the mounted filesystem (ext3, mounted on /var/redundant) are stuck in D state and never return. It seems that drbd is not processing any requests, though it seems to be in a sane state. Has anyone experienced something akin to this? Has anyone sucessfully used drbd on alpha? Maybe I stumbled upon a 64bit problem? (alpha is 64bit little endian). If this is a bug, maybe one of the developers can direct me in the right direction on how to debug this. I'd be more than happy to put in a couple of hours of debugging :) I'm going to try using the 2.6 kernel tree, but I really would prefer the 2.4 tree for production, because IMHO 2.6 isn't there yet (close, but still). thanks a lot, Nils Juergens