Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2006-11-17 13:00:32 +0100
\ Lars Knudsen:
> Hello,
>
> I'm currently using DRBD for test purposes on two Dell PE2950's with a
> megasas disksystem.
> My system is a SuSE 10.1 installation - kernel 2.6.16.13-4-smp, drbd
> drbd: initialised. Version: 0.7.17 (api:77/proto:74)
> drbd: SVN Revision: 2125 build by lmb at chip, 2006-03-27 17:40:22
>
> I have created a simple DRBD volume using below config (no linux-ha,
> heartbeat etc yet):
>
> --
> resource drbd0 {
> protocol C;
> incon-degr-cmd "halt -f";
# from the example config file:
disk {
# if the lower level device reports io-error you have the choice of
# "pass_on" -> Report the io-error to the upper layers.
# Primary -> report it to the mounted file system.
# Secondary -> ignore it.
# "panic" -> The node leaves the cluster by doing a kernel panic.
# "detach" -> The node drops its backing storage device, and
# continues in disk less mode.
#
on-io-error detach;
}
also have a look at the "ko-count" parameter.
> syncer {
> rate 110M;
> group 1;
> al-extents 257;
> }
>
> on dkvm1 {
> device /dev/drbd1;
> disk /dev/sda6;
> address 10.100.10.101:7789;
> meta-disk internal;
> }
>
> on dkvm2 {
> device /dev/drbd1;
> disk /dev/sda6;
> address 10.100.10.102:7789;
> meta-disk internal;
> }
> }
> --
> A manual test unplugging my primary node has been succesfull - state
> detected - fsck - remount gave me fs access on my secondary node.
>
> Today I hit this bug on my secondary node:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0610.1/2599.html getting log
> entries like:
> --
> sd 0:2:0:0: megasas: RESET -2189309 cmd=2a
> megasas: [ 0]waiting for 16 commands to complete
> megasas: [ 5]waiting for 16 commands to complete
> megasas: [10]waiting for 16 commands to complete
> --
> I would expect my secondary node to fail, and my primary stay online.
> Instead the /dev/drbd1 seemed to be in a "hanging" state where also my
> primary node hung.
the fact is that drbd 0.7 does not really handle backend storage io errors as
good as it could in all cases. and doing a kernel panic is not the most polite thing
for a driver to do either, but effectively works in a two node drbd cluster (as
long as the other node is healthy).
configuring "detach" or "panic" for the on-io-error
had done the trick in this case, I think.
we improved the behaviour of drbd after hardware errors considerably in drbd8,
and the good news is, we expect it to be released as 8.0 during december.
--
: Lars Ellenberg Tel +43-1-8178292-0 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :
__
please use the "List-Reply" function of your email client.