Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
/ 2006-11-17 13:00:32 +0100 \ Lars Knudsen: > Hello, > > I'm currently using DRBD for test purposes on two Dell PE2950's with a > megasas disksystem. > My system is a SuSE 10.1 installation - kernel 2.6.16.13-4-smp, drbd > drbd: initialised. Version: 0.7.17 (api:77/proto:74) > drbd: SVN Revision: 2125 build by lmb at chip, 2006-03-27 17:40:22 > > I have created a simple DRBD volume using below config (no linux-ha, > heartbeat etc yet): > > -- > resource drbd0 { > protocol C; > incon-degr-cmd "halt -f"; # from the example config file: disk { # if the lower level device reports io-error you have the choice of # "pass_on" -> Report the io-error to the upper layers. # Primary -> report it to the mounted file system. # Secondary -> ignore it. # "panic" -> The node leaves the cluster by doing a kernel panic. # "detach" -> The node drops its backing storage device, and # continues in disk less mode. # on-io-error detach; } also have a look at the "ko-count" parameter. > syncer { > rate 110M; > group 1; > al-extents 257; > } > > on dkvm1 { > device /dev/drbd1; > disk /dev/sda6; > address 10.100.10.101:7789; > meta-disk internal; > } > > on dkvm2 { > device /dev/drbd1; > disk /dev/sda6; > address 10.100.10.102:7789; > meta-disk internal; > } > } > -- > A manual test unplugging my primary node has been succesfull - state > detected - fsck - remount gave me fs access on my secondary node. > > Today I hit this bug on my secondary node: > http://www.ussg.iu.edu/hypermail/linux/kernel/0610.1/2599.html getting log > entries like: > -- > sd 0:2:0:0: megasas: RESET -2189309 cmd=2a > megasas: [ 0]waiting for 16 commands to complete > megasas: [ 5]waiting for 16 commands to complete > megasas: [10]waiting for 16 commands to complete > -- > I would expect my secondary node to fail, and my primary stay online. > Instead the /dev/drbd1 seemed to be in a "hanging" state where also my > primary node hung. the fact is that drbd 0.7 does not really handle backend storage io errors as good as it could in all cases. and doing a kernel panic is not the most polite thing for a driver to do either, but effectively works in a two node drbd cluster (as long as the other node is healthy). configuring "detach" or "panic" for the on-io-error had done the trick in this case, I think. we improved the behaviour of drbd after hardware errors considerably in drbd8, and the good news is, we expect it to be released as 8.0 during december. -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.