[DRBD-user] Secondary node halts primary

Lars Knudsen lakn at hcdk.com
Fri Nov 17 13:00:32 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I'm currently using DRBD for test purposes on two Dell PE2950's with a
megasas disksystem.
My system is a SuSE 10.1 installation - kernel 2.6.16.13-4-smp, drbd
drbd: initialised. Version: 0.7.17 (api:77/proto:74)
drbd: SVN Revision: 2125 build by lmb at chip, 2006-03-27 17:40:22

I have created a simple DRBD volume using below config (no linux-ha,
heartbeat etc yet):

--
resource drbd0 {
         protocol C;
         incon-degr-cmd "halt -f";

         syncer {
          rate 110M;
          group 1;
          al-extents 257;
         }

         on dkvm1 {
           device    /dev/drbd1;
           disk      /dev/sda6;
           address   10.100.10.101:7789;
           meta-disk  internal;
         }

         on dkvm2 {
           device    /dev/drbd1;
           disk      /dev/sda6;
           address   10.100.10.102:7789;
           meta-disk  internal;
         }
       }
--
A manual test unplugging my primary node has been succesfull - state
detected - fsck - remount gave me fs access on my secondary node.

Today I hit this bug on my secondary node:
http://www.ussg.iu.edu/hypermail/linux/kernel/0610.1/2599.html getting log
entries like:
--
sd 0:2:0:0: megasas: RESET -2189309 cmd=2a
megasas: [ 0]waiting for 16 commands to complete
megasas: [ 5]waiting for 16 commands to complete
megasas: [10]waiting for 16 commands to complete
--
I would expect my secondary node to fail, and my primary stay online.
Instead the /dev/drbd1 seemed to be in a "hanging" state where also my
primary node hung.

When i issued a 'ifconfig eth0 down' (eth0=drbd interface) on my secondary
node, the primary woke up and continued.
Is there something i forgot in my config to take care of the situation where
scsi bus errors leave the secondary node in semi-working condition - kernel
was up, networking was up, but scsi dead.

I placed last part of dmesg from both nodes here: 
http://spa.ceman.info/drbd

If more info is needed, please state what is needed

regards, Lars



More information about the drbd-user mailing list