[DRBD-user] Problem updating from 8.3.11 to 8.4.1: drbdsetup hangs when "downing" resources

Sat Jun 23 15:44:21 CEST 2012

Hello all,

I have a problem updating an SLES11SP2 cluster from DRBD 8.3.11 to 
8.4.1, following the manual at 
http://www.drbd.org/users-guide/s-upgrading-drbd.html.  I will post my 
DRBD config below.

What I've done so far:
  - Stopped Pacemaker/Corosync/OpenAIS on node A.
  - Installed latest DRBD RPMs from Novell (8.4.1) on node A.
  - Node B remained in DRBD 8.3.11, running all services normally.
  - Rebooted node A, verified that everything is installed properly.
  - Started DRBD ok on node A, using "/etc/init.d/drbd start".
  - DRBD status is fine on both nodes, i.e., resources are up-to-date in 
Secondary (node A)/Primary (node B) state, using "/etc/init.d/drbd status"

The problem is now, that I cannot stop DRBD on node A! As soon as I 
issue "/etc/init.d/drbd stop", the command hangs and nothing happens. If 
I "strg-C" it, and do a "ps aux", I see that "drbdsetup down r0" is in 
dead state, see:
--------
root      4018  0.0  0.0   4080   312 pts/1    D+   Jun22   0:00 
drbdsetup down r0
--------

After this, when I issue "/etc/init.d/drbd status" now (which was 
working fine before, this now also hangs and I see:
--------
root      4912  0.0  0.0   4080   312 pts/0    D    15:27   0:00 
drbdsetup sh-status 0
--------

 From now on, all commands that depend in "drbdsetup" will hang. I can 
reboot node A, but I see a "network failure" in the DRBD log files of 
node B when I do that. It looks like DRBD is not shutting down cleanly 
on node A.

As mentioned, there is no cluster manager running/interfering on node A. 
I basically boot the system and start DRBD, but cannot stop it! Of 
course, everything was OK on DRBD 8.3.11...

My config is as follows (two DRBD resources):
---------
Node A = athene
Node B = apollon

/etc/drbd.conf:
--------
include "/etc/drbd.d/global_common.conf";
include "/etc/drbd.d/r0.res";
include "/etc/drbd.d/r1.res";

/etc/drbd.d/global_common.conf:
--------
global {
         dialog-refresh  1;
}
common {
         net {
                 protocol        C;

                 max-buffers     16k;
                 max-epoch-size  16k;

                 # Auto negotate TCP send buffer
                 sndbuf-size     0;

                 verify-alg      md5;
         }
         disk {
                 # On IO error, detach DRBD
                 on-io-error     detach;

		# We have UPS to protect the systems and are aware of the risks:
                 disk-flushes    no;
                 md-flushes      no;
                 disk-barrier    no;

                 fencing         resource-only;

                 # Max sync rate (use 50% or harddrive write speed)
                 rate            25M;

                 al-extents      3001;
         }
         startup {
                 degr-wfc-timeout        1;
                 wfc-timeout             1;
         }
         handlers {
                 fence-peer 
"/usr/lib/drbd/crm-fence-peer.sh";
                 after-resync-target 
"/usr/lib/drbd/crm-unfence-peer.sh";
                 split-brain 
"/usr/lib/drbd/notify-split-brain.sh <EM at il>";
                 local-io-error 
"/usr/lib/drbd/notify-io-error.sh <EM at il>";
         }
}

/etc/drbd.d/r0.res
--------
resource r0 {
         on athene {
                 device          /dev/drbd0 minor 0;
                 address         ipv4 10.0.0.1:7788;
                 meta-disk       internal;
                 disk            /dev/md2;
         }
         on apollon {
                 device          /dev/drbd0 minor 0;
                 address         ipv4 10.0.0.2:7788;
                 meta-disk       internal;
                 disk            /dev/md2;
         }
}

/etc/drbd.d/r1.res
--------
resource r1 {
         on athene {
                 device          /dev/drbd1 minor 1;
                 address         ipv4 10.0.0.1:7789;
                 meta-disk       internal;
                 disk            /dev/md3;
         }
         on apollon {
                 device          /dev/drbd1 minor 1;
                 address         ipv4 10.0.0.2:7789;
                 meta-disk       internal;
                 disk            /dev/md3;
         }
}
---------
I am happy about any comments about our config (we are aware of the 
risks of turning of barriers).

Did anyone experience these problems with "drbdsetup" on 8.4.1?

For the moment, I can live with our clustering just running on node B. 
Eventually, I would try to revert to DRBD 8.3.11 if I cannot resolve the 
problem...

Thanks!
-- 
Dipl.-Ing. Joschi Brauchle, M.S.

Institute for Communications Engineering (LNT)
Technische Universitaet Muenchen (TUM)
80290 Munich, Germany

Tel (work): +49 89 289-23474
Fax (work): +49 89 289-23490
E-mail: joschi.brauchle at tum.de
Web: http://www.lnt.ei.tum.de/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4607 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120623/35d1615c/attachment.bin>