Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all,
I have a problem updating an SLES11SP2 cluster from DRBD 8.3.11 to
8.4.1, following the manual at
http://www.drbd.org/users-guide/s-upgrading-drbd.html. I will post my
DRBD config below.
What I've done so far:
- Stopped Pacemaker/Corosync/OpenAIS on node A.
- Installed latest DRBD RPMs from Novell (8.4.1) on node A.
- Node B remained in DRBD 8.3.11, running all services normally.
- Rebooted node A, verified that everything is installed properly.
- Started DRBD ok on node A, using "/etc/init.d/drbd start".
- DRBD status is fine on both nodes, i.e., resources are up-to-date in
Secondary (node A)/Primary (node B) state, using "/etc/init.d/drbd status"
The problem is now, that I cannot stop DRBD on node A! As soon as I
issue "/etc/init.d/drbd stop", the command hangs and nothing happens. If
I "strg-C" it, and do a "ps aux", I see that "drbdsetup down r0" is in
dead state, see:
--------
root 4018 0.0 0.0 4080 312 pts/1 D+ Jun22 0:00
drbdsetup down r0
--------
After this, when I issue "/etc/init.d/drbd status" now (which was
working fine before, this now also hangs and I see:
--------
root 4912 0.0 0.0 4080 312 pts/0 D 15:27 0:00
drbdsetup sh-status 0
--------
From now on, all commands that depend in "drbdsetup" will hang. I can
reboot node A, but I see a "network failure" in the DRBD log files of
node B when I do that. It looks like DRBD is not shutting down cleanly
on node A.
As mentioned, there is no cluster manager running/interfering on node A.
I basically boot the system and start DRBD, but cannot stop it! Of
course, everything was OK on DRBD 8.3.11...
My config is as follows (two DRBD resources):
---------
Node A = athene
Node B = apollon
/etc/drbd.conf:
--------
include "/etc/drbd.d/global_common.conf";
include "/etc/drbd.d/r0.res";
include "/etc/drbd.d/r1.res";
/etc/drbd.d/global_common.conf:
--------
global {
dialog-refresh 1;
}
common {
net {
protocol C;
max-buffers 16k;
max-epoch-size 16k;
# Auto negotate TCP send buffer
sndbuf-size 0;
verify-alg md5;
}
disk {
# On IO error, detach DRBD
on-io-error detach;
# We have UPS to protect the systems and are aware of the risks:
disk-flushes no;
md-flushes no;
disk-barrier no;
fencing resource-only;
# Max sync rate (use 50% or harddrive write speed)
rate 25M;
al-extents 3001;
}
startup {
degr-wfc-timeout 1;
wfc-timeout 1;
}
handlers {
fence-peer
"/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target
"/usr/lib/drbd/crm-unfence-peer.sh";
split-brain
"/usr/lib/drbd/notify-split-brain.sh <EM at il>";
local-io-error
"/usr/lib/drbd/notify-io-error.sh <EM at il>";
}
}
/etc/drbd.d/r0.res
--------
resource r0 {
on athene {
device /dev/drbd0 minor 0;
address ipv4 10.0.0.1:7788;
meta-disk internal;
disk /dev/md2;
}
on apollon {
device /dev/drbd0 minor 0;
address ipv4 10.0.0.2:7788;
meta-disk internal;
disk /dev/md2;
}
}
/etc/drbd.d/r1.res
--------
resource r1 {
on athene {
device /dev/drbd1 minor 1;
address ipv4 10.0.0.1:7789;
meta-disk internal;
disk /dev/md3;
}
on apollon {
device /dev/drbd1 minor 1;
address ipv4 10.0.0.2:7789;
meta-disk internal;
disk /dev/md3;
}
}
---------
I am happy about any comments about our config (we are aware of the
risks of turning of barriers).
Did anyone experience these problems with "drbdsetup" on 8.4.1?
For the moment, I can live with our clustering just running on node B.
Eventually, I would try to revert to DRBD 8.3.11 if I cannot resolve the
problem...
Thanks!
--
Dipl.-Ing. Joschi Brauchle, M.S.
Institute for Communications Engineering (LNT)
Technische Universitaet Muenchen (TUM)
80290 Munich, Germany
Tel (work): +49 89 289-23474
Fax (work): +49 89 289-23490
E-mail: joschi.brauchle at tum.de
Web: http://www.lnt.ei.tum.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4607 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120623/35d1615c/attachment.bin>