Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all, I have a problem updating an SLES11SP2 cluster from DRBD 8.3.11 to 8.4.1, following the manual at http://www.drbd.org/users-guide/s-upgrading-drbd.html. I will post my DRBD config below. What I've done so far: - Stopped Pacemaker/Corosync/OpenAIS on node A. - Installed latest DRBD RPMs from Novell (8.4.1) on node A. - Node B remained in DRBD 8.3.11, running all services normally. - Rebooted node A, verified that everything is installed properly. - Started DRBD ok on node A, using "/etc/init.d/drbd start". - DRBD status is fine on both nodes, i.e., resources are up-to-date in Secondary (node A)/Primary (node B) state, using "/etc/init.d/drbd status" The problem is now, that I cannot stop DRBD on node A! As soon as I issue "/etc/init.d/drbd stop", the command hangs and nothing happens. If I "strg-C" it, and do a "ps aux", I see that "drbdsetup down r0" is in dead state, see: -------- root 4018 0.0 0.0 4080 312 pts/1 D+ Jun22 0:00 drbdsetup down r0 -------- After this, when I issue "/etc/init.d/drbd status" now (which was working fine before, this now also hangs and I see: -------- root 4912 0.0 0.0 4080 312 pts/0 D 15:27 0:00 drbdsetup sh-status 0 -------- From now on, all commands that depend in "drbdsetup" will hang. I can reboot node A, but I see a "network failure" in the DRBD log files of node B when I do that. It looks like DRBD is not shutting down cleanly on node A. As mentioned, there is no cluster manager running/interfering on node A. I basically boot the system and start DRBD, but cannot stop it! Of course, everything was OK on DRBD 8.3.11... My config is as follows (two DRBD resources): --------- Node A = athene Node B = apollon /etc/drbd.conf: -------- include "/etc/drbd.d/global_common.conf"; include "/etc/drbd.d/r0.res"; include "/etc/drbd.d/r1.res"; /etc/drbd.d/global_common.conf: -------- global { dialog-refresh 1; } common { net { protocol C; max-buffers 16k; max-epoch-size 16k; # Auto negotate TCP send buffer sndbuf-size 0; verify-alg md5; } disk { # On IO error, detach DRBD on-io-error detach; # We have UPS to protect the systems and are aware of the risks: disk-flushes no; md-flushes no; disk-barrier no; fencing resource-only; # Max sync rate (use 50% or harddrive write speed) rate 25M; al-extents 3001; } startup { degr-wfc-timeout 1; wfc-timeout 1; } handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; split-brain "/usr/lib/drbd/notify-split-brain.sh <EM at il>"; local-io-error "/usr/lib/drbd/notify-io-error.sh <EM at il>"; } } /etc/drbd.d/r0.res -------- resource r0 { on athene { device /dev/drbd0 minor 0; address ipv4 10.0.0.1:7788; meta-disk internal; disk /dev/md2; } on apollon { device /dev/drbd0 minor 0; address ipv4 10.0.0.2:7788; meta-disk internal; disk /dev/md2; } } /etc/drbd.d/r1.res -------- resource r1 { on athene { device /dev/drbd1 minor 1; address ipv4 10.0.0.1:7789; meta-disk internal; disk /dev/md3; } on apollon { device /dev/drbd1 minor 1; address ipv4 10.0.0.2:7789; meta-disk internal; disk /dev/md3; } } --------- I am happy about any comments about our config (we are aware of the risks of turning of barriers). Did anyone experience these problems with "drbdsetup" on 8.4.1? For the moment, I can live with our clustering just running on node B. Eventually, I would try to revert to DRBD 8.3.11 if I cannot resolve the problem... Thanks! -- Dipl.-Ing. Joschi Brauchle, M.S. Institute for Communications Engineering (LNT) Technische Universitaet Muenchen (TUM) 80290 Munich, Germany Tel (work): +49 89 289-23474 Fax (work): +49 89 289-23490 E-mail: joschi.brauchle at tum.de Web: http://www.lnt.ei.tum.de/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4607 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120623/35d1615c/attachment.bin>