Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Servus !
Am 24.02.2017 um 15:53 schrieb Lars Ellenberg:
> On Fri, Feb 24, 2017 at 03:08:04PM +0100, Dr. Volker Jaenisch wrote:
>> If both 10Gbit links fail then the bond0 aka the worker connection fails
>> and DRBD goes - as expected - into split brain. But that is not the problem.
>
> DRBD will be *disconnected*, yes.
Sorry, was not precise in my wording. But I assumed that after going
into disconnect state the Cluster manager is informed and reflects this
somehow.
I now noticed that a CIB rule is set on the former primary to stay
primary (please have a look at the cluster state at the end of this
email.) but I still wonder why this is not reflected in the crm status.
I was misled by this missing status information and concluded wrongly,
that the ocf:linbit:drbd plugin does not inform the CRM/CIB. Sorry, for
blaiming drbd.
But I am still confused about the behavior of pacemaker in not
reflecting the change of DRBD in the crm status. Maybe this question
should go to the pacemaker list.
> But no reason for it to be "split brain"ed yet.
> and with proper fencing configured, it won't.
This is our DRBD config. This is all quite basic:
resource r0 {
disk {
fencing resource-only;
}
handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
on mail1 {
device /dev/drbd1;
disk /dev/sda1;
address 172.27.250.8:7789;
meta-disk internal;
}
on mail2 {
device /dev/drbd1;
disk /dev/sda1;
address 172.27.250.9:7789;
meta-disk internal;
}
}
*What did we miss?* We have no Stonith configured, yet. And IMHO a
missing stonith configuration should not interfere with the DRBD-state
change. Or am I wrong with this assumption?
State after bond0 goes down:
root at mail1:/home/volker# crm status
Stack: corosync
Current DC: mail2 (version 1.1.15-e174ec8) - partition with quorum
Last updated: Fri Feb 24 16:56:44 2017 Last change: Fri Feb 24
16:45:19 2017 by root via cibadmin on mail2
2 nodes and 7 resources configured
Online: [ mail1 mail2 ]
Full list of resources:
Master/Slave Set: ms_drbd_mail [drbd_mail]
Masters: [ mail2 ]
Slaves: [ mail1 ]
Resource Group: FS_IP
fs_mail (ocf::heartbeat:Filesystem): Started mail2
vip_193.239.30.23 (ocf::heartbeat:IPaddr2): Started mail2
vip_172.27.250.7 (ocf::heartbeat:IPaddr2): Started mail2
Resource Group: Services
postgres_pg2 (ocf::heartbeat:pgsql): Started mail2
Dovecot (lsb:dovecot): Started mail2
Failed Actions:
* vip_172.27.250.7_monitor_30000 on mail2 'not running' (7): call=55,
status=complete, exitreason='none',
last-rc-change='Fri Feb 24 16:47:07 2017', queued=0ms, exec=0ms
root at mail2:/home/volker# drbd-overview
1:r0/0 StandAlone Primary/Unknown UpToDate/Outdated /shared/data ext4
916G 12G 858G 2%
root at mail1:/home/volker#
drbd-overview
1:r0/0 WFConnection Secondary/Unknown UpToDate/DUnknown
And after bringing up bond0 again the same state on both machines.
After cleanup of the failed VIP interface still the same state:
root at mail2:/home/volker# crm status
Stack: corosync
Current DC: mail2 (version 1.1.15-e174ec8) - partition with quorum
Last updated: Fri Feb 24 17:01:05 2017 Last change: Fri Feb 24
16:59:32 2017 by hacluster via crmd on mail2
2 nodes and 7 resources configured
Online: [ mail1 mail2 ]
Full list of resources:
Master/Slave Set: ms_drbd_mail [drbd_mail]
Masters: [ mail2 ]
Slaves: [ mail1 ]
Resource Group: FS_IP
fs_mail (ocf::heartbeat:Filesystem): Started mail2
vip_193.239.30.23 (ocf::heartbeat:IPaddr2): Started mail2
vip_172.27.250.7 (ocf::heartbeat:IPaddr2): Started mail2
Resource Group: Services
postgres_pg2 (ocf::heartbeat:pgsql): Started mail2
Dovecot (lsb:dovecot): Started mail2
root at mail2:/home/volker# drbd-overview
1:r0/0 StandAlone Primary/Unknown UpToDate/Outdated /shared/data ext4
916G 12G 858G 2%
root at mail1:/home/volker# drbd-overview
1:r0/0 WFConnection Secondary/Unknown UpToDate/DUnknown
After issuing a
mail2# drbdadm connect all
the nodes resync and everything is in best order (The "sticky" rule is
cleared also).
Cheers,
Volker
General Setup : Stock Debian Jessie without any modifications. DRBD,
Pacemaker etc. all Debian.
Here our crm config:
node 740030984: mail1 \
attributes standby=off
node 740030985: mail2 \
attributes standby=off
primitive Dovecot lsb:dovecot \
op monitor interval=20s timeout=15s \
meta target-role=Started
primitive drbd_mail ocf:linbit:drbd \
params drbd_resource=r0 \
op monitor interval=15s role=Master \
op monitor interval=16s role=Slave \
op start interval=0 timeout=240s \
op stop interval=0 timeout=100s
...
ms ms_drbd_mail drbd_mail \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true is-managed=true target-role=Started
order FS_IP_after_drbd inf: ms_drbd_mail:promote FS_IP:start
order dovecot_after_FS_IP inf: FS_IP:start Services:start
location drbd-fence-by-handler-r0-ms_drbd_mail ms_drbd_mail \
*rule $role=Master -inf: #uname ne mail2*
colocation mail_fs_on_drbd inf: FS_IP Services ms_drbd_mail:Master
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.15-e174ec8 \
cluster-infrastructure=corosync \
cluster-name=mail \
stonith-enabled=false \
last-lrm-refresh=1487951972 \
no-quorum-policy=ignore
--
=========================================================
inqbus Scientific Computing Dr. Volker Jaenisch
Richard-Strauss-Straße 1 +49(08861) 690 474 0
86956 Schongau-West http://www.inqbus.de
=========================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170224/26adc507/attachment.htm>