Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
See below >>>>
-----Original Message-----
From: drbd-user-bounces at lists.linbit.com
[mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Bala Ramakrishnan
Sent: Saturday, September 01, 2012 4:24 PM
To: drbd-user at lists.linbit.com
Subject: [DRBD-user] DRBD volumes have different files, but DRBD reports
them being in sync
I have a DRBD installation on two machines running Centos 6.3 and DRBD 8.4.1
I have a single resource 'agalaxy' being synced across these two machines.
This resource has two volumes:
Volume 0: /dev/drbd0 mounted on /a10data And Volume 1: /dev/drbd1 mounted on
/a10.
Volume 0 is running Postgres. I did a lot of other activities with DRBD
shutdown.
>>>> Activities with DRBD shutdown? Did any of these activities affect
lv_a10data or lv_a10? If so, the DRBD metadata won't know about it and can't
sync it.
However after a while, I found that the contents in the directory /a10data
on one of the machines was different (some intermediate level directories
were missing), yet DRBD (cat /proc/drbd) reported that the file systems were
in sync.
>>>> You can't view the secondary node - it can't be mounted. So what steps
are you not telling us?
Ultimately, I had to re-initialize and resync the volume by invalidating it:
Drbdadm invalidate agalaxy/0
>>>> Careful that you do this on the correct node. One of the nodes should
have the "correct" data and the other should have obsolete data. DRBD can
sync in either direction. Until/unless you identify how you got the volumes
to diverge, you'll probably repeat the problem. If the DRBD nodes get
disconnected AND you modify the secondary to primary and mount/write the
disk AND you then reconnect the nodes, DRBD will recognize the situation
(called Split-brain). You didn't mention split brain, so you must have
written to one of the LVs while DRBD was down.
Has anyone run into this kind of issue?
>>>> In summary, you can't say "DRBD reports them being in sync" when DRBD
is down, and you can't access the secondary volumes with DRBD is up -
Something is missing from your question.
>>>> Dan
===================================
For example, on balar-lnx3, the contents of /a10data/db/data/system was:
[root at balar-lnx3 system]# ls
12531 12664 12670 12675 12681 12687 12779
12531_fsm 12666 12671 12677 12682 12688 pg_control
12531_vm 12667 12672 12678 12683 12773
pg_filenode.map
12533 12668 12672_fsm 12679 12683_fsm 12775
pg_internal.init
12534 12668_fsm 12672_vm 12679_fsm 12683_vm 12777 pgstat.stat
12662 12668_vm 12674 12679_vm 12685 12778
The contents of /a10/data/db/data/system on balar-lnx was:
[root at balar-lnx system]# ls
base pg_ident.conf pg_stat_tmp PG_VERSION
global pg_multixact pg_subtrans pg_xlog
pg_clog pg_notify pg_tblspc postgresql.conf
pg_hba.conf pg_serial pg_twophase postmaster.opts
[root at balar-lnx system]#
The contents of /a10data/db/data/system on balar-lnx3 was actually the
contents of /a10data/db/data/system/global on balar-lnx. Yet, DRBD was
reporting the status:
[root at balar-lnx3 ~]# cat /proc/drbd
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at Build64R6,
2012-04-17 11:28:08
0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
ns:0 nr:173009724 dw:173009724 dr:0 al:0 bm:10560 lo:0 pe:0 ua:0 ap:0
ep:1 wo:f oos:0
1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root at balar-lnx3 ~]#
#=====================================================================
Here is my drbd conf:
1. Global:
global {
usage-count yes;
# minor-count dialog-refresh disable-ip-verification }
common {
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
halt -f";
}
startup {
wfc-timeout 0;
degr-wfc-timeout 120;
}
options {
# cpu-mask on-no-data-accessible
}
disk {
on-io-error detach;
}
net {
protocol C;
}
}
2. Agalaxy.res:
resource agalaxy {
disk {
resync-rate 100M;
fencing resource-only;
}
handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
on balar-lnx {
address 10.0.1.1:7788;
volume 0 {
device /dev/drbd0;
disk /dev/vg_balarlnx3/lv_a10data;
meta-disk internal;
}
volume 1 {
device /dev/drbd1;
disk /dev/vg_balarlnx3/lv_a10;
meta-disk internal;
}
}
on balar-lnx3 {
address 10.0.1.2:7788;
volume 0 {
device /dev/drbd0;
disk /dev/vg_balarlnx2/lv_a10data;
meta-disk internal;
}
volume 1 {
device /dev/drbd1;
disk /dev/vg_balarlnx2/lv_a10;
# meta-disk /dev/vg_balarlnx2/lv_a10_drbdmeta;
meta-disk internal;
}
}
}
_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user