Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I think it was my mistake. For any file system that is to be managed by DRBD, I should not mount the underlying logical volume directly when DRBD is not running and make changes as DRBD will not be aware of the changes. -----Original Message----- From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Dan Barker Sent: Sunday, September 02, 2012 8:51 AM To: drbd-user at lists.linbit.com Subject: Re: [DRBD-user] DRBD volumes have different files, but DRBD reports them being in sync See below >>>> -----Original Message----- From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Bala Ramakrishnan Sent: Saturday, September 01, 2012 4:24 PM To: drbd-user at lists.linbit.com Subject: [DRBD-user] DRBD volumes have different files, but DRBD reports them being in sync I have a DRBD installation on two machines running Centos 6.3 and DRBD 8.4.1 I have a single resource 'agalaxy' being synced across these two machines. This resource has two volumes: Volume 0: /dev/drbd0 mounted on /a10data And Volume 1: /dev/drbd1 mounted on /a10. Volume 0 is running Postgres. I did a lot of other activities with DRBD shutdown. >>>> Activities with DRBD shutdown? Did any of these activities affect lv_a10data or lv_a10? If so, the DRBD metadata won't know about it and can't sync it. However after a while, I found that the contents in the directory /a10data on one of the machines was different (some intermediate level directories were missing), yet DRBD (cat /proc/drbd) reported that the file systems were in sync. >>>> You can't view the secondary node - it can't be mounted. So what >>>> steps are you not telling us? Ultimately, I had to re-initialize and resync the volume by invalidating it: Drbdadm invalidate agalaxy/0 >>>> Careful that you do this on the correct node. One of the nodes >>>> should have the "correct" data and the other should have obsolete data. DRBD can sync in either direction. Until/unless you identify how you got the volumes to diverge, you'll probably repeat the problem. If the DRBD nodes get disconnected AND you modify the secondary to primary and mount/write the disk AND you then reconnect the nodes, DRBD will recognize the situation (called Split-brain). You didn't mention split brain, so you must have written to one of the LVs while DRBD was down. Has anyone run into this kind of issue? >>>> In summary, you can't say "DRBD reports them being in sync" when >>>> DRBD is down, and you can't access the secondary volumes with DRBD is up - Something is missing from your question. >>>> Dan =================================== For example, on balar-lnx3, the contents of /a10data/db/data/system was: [root at balar-lnx3 system]# ls 12531 12664 12670 12675 12681 12687 12779 12531_fsm 12666 12671 12677 12682 12688 pg_control 12531_vm 12667 12672 12678 12683 12773 pg_filenode.map 12533 12668 12672_fsm 12679 12683_fsm 12775 pg_internal.init 12534 12668_fsm 12672_vm 12679_fsm 12683_vm 12777 pgstat.stat 12662 12668_vm 12674 12679_vm 12685 12778 The contents of /a10/data/db/data/system on balar-lnx was: [root at balar-lnx system]# ls base pg_ident.conf pg_stat_tmp PG_VERSION global pg_multixact pg_subtrans pg_xlog pg_clog pg_notify pg_tblspc postgresql.conf pg_hba.conf pg_serial pg_twophase postmaster.opts [root at balar-lnx system]# The contents of /a10data/db/data/system on balar-lnx3 was actually the contents of /a10data/db/data/system/global on balar-lnx. Yet, DRBD was reporting the status: [root at balar-lnx3 ~]# cat /proc/drbd version: 8.4.1 (api:1/proto:86-100) GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at Build64R6, 2012-04-17 11:28:08 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:173009724 dw:173009724 dr:0 al:0 bm:10560 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 [root at balar-lnx3 ~]# #===================================================================== Here is my drbd conf: 1. Global: global { usage-count yes; # minor-count dialog-refresh disable-ip-verification } common { handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; } startup { wfc-timeout 0; degr-wfc-timeout 120; } options { # cpu-mask on-no-data-accessible } disk { on-io-error detach; } net { protocol C; } } 2. Agalaxy.res: resource agalaxy { disk { resync-rate 100M; fencing resource-only; } handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } on balar-lnx { address 10.0.1.1:7788; volume 0 { device /dev/drbd0; disk /dev/vg_balarlnx3/lv_a10data; meta-disk internal; } volume 1 { device /dev/drbd1; disk /dev/vg_balarlnx3/lv_a10; meta-disk internal; } } on balar-lnx3 { address 10.0.1.2:7788; volume 0 { device /dev/drbd0; disk /dev/vg_balarlnx2/lv_a10data; meta-disk internal; } volume 1 { device /dev/drbd1; disk /dev/vg_balarlnx2/lv_a10; # meta-disk /dev/vg_balarlnx2/lv_a10_drbdmeta; meta-disk internal; } } } _______________________________________________ drbd-user mailing list drbd-user at lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user _______________________________________________ drbd-user mailing list drbd-user at lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user