[DRBD-user] node shows uptodate and oos:****** ?

Lee Musgrave lee at sclinternet.co.uk
Fri Sep 18 01:07:51 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 17 September 2015 at 21:25, Eric Blevins <ericlb100 at gmail.com> wrote:

> The nodes marked diskless are just that, diskless.
> Wherever its diskless no data is being written.
>
> DRBD will read/write from the other up2date node when the local node is
> diskless
> Such as resource 1 is reading/writing all data from node2.
>
>
ok, that's what i thought was happening, nice to have it confirmed.
that means if necessary (ie. last resort) i can completely remove node 1
and rebuild it from scratch.

what confused me is the diskless resources on node 1 can't be  writing
data, and the resources on node 2 are fully up to date, so why is it node 2
that shows data like oos:1825092



> You need to figure out why those resources are diskless and fix it.
>
> What do your DRBD configs look like?
>




global {
usage-count no;
}


resource iscsi.config {
protocol C;

handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}

startup {
# become-primary-on nas1;
wfc-timeout 120;
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
}

net {
cram-hmac-alg sha1;
shared-secret "password";
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

syncer {
rate 30M;
verify-alg sha1;
al-extents 257;
}

on nas1 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.61.1:7788;
meta-disk /dev/sda2[0];
}
on nas2 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.61.2:7788;
meta-disk /dev/sda2[0];
}
}


resource iscsi.target.0 {
protocol C;

handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}

startup {
# become-primary-on nas1;
wfc-timeout 120;
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
}

net {
cram-hmac-alg sha1;
shared-secret "password";
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

syncer {
rate 30M;
verify-alg sha1;
al-extents 257;
}

on nas1 {
device /dev/drbd1;
disk /dev/sdb2;
address 192.168.61.1:7789;
meta-disk /dev/sda2[1];
}
on nas2 {
device /dev/drbd1;
disk /dev/sdb2;
address 192.168.61.2:7789;
meta-disk /dev/sda2[1];
}
}

resource iscsi.ocfs2 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}

startup {
become-primary-on nas1;
wfc-timeout 120;
degr-wfc-timeout 120;
}

disk {
on-io-error detach;
# fencing resource-and-stonith;
}

syncer {
rate 30M;
verify-alg sha1;
}

net {
# allow-two-primaries;
cram-hmac-alg sha1;
shared-secret "password";
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

on nas1 {
device /dev/drbd2;
disk /dev/sdc1;
address 192.168.61.1:7790;
meta-disk /dev/sda2[2];
}

on nas2 {
device /dev/drbd2;
disk /dev/sdc1;
address 192.168.61.2:7790;
meta-disk /dev/sda2[2];
}
}

resource iscsi.db-bak {
protocol C;

handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}
startup {
become-primary-on nas1;
wfc-timeout 120;
degr-wfc-timeout 120;
}

disk {
on-io-error detach;
# fencing resource-and-stonith;
}

syncer {
rate 30M;
verify-alg sha1;
al-extents 257;
}

net {
cram-hmac-alg sha1;
shared-secret "password";
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

on nas1 {
device /dev/drbd3;
disk /dev/sdc2;
address 192.168.61.1:7791;
meta-disk /dev/sda2[3];
}

on nas2 {
device /dev/drbd3;
disk /dev/sdc2;
address 192.168.61.2:7791;
meta-disk /dev/sda2[3];
}
}


resource iscsi.mail {
protocol C;

handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}

startup {
become-primary-on nas1;
wfc-timeout 120;
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
}

net {
cram-hmac-alg sha1;
shared-secret "password";
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

syncer {
rate 30M;
verify-alg sha1;
al-extents 257;
}

on nas1 {
device /dev/drbd4;
disk /dev/sdb3;
address 192.168.61.1:7792;
meta-disk /dev/sda2[4];
}
on nas2 {
device /dev/drbd4;
disk /dev/sdb3;
address 192.168.61.2:7792;
meta-disk /dev/sda2[4];
}
}






>
> On Thu, Sep 17, 2015 at 9:18 AM, Lee Musgrave <lee at sclinternet.co.uk>
> wrote:
> > Hi,
> >
> > i'm a little confused by this, and i don't want to do anything with these
> > systems until i've got some clarity.
> >
> > i'm using drbd 8.3.13 on ubuntu 12.04.3,  ( i know i should upgrade, but
> > that isn't an option right now).
> >
> > the metadata partition is on /dev/sda2, the os is installed on /dev/sda1
> >
> >
> > on node1  cat /proc/drbd shows:
> >
> >
> >  0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
> >     ns:40 nr:0 dw:12 dr:1104 al:2 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
> oos:0
> >  1: cs:Connected ro:Primary/Secondary ds:Diskless/UpToDate C r-----
> >     ns:2015462262 nr:516042356 dw:1353605609 dr:1812380379 al:67433064
> bm:3
> > lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> >  2: cs:Connected ro:Primary/Secondary ds:Diskless/UpToDate C r-----
> >     ns:536493384 nr:80 dw:502334308 dr:1646088 al:28554 bm:0 lo:0 pe:0
> ua:0
> > ap:0 ep:1 wo:f oos:0
> >  3: cs:Connected ro:Primary/Secondary ds:Diskless/UpToDate C r-----
> >     ns:201451152 nr:0 dw:131716884 dr:84892 al:211 bm:0 lo:0 pe:0 ua:0
> ap:0
> > ep:1 wo:f oos:0
> >  4: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
> >     ns:3202744 nr:0 dw:3202744 dr:349700 al:57 bm:0 lo:0 pe:0 ua:0 ap:0
> ep:1
> > wo:f oos:0
> >
> >
> > on node2 :
> >
> >  0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
> >     ns:0 nr:40 dw:40 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> >  1: cs:Connected ro:Secondary/Primary ds:UpToDate/Diskless C r-----
> >     ns:516043328 nr:2015478830 dw:2015478830 dr:516043328 al:33272078
> bm:3
> > lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:289286616
> >  2: cs:Connected ro:Secondary/Primary ds:UpToDate/Diskless C r-----
> >     ns:80 nr:536493908 dw:536493908 dr:80 al:536 bm:0 lo:0 pe:0 ua:0 ap:0
> > ep:1 wo:f oos:1825092
> >  3: cs:Connected ro:Secondary/Primary ds:UpToDate/Diskless C r-----
> >     ns:0 nr:201451968 dw:201451968 dr:0 al:298 bm:0 lo:0 pe:0 ua:0 ap:0
> ep:1
> > wo:f oos:91228
> >  4: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
> >     ns:0 nr:3202744 dw:3202744 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1
> wo:f
> > oos:0
> >
> >
> > 0 is on /dev/sdb1, 1 is on /dev/sdb2, 2 is on /dev/sdc1, 3 is on
> /dev/sdc2,
> > 4 is on /dev/sdb3
> >
> > /dev/sdb and /dev/sdc are 2 separate raid 10 arrays, each of 4 disks.
> >
> >
> > i don't believe there is any problem with the raid arrays, or their
> > composite disks. on node 1, the filesystem is currently mounted
> readonly, so
> > i believe the problem is the os disk, which also has the metadata
> partition
> > on it.
> >
> > does this seem the most likely to you?
> >
> > what's confusing me, is it's the node 2 partitions showing as
> out-of-sync,
> > how? surely it's the diskless partitions that are oos? is it keeping
> > everything in memory? are changes actually getting written to the disks
> on
> > node2?
> >
> > as i said, i believe the problem to be access to the metadata, since all
> 5
> > drbd partitions share the same metadata partition, what would be the
> > recommended recovery method, i don't even want to try remounting / as rw
> > until i've got a bit more information, right now things are still
> working,
> > although in a degraded state, and it's not live, but i want to treat it
> as
> > live so i know i can recover from the same situation when it is in
> > production, so downtime or data loss needs to be avoided if at all
> possible.
> >
> >
> > thanks
> > lee.
> >
> > _______________________________________________
> > drbd-user mailing list
> > drbd-user at lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> >
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150918/8902c48e/attachment.htm>


More information about the drbd-user mailing list