Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 17 September 2015 at 21:25, Eric Blevins <ericlb100 at gmail.com> wrote: > The nodes marked diskless are just that, diskless. > Wherever its diskless no data is being written. > > DRBD will read/write from the other up2date node when the local node is > diskless > Such as resource 1 is reading/writing all data from node2. > > ok, that's what i thought was happening, nice to have it confirmed. that means if necessary (ie. last resort) i can completely remove node 1 and rebuild it from scratch. what confused me is the diskless resources on node 1 can't be writing data, and the resources on node 2 are fully up to date, so why is it node 2 that shows data like oos:1825092 > You need to figure out why those resources are diskless and fix it. > > What do your DRBD configs look like? > global { usage-count no; } resource iscsi.config { protocol C; handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } startup { # become-primary-on nas1; wfc-timeout 120; degr-wfc-timeout 120; } disk { on-io-error detach; } net { cram-hmac-alg sha1; shared-secret "password"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 30M; verify-alg sha1; al-extents 257; } on nas1 { device /dev/drbd0; disk /dev/sdb1; address 192.168.61.1:7788; meta-disk /dev/sda2[0]; } on nas2 { device /dev/drbd0; disk /dev/sdb1; address 192.168.61.2:7788; meta-disk /dev/sda2[0]; } } resource iscsi.target.0 { protocol C; handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } startup { # become-primary-on nas1; wfc-timeout 120; degr-wfc-timeout 120; } disk { on-io-error detach; } net { cram-hmac-alg sha1; shared-secret "password"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 30M; verify-alg sha1; al-extents 257; } on nas1 { device /dev/drbd1; disk /dev/sdb2; address 192.168.61.1:7789; meta-disk /dev/sda2[1]; } on nas2 { device /dev/drbd1; disk /dev/sdb2; address 192.168.61.2:7789; meta-disk /dev/sda2[1]; } } resource iscsi.ocfs2 { protocol C; handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } startup { become-primary-on nas1; wfc-timeout 120; degr-wfc-timeout 120; } disk { on-io-error detach; # fencing resource-and-stonith; } syncer { rate 30M; verify-alg sha1; } net { # allow-two-primaries; cram-hmac-alg sha1; shared-secret "password"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; rr-conflict disconnect; } on nas1 { device /dev/drbd2; disk /dev/sdc1; address 192.168.61.1:7790; meta-disk /dev/sda2[2]; } on nas2 { device /dev/drbd2; disk /dev/sdc1; address 192.168.61.2:7790; meta-disk /dev/sda2[2]; } } resource iscsi.db-bak { protocol C; handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } startup { become-primary-on nas1; wfc-timeout 120; degr-wfc-timeout 120; } disk { on-io-error detach; # fencing resource-and-stonith; } syncer { rate 30M; verify-alg sha1; al-extents 257; } net { cram-hmac-alg sha1; shared-secret "password"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; rr-conflict disconnect; } on nas1 { device /dev/drbd3; disk /dev/sdc2; address 192.168.61.1:7791; meta-disk /dev/sda2[3]; } on nas2 { device /dev/drbd3; disk /dev/sdc2; address 192.168.61.2:7791; meta-disk /dev/sda2[3]; } } resource iscsi.mail { protocol C; handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } startup { become-primary-on nas1; wfc-timeout 120; degr-wfc-timeout 120; } disk { on-io-error detach; } net { cram-hmac-alg sha1; shared-secret "password"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 30M; verify-alg sha1; al-extents 257; } on nas1 { device /dev/drbd4; disk /dev/sdb3; address 192.168.61.1:7792; meta-disk /dev/sda2[4]; } on nas2 { device /dev/drbd4; disk /dev/sdb3; address 192.168.61.2:7792; meta-disk /dev/sda2[4]; } } > > On Thu, Sep 17, 2015 at 9:18 AM, Lee Musgrave <lee at sclinternet.co.uk> > wrote: > > Hi, > > > > i'm a little confused by this, and i don't want to do anything with these > > systems until i've got some clarity. > > > > i'm using drbd 8.3.13 on ubuntu 12.04.3, ( i know i should upgrade, but > > that isn't an option right now). > > > > the metadata partition is on /dev/sda2, the os is installed on /dev/sda1 > > > > > > on node1 cat /proc/drbd shows: > > > > > > 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- > > ns:40 nr:0 dw:12 dr:1104 al:2 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f > oos:0 > > 1: cs:Connected ro:Primary/Secondary ds:Diskless/UpToDate C r----- > > ns:2015462262 nr:516042356 dw:1353605609 dr:1812380379 al:67433064 > bm:3 > > lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 > > 2: cs:Connected ro:Primary/Secondary ds:Diskless/UpToDate C r----- > > ns:536493384 nr:80 dw:502334308 dr:1646088 al:28554 bm:0 lo:0 pe:0 > ua:0 > > ap:0 ep:1 wo:f oos:0 > > 3: cs:Connected ro:Primary/Secondary ds:Diskless/UpToDate C r----- > > ns:201451152 nr:0 dw:131716884 dr:84892 al:211 bm:0 lo:0 pe:0 ua:0 > ap:0 > > ep:1 wo:f oos:0 > > 4: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- > > ns:3202744 nr:0 dw:3202744 dr:349700 al:57 bm:0 lo:0 pe:0 ua:0 ap:0 > ep:1 > > wo:f oos:0 > > > > > > on node2 : > > > > 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- > > ns:0 nr:40 dw:40 dr:0 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 > > 1: cs:Connected ro:Secondary/Primary ds:UpToDate/Diskless C r----- > > ns:516043328 nr:2015478830 dw:2015478830 dr:516043328 al:33272078 > bm:3 > > lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:289286616 > > 2: cs:Connected ro:Secondary/Primary ds:UpToDate/Diskless C r----- > > ns:80 nr:536493908 dw:536493908 dr:80 al:536 bm:0 lo:0 pe:0 ua:0 ap:0 > > ep:1 wo:f oos:1825092 > > 3: cs:Connected ro:Secondary/Primary ds:UpToDate/Diskless C r----- > > ns:0 nr:201451968 dw:201451968 dr:0 al:298 bm:0 lo:0 pe:0 ua:0 ap:0 > ep:1 > > wo:f oos:91228 > > 4: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- > > ns:0 nr:3202744 dw:3202744 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 > wo:f > > oos:0 > > > > > > 0 is on /dev/sdb1, 1 is on /dev/sdb2, 2 is on /dev/sdc1, 3 is on > /dev/sdc2, > > 4 is on /dev/sdb3 > > > > /dev/sdb and /dev/sdc are 2 separate raid 10 arrays, each of 4 disks. > > > > > > i don't believe there is any problem with the raid arrays, or their > > composite disks. on node 1, the filesystem is currently mounted > readonly, so > > i believe the problem is the os disk, which also has the metadata > partition > > on it. > > > > does this seem the most likely to you? > > > > what's confusing me, is it's the node 2 partitions showing as > out-of-sync, > > how? surely it's the diskless partitions that are oos? is it keeping > > everything in memory? are changes actually getting written to the disks > on > > node2? > > > > as i said, i believe the problem to be access to the metadata, since all > 5 > > drbd partitions share the same metadata partition, what would be the > > recommended recovery method, i don't even want to try remounting / as rw > > until i've got a bit more information, right now things are still > working, > > although in a degraded state, and it's not live, but i want to treat it > as > > live so i know i can recover from the same situation when it is in > > production, so downtime or data loss needs to be avoided if at all > possible. > > > > > > thanks > > lee. > > > > _______________________________________________ > > drbd-user mailing list > > drbd-user at lists.linbit.com > > http://lists.linbit.com/mailman/listinfo/drbd-user > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150918/8902c48e/attachment.htm>