Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, Jun 19, 2009 at 11:16:09AM +0200, Fabrice LE CREURER wrote: > Hi, > > We used a DRBD 8.3 with three nodes manage with Heartbeat. We've got a > problem between first active lower device SV1 and the stacked SV3 > device. After this problem, the drbd device was always active in SV1 but > only in read mode ! All third node was synchronized correctly and we > could used normaly DRBD device /dev/drbd1. > We have detected many network failure between first virtual device and > stacked device but without problem after resynchronizations but we have > the following message during production mode : > > Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899392, limit=253891608 > Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device > > Thank for your help. > > Regards > > > The configuration : > SV1 -| > lower device /dev/drbd0 | > device /dev/sda9 > | > | SV3 > | stacked device /dev/drbd1 > device /dev/sda9 > SV2 | > lower device /dev/drbd0 -| > device /dev/sda9 > Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899432, limit=253891608 > Partition /dev/drbd0 : > > fdisk -s /dev/drbd0 > 126949716 > > We the same disk structure for all nodes : /dev/sda9 > > fdisk -s /dev/sda9 > 126953631 fdisk -s reports size in kB. the kernel messages in 512 byte sectors. lets bring them both to sectors: sda9 -> 253907262 drbd0 -> 253899432 # this happens to be the "want=" above. drbd1 -> 253891608 # from the "limit=" above. sda9 253907262 sectors, - (size of internal meta data, aligned to 4k) resulting drbd0 size: 253899432. correct. drbd0 size - (size of internal meta data of the "upper", stacked, drbd1) resulting drbd1 size: 253891608. correct. but your file system tries to access the sector number corresponding to the last sector of drbd0. conclusio: you created your file system on the (lower) drbd0, and when creating the meta data for the (upper) drbd1, you truncated your file system. > An extract of the drbd configuration not commented further, but you configure your lower drbd to call a handler on IO error, which calls drbdadm detach all; but your upper drbd does ignore IO errors, passing them on to the file system. which in your case apparently was set to "remount readonly" on io errors. (the "all" only acts on the "lower" resources; for the "upper" resources you'd need an additional a "drbdadm --stacked all". and I really recommend to leave the setting on the default "detach", and not call any handler, but monitor the system via some other means.) > resource drbd-lan { > protocol C; > net { > shared-secret "XXXX"; > ping-timeout 10; #1 seconde attente réponse ping > after-sb-0pri discard-older-primary; > after-sb-1pri discard-secondary; > after-sb-2pri call-pri-lost-after-sb; > rr-conflict call-pri-lost; > } > > syncer { > rate 10M; # Limitation pour ne pas relentir le systeme > al-extents 257; > } > > handlers { > pri-lost-after-sb "logger -t drbd pri-lost-after-sb ; > /root/scripts/redondance/drbr_alert.sh pri-lost-after-sb; > /etc/init.d/heartbeat stop; /etc/init.d/drbd stop"; > pri-lost "logger -t drbd pri-lost ; > /root/scripts/redondance/drbr_alert.sh pri-lost; > /etc/init.d/heartbeat stop; /etc/init.d/drbd stop"; > local-io-error "logger -t drbd local-io-error ; > drbdadm detach all; /root/scripts/redondance/drbr_alert.sh local-io-error"; > split-brain "logger -t drbd split-brain ; /root/scripts/redondance/drbr_alert.sh split-brain "; > } > > startup { > wfc-timeout 10; > degr-wfc-timeout 10; > } > > disk { > on-io-error call-local-io-error; > } > > on sv1.//xxxxxx.net { # Remplacer par le nom complet de votre cluster1. Pour le connaitre uname -n > device /dev/drbd0; > disk /dev/sda9; # Remplacer par une partition dédié, créer une partition NON monté > address 192.168.35.1:7789; # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig > meta-disk internal; > } > > on sv2.xxxxxx//.net { # Remplacer par le nom complet de votre cluster2. Pour le connaitre uname -n > device /dev/drbd0; > disk /dev/sda9; # Remplacer par une partition dédié, créer une partition NON monté > address 192.168.35.2:7789; # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig > meta-disk internal; > } > } > > resource drbd-wan { > protocol B; > net { > shared-secret "XXXXX"; > } > > syncer { > after drbd-lan; > *rate 256;* > al-extents 257; > } > > stacked-on-top-of drbd-lan { > device /dev/drbd1; > address 172.16.29.3:7789; # Remplacer par l'ip generique du cluster1 > } > > on sv3.xxxxxx.net { # Remplacer par le nom complet de votre cluster3. Pour le connaitre uname -n > device /dev/drbd1; > disk /dev/sda9; # Remplacer par une partition dédié, créer une partition NON monté > address 172.28.15.3:7789; # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig > meta-disk internal; > } > } -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed