[DRBD-user] DRBD 8.3 : Active resource in read only mode after a JFS bug

Fri Jun 19 11:41:21 CEST 2009

On Fri, Jun 19, 2009 at 11:16:09AM +0200, Fabrice LE CREURER wrote:
> Hi,
> 
> We used a DRBD 8.3 with three nodes manage with Heartbeat. We've got a
> problem between first active lower device SV1 and the stacked SV3
> device. After this problem, the drbd device was always active in SV1 but
> only in read mode ! All third node was synchronized correctly and we
> could used normaly DRBD device /dev/drbd1.
> We have detected many network failure between first virtual device and
> stacked device but without problem after resynchronizations but we have
> the following message during production mode :
> 
> Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899392, limit=253891608
> Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
> 
> Thank for your help.
> 
> Regards
> 
> 
>     The configuration :
>        SV1                          -|                       
>           lower device /dev/drbd0    |
>           device /dev/sda9
>        |
>        |                               SV3
>        |                                  stacked device /dev/drbd1
>                                           device /dev/sda9
>        SV2                           |
>           lower device /dev/drbd0   -|
>           device /dev/sda9

> Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899432, limit=253891608

> Partition /dev/drbd0 :
> > fdisk -s /dev/drbd0
> 126949716
> 
> We the same disk structure for all nodes : /dev/sda9
> > fdisk -s /dev/sda9
> 126953631

fdisk -s reports size in kB.  the kernel messages in 512 byte sectors.
lets bring them both to sectors:
sda9  -> 253907262
drbd0 -> 253899432 # this happens to be the "want=" above.
drbd1 -> 253891608 # from the "limit=" above.

sda9 253907262 sectors, - (size of internal meta data, aligned to 4k)
resulting drbd0 size: 253899432. correct.
drbd0 size - (size of internal meta data of the "upper", stacked, drbd1)
resulting drbd1 size: 253891608. correct.
but your  file system tries to access the sector number corresponding
to the last sector of drbd0.

conclusio: you created your file system on the (lower) drbd0,
and when creating the meta data for the (upper) drbd1,
you truncated your file system.

> An extract of the drbd configuration

not commented further, but you configure your lower drbd
to call a handler on IO error, which calls drbdadm detach all;
but your upper drbd does ignore IO errors,
passing them on to the file system.  which in your case apparently was
set to "remount readonly" on io errors.

(the "all" only acts on the "lower" resources; for the "upper" resources
 you'd need an additional a "drbdadm --stacked all".
and I really recommend to leave the setting on the default "detach",
and not call any handler, but monitor the system via some other means.)

> resource drbd-lan {
>         protocol C;
>         net {
>             shared-secret "XXXX";
>             ping-timeout    10; #1 seconde attente réponse ping
>             after-sb-0pri   discard-older-primary;
>             after-sb-1pri   discard-secondary;
>             after-sb-2pri   call-pri-lost-after-sb;
>             rr-conflict     call-pri-lost;
>         }
> 
>         syncer {
>             rate 10M;     # Limitation pour ne pas relentir le systeme
>             al-extents 257;
>         }
> 
>         handlers {
>             pri-lost-after-sb "logger -t drbd pri-lost-after-sb ;
> 		/root/scripts/redondance/drbr_alert.sh pri-lost-after-sb;
> 		/etc/init.d/heartbeat stop; /etc/init.d/drbd stop";
>             pri-lost "logger -t drbd pri-lost ;
> 		/root/scripts/redondance/drbr_alert.sh pri-lost;
> 		/etc/init.d/heartbeat stop; /etc/init.d/drbd stop";
>             local-io-error "logger -t drbd local-io-error ;
> 		drbdadm detach all; /root/scripts/redondance/drbr_alert.sh local-io-error";
>             split-brain "logger -t drbd split-brain ; /root/scripts/redondance/drbr_alert.sh split-brain ";
>         }
> 
>         startup {
>             wfc-timeout 10;
>             degr-wfc-timeout 10;
>         }
> 
>         disk {
>             on-io-error call-local-io-error;
>         }
> 
>         on sv1.//xxxxxx.net {         # Remplacer par le nom complet de votre cluster1. Pour le connaitre uname -n
>             device    /dev/drbd0;
>             disk      /dev/sda9;        # Remplacer par une partition dédié, créer une partition NON monté
>             address   192.168.35.1:7789;  # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig
>             meta-disk  internal;
>         }
> 
>         on sv2.xxxxxx//.net {         # Remplacer par le nom complet de votre cluster2. Pour le connaitre uname -n
>             device    /dev/drbd0;
>             disk      /dev/sda9;        # Remplacer par une partition dédié, créer une partition NON monté
>             address   192.168.35.2:7789;  # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig
>             meta-disk  internal;
>         }
>     }
> 
> resource drbd-wan {
>         protocol B;
>         net {
>             shared-secret "XXXXX";
>         }
> 
>         syncer {
>             after drbd-lan;
>             *rate 256;*
>             al-extents 257;
>         }
> 
>         stacked-on-top-of drbd-lan {
>             device    /dev/drbd1;
>             address   172.16.29.3:7789;  # Remplacer par l'ip generique du cluster1
>         }
> 
>         on sv3.xxxxxx.net {         # Remplacer par le nom complet de votre cluster3. Pour le connaitre uname -n
>             device    /dev/drbd1;
>             disk      /dev/sda9;        # Remplacer par une partition dédié, créer une partition NON monté
>             address   172.28.15.3:7789;  # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig
>             meta-disk  internal;
>         }
>     }

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed