[DRBD-user] DRBD 8.3 : Active resource in read only mode after a JFS bug

Fabrice LE CREURER f.lecreurer at numlog.fr
Fri Jun 19 11:16:09 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

We used a DRBD 8.3 with three nodes manage with Heartbeat. We've got a
problem between first active lower device SV1 and the stacked SV3
device. After this problem, the drbd device was always active in SV1 but
only in read mode ! All third node was synchronized correctly and we
could used normaly DRBD device /dev/drbd1.
We have detected many network failure between first virtual device and
stacked device but without problem after resynchronizations but we have
the following message during production mode :

Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899392, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device

Thank for your help.

Regards


    The configuration :
       SV1                          -|                       
          lower device /dev/drbd0    |
          device /dev/sda9
       |
       |                               SV3
       |                                  stacked device /dev/drbd1
                                          device /dev/sda9
       SV2                           |
          lower device /dev/drbd0   -|
          device /dev/sda9


An extract of the kernel log :

ACTIVE SV1 :

Jun 18 15:54:08 sv1 kernel: drbd1: sock was shut down by peer
Jun 18 15:54:08 sv1 kernel: drbd1: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Jun 18 15:54:08 sv1 kernel: drbd1: short read expecting header on sock: r=0
Jun 18 15:54:08 sv1 kernel: drbd1: Creating new current UUID
Jun 18 15:54:08 sv1 kernel: drbd1: meta connection shut down by peer.
Jun 18 15:54:08 sv1 kernel: drbd1: asender terminated
Jun 18 15:54:08 sv1 kernel: drbd1: Terminating asender thread
Jun 18 15:54:08 sv1 kernel: drbd1: Connection closed
Jun 18 15:54:08 sv1 kernel: drbd1: conn( BrokenPipe -> Unconnected )
Jun 18 15:54:08 sv1 kernel: drbd1: receiver terminated
Jun 18 15:54:08 sv1 kernel: drbd1: Restarting receiver thread
Jun 18 15:54:08 sv1 kernel: drbd1: receiver (re)started
Jun 18 15:54:08 sv1 kernel: drbd1: conn( Unconnected -> WFConnection )
Jun 18 15:54:17 sv1 kernel: drbd1: Handshake successful: Agreed network protocol version 89
Jun 18 15:54:17 sv1 kernel: drbd1: conn( WFConnection -> WFReportParams )
Jun 18 15:54:17 sv1 kernel: drbd1: Starting asender thread (from drbd1_receiver [7041])
Jun 18 15:54:17 sv1 kernel: drbd1: data-integrity-alg: <not-used>
Jun 18 15:54:17 sv1 kernel: drbd1: drbd_sync_handshake:
Jun 18 15:54:17 sv1 kernel: drbd1: self B7DD0B7100D56B0F:CB96E403D0949D99:6BB6E20829B0E07E:49E18AD4C1339A69 bits:53 flags:0
Jun 18 15:54:17 sv1 kernel: drbd1: peer CB96E403D0949D98:0000000000000000:6BB6E20829B0E07E:49E18AD4C1339A69 bits:0 flags:0
Jun 18 15:54:17 sv1 kernel: drbd1: uuid_compare()=1 by rule 7
Jun 18 15:54:17 sv1 kernel: drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Jun 18 15:54:58 sv1 kernel: drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
Jun 18 15:54:58 sv1 kernel: drbd1: Began resync as SyncSource (will sync 212 KB [53 bits set]).
Jun 18 15:55:03 sv1 kernel: drbd1: Resync done (total 4 sec; paused 0 sec; 52 K/sec)
Jun 18 15:55:03 sv1 kernel: drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )

Jun 18 16:40:43 sv1 kernel: ERROR: (device drbd1): diAllocIno: nfreeinos = 0, but iag on freelist JFS problem with the partition

Just before this problem, we have this line ( all node was initialized as the DRBD 8.3 guide on debian ) :

Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899392, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899400, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899408, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899416, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899424, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899432, limit=253891608

Partition /dev/drbd0 :
> fdisk -s /dev/drbd0
126949716

We the same disk structure for all nodes : /dev/sda9
> fdisk -s /dev/sda9
126953631*

SLAVE SV3 :
Jun 18 15:53:52 sv3 kernel: drbd1: PingAck did not arrive in time.
Jun 18 15:53:52 sv3 kernel: drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jun 18 15:53:52 sv3 kernel: drbd1: asender terminated
Jun 18 15:53:52 sv3 kernel: drbd1: Terminating asender thread
Jun 18 15:53:52 sv3 kernel: drbd1: short read expecting header on sock: r=-512
Jun 18 15:53:52 sv3 kernel: drbd1: Connection closed
Jun 18 15:53:52 sv3 kernel: drbd1: conn( NetworkFailure -> Unconnected )


An extract of the drbd configuration

resource drbd-lan {
        protocol C;
        net {
            shared-secret "XXXX";
            ping-timeout    10; #1 seconde attente réponse ping
            after-sb-0pri   discard-older-primary;
            after-sb-1pri   discard-secondary;
            after-sb-2pri   call-pri-lost-after-sb;
            rr-conflict     call-pri-lost;
        }

        syncer {
            rate 10M;     # Limitation pour ne pas relentir le systeme
            al-extents 257;
        }

        handlers {
            pri-lost-after-sb "logger -t drbd pri-lost-after-sb ;
		/root/scripts/redondance/drbr_alert.sh pri-lost-after-sb;
		/etc/init.d/heartbeat stop; /etc/init.d/drbd stop";
            pri-lost "logger -t drbd pri-lost ;
		/root/scripts/redondance/drbr_alert.sh pri-lost;
		/etc/init.d/heartbeat stop; /etc/init.d/drbd stop";
            local-io-error "logger -t drbd local-io-error ;
		drbdadm detach all; /root/scripts/redondance/drbr_alert.sh local-io-error";
            split-brain "logger -t drbd split-brain ; /root/scripts/redondance/drbr_alert.sh split-brain ";
        }

        startup {
            wfc-timeout 10;
            degr-wfc-timeout 10;
        }

        disk {
            on-io-error call-local-io-error;
        }

        on sv1.//xxxxxx.net {         # Remplacer par le nom complet de votre cluster1. Pour le connaitre uname -n
            device    /dev/drbd0;
            disk      /dev/sda9;        # Remplacer par une partition dédié, créer une partition NON monté
            address   192.168.35.1:7789;  # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig
            meta-disk  internal;
        }

        on sv2.xxxxxx//.net {         # Remplacer par le nom complet de votre cluster2. Pour le connaitre uname -n
            device    /dev/drbd0;
            disk      /dev/sda9;        # Remplacer par une partition dédié, créer une partition NON monté
            address   192.168.35.2:7789;  # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig
            meta-disk  internal;
        }
    }

resource drbd-wan {
        protocol B;
        net {
            shared-secret "XXXXX";
        }

        syncer {
            after drbd-lan;
            *rate 256;*
            al-extents 257;
        }

        stacked-on-top-of drbd-lan {
            device    /dev/drbd1;
            address   172.16.29.3:7789;  # Remplacer par l'ip generique du cluster1
        }

        on sv3.xxxxxx.net {         # Remplacer par le nom complet de votre cluster3. Pour le connaitre uname -n
            device    /dev/drbd1;
            disk      /dev/sda9;        # Remplacer par une partition dédié, créer une partition NON monté
            address   172.28.15.3:7789;  # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig
            meta-disk  internal;
        }
    }


-- 
Fabrice LE CREURER
Développement / Support technique EDTI FT-MASTER
Developer engineer / Helpdesk FT-MASTER product
NUMLOG - Internet : http://www.numlog.fr
Tel : (+33) 1 30 79 16 16 - Fax: (+33) 1 30 81 92 86




More information about the drbd-user mailing list