Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, We used a DRBD 8.3 with three nodes manage with Heartbeat. We've got a problem between first active lower device SV1 and the stacked SV3 device. After this problem, the drbd device was always active in SV1 but only in read mode ! All third node was synchronized correctly and we could used normaly DRBD device /dev/drbd1. We have detected many network failure between first virtual device and stacked device but without problem after resynchronizations but we have the following message during production mode : Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899392, limit=253891608 Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device Thank for your help. Regards The configuration : SV1 -| lower device /dev/drbd0 | device /dev/sda9 | | SV3 | stacked device /dev/drbd1 device /dev/sda9 SV2 | lower device /dev/drbd0 -| device /dev/sda9 An extract of the kernel log : ACTIVE SV1 : Jun 18 15:54:08 sv1 kernel: drbd1: sock was shut down by peer Jun 18 15:54:08 sv1 kernel: drbd1: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Jun 18 15:54:08 sv1 kernel: drbd1: short read expecting header on sock: r=0 Jun 18 15:54:08 sv1 kernel: drbd1: Creating new current UUID Jun 18 15:54:08 sv1 kernel: drbd1: meta connection shut down by peer. Jun 18 15:54:08 sv1 kernel: drbd1: asender terminated Jun 18 15:54:08 sv1 kernel: drbd1: Terminating asender thread Jun 18 15:54:08 sv1 kernel: drbd1: Connection closed Jun 18 15:54:08 sv1 kernel: drbd1: conn( BrokenPipe -> Unconnected ) Jun 18 15:54:08 sv1 kernel: drbd1: receiver terminated Jun 18 15:54:08 sv1 kernel: drbd1: Restarting receiver thread Jun 18 15:54:08 sv1 kernel: drbd1: receiver (re)started Jun 18 15:54:08 sv1 kernel: drbd1: conn( Unconnected -> WFConnection ) Jun 18 15:54:17 sv1 kernel: drbd1: Handshake successful: Agreed network protocol version 89 Jun 18 15:54:17 sv1 kernel: drbd1: conn( WFConnection -> WFReportParams ) Jun 18 15:54:17 sv1 kernel: drbd1: Starting asender thread (from drbd1_receiver [7041]) Jun 18 15:54:17 sv1 kernel: drbd1: data-integrity-alg: <not-used> Jun 18 15:54:17 sv1 kernel: drbd1: drbd_sync_handshake: Jun 18 15:54:17 sv1 kernel: drbd1: self B7DD0B7100D56B0F:CB96E403D0949D99:6BB6E20829B0E07E:49E18AD4C1339A69 bits:53 flags:0 Jun 18 15:54:17 sv1 kernel: drbd1: peer CB96E403D0949D98:0000000000000000:6BB6E20829B0E07E:49E18AD4C1339A69 bits:0 flags:0 Jun 18 15:54:17 sv1 kernel: drbd1: uuid_compare()=1 by rule 7 Jun 18 15:54:17 sv1 kernel: drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Jun 18 15:54:58 sv1 kernel: drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Jun 18 15:54:58 sv1 kernel: drbd1: Began resync as SyncSource (will sync 212 KB [53 bits set]). Jun 18 15:55:03 sv1 kernel: drbd1: Resync done (total 4 sec; paused 0 sec; 52 K/sec) Jun 18 15:55:03 sv1 kernel: drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Jun 18 16:40:43 sv1 kernel: ERROR: (device drbd1): diAllocIno: nfreeinos = 0, but iag on freelist JFS problem with the partition Just before this problem, we have this line ( all node was initialized as the DRBD 8.3 guide on debian ) : Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899392, limit=253891608 Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899400, limit=253891608 Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899408, limit=253891608 Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899416, limit=253891608 Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899424, limit=253891608 Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899432, limit=253891608 Partition /dev/drbd0 : > fdisk -s /dev/drbd0 126949716 We the same disk structure for all nodes : /dev/sda9 > fdisk -s /dev/sda9 126953631* SLAVE SV3 : Jun 18 15:53:52 sv3 kernel: drbd1: PingAck did not arrive in time. Jun 18 15:53:52 sv3 kernel: drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jun 18 15:53:52 sv3 kernel: drbd1: asender terminated Jun 18 15:53:52 sv3 kernel: drbd1: Terminating asender thread Jun 18 15:53:52 sv3 kernel: drbd1: short read expecting header on sock: r=-512 Jun 18 15:53:52 sv3 kernel: drbd1: Connection closed Jun 18 15:53:52 sv3 kernel: drbd1: conn( NetworkFailure -> Unconnected ) An extract of the drbd configuration resource drbd-lan { protocol C; net { shared-secret "XXXX"; ping-timeout 10; #1 seconde attente réponse ping after-sb-0pri discard-older-primary; after-sb-1pri discard-secondary; after-sb-2pri call-pri-lost-after-sb; rr-conflict call-pri-lost; } syncer { rate 10M; # Limitation pour ne pas relentir le systeme al-extents 257; } handlers { pri-lost-after-sb "logger -t drbd pri-lost-after-sb ; /root/scripts/redondance/drbr_alert.sh pri-lost-after-sb; /etc/init.d/heartbeat stop; /etc/init.d/drbd stop"; pri-lost "logger -t drbd pri-lost ; /root/scripts/redondance/drbr_alert.sh pri-lost; /etc/init.d/heartbeat stop; /etc/init.d/drbd stop"; local-io-error "logger -t drbd local-io-error ; drbdadm detach all; /root/scripts/redondance/drbr_alert.sh local-io-error"; split-brain "logger -t drbd split-brain ; /root/scripts/redondance/drbr_alert.sh split-brain "; } startup { wfc-timeout 10; degr-wfc-timeout 10; } disk { on-io-error call-local-io-error; } on sv1.//xxxxxx.net { # Remplacer par le nom complet de votre cluster1. Pour le connaitre uname -n device /dev/drbd0; disk /dev/sda9; # Remplacer par une partition dédié, créer une partition NON monté address 192.168.35.1:7789; # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig meta-disk internal; } on sv2.xxxxxx//.net { # Remplacer par le nom complet de votre cluster2. Pour le connaitre uname -n device /dev/drbd0; disk /dev/sda9; # Remplacer par une partition dédié, créer une partition NON monté address 192.168.35.2:7789; # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig meta-disk internal; } } resource drbd-wan { protocol B; net { shared-secret "XXXXX"; } syncer { after drbd-lan; *rate 256;* al-extents 257; } stacked-on-top-of drbd-lan { device /dev/drbd1; address 172.16.29.3:7789; # Remplacer par l'ip generique du cluster1 } on sv3.xxxxxx.net { # Remplacer par le nom complet de votre cluster3. Pour le connaitre uname -n device /dev/drbd1; disk /dev/sda9; # Remplacer par une partition dédié, créer une partition NON monté address 172.28.15.3:7789; # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig meta-disk internal; } } -- Fabrice LE CREURER Développement / Support technique EDTI FT-MASTER Developer engineer / Helpdesk FT-MASTER product NUMLOG - Internet : http://www.numlog.fr Tel : (+33) 1 30 79 16 16 - Fax: (+33) 1 30 81 92 86