Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
We used a DRBD 8.3 with three nodes manage with Heartbeat. We've got a
problem between first active lower device SV1 and the stacked SV3
device. After this problem, the drbd device was always active in SV1 but
only in read mode ! All third node was synchronized correctly and we
could used normaly DRBD device /dev/drbd1.
We have detected many network failure between first virtual device and
stacked device but without problem after resynchronizations but we have
the following message during production mode :
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899392, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
Thank for your help.
Regards
The configuration :
SV1 -|
lower device /dev/drbd0 |
device /dev/sda9
|
| SV3
| stacked device /dev/drbd1
device /dev/sda9
SV2 |
lower device /dev/drbd0 -|
device /dev/sda9
An extract of the kernel log :
ACTIVE SV1 :
Jun 18 15:54:08 sv1 kernel: drbd1: sock was shut down by peer
Jun 18 15:54:08 sv1 kernel: drbd1: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Jun 18 15:54:08 sv1 kernel: drbd1: short read expecting header on sock: r=0
Jun 18 15:54:08 sv1 kernel: drbd1: Creating new current UUID
Jun 18 15:54:08 sv1 kernel: drbd1: meta connection shut down by peer.
Jun 18 15:54:08 sv1 kernel: drbd1: asender terminated
Jun 18 15:54:08 sv1 kernel: drbd1: Terminating asender thread
Jun 18 15:54:08 sv1 kernel: drbd1: Connection closed
Jun 18 15:54:08 sv1 kernel: drbd1: conn( BrokenPipe -> Unconnected )
Jun 18 15:54:08 sv1 kernel: drbd1: receiver terminated
Jun 18 15:54:08 sv1 kernel: drbd1: Restarting receiver thread
Jun 18 15:54:08 sv1 kernel: drbd1: receiver (re)started
Jun 18 15:54:08 sv1 kernel: drbd1: conn( Unconnected -> WFConnection )
Jun 18 15:54:17 sv1 kernel: drbd1: Handshake successful: Agreed network protocol version 89
Jun 18 15:54:17 sv1 kernel: drbd1: conn( WFConnection -> WFReportParams )
Jun 18 15:54:17 sv1 kernel: drbd1: Starting asender thread (from drbd1_receiver [7041])
Jun 18 15:54:17 sv1 kernel: drbd1: data-integrity-alg: <not-used>
Jun 18 15:54:17 sv1 kernel: drbd1: drbd_sync_handshake:
Jun 18 15:54:17 sv1 kernel: drbd1: self B7DD0B7100D56B0F:CB96E403D0949D99:6BB6E20829B0E07E:49E18AD4C1339A69 bits:53 flags:0
Jun 18 15:54:17 sv1 kernel: drbd1: peer CB96E403D0949D98:0000000000000000:6BB6E20829B0E07E:49E18AD4C1339A69 bits:0 flags:0
Jun 18 15:54:17 sv1 kernel: drbd1: uuid_compare()=1 by rule 7
Jun 18 15:54:17 sv1 kernel: drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate )
Jun 18 15:54:58 sv1 kernel: drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent )
Jun 18 15:54:58 sv1 kernel: drbd1: Began resync as SyncSource (will sync 212 KB [53 bits set]).
Jun 18 15:55:03 sv1 kernel: drbd1: Resync done (total 4 sec; paused 0 sec; 52 K/sec)
Jun 18 15:55:03 sv1 kernel: drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
Jun 18 16:40:43 sv1 kernel: ERROR: (device drbd1): diAllocIno: nfreeinos = 0, but iag on freelist JFS problem with the partition
Just before this problem, we have this line ( all node was initialized as the DRBD 8.3 guide on debian ) :
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899392, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899400, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899408, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899416, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899424, limit=253891608
Jun 18 15:43:45 sv1 kernel: attempt to access beyond end of device
Jun 18 15:43:45 sv1 kernel: drbd1: rw=25, want=253899432, limit=253891608
Partition /dev/drbd0 :
> fdisk -s /dev/drbd0
126949716
We the same disk structure for all nodes : /dev/sda9
> fdisk -s /dev/sda9
126953631*
SLAVE SV3 :
Jun 18 15:53:52 sv3 kernel: drbd1: PingAck did not arrive in time.
Jun 18 15:53:52 sv3 kernel: drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jun 18 15:53:52 sv3 kernel: drbd1: asender terminated
Jun 18 15:53:52 sv3 kernel: drbd1: Terminating asender thread
Jun 18 15:53:52 sv3 kernel: drbd1: short read expecting header on sock: r=-512
Jun 18 15:53:52 sv3 kernel: drbd1: Connection closed
Jun 18 15:53:52 sv3 kernel: drbd1: conn( NetworkFailure -> Unconnected )
An extract of the drbd configuration
resource drbd-lan {
protocol C;
net {
shared-secret "XXXX";
ping-timeout 10; #1 seconde attente réponse ping
after-sb-0pri discard-older-primary;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;
rr-conflict call-pri-lost;
}
syncer {
rate 10M; # Limitation pour ne pas relentir le systeme
al-extents 257;
}
handlers {
pri-lost-after-sb "logger -t drbd pri-lost-after-sb ;
/root/scripts/redondance/drbr_alert.sh pri-lost-after-sb;
/etc/init.d/heartbeat stop; /etc/init.d/drbd stop";
pri-lost "logger -t drbd pri-lost ;
/root/scripts/redondance/drbr_alert.sh pri-lost;
/etc/init.d/heartbeat stop; /etc/init.d/drbd stop";
local-io-error "logger -t drbd local-io-error ;
drbdadm detach all; /root/scripts/redondance/drbr_alert.sh local-io-error";
split-brain "logger -t drbd split-brain ; /root/scripts/redondance/drbr_alert.sh split-brain ";
}
startup {
wfc-timeout 10;
degr-wfc-timeout 10;
}
disk {
on-io-error call-local-io-error;
}
on sv1.//xxxxxx.net { # Remplacer par le nom complet de votre cluster1. Pour le connaitre uname -n
device /dev/drbd0;
disk /dev/sda9; # Remplacer par une partition dédié, créer une partition NON monté
address 192.168.35.1:7789; # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig
meta-disk internal;
}
on sv2.xxxxxx//.net { # Remplacer par le nom complet de votre cluster2. Pour le connaitre uname -n
device /dev/drbd0;
disk /dev/sda9; # Remplacer par une partition dédié, créer une partition NON monté
address 192.168.35.2:7789; # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig
meta-disk internal;
}
}
resource drbd-wan {
protocol B;
net {
shared-secret "XXXXX";
}
syncer {
after drbd-lan;
*rate 256;*
al-extents 257;
}
stacked-on-top-of drbd-lan {
device /dev/drbd1;
address 172.16.29.3:7789; # Remplacer par l'ip generique du cluster1
}
on sv3.xxxxxx.net { # Remplacer par le nom complet de votre cluster3. Pour le connaitre uname -n
device /dev/drbd1;
disk /dev/sda9; # Remplacer par une partition dédié, créer une partition NON monté
address 172.28.15.3:7789; # Remplacer par l'ip de votre cluster1. Pour le connaitre ifconfig
meta-disk internal;
}
}
--
Fabrice LE CREURER
Développement / Support technique EDTI FT-MASTER
Developer engineer / Helpdesk FT-MASTER product
NUMLOG - Internet : http://www.numlog.fr
Tel : (+33) 1 30 79 16 16 - Fax: (+33) 1 30 81 92 86