Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
I hope someone can help me and understand what i am trying to say.
We have encounter some I/O error according to our programmer team.
The DRBD + HA servers were setup and tested by myself. (I am quite new
to this myself)
I have provided all information that i can think of in this email.
Our problem is, from now and then 'something' will be running and
transmit between my 2 drbd servers. They suspect it's drbd issue that
caused them the problem connecting back to the storage server.
Scenario:
Since last 2 month, on certain time (unknown to us, cause it change
sometime, but mostly it happens at after 8pm) of the day, DFS 1 and DFS2
will have very high transmitting status through nfs, and causing the
other Fedora7 / Fedora 8 servers to be unable to connect back using nfs.
It will give then IO error saying connection timeout and it will keep
retrying.
I am not sure how to explain this. But can anyone please check my
settings? if i have should fine tune anything?
we have checked the DFS1 and DFS2, there is nothing scheduled to sync or
replicated to each other.
Server OS detail:
Fedora 7
kernel-2.6.22.9-91.fc7,
heartbeat-2.0.8-1.fc7,
drbd-0.7.24-17.fc7
Server name: DFS1 and DFS2
Server specification:
*HP DL320S 3060 (2.4 GHz, 1066 FSB) 6x 750GB SATA HDD (Hardware RAID 5),
4GB PC205300 DDR2-667 RAM, GB LAN (x2)
with *SC11Xe Host Bus Adapter ( PCI-E ) for my HP Ultrium 1840 SCSI
External Tape Drive*.
*
***********************#ifconfig
*********************************************
eth0 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:64
inet addr:172.16.0.10 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::21c:c4ff:fec2:2d64/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1823320 errors:0 dropped:0 overruns:0 frame:0
TX packets:1407195 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:304390272 (290.2 MiB) TX bytes:791761414 (755.0 MiB)
Interrupt:16
eth0:0 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:64
inet addr:172.16.0.100 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:16
eth1 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:65
inet addr:192.168.1.10 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::21c:c4ff:fec2:2d65/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:18899916 errors:0 dropped:285 overruns:0 frame:0
TX packets:20938241 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:12150347110 (11.3 GiB) TX bytes:17866611442 (16.6 GiB)
Interrupt:17
eth1:0 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:65
inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:17
eth2 Link encap:Ethernet HWaddr 00:1D:7E:00:B0:37
inet addr:192.168.2.10 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::21d:7eff:fe00:b037/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:46679 errors:0 dropped:0 overruns:0 frame:0
TX packets:1684 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:14921988 (14.2 MiB) TX bytes:207296 (202.4 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:7059291 errors:0 dropped:0 overruns:0 frame:0
TX packets:7059291 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2136509915 (1.9 GiB) TX bytes:2136509915 (1.9 GiB)
*
****************#service drbd status****************
drbd driver loaded OK; device status:
version: 0.7.24 (api:79/proto:74)
SVN Revision: 2875 build by bachbuilder@, 2007-10-06 07:08:38
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:14473752 nr:4321272 dw:18795008 dr:341027065 al:16716 bm:720 lo:0
pe:0 ua:0 ap:0
1: cs:Connected st:Primary/Secondary ld:Consistent
ns:10356 nr:520192 dw:530548 dr:84166121 al:51 bm:117 lo:0 pe:0 ua:0 ap:0
***************/etc/init.d/heartbeat status***************
heartbeat OK [pid 2496 et al] is running on dfs1 [dfs1]...
**********Harddisk Partitioning************************
Hard Drive
/dev/cciss/c0d0
/dev/cciss/c0d0p1
/boot
ext3
Y
500
500MB
/dev/cciss/c0d0p2
VolGroup00
LVM PV
Y
102,400
100GB
/dev/cciss/c0d0p3
VolGroup01
LVM PV
Y
1,536,000
1.5TB
/dev/cciss/c0d0p4
VolGroup01
LVM PV
Y
Extended
/dev/cciss/c0d0p5
VolGroup01
LVM PV
Y
1,222,579
≈ 1.194TB
Using 64 MB Physical Extent
LVM Volume Groups
VolGroup00
102,336
LogVol00
/
ext3
Y
92,160
90GB
LogVol01
swap
Y
4,096
4GB
LogVol02
/home
ext3
Y
6,016
≈ 6GB
VolGroup01
LogVol00
/metad
ext3
Y
1,024
1GB
LogVol01
/winnt
ext3
Y
256,000
250GB
LogVol02
/data
ext3
Y
2,048,000
2TB
LogVol03
/data2
ext3
Y
453,376
442.75GB
*************drbd.conf**************************
resource r0 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt
-f";
startup {
wfc-timeout 60;
degr-wfc-timeout 60; # 1 minute.
}
disk {
on-io-error detach;
}
net {
max-buffers 2048;
unplug-watermark 128;
max-epoch-size 2048;
}
syncer {
rate 100M;
group 1;
al-extents 1801;
}
on dfs1 {
device /dev/drbd0;
disk /dev/VolGroup01/LogVol02;
address 192.168.1.10:7788;
meta-disk /dev/VolGroup01/LogVol00[0];
}
on dfs2 {
device /dev/drbd0;
disk /dev/VolGroup01/LogVol02;
address 192.168.1.12:7788;
meta-disk /dev/VolGroup01/LogVol00[0];
}
}
resource r1 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt
-f";
startup {
wfc-timeout 60; ## 1 minute.
degr-wfc-timeout 60; ## 1 minute.
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 100M;
group 1; # sync concurrently with r0
}
on dfs1 {
device /dev/drbd1;
disk /dev/VolGroup01/LogVol03;
address 192.168.1.10:7789;
meta-disk /dev/VolGroup01/LogVol00[1];
}
on dfs2 {
device /dev/drbd1;
disk /dev/VolGroup01/LogVol03;
address 192.168.1.12:7789;
meta-disk /dev/VolGroup01/LogVol00[1];
}
}
*****************# /etc/ha.d/ha.cf******************
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
warntime 5
deadtime 15
initdead 30
udpport 694
bcast eth0 eth1
auto_failback on
node dfs1 dfs2
ping 172.16.0.1
respawn hacluster /usr/lib/heartbeat/ipfail
*****************/etc/ha.d/haresources*****************
dfs1 172.16.0.100/16/172.16.255.255 192.168.1.100/24/192.168.1.255
drbddisk::r0 drbddisk::r1 Filesystem::/dev/drbd0::/data::ext3
Filesystem::/dev/drbd1::/data2::ext3 nfs smb nmb
***********************************
Warm Regards,
Cindy KS TOH
Dalas Technologies Sdn Bhd
2, Jalan PJU 5/15,
Dataran Sunway,
Kota Damansara,
47810 Petaling Jaya,
Selangor, Malaysia.
Tel: 03-6156 9000/8000 Ext:401
Fax: 03-6127 8660