Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I hope someone can help me and understand what i am trying to say. We have encounter some I/O error according to our programmer team. The DRBD + HA servers were setup and tested by myself. (I am quite new to this myself) I have provided all information that i can think of in this email. Our problem is, from now and then 'something' will be running and transmit between my 2 drbd servers. They suspect it's drbd issue that caused them the problem connecting back to the storage server. Scenario: Since last 2 month, on certain time (unknown to us, cause it change sometime, but mostly it happens at after 8pm) of the day, DFS 1 and DFS2 will have very high transmitting status through nfs, and causing the other Fedora7 / Fedora 8 servers to be unable to connect back using nfs. It will give then IO error saying connection timeout and it will keep retrying. I am not sure how to explain this. But can anyone please check my settings? if i have should fine tune anything? we have checked the DFS1 and DFS2, there is nothing scheduled to sync or replicated to each other. Server OS detail: Fedora 7 kernel-2.6.22.9-91.fc7, heartbeat-2.0.8-1.fc7, drbd-0.7.24-17.fc7 Server name: DFS1 and DFS2 Server specification: *HP DL320S 3060 (2.4 GHz, 1066 FSB) 6x 750GB SATA HDD (Hardware RAID 5), 4GB PC205300 DDR2-667 RAM, GB LAN (x2) with *SC11Xe Host Bus Adapter ( PCI-E ) for my HP Ultrium 1840 SCSI External Tape Drive*. * ***********************#ifconfig ********************************************* eth0 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:64 inet addr:172.16.0.10 Bcast:172.16.255.255 Mask:255.255.0.0 inet6 addr: fe80::21c:c4ff:fec2:2d64/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1823320 errors:0 dropped:0 overruns:0 frame:0 TX packets:1407195 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:304390272 (290.2 MiB) TX bytes:791761414 (755.0 MiB) Interrupt:16 eth0:0 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:64 inet addr:172.16.0.100 Bcast:172.16.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:16 eth1 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:65 inet addr:192.168.1.10 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::21c:c4ff:fec2:2d65/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:18899916 errors:0 dropped:285 overruns:0 frame:0 TX packets:20938241 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:12150347110 (11.3 GiB) TX bytes:17866611442 (16.6 GiB) Interrupt:17 eth1:0 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:65 inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:17 eth2 Link encap:Ethernet HWaddr 00:1D:7E:00:B0:37 inet addr:192.168.2.10 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::21d:7eff:fe00:b037/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:46679 errors:0 dropped:0 overruns:0 frame:0 TX packets:1684 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:14921988 (14.2 MiB) TX bytes:207296 (202.4 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:7059291 errors:0 dropped:0 overruns:0 frame:0 TX packets:7059291 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2136509915 (1.9 GiB) TX bytes:2136509915 (1.9 GiB) * ****************#service drbd status**************** drbd driver loaded OK; device status: version: 0.7.24 (api:79/proto:74) SVN Revision: 2875 build by bachbuilder@, 2007-10-06 07:08:38 0: cs:Connected st:Primary/Secondary ld:Consistent ns:14473752 nr:4321272 dw:18795008 dr:341027065 al:16716 bm:720 lo:0 pe:0 ua:0 ap:0 1: cs:Connected st:Primary/Secondary ld:Consistent ns:10356 nr:520192 dw:530548 dr:84166121 al:51 bm:117 lo:0 pe:0 ua:0 ap:0 ***************/etc/init.d/heartbeat status*************** heartbeat OK [pid 2496 et al] is running on dfs1 [dfs1]... **********Harddisk Partitioning************************ Hard Drive /dev/cciss/c0d0 /dev/cciss/c0d0p1 /boot ext3 Y 500 500MB /dev/cciss/c0d0p2 VolGroup00 LVM PV Y 102,400 100GB /dev/cciss/c0d0p3 VolGroup01 LVM PV Y 1,536,000 1.5TB /dev/cciss/c0d0p4 VolGroup01 LVM PV Y Extended /dev/cciss/c0d0p5 VolGroup01 LVM PV Y 1,222,579 ≈ 1.194TB Using 64 MB Physical Extent LVM Volume Groups VolGroup00 102,336 LogVol00 / ext3 Y 92,160 90GB LogVol01 swap Y 4,096 4GB LogVol02 /home ext3 Y 6,016 ≈ 6GB VolGroup01 LogVol00 /metad ext3 Y 1,024 1GB LogVol01 /winnt ext3 Y 256,000 250GB LogVol02 /data ext3 Y 2,048,000 2TB LogVol03 /data2 ext3 Y 453,376 442.75GB *************drbd.conf************************** resource r0 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { wfc-timeout 60; degr-wfc-timeout 60; # 1 minute. } disk { on-io-error detach; } net { max-buffers 2048; unplug-watermark 128; max-epoch-size 2048; } syncer { rate 100M; group 1; al-extents 1801; } on dfs1 { device /dev/drbd0; disk /dev/VolGroup01/LogVol02; address 192.168.1.10:7788; meta-disk /dev/VolGroup01/LogVol00[0]; } on dfs2 { device /dev/drbd0; disk /dev/VolGroup01/LogVol02; address 192.168.1.12:7788; meta-disk /dev/VolGroup01/LogVol00[0]; } } resource r1 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { wfc-timeout 60; ## 1 minute. degr-wfc-timeout 60; ## 1 minute. } disk { on-io-error detach; } net { } syncer { rate 100M; group 1; # sync concurrently with r0 } on dfs1 { device /dev/drbd1; disk /dev/VolGroup01/LogVol03; address 192.168.1.10:7789; meta-disk /dev/VolGroup01/LogVol00[1]; } on dfs2 { device /dev/drbd1; disk /dev/VolGroup01/LogVol03; address 192.168.1.12:7789; meta-disk /dev/VolGroup01/LogVol00[1]; } } *****************# /etc/ha.d/ha.cf****************** debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 warntime 5 deadtime 15 initdead 30 udpport 694 bcast eth0 eth1 auto_failback on node dfs1 dfs2 ping 172.16.0.1 respawn hacluster /usr/lib/heartbeat/ipfail *****************/etc/ha.d/haresources***************** dfs1 172.16.0.100/16/172.16.255.255 192.168.1.100/24/192.168.1.255 drbddisk::r0 drbddisk::r1 Filesystem::/dev/drbd0::/data::ext3 Filesystem::/dev/drbd1::/data2::ext3 nfs smb nmb *********************************** Warm Regards, Cindy KS TOH Dalas Technologies Sdn Bhd 2, Jalan PJU 5/15, Dataran Sunway, Kota Damansara, 47810 Petaling Jaya, Selangor, Malaysia. Tel: 03-6156 9000/8000 Ext:401 Fax: 03-6127 8660