[DRBD-user] DRBD I/O error? or....

Cindy KS TOH kstoh at dlsjubm.com.my
Mon Jul 14 09:39:38 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

I hope someone can help me and understand what i am trying to say.
We have encounter some I/O error according to our programmer team.
The DRBD + HA servers were setup and tested by myself. (I am quite new 
to this myself)
I have provided all information that i can think of in this email.

Our problem is, from now and then 'something' will be running and 
transmit between my 2 drbd servers. They suspect it's drbd issue that 
caused them the problem connecting back to the storage server.

Scenario:
Since last 2 month, on certain time (unknown to us, cause it change 
sometime, but mostly it happens at after 8pm) of the day, DFS 1 and DFS2 
will have very high transmitting status through nfs, and causing the 
other Fedora7 / Fedora 8 servers to be unable to connect back using nfs. 
It will give then IO error saying connection timeout and it will keep 
retrying.
I am not sure how to explain this. But can anyone please check my 
settings? if i have should fine tune anything?
we have checked the DFS1 and DFS2, there is nothing scheduled to sync or 
replicated to each other.

Server OS detail:
Fedora 7
kernel-2.6.22.9-91.fc7,
heartbeat-2.0.8-1.fc7,
drbd-0.7.24-17.fc7

Server name: DFS1 and DFS2

Server specification:
*HP DL320S 3060 (2.4 GHz, 1066 FSB) 6x 750GB SATA HDD (Hardware RAID 5), 
4GB PC205300 DDR2-667 RAM, GB LAN (x2)
with *SC11Xe Host Bus Adapter ( PCI-E ) for my HP Ultrium 1840 SCSI 
External Tape Drive*.
*

***********************#ifconfig 
*********************************************

eth0 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:64
inet addr:172.16.0.10 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::21c:c4ff:fec2:2d64/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1823320 errors:0 dropped:0 overruns:0 frame:0
TX packets:1407195 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:304390272 (290.2 MiB) TX bytes:791761414 (755.0 MiB)
Interrupt:16

eth0:0 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:64
inet addr:172.16.0.100 Bcast:172.16.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:16

eth1 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:65
inet addr:192.168.1.10 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::21c:c4ff:fec2:2d65/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:18899916 errors:0 dropped:285 overruns:0 frame:0
TX packets:20938241 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:12150347110 (11.3 GiB) TX bytes:17866611442 (16.6 GiB)
Interrupt:17

eth1:0 Link encap:Ethernet HWaddr 00:1C:C4:C2:2D:65
inet addr:192.168.1.100 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:17

eth2 Link encap:Ethernet HWaddr 00:1D:7E:00:B0:37
inet addr:192.168.2.10 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::21d:7eff:fe00:b037/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:46679 errors:0 dropped:0 overruns:0 frame:0
TX packets:1684 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:14921988 (14.2 MiB) TX bytes:207296 (202.4 KiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:7059291 errors:0 dropped:0 overruns:0 frame:0
TX packets:7059291 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2136509915 (1.9 GiB) TX bytes:2136509915 (1.9 GiB)
*

****************#service drbd status****************

drbd driver loaded OK; device status:
version: 0.7.24 (api:79/proto:74)
SVN Revision: 2875 build by bachbuilder@, 2007-10-06 07:08:38
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:14473752 nr:4321272 dw:18795008 dr:341027065 al:16716 bm:720 lo:0 
pe:0 ua:0 ap:0
1: cs:Connected st:Primary/Secondary ld:Consistent
ns:10356 nr:520192 dw:530548 dr:84166121 al:51 bm:117 lo:0 pe:0 ua:0 ap:0



***************/etc/init.d/heartbeat status***************
heartbeat OK [pid 2496 et al] is running on dfs1 [dfs1]...

**********Harddisk Partitioning************************

Hard Drive

	

	

	

	

	

/dev/cciss/c0d0

	

	

	

	

	

/dev/cciss/c0d0p1

	

/boot

	

ext3

	

Y

	

500

	

500MB

/dev/cciss/c0d0p2

	

VolGroup00

	

LVM PV

	

Y

	

102,400

	

100GB

/dev/cciss/c0d0p3

	

VolGroup01

	

LVM PV

	

Y

	

1,536,000

	

1.5TB

/dev/cciss/c0d0p4

	

VolGroup01

	

LVM PV

	

Y

	

Extended

	

/dev/cciss/c0d0p5

	

VolGroup01

	

LVM PV

	

Y

	

1,222,579

	

≈ 1.194TB

	

	

	

	

	

Using 64 MB Physical Extent

LVM Volume Groups

	

	

	

	

	

VolGroup00

	

	

	

	

102,336

	

LogVol00

	

/

	

ext3

	

Y

	

92,160

	

90GB

LogVol01

	

	

swap

	

Y

	

4,096

	

4GB

LogVol02

	

/home

	

ext3

	

Y

	

6,016

	

≈ 6GB

	

	

	

	

	

VolGroup01

	

	

	

	

	

LogVol00

	

/metad

	

ext3

	

Y

	

1,024

	

1GB

LogVol01

	

/winnt

	

ext3

	

Y

	

256,000

	

250GB

LogVol02

	

/data

	

ext3

	

Y

	

2,048,000

	

2TB

LogVol03

	

/data2

	

ext3

	

Y

	

453,376

	

442.75GB



*************drbd.conf**************************

resource r0 {

protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt 
-f";

startup {
wfc-timeout 60;
degr-wfc-timeout 60; # 1 minute.
}

disk {
on-io-error detach;
}

net {
max-buffers 2048;
unplug-watermark 128;
max-epoch-size 2048;
}

syncer {
rate 100M;
group 1;
al-extents 1801;
}

on dfs1 {
device /dev/drbd0;
disk /dev/VolGroup01/LogVol02;
address 192.168.1.10:7788;
meta-disk /dev/VolGroup01/LogVol00[0];
}

on dfs2 {
device /dev/drbd0;
disk /dev/VolGroup01/LogVol02;
address 192.168.1.12:7788;
meta-disk /dev/VolGroup01/LogVol00[0];
}
}

resource r1 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt 
-f";
startup {
wfc-timeout 60; ## 1 minute.
degr-wfc-timeout 60; ## 1 minute.
}
disk {
on-io-error detach;
}
net {
}
syncer {
rate 100M;
group 1; # sync concurrently with r0
}

on dfs1 {
device /dev/drbd1;
disk /dev/VolGroup01/LogVol03;
address 192.168.1.10:7789;
meta-disk /dev/VolGroup01/LogVol00[1];
}

on dfs2 {
device /dev/drbd1;
disk /dev/VolGroup01/LogVol03;
address 192.168.1.12:7789;
meta-disk /dev/VolGroup01/LogVol00[1];
}
}



*****************# /etc/ha.d/ha.cf******************
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
warntime 5
deadtime 15
initdead 30
udpport 694
bcast eth0 eth1
auto_failback on
node dfs1 dfs2
ping 172.16.0.1
respawn hacluster /usr/lib/heartbeat/ipfail


*****************/etc/ha.d/haresources*****************
dfs1 172.16.0.100/16/172.16.255.255 192.168.1.100/24/192.168.1.255 
drbddisk::r0 drbddisk::r1 Filesystem::/dev/drbd0::/data::ext3 
Filesystem::/dev/drbd1::/data2::ext3 nfs smb nmb

***********************************

Warm Regards,
Cindy KS TOH

Dalas Technologies Sdn Bhd
2, Jalan PJU 5/15,
Dataran Sunway,
Kota Damansara,
47810 Petaling Jaya,
Selangor, Malaysia.

Tel: 03-6156 9000/8000   Ext:401
Fax: 03-6127 8660






More information about the drbd-user mailing list