<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

<META content="MSHTML 6.00.2800.1400" name=GENERATOR></HEAD>

<BODY>

<DIV><FONT size=2>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>Hello,</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>I am testing drbd + 

heartbeat for an HA setup consisiting of two cluster members. The first is A 

Dell 2400, 256MB, Dual PIII 500, HW Raid. The second is a Dell 2300, 128Mb, 

Single PIII500, Soft RAID. Both systems are running RedHat 9 with 2.4.20-31.9smp 

kernel (the single proc box because of a bug in the 440GX chipset, APIC only 

works when running SMP kernel). I am using 0.6.12 as 0.7 seemed hell on my 

machines (loads of kernel oopses, panics, hangs etc.). So far I've been having 

good results. Tested failover between nodes, which all worked well. Until I 

decided to test the all out disaster scenario.</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>First I took down my 

primary cluster node (did this by disconnecting all NIC's). Failover went well 

as expected. Then decided to go for all-out by gracefully shutting down the 

secondary node. In this scenario you would boot up the secondary cluster node 

first, as that would have the latest data set. And as I want HA, decided not to 

wait for the other side of drbd to show up and make disks primary. Up until this 

point still no problem, disks would be mounted and data served from the 

secondary cluster node. </FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>But when I booted my 

primary cluster node, shit did really hit the van (you should see my office, it 

smells terrible ;-). As soon as it started replicating off data from the 

secondary cluster node, problems started. Immediately both of the nodes were 

showing lock-up problems (eg, not able to log in on console / ssh etc.). Already 

logged in sessions kept working except for doing su would lock up also. A 'cat 

/proc/drbd' would initially show acceptable speeds (around 5MB/s, my sync min. 

Syncing from primary node to secondary would reach 10MB/s+). Also the system 

load would&nbsp;slowly increase up unto the point where heartbeat generated 

failover: (If I run softdog, it would even just reset the 

machine)</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>&nbsp;11:09:37&nbsp; up 

10:23,&nbsp; 1 user,&nbsp; load average: 3.58, 3.00, 2.41<BR>85 processes: 75 

sleeping, 7 running, 3 zombie, 0 stopped<BR>CPU states:&nbsp; 70.9% user&nbsp; 

29.0% system&nbsp;&nbsp; 0.0% nice&nbsp;&nbsp; 0.0% iowait&nbsp;&nbsp; 0.0% 

idle<BR>Mem:&nbsp;&nbsp; 125412k av,&nbsp; 122820k used,&nbsp;&nbsp;&nbsp; 2592k 

free,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0k shrd,&nbsp;&nbsp; 36628k 

buff<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

78112k actv,&nbsp;&nbsp;&nbsp;&nbsp; 796k in_d,&nbsp;&nbsp;&nbsp; 1624k 

in_c<BR>Swap:&nbsp; 787064k av,&nbsp;&nbsp;&nbsp; 1184k used,&nbsp; 785880k 

free&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

54192k cached</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>(CPU was usually not at 

100%, but more like 25 to 30%),&nbsp;Load 3+ on a Single CPU machine, while not 

using that much mem and cpu time, that's weird.</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>Also at this point sync 

speeds would drop to under 1MB/s. Plus the console got overloaded with these 

messages:</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967295<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967294<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967295<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967295<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967294<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967295<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967294<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967295<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967295<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967294<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967295<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967294<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967295<BR>drbd1: [drbd_syncer_1/4321] 

sock_sendmsg time expired, ko = 4294967295<BR></FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>I've tryd fiddling with 

sync parameters (sync-nice, sync-group, tl-size, etc.) nothing helped, although 

symptoms did vary (time before lock-ups of system, time before HB failed over, 

less or more of these sock_sendmsg messages).</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>As soon as Hearbeat had 

shut itself down, sync speed would sometimes go up again, but other tiimes 

remained low.&nbsp;Same thing with&nbsp;the load somtimes went down to normal 

values, sometimes not. System lock ups too. Stopping the sync by disconnecting 

the secondary cluster node always brought sysmtems back to 

normal.</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>The only way systems 

remained stable was doing the sync in single user mode. But as it's 70GB of data 

we're talking about and 5MB/s sync would take 3hrs+, this would be unacceptable 

downtime. I will now start with a new dataset and see if I can reproduce the 

problem. I am not going to wait for sync to finish in single user mode. I would 

not mind, if in a situation like this syncing the data back to the primary node, 

takes a day, but it has to be stable and the secondary node has to serve the 

data in the meantime.</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>My 

drbd.conf:</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>resource drbd0 {<BR>&nbsp; 

protocol = C<BR>&nbsp; fsckcmd = /bin/true</FONT></SPAN></DIV>

<DIV><FONT face=Arial></FONT>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>&nbsp; disk 

{<BR>&nbsp;&nbsp;&nbsp; disk-size = 4890000k<BR>&nbsp;&nbsp;&nbsp; 

do-panic<BR>&nbsp; }</FONT></SPAN></DIV>

<DIV><FONT face=Arial></FONT>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>&nbsp; net {<BR>&nbsp; 

sync-group = 0<BR>&nbsp; sync-rate = 8M<BR>&nbsp; sync-min = 5M<BR>&nbsp; 

sync-max = 10M<BR>&nbsp; sync-nice = 0<BR>&nbsp; tl-size = 

5000</FONT></SPAN></DIV>

<DIV><FONT face=Arial></FONT>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>&nbsp; ping-int = 

10<BR>&nbsp; timeout = 9<BR>&nbsp; }</FONT></SPAN></DIV>

<DIV><FONT face=Arial></FONT>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>&nbsp; on syslogcs-cla 

{<BR>&nbsp;&nbsp;&nbsp; device = /dev/nb0<BR>&nbsp;&nbsp;&nbsp; disk = 

/dev/sdb2<BR>&nbsp;&nbsp;&nbsp; address = 10.0.0.1<BR>&nbsp;&nbsp;&nbsp; port = 

7788<BR>&nbsp; }</FONT></SPAN></DIV>

<DIV><FONT face=Arial></FONT>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>&nbsp; on syslogcs-clb 

{<BR>&nbsp;&nbsp;&nbsp; device = /dev/nb0<BR>&nbsp;&nbsp;&nbsp; disk = 

/dev/md14<BR>&nbsp;&nbsp;&nbsp; address = 10.0.0.2<BR>&nbsp;&nbsp;&nbsp; port = 

7788<BR>&nbsp; }<BR>}</FONT></SPAN></DIV>

<DIV><FONT face=Arial></FONT>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>resource drbd1 {<BR>&nbsp; 

protocol = C<BR>&nbsp; fsckcmd = /bin/true</FONT></SPAN></DIV>

<DIV><FONT face=Arial></FONT>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>&nbsp; disk 

{<BR>&nbsp;&nbsp;&nbsp; disk-size = 64700000k<BR>&nbsp;&nbsp;&nbsp; 

do-panic<BR>&nbsp; }</FONT></SPAN></DIV>

<DIV><FONT face=Arial></FONT>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>&nbsp; net {<BR>&nbsp; 

sync-group = 1<BR>&nbsp; sync-rate = 8M<BR>&nbsp; sync-min = 5M<BR>&nbsp; 

sync-max = 10M<BR>&nbsp; sync-nice = 19<BR>&nbsp; tl-size = 

5000</FONT></SPAN></DIV>

<DIV><FONT face=Arial></FONT>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>&nbsp; ping-int = 

10<BR>&nbsp; timeout = 9<BR>&nbsp; }</FONT></SPAN></DIV>

<DIV><FONT face=Arial></FONT>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>&nbsp; on syslogcs-cla 

{<BR>&nbsp;&nbsp;&nbsp; device = /dev/nb1<BR>&nbsp;&nbsp;&nbsp; disk = 

/dev/sdb3<BR>&nbsp;&nbsp;&nbsp; address = 10.0.0.1<BR>&nbsp;&nbsp;&nbsp; port = 

7789<BR>&nbsp; }</FONT></SPAN></DIV>

<DIV><FONT face=Arial></FONT>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>&nbsp; on syslogcs-clb 

{<BR>&nbsp;&nbsp;&nbsp; device = /dev/nb1<BR>&nbsp;&nbsp;&nbsp; disk = 

/dev/md15<BR>&nbsp;&nbsp;&nbsp; address = 10.0.0.2<BR>&nbsp;&nbsp;&nbsp; port = 

7789<BR>&nbsp; }<BR>}</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>/dev/md14 is RAID0 made of 

two RAID1 pairs (md9 &amp; md10)<BR>/dev/md15 is RAID0 made of two RAID1 pairs 

(md11 &amp; md12)<BR></FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>Output of mount 

commands:</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>drbd1: blksize=1024 

B<BR>drbd1: blksize=4096 B<BR>kjournald starting.&nbsp; Commit interval 5 

seconds<BR>EXT3 FS 2.4-0.9.19, 19 August 2002 on drbd(43,1), internal 

journal<BR>EXT3-fs: mounted filesystem with ordered data 

mode.</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>Why the different block 

size? Both disks have this when mounting.<BR></FONT></DIV></SPAN>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>Sometimes I get the 

message, that md device used obsolete ioctl, but this should only be 

cosmetical.</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>Sometimes&nbsp;got the 

message on the SW RAID systdm, that block size couldn't be determined and 512b 

was assumed.</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>The SW RAID seems to 

outperform the HW RAID<SPAN class=225091910-17052004> by 

100%</SPAN></FONT></SPAN></DIV>

<DIV><FONT face=Arial><SPAN class=268490409-17052004>On rare occasions I saw 

lock-ups of fsck or mount during heartbeat start-up</SPAN>.<SPAN 

class=268490409-17052004> One time even causing entire system to hang during 

reboot (killall was not able to kill a hanging mount 

process.)</SPAN></FONT></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>Maybe also important info: 

Some md devices were syncing at the same time drbd devices were syncing. This 

too was not acheiving high speeds. You would expect this, when drbd sync uses 

5MB, but not when that drops. You then would expect md sync to go faster, but it 

didn't, it would stay at 100-300KB/s.</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial>Lots of information, but 

probably more needed. I will let you know if I can reproduce the problem, when I 

have created new datasets to test with.</FONT></SPAN></DIV>

<DIV><SPAN class=268490409-17052004><FONT face=Arial></FONT></SPAN>&nbsp;</DIV>

<DIV><FONT size=2><FONT face=Arial><SPAN 

class=268490409-17052004>Sietse</SPAN><FONT size=+0><SPAN 

class=268490409-17052004></DIV></SPAN></FONT></FONT></FONT></FONT></DIV></BODY></HTML>