[DRBD-user] Apologies...wrong subject: should have been drbd performance and hang issue..

Thu Jun 22 19:23:49 CEST 2006

 Hi guys,

O.K. I have some reply from the customer.  Lars, I'm still waiting to
see about your question concerning write rates, but his response I'm
cutting and pasting here might have a bit more clues as to what is going
on.  I should explain there are also 4 separate partitions being
replicated, hence his reference to /dev/drbd2, which is one of them
(should be clear in the config file following actually). So, here's what
we have so far (kind of long):

His response:
------------------------------------------------------------------------
------

If I did understand the response from the e-mail it state to add the
all_extents to 257 

I did this and restart DRBD 

Unfortunately it didn't change the situation. 

The thing I would like to do is explain in detail what's happening and
the way I see it. 

1 - all_extents,protocol and snd_buffer parameter have been changed 
2 - Take drbd down and back up on both side( to make sure the changed
have take effect) 
3 - Start a copy of a 300MB files on the partition /dev/drbd2 
4 - The copy goes all the way 
5 - After about 30 sec to 1 minute when the copy finish we can't have
access to the /dev/drbd2 partition (true win samba or just doing a ls of
the partition) 
      all the other drbd partition and the system it self show no
degradation. 
6 - We see in the cat /proc/drbd the bytes of this partition going from
primary to secondary 
7 - When the copy is done from primary to secondary the partition
/dev/drbd2 become back available and performance is 
      back to normal on this partition(no other part of the linux system
is affect by this) 

So what I see in all this. 

It look like drbd doesn't really do is copy from primary to secondary in
the background. 
My impression was that drbd would complete is copy in the backround with
out slowing down the access to the fs on the primary machine. 
I really hope this is not a concept issue. 

If I copy let say a 500MB files the same thing happen except it happen
even before the copy to the primary finish 
and it can even abort the copy. 

<snip>
To make all parition hang I would just copy 3 files in each partition
and I would be able to hang all 
the drbd partition at once. 

They would only get available when the copy from primary to secondary
complete. 
<snip>
Below is the drbd.conf present on both machine. 

global { 
        minor-count 4; 
        dialog-refresh 600; 
} 
resource drbd0 { 
        protocol A; 
        incon-degr-cmd "/bin/true"; 
        syncer { 
                rate 70M; 
                al-extents 257; 
        } 
        net { 
                sndbuf-size 512k; 

        } 
        on smb-mtl01 { 
                device /dev/drbd0; 
                disk /dev/mapper/VGPrivate_orion-LVPrivate_orion; 
                address 10.10.1.3:7789; 
                meta-disk /dev/sdh1[0]; 
        } 
        on smb-sjo01 { 
                device /dev/drbd0; 
                disk /dev/mapper/VGPrivate_orion-LVPrivate_orion; 
                address 10.10.2.66:7789; 
                meta-disk /dev/sde1[0]; 
        } 
} 
resource drbd1 { 
        protocol A; 
        incon-degr-cmd "/bin/true"; 

        syncer { 
                rate 70M; 
                al-extents 257; 
        } 
        net { 
                sndbuf-size 512k; 

        } 
        on smb-mtl01 { 
                device /dev/drbd1; 
                disk /dev/mapper/VGProfiles_orion-LVProfiles_orion; 
                address 10.10.1.3:7790; 
                meta-disk /dev/sdh1[1]; 
        } 
        on smb-sjo01 { 
                device /dev/drbd1; 
                disk /dev/mapper/VGProfiles_orion-LVProfiles_orion; 
                address 10.10.2.66:7790; 
                meta-disk /dev/sde1[1]; 
        } 
} 
resource drbd2 { 
        protocol A; 
        incon-degr-cmd "/bin/true"; 

        syncer { 
                rate 70M; 
                al-extents 257; 
        } 
        net { 
                sndbuf-size 512k; 

        } 
        on smb-mtl01 { 
                device /dev/drbd2; 
                disk /dev/mapper/VGPublic_orion-LVPublic_orion; 
                address 10.10.1.3:7791; 
                meta-disk /dev/sdh1[2]; 
        } 
        on smb-sjo01 { 
                device /dev/drbd2; 
                disk /dev/mapper/VGPublic_orion-LVPublic_orion; 
                address 10.10.2.66:7791; 
                meta-disk /dev/sde1[2]; 
        } 
} 

-----------------------------------------------------------
End customer response...

Hope this sheds some light for anybody?  Looks like they were also
playing with the dialog-refresh to see if this might help, but
everything else looks o.k. to me.
Thanks
Tim

Tim Johnson
Senior Software Engineer
Vision Solutions, Inc.

17911 Von Karman Ave,  5th Floor
Irvine, CA 92614
UNITED STATES

Tel: +1 (949) 253-6528
Fax: +1 (949) 225-0287
Email: tjohnson at visionsolutions.com
<http://www.visionsolutions.com/>
Disclaimer - 6/22/2006
The contents of this e-mail (and any attachments) are confidential, may be privileged, and may contain copyright material of Vision Solutions, Inc. or third parties. You may only reproduce or distribute the material if you are expressly authorized by Vision Solutions to do so. If you are not the intended recipient, any use, disclosure or copying of this e-mail (and any attachments) is unauthorized. If you have received this e-mail in error, please immediately delete it and any copies of it from your system and notify us via e-mail at helpdesk at visionsolutions.com 
-----Original Message-----
From: drbd-user-bounces at lists.linbit.com
[mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of jeffb
Sent: Wednesday, June 21, 2006 11:38 AM
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] Apologies...wrong subject: should have been
drbdperformance issue..

I think you can also mess with your al-extents setting to increase
performance, but I think it only helps you get through bursts of
size[al-extents * 4M] and has other repercussions later as well. (You'll
resync that much after a primary crash). I think this mostly helps on
systems with sub-optimal disks or controller cards. We had to set with
high with our 3ware cards, but with our Areca's we've been just fine.. 

The problem with our 3ware cards was that when we had x-fer's >
al-extents * 4M the system would be fine until it had xfer'd that
amount, or a little bit more, but then it would nearly deadlock after
that limit was reached. Kinda like the sorta thing that could keep
people from logging in effectively, or keeping their active
connections.. Our system would go into disk IO deadlock for about 30
seconds -> a minute, then come out of it for about 10 seconds, then go
right back into another deadlock. This would continue until shortly
after our large transfer's were done.. 

Any transfer that was smaller than al-extents * 4M would never have this
problem (unless run too closely to another large transfer).

I'm 100% sure that our problem was related to a problem with either our
disks or our raid controller (3ware 85xx), and not a problem with drbd.
The 3ware cards seemed to be OK if you only had a couple or a few disks,
but when you kept adding more drives the problem would get progressively
worse when running in a raid 5 configuration.  We had either 8 or 12
drives on our system and it ugly painful.

On Wed, 2006-06-21 at 10:47 -0700, Tim Johnson wrote:
> Tim Johnson
> Senior Software Engineer
> Vision Solutions, Inc.
> 
> 17911 Von Karman Ave,  5th Floor
> Irvine, CA 92614
> UNITED STATES
> 
> Tel: +1 (949) 253-6528
> Fax: +1 (949) 225-0287
> Email: tjohnson at visionsolutions.com
> <http://www.visionsolutions.com/>
> Disclaimer - 6/21/2006
> The contents of this e-mail (and any attachments) are confidential, 
> may be privileged, and may contain copyright material of Vision 
> Solutions, Inc. or third parties. You may only reproduce or distribute

> the material if you are expressly authorized by Vision Solutions to do

> so. If you are not the intended recipient, any use, disclosure or 
> copying of this e-mail (and any attachments) is unauthorized. If you 
> have received this e-mail in error, please immediately delete it and 
> any copies of it from your system and notify us via e-mail at 
> helpdesk at visionsolutions.com -----Original Message-----
> From: drbd-user-bounces at lists.linbit.com
> [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Tim Johnson
> Sent: Wednesday, June 21, 2006 10:47 AM
> To: drbd-user at lists.linbit.com
> Subject: RE: [DRBD-user] drbd and lvm understanding question
> 
>  Hi guys,
> 
> We've got a bit of a problem at a customer site and I was wondering if

> anybody had any suggestions.  With drbd up and running on both the 
> primary and backup, massive amounts of data were copied over to the 
> relevant mount points on the primary.  Apparently this slowed the 
> machine down so much (haven't yet been able to get from them if it was

> the CPU usage or memory) that users were getting kicked off.  When 
> drbd was taken down on the backup, then everything was o.k.  They 
> started with drbd version 0.7.13.  I perused the archives of this 
> mailing list and found something which suggested that this was a 
> problem fixed after 0.7.13, so they upgraded to 0.7.19 and are still
having the problem.
> Parameters we've thought might be appropriate in the drbd.conf file 
> are protocol (using protocol A: I'm sure this is fine), sndbuf-size 
> (warnings using large values like 1M), max-buffers (this looks 
> promising to me), max-epoch-size, and maybe rate.  I'm a bit nervous 
> about changing anything, so does anybody have some good ideas?
> 
> Appropriate environmental information such as proc/drbd and system 
> info is below:
> 
> Thanks,
> Tim
> 
> -----------------------------------------------------------------
> Output from /proc/drbd
> smb-mtl01:~ # cat /proc/drbd
> version: 0.7.19 (api:78/proto:74)
> SVN Revision: 2212 build by root at smb-mtl01, 2006-06-13 10:56:44
>  0: cs:Connected st:Primary/Secondary ld:Consistent 
>     ns:0 nr:0 dw:2112 dr:209304 al:522 bm:0 lo:0 pe:0 ua:0 ap:0
>  1: cs:Connected st:Primary/Secondary ld:Consistent 
>     ns:32 nr:0 dw:352 dr:86220 al:3 bm:2 lo:0 pe:0 ua:0 ap:0
>  2: cs:Connected st:Primary/Secondary ld:Consistent 
>     ns:32 nr:0 dw:1632 dr:123328 al:393 bm:3 lo:0 pe:0 ua:0 ap:0
> 
> 
> smb-sjo01:~ # cat /proc/drbd
> version: 0.7.19 (api:78/proto:74)
> SVN Revision: 2212 build by root at smb-sjo01, 2006-06-12 14:55:54
>  0: cs:Connected st:Secondary/Primary ld:Consistent 
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
>  1: cs:Connected st:Secondary/Primary ld:Consistent 
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
>  2: cs:Connected st:Secondary/Primary ld:Consistent 
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
> ------------------------------------------------------------------
> 
> Machine 1: (primary, I believe)
> Link speed: 10 MBps  (T1 line)
> 1 NIC 100 MB/s.
> SuSe 9
> Hardware Configuration:
> Manufactured by IBM (iSeries) running on top of AS/400 810.
> Has 1 CPU at 1GHz.
> Memory is 1024 MB.
> 200 GB disk
> 
> 
> Machine 2:
> Also SuSE 9.
> Same IBM running on iSeries.
> 1 CPU at 1 GHz.
> 512 MB memory.
> 150 GB disk space.
> 1 NIC at 10 MB/s
> 
> Replicating about 100 GB of data.
> 
> 
> Tim Johnson
> Senior Software Engineer
> Vision Solutions, Inc.
> 
> 17911 Von Karman Ave,  5th Floor
> Irvine, CA 92614
> UNITED STATES
> 
> Tel: +1 (949) 253-6528
> Fax: +1 (949) 225-0287
> Email: tjohnson at visionsolutions.com
> <http://www.visionsolutions.com/>
> Disclaimer - 6/21/2006
> The contents of this e-mail (and any attachments) are confidential, 
> may be privileged, and may contain copyright material of Vision 
> Solutions, Inc. or third parties. You may only reproduce or distribute

> the material if you are expressly authorized by Vision Solutions to do

> so. If you are not the intended recipient, any use, disclosure or 
> copying of this e-mail (and any attachments) is unauthorized. If you 
> have received this e-mail in error, please immediately delete it and 
> any copies of it from your system and notify us via e-mail at 
> helpdesk at visionsolutions.com 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
drbd-user at lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user