[DRBD-user] Re: drbd-user Digest, Vol 2, Issue 35

Thu Sep 23 16:09:55 CEST 2004

Dear Lars Ellenberg
Ok, your idea is also good. Setting up the monitor script to monitor
the disk failure. In case disk failure, we switch to secondary and may
even stop the heartbeat.
I have 2 questions
1) on-io-error detach. On what io error , the node will detach from
cluster? Local disk i/o error or , both local and remote disk i/o
error

2) Suppose I have designed some script to monitor syslog. Then I need
to test my script. So far, my disks work very well. So I can't not
test the correctly of my script.
Beside disconnecting the IDE cable or, remove the harddisk power
cable, what I can do to create the case that, i/o error?

Thank you
Regards,
Seki

On Thu, 23 Sep 2004 14:31:43 +0200 (CEST),
drbd-user-request at lists.linbit.com
<drbd-user-request at lists.linbit.com> wrote:
> Send drbd-user mailing list submissions to
>        drbd-user at lists.linbit.com
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.linbit.com/mailman/listinfo/drbd-user
> or, via email, send a message with subject or body 'help' to
>        drbd-user-request at lists.linbit.com
> 
> You can reach the person managing the list at
>        drbd-user-owner at lists.linbit.com
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of drbd-user digest..."
> 
> Today's Topics:
> 
>   1. Re: Inaccurate disk usage (Lars Ellenberg)
>   2. Re: How to simulate disk failure (Lars Ellenberg)
>   3. drbd tranfer limitations (random and sequential   acccess)
>      using raw (crsurf)
>   4. Re: drbd tranfer limitations (random and sequential       acccess)
>      using raw (Lars Ellenberg)
>   5. Re: "syncer" crash when doing full resync (Philipp Reisner)
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Thu, 23 Sep 2004 01:10:44 +0200
> From: Lars Ellenberg <Lars.Ellenberg at linbit.com>
> Subject: Re: [DRBD-user] Inaccurate disk usage
> To: drbd-user at linbit.com
> Message-ID: <dMfAR12aVvG+me+kaJQ/qBM=lge at web.de>
> Content-Type: text/plain; charset=us-ascii
> 
> / 2004-09-23 10:16:31 +1200
> \ James Doherty:
> > Hi guys,
> >
> > I've been copying files over from our live servers to my DRBD setup in
> > preparation for putting it live. We're running DRBD on top of LVM1 and
> > using ext3 as the filesystem. I'm using 'internal' for meta-disk.
> >
> > However, in the process of copying files over, I managed to fill the
> > DRBD resource up. df -h reports this however:
> >
> > fileserver:~# df -h
> > Filesystem            Size  Used Avail Use% Mounted on
> > /dev/hda1             2.8G  309M  2.3G  12% /
> > /dev/drbd0            1.8G  408M  1.3G  23% /data
> > /dev/drbd1             20G   18G  912M  96% /shares
> >
> > I can't copy files to the /shares drbd resource as I get a "disk is
> > full" error. I'm not really sure who is to blame here. Whether its LVM,
> > DRBD or ext3. Has anyone come across this before?
> 
> hint:
> man tune2fs
> and then probably: tune2fs -m 0 /dev/drbd1
> 
>  :-)
> 
> but I'd feel uneasy with fs usage > 90% in any case
> 
>        Lars Ellenberg
> 
> --
> please use the "List-Reply" function of your email client.
> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 23 Sep 2004 12:35:30 +0200
> From: Lars Ellenberg <Lars.Ellenberg at linbit.com>
> Subject: Re: [DRBD-user] How to simulate disk failure
> To: drbd-user at lists.linbit.com
> Message-ID: <Am4ls6mGgx0TTs0Ao6x8Edg=lge at web.de>
> Content-Type: text/plain; charset=us-ascii
> 
> / 2004-09-23 14:52:00 +0800
> \ Seki Lau:
> > Dear All,
> > As I asked in a couple of hours ago, I have simulated network failure,
> > power failure for my HA file server. My last and major concern is, how
> > to simulate the disk failure. The system has 5 disks (not raided, 4
> > completely share and 1 partially share). I want to confirm that, if
> > any one of them fail, the failover will occur and the secondary will
> > become the primary.
> >
> > I have 2 idea on mind on testing this:1) I plug out the ide cable to
> > simulate hte failure; 2) I remove the dm_mod .
> > First solution may not good because I can't estimate the consequence
> > to the hardware.
> > For the second solution, I get this idea from the CTH and it said it
> > use iptable to simluation network failure and use DM to simulate disk
> > failure. I don't really get the exact meaning of "DM". I guess it is
> > that dm_mod. I check that dm_mode is something related to LVM2 disk
> > mapper. I am not sure what is it about.
> >
> > So, if any one of you can give me some light on the simulation of disk failure?
> 
> it is "device mapper", and it is used by lvm2, among other things. it
> knows about several "targets", one of them "linear" (so it is just a
> linear remapping of block numbers from the virtual to the real device),
> one of them is "error", which does what you'd expect.
> man dmsetup ...
> 
> interesting drbd config options are "on-io-error" ...
> I'd recommend to have "on-io-error detach;", and some monitor
> script reading syslog, and doing a gracefull failover
> (hbstandby/heartbeat stop) in case related messages show up.
> 
>        Lars Ellenberg
> 
> --
> please use the "List-Reply" function of your email client.
> 
> ------------------------------
> 
> Message: 3
> Date: Thu, 23 Sep 2004 09:01:50 -0300
> From: "crsurf" <crsurf at terra.com.br>
> Subject: [DRBD-user] drbd tranfer limitations (random and sequential
>        acccess) using raw
> To: "drbd-user" <drbd-user at lists.linbit.com>
> Message-ID: <I4HTF2$A18787FCDDCAEBD90F4AEFECEC1B8875 at terra.com.br>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hello list
> I´m using drbd in a Gigabit network (crossover cable) and when I synchronize data (sequential read/write) the tranfer hit to 30~40 Mb/s, but when I make inserts into database (random read/write) that access drbd device via raw, the transfer rate hit only 5~6 Mb/s. Exists some way to improve this performance?
> I increased the sndbuf-size to 256K, but no such effect. I increase interface MTU to 9000 and no such effect too.
> One database process that run in 4 min. without replication, run in 30 min. with replication.
> Now I´m trying using filesystem jfs instead raw to view if the performance will be improved.
> Maybe sync-nice can help us?
> There is my drbd.conf
> [root at SRVSYB02 nagios]# cat /etc/drbd.conf
> # Sybase Master
>       resource drbd0 {
>         protocol=C
>         fsckcmd=/bin/true
>         #skip-wait
>         load-only
>         inittimeout=30
>         disk {
>           do-panic
>           disk-size = 63461
>         }
>         net {
>           skip-sync
>           sync-min    = 599999
>           sync-max    = 600000 # maximal average syncer bandwidth
>           tl-size     = 5000   # transfer log size, ensures strict write ordering
>           timeout     = 60     # 0.1 seconds
>           connect-int = 10     # seconds
>           ping-int    = 10     # seconds
>           sndbuf-size = 262144 #
>         }
>         on SRVSYB01 {
>           device=/dev/nb0
>           disk=/dev/vg01/syb.master
>           address=172.22.2.4
>           port=7789
>         }
>         on SRVSYB02 {
>           device=/dev/nb0
>           disk=/dev/vg01/syb.master
>           address=172.22.2.5
>           port=7789
>         }
>       }
> # Sybase Sybsystemprocs
>       resource drbd1 {
>         protocol=C
>         fsckcmd=/bin/true
>         #skip-wait
>         load-only
>         inittimeout=30
>         disk {
>           do-panic
>           disk-size = 206269
>         }
>         net {
>           skip-sync
>           sync-min    = 599999
>           sync-max    = 600000 # maximal average syncer bandwidth
>           tl-size     = 5000   # transfer log size, ensures strict write ordering
>           timeout     = 60     # 0.1 seconds
>           connect-int = 10     # seconds
>           ping-int    = 10     # seconds
>           sndbuf-size = 262144 #
>         }
>         on SRVSYB01 {
>           device=/dev/nb1
>           disk=/dev/vg01/syb.sybprocs
>           address=172.22.2.4
>           port=7790
>         }
>         on SRVSYB02 {
>           device=/dev/nb1
>           disk=/dev/vg01/syb.sybprocs
>           address=172.22.2.5
>           port=7790
>         }
>       }
> # Sybase Banco01
>       resource drbd2 {
>         protocol=C
>         fsckcmd=/bin/true
>         #skip-wait
>         load-only
>         inittimeout=30
>         disk {
>           do-panic
>           disk-size = 33555244
>         }
>         net {
>           skip-sync
>           sync-min    = 599999
>           sync-max    = 600000 # maximal average syncer bandwidth
>           tl-size     = 5000   # transfer log size, ensures strict write ordering
>           timeout     = 60     # 0.1 seconds
>           connect-int = 10     # seconds
>           ping-int    = 10     # seconds
>           sndbuf-size = 262144 #
>         }
>         on SRVSYB01 {
>           device=/dev/nb2
>           disk=/dev/vg01/syb.bco01
>           address=172.22.2.4
>           port=7791
>         }
>         on SRVSYB02 {
>           device=/dev/nb2
>           disk=/dev/vg01/syb.bco01
>           address=172.22.2.5
>           port=7791
>         }
>       }
> # Sybase Log01
>       resource drbd3 {
>         protocol=C
>         fsckcmd=/bin/true
>         #skip-wait
>         load-only
>         inittimeout=30
>         disk {
>           do-panic
>           disk-size = 8389856
>         }
>         net {
>           skip-sync
>           sync-min    = 599999
>           sync-max    = 600000 # maximal average syncer bandwidth
>           tl-size     = 5000   # transfer log size, ensures strict write ordering
>           timeout     = 60     # 0.1 seconds
>           connect-int = 10     # seconds
>           ping-int    = 10     # seconds
>           sndbuf-size = 262144 #
>         }
>         on SRVSYB01 {
>           device=/dev/nb3
>           disk=/dev/vg01/syb.log01
>           address=172.22.2.4
>           port=7792
>         }
>         on SRVSYB02 {
>           device=/dev/nb3
>           disk=/dev/vg01/syb.log01
>           address=172.22.2.5
>           port=7792
>         }
>       }
> 
> Grateful
> Cristiano da Costa
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://lists.linbit.com/pipermail/drbd-user/attachments/20040923/c501221a/attachment.html
> 
> ------------------------------
> 
> Message: 4
> Date: Thu, 23 Sep 2004 14:26:04 +0200
> From: Lars Ellenberg <Lars.Ellenberg at linbit.com>
> Subject: Re: [DRBD-user] drbd tranfer limitations (random and
>        sequential      acccess) using raw
> To: drbd-user <drbd-user at lists.linbit.com>
> Message-ID: <ZeTnUDGdNGOjGjvw4RtsrKc=lge at web.de>
> Content-Type: text/plain; charset=us-ascii
> 
> / 2004-09-23 09:01:50 -0300
> \ crsurf:
> > Hello list
> > I?m using drbd in a Gigabit network (crossover cable) and when I
> > synchronize data (sequential read/write) the tranfer hit to 30~40
> > Mb/s, but when I make inserts into database (random read/write) that
> > access drbd device via raw, the transfer rate hit only 5~6 Mb/s.
> > Exists some way to improve this performance?  I increased the
> > sndbuf-size to 256K, but no such effect. I increase interface MTU to
> > 9000 and no such effect too.  One database process that run in 4 min.
> > without replication, run in 30 min. with replication.  Now I?m trying
> > using filesystem jfs instead raw to view if the performance will be
> > improved.
> > Maybe sync-nice can help us?
> 
> I am not sure which transfer rate you are talking about.
> the throughput of your applications, or the resync throughput?
> 
> or the resync throughput while you have resynchronization and
> applications running concurrently?
> 
> and when you talk about sync-nice, you are using drbd 0.6 ?
> 
> maybe you want to use 0.7 instead. it reduces the amount of data
> transfered to a minimum, thus reducing the time for resynchronization to
> some one to three minutes, typically.
> 
>        lge
> 
> ------------------------------
> 
> Message: 5
> Date: Thu, 23 Sep 2004 14:31:43 +0200
> From: Philipp Reisner <philipp.reisner at linbit.com>
> Subject: Re: [DRBD-user] "syncer" crash when doing full resync
> To: drbd-user at lists.linbit.com
> Message-ID: <200409231431.43092.philipp.reisner at linbit.com>
> Content-Type: text/plain;  charset="iso-8859-15"
> 
> On Wednesday 22 September 2004 23:26, Kohari, Moiz wrote:
> > Folks,
> >
> >
> >
> > I am seeing a drbdd oops almost exact in nature to the one below, certainly
> > looks like memory corruption.  It is happening in the same spot
> > (drbd_end_req()).
> >
> >
> >
> > Because this is happening somewhat consistently within the drbd subsystem
> > and no where else, I wonder if the corruption is coming from within drbd?
> > Was this issue ever resolved?
> >
> >
> >
> > I am using drbd version 0.6.12, has anyone seen this problem with newer
> > versions of drbd?
> >
> 
> The thing is, the corrupt memory bank is in the secondary, but the
> primary crashes !!!
> 
> We have a fix for this on the DRBD-0.8 roadmap.
> 
> -Philipp
> --
> : Dipl-Ing Philipp Reisner                      Tel +43-1-8178292-50 :
> : LINBIT Information Technologies GmbH          Fax +43-1-8178292-82 :
> : Schönbrunnerstr 244, 1120 Vienna, Austria    http://www.linbit.com :
> 
> ------------------------------
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 
> End of drbd-user Digest, Vol 2, Issue 35
> ****************************************
>