[DRBD-user] drbd freeze / "out of range" message

Mon Nov 7 22:36:12 CET 2005

/ 2005-11-07 18:30:53 +0100
\ Markus Marquardt:
> Hi,
> 
> we encounter the same "freeze" problem with our dell PE2850 servers
> (2x Xeon 3 GHz, 4 GB ram, Perc Raid 4/Di (Megaraid), latest FC4 kernel
> 2.6.13, drbd-0.7.13, latest BIOS & raid firmware) as some other people
> here.

DRBD 0.7.13 contains a bug (introduced in 0.7.12, while fixing a much
completely harmless other buglet), that triggers on SMP only, and will
freeze (deadlock) a SyncSource Primary, if it is busy io-wise.

this is fixed in 0.7.14.
you should have received an announcement via the drbd-announce list.

> We can reproduce this by:
> 
> o On secondary: drbdadm invalidate drbd0
> o On primary: cat /dev/zero > /drbd/mybigfile

exactly.

> Another oddity after we've done some repartitioning of the underlying disk partitions:
> 
> When stopping heartbeat on a cluster node (which implies a /etc/ha.d/resource.d/drbddisk stop) some messages like
> 
> ... drbd_md_sync_page_io ... out of range ...
> 
> appear on the console. What is this?

well. that is an other issue.
do you use "external" or "internal" drbd meta data?
how exactly did you "repartiton"?

can you provide
  cat /proc/partitions
  drbdadm dump
  drbdsetup /dev/drbd0 show # (or which device shows it)
and the _complete_ line from syslog (i.e. including the ugly numbers)
just the first and last line, maybe, in case there are many (probably
with consecutive numbers)

thanks,

-- 
: Lars Ellenberg                                  Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH            Fax +43-1-8178292-82 :
: Schoenbrunner Str. 244, A-1120 Vienna/Europe   http://www.linbit.com :
__
please use the "List-Reply" function of your email client.