[DRBD-user] DRBD unable to do IO tasks - SLES 10 SP2

Lars Ellenberg lars.ellenberg at linbit.com
Tue Dec 9 16:50:21 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Dec 09, 2008 at 10:51:51AM -0200, Carlos Eduardo Pedroza Santiviago wrote:
> Hi Florian!
> 
> On Tue, Dec 9, 2008 at 7:54 AM, Florian Haas <florian.haas at linbit.com> wrote:
> > Carlos,
> >
> > you really want to check out the "DRBD fundamentals" chapter in the
> > User's Guide, specifically the section about resource roles at
> > http://www.drbd.org/users-guide/s-resource-roles.html.
> >
> 
> Thanks for pointing the guide, but i already have some knowledge about
> using DRBD. In fact, we've using it for several years! Currently we
> have 20 servers using it. It has always been straight and simple. But
> this one is somewhat different. I'll try to reproduce here what's
> going on:
> 
> chi02b:/dev/dados # cat /proc/drbd
> version: 0.7.22 (api:79/proto:74)
> SVN Revision: 2572 build by lmb at dale, 2006-10-25 18:17:21
>  0: cs:StandAlone st:Secondary/Unknown ld:Inconsistent
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
> chi02b:/dev/dados # drbdsetup /dev/drbd0 primary --do-what-I-say
> ioctl(,SET_STATE,) failed: Input/output error

if some ioctl returns EIO, that does not neccessarily mean there was
actually a read/write error.

> Local replica is inconsistent (--do-what-I-say ?)

in this case it probably means that this drbd is diskless.
why it detached/never attached, the kernel logs may know.

> chi02b:/dev/dados # rcdrbd stop
> Stopping all DRBD resourcesERROR: Module drbd is in use
> .
> chi02b:/dev/dados # rmmod drbd
> ERROR: Module drbd is in use
> chi02b:/dev/dados # cat /proc/drbd
> version: 0.7.22 (api:79/proto:74)
> SVN Revision: 2572 build by lmb at dale, 2006-10-25 18:17:21
>  0: cs:Unconfigured
> chi02b:/dev/dados # lsmod | grep drbd
> drbd                  218984  1


the big question is,
what happened before.

how did you get into that situation.

what do the kernel logs say since this drbd was last Connected and
happily replicating along?

and yes, it is absolutely possible that in some older 0.7 drbd,
some failure recovery code paths are buggy.

sorry, but on this list we just have to say "try to reproduce with a
current version of drbd".

> chi02b:/dev/dados # grep disk /etc/drbd.conf
>   disk {
>     disk       /dev/mapper/dados-shared;
>     meta-disk  internal;
>     disk      /dev/mapper/dados-shared;
>     meta-disk internal;
> chi02b:/dev/dados # ls -la /dev/mapper/dados-shared
> brw------- 1 root root 253, 15 Dec  6 16:26 /dev/mapper/dados-shared
> chi02b:/dev/dados # mkfs.ext3 /dev/mapper/dados-shared

irrelevant.

> chi02b:/dev/dados # drbdadm primary all
> ioctl(,SET_STATE,) failed: No such device or address
> Device not configured
> Command 'drbdsetup /dev/drbd0 primary' terminated with exit code 20
> drbdsetup exited with code 20
> chi02b:/dev/dados # rcdrbd start
> Starting DRBD resources:    [ d0 ioctl(,SET_DISK_CONFIG,) failed:
> Device or resource busy
> 
> cmd /sbin/drbdsetup /dev/drbd0 disk /dev/mapper/dados-shared internal
> -1 --on-io-error=panic  failed!
> chi02b:/dev/dados # rcdrbd stop
> Stopping all DRBD resourcesERROR: Module drbd is in use
> .


> So, at this point, i have some questions:
> 
> 1. if the backend storage is OK, i can write to it, why drbd is
> complaining about "device busy"?

it possibly complains about the drbd0 being busy.
I don't know.
it apears to be pretty much screwed up.

reboot the box and try again.

> 2. Why the drbd module cannot be unloaded, since there isn't resources
> being used?

again, guesswork. maybe because some error recovery code path in your
drbd version is buggy and forgot to decrease the module usage count.

> 3. Is there any way to discover what is using the drbd module,
> preventing it from being rmmoded?

1. kernel logs

2. crashdump and forensic analysis.

3. if you feel comfortable with attaching gdb to a running kernel,
   and your kernel allows that, you could poke around there.
   chances are that you will crash the box anyways, though.

4. find a way to reliably reproduce the situation, preferably quickly,
   and do so in a well prepared test setup, possibly sprinkling printks
   all over the code.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list