[DRBD-user] Testing local-io-error handler -- blkid hangs and ties up drbd device

Thu Apr 12 15:40:46 CEST 2012

Thanks Lars, dmesg indeed reported the exit code of 0:

[  332.733554] block drbd575: role( Secondary -> Primary )
[  332.772827] block drbd575: disk( UpToDate -> Failed )
[  332.772840] block drbd575: Local IO failed in __req_mod. Detaching...
[  332.772925] block drbd575: helper command: /sbin/drbdadm local-io-error
minor-575
[  332.790163] block drbd575: helper command: /sbin/drbdadm local-io-error
minor-575 exit code 0 (0x0)
[  332.790189] block drbd575: disk( Failed -> Diskless )
[  332.803862] block drbd575: receiver updated UUIDs to effective data
uuid: 2B81D15C3E0ADD80

The peer node is also locked up, all operations report:

r575: State change failed: (-10) State change was refused by peer node

One question on 8.3.latest, one of the reasons I wanted to use 8.4 was the
support for more minor numbers. It's not that I necessarily need more than
256 on one machine, but the way my numbering system works it makes it nice
to be able to assign minor numbers greater than 255. Is there a quick hack
somewhere in the source that I can increase this limit or is this a more
complex change made for 8.4?

Also the prefer-remote read balancing method is something that I was
interested in, but not super necessary.

Thanks,

Chris

On Thu, Apr 12, 2012 at 9:24 AM, Lars Ellenberg
<lars.ellenberg at linbit.com>wrote:

> On Thu, Apr 12, 2012 at 09:14:38AM -0400, Chris Dickson wrote:
> > Thanks for the quick reply,
> >
> > My test handler currently isn't doing anything interesting, I just had it
> > echo 'hello world' to a file which is located on a different drive than
> the
> > LVM volume. The echo seems to have completed successfully as the file is
> > written.
> >
> > The end goal for the handler is to simply insert a row into a remote DB,
> > other than that the default behavior on io-error of detaching is exactly
> > what I would like to have happen.
> >
> > I just tried filtering out drbd in lvm.conf and that doesn't seem to be
> the
> > issue. After another try I did a quick ps auxf this showed up:
> >
> > root       340  0.0  0.0  21392  1284 ?        Ss   12:59   0:00 udevd
> > --daemon
> > root       415  0.0  0.0  21384   896 ?        S    12:59   0:00  \_
> udevd
> > --daemon
> > root      1775  0.0  0.0   8448   724 ?        D    13:04   0:00  |   \_
> > /sbin/blkid -o udev -p /dev/drbd575
> >
> > So it seems like udev is initiating the blkid call, could it be doing
> this
> > before drbd has finished executing the handler?
>
> If the handler finished,
> (drbd prints "... helper command .... exit code ..." to the kernel log).
> there is no reason for anything to hang.
>
> DRBD is supposed to retry failed local requests on the peer, and if that
> is not possible (no connection, or no good remote disk either), either
> freeze IO (if so configured) or report IO errors back up the stack.
>
> "Supposed to just work".
>
> Maybe rather downgrade to 8.3.latest, I know we fixed some issues
> in the retry logic on the way to 8.4.not-yet-but-"soon"-to-be-released.2
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120412/7650ffc3/attachment.htm>