[DRBD-user] Testing local-io-error handler -- blkid hangs and ties up drbd device

Thu Apr 12 17:18:23 CEST 2012

A little more info:

If I set the the node with the good disk to primary, then write 100MB to
the drbd volume, the drbd node with the bad disk calls my handler
successfully, detaches and does not hang. It seems to only hang when I
change the node with the bad disk's role to Primary.

On Thu, Apr 12, 2012 at 9:40 AM, Chris Dickson <chrisd1100 at gmail.com> wrote:

> Thanks Lars, dmesg indeed reported the exit code of 0:
>
> [  332.733554] block drbd575: role( Secondary -> Primary )
> [  332.772827] block drbd575: disk( UpToDate -> Failed )
> [  332.772840] block drbd575: Local IO failed in __req_mod. Detaching...
> [  332.772925] block drbd575: helper command: /sbin/drbdadm local-io-error
> minor-575
> [  332.790163] block drbd575: helper command: /sbin/drbdadm local-io-error
> minor-575 exit code 0 (0x0)
> [  332.790189] block drbd575: disk( Failed -> Diskless )
> [  332.803862] block drbd575: receiver updated UUIDs to effective data
> uuid: 2B81D15C3E0ADD80
>
> The peer node is also locked up, all operations report:
>
> r575: State change failed: (-10) State change was refused by peer node
>
> One question on 8.3.latest, one of the reasons I wanted to use 8.4 was the
> support for more minor numbers. It's not that I necessarily need more than
> 256 on one machine, but the way my numbering system works it makes it nice
> to be able to assign minor numbers greater than 255. Is there a quick hack
> somewhere in the source that I can increase this limit or is this a more
> complex change made for 8.4?
>
> Also the prefer-remote read balancing method is something that I was
> interested in, but not super necessary.
>
> Thanks,
>
> Chris
>
> On Thu, Apr 12, 2012 at 9:24 AM, Lars Ellenberg <lars.ellenberg at linbit.com
> > wrote:
>
>> On Thu, Apr 12, 2012 at 09:14:38AM -0400, Chris Dickson wrote:
>> > Thanks for the quick reply,
>> >
>> > My test handler currently isn't doing anything interesting, I just had
>> it
>> > echo 'hello world' to a file which is located on a different drive than
>> the
>> > LVM volume. The echo seems to have completed successfully as the file is
>> > written.
>> >
>> > The end goal for the handler is to simply insert a row into a remote DB,
>> > other than that the default behavior on io-error of detaching is exactly
>> > what I would like to have happen.
>> >
>> > I just tried filtering out drbd in lvm.conf and that doesn't seem to be
>> the
>> > issue. After another try I did a quick ps auxf this showed up:
>> >
>> > root       340  0.0  0.0  21392  1284 ?        Ss   12:59   0:00 udevd
>> > --daemon
>> > root       415  0.0  0.0  21384   896 ?        S    12:59   0:00  \_
>> udevd
>> > --daemon
>> > root      1775  0.0  0.0   8448   724 ?        D    13:04   0:00  |   \_
>> > /sbin/blkid -o udev -p /dev/drbd575
>> >
>> > So it seems like udev is initiating the blkid call, could it be doing
>> this
>> > before drbd has finished executing the handler?
>>
>> If the handler finished,
>> (drbd prints "... helper command .... exit code ..." to the kernel log).
>> there is no reason for anything to hang.
>>
>> DRBD is supposed to retry failed local requests on the peer, and if that
>> is not possible (no connection, or no good remote disk either), either
>> freeze IO (if so configured) or report IO errors back up the stack.
>>
>> "Supposed to just work".
>>
>> Maybe rather downgrade to 8.3.latest, I know we fixed some issues
>> in the retry logic on the way to 8.4.not-yet-but-"soon"-to-be-released.2
>>
>> --
>> : Lars Ellenberg
>> : LINBIT | Your Way to High Availability
>> : DRBD/HA support and consulting http://www.linbit.com
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120412/34d03643/attachment.htm>