Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, May 13, 2008 at 10:58:29AM +0200, Iustin Pop wrote: > On Tue, May 13, 2008 at 10:50:22AM +0200, Lars Ellenberg wrote: > > On Sat, May 10, 2008 at 12:28:00PM +0200, Iustin Pop wrote: > > > Philipp Reisner wrote: > > > > Am Sonntag, 4. Mai 2008 02:19:12 schrieb Wolfgang Denk: > > > > > Hi, > > > > > > > > > > I'm trying to run DRBD on top of a LV, and get flooded with above > > > > > error messages. I know this has been discussed before, see threads > > > > > starting at > > > > > http://lists.linbit.com/pipermail/drbd-user/2008-February/008665.html > > > > > and > > > > > http://lists.linbit.com/pipermail/drbd-user/2008-February/008519.html > > > > > > > > > > When this was discussed in February, it sounded (at least to me) as is > > > > > a fix was on the way, see > > > > > http://lists.linbit.com/pipermail/drbd-user/2008-February/008692.html > > > > > > > > > > However, even top of tree from the git repo still shows the same > > > > > behaviour. > > > > > > > > > > Am I missing something, or is this usage mode so exotic that nobody > > > > > cares? > > > > > > > > > > > > > Hi Wolfgang, > > > > > > > > That is actually a kernel bug, I think in 2.6.24. Was fixed later, do not > > > > know by heart with which "sucker" release. I guess it is fixed in 2.6.25. > > > > > > > > Starting with 8.0.12 we offer a workaround for this in DRBD (and 8.2.6 > > > > when I finally find the time to finish it): > > > > > > > > Add no-disk-flushes and no-md-flushes to your disk config. > > > > > > Because this happens not only with LVM, but with any I/O subsystem that > > > returns wrong error codes from flushes (e.g. broken scsi drivers or > > > controller, I think), would it be a sane thing to disable barriers > > > automatically if there after a certain number of errors? > > > > > > (Looking at the barrier flush code I see that only the drbd_receiver.c > > > has code for auto-disabling in case of EOPNOTSUPP, but drbd_actlog and > > > drbd_bitmap.c don't; maybe these too should have this). > > > > hm? > > I think we do have a retry-and-disable-barriers in those places too. > > I must be wrong then; I'm looking at the drbd 8.0 git tree, and I see in > drbd_bitmap.c: > > if (rw == WRITE) { > /* swap back endianness */ > bm_lel_to_cpu(b); > /* flush bitmap to stable storage */ > if (!test_bit(MD_NO_BARRIER,&mdev->flags)) > blkdev_issue_flush(mdev->bc->md_bdev, NULL); > > (around line 745). This just issues the flush, and no retry/disable in place > (it uses the same blkdev_issue_flush as drbd_receiver.c, and there's no check > of the return value). > > What am I missing here? Wrong git tree? grep for set_bit MD_NO_BARRIER > > > The reason I propose this is because with many deployments on different > > > machines it would be better to let it always enabled at startup and > > > allow it to autodisable if it see EOPNOTSUPP > > > > that is the way we do it. > > > > > or too many other errors. > > > > and that is what we don't. > > Would it make sense to do it if no blkdev_issue_flush is ever successfull? > > > > And people can't always track latest upstream kernel... > > > > if they are stuck with a kernel where DRBD spits out too much > > noise due to barrier requests throwing IO errors, > > then they have to disable use of barriers in the drbd config. > > Ok, let me explain some more. If you have deployments on the order of hundreds > of machines, with various types of controllers, it would be easier to let the > config always have barriers enabled and rely on auto-disable if *no single > flush is ever successfull*. buy a support contract, have a script parse log files and auto-adjust them, or send a patch. -- : Lars Ellenberg http://www.linbit.com : : DRBD/HA support and consulting sales at linbit.com : : LINBIT Information Technologies GmbH Tel +43-1-8178292-0 : : Vivenotgasse 48, A-1120 Vienna/Europe Fax +43-1-8178292-82 : __ please don't Cc me, but send to list -- I'm subscribed