[DRBD-user] drbd on virtio: WARNING: at block/blk-core.c

Tue Nov 9 14:10:16 CET 2010

On Mon, Nov 08, 2010 at 01:56:43PM +0100, Thomas Vögtle wrote:
> Hello,
> 
> 
> For testing purposes only I test our software and drbd stuff on two
> Virtual Machines (kvm, virtio-net, virtio-blk)
> I'm using Kernel 2.6.32.25.
> 
> Since using drbd-8.3.9 I get following messages (or similar), again and
> again, when DRBD is starting to sync:
> 
> 
> [ 3830.713476] block drbd0: Began resync as SyncSource (will sync
> 7814892 KB [1953723 bits set]).
> [ 3829.057557] block drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0
> [ 3830.739016] ------------[ cut here ]------------
> [ 3830.739143] WARNING: at block/blk-core.c:337 blk_start_queue+0x29/0x42()

void blk_start_queue(struct request_queue *q)
{
	WARN_ON(!irqs_disabled());			<=== there

	queue_flag_clear(QUEUE_FLAG_STOPPED, q);
	__blk_run_queue(q);
}

> [ 3830.739145] Hardware name: Bochs
> [ 3830.739147] Modules linked in: ocfs2 jbd2 ocfs2_nodemanager
> ocfs2_stack_user ocfs2_stackglue dlm bonding dummy drbd cn 8021q garp
> bridge stp llc rpcsec_gss_krb5 nfsd exportfs nfs lockd fscache nfs_acl
> auth_rpcgss sunrpc xt_NOTRACK xt_TCPMSS xt_connmark xt_conntrack
> xt_CONNMARK xt_state xt_policy iptable_nat nf_nat_tftp nf_conntrack_tftp
> nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre
> nf_nat_irc nf_conntrack_irc nf_nat_sip nf_conntrack_sip nf_nat_ftp
> nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack
> autofs4 xfrm_user ipmi_devintf ipmi_msghandler 8139too lcd_module ppdev
> parport_pc parport st tpm_tis virtio_net tpm tpm_bios virtio_balloon
> i2c_piix4 rtc_cmos i2c_core rtc_core rtc_lib evdev button sg [last
> unloaded: ocfs2_stackglue]
> [ 3830.739351] Pid: 22400, comm: path_id Not tainted 2.6.32.25 #1
> [ 3830.739353] Call Trace:
> [ 3830.739355]  <IRQ>  [<ffffffff81183265>] ? blk_start_queue+0x29/0x42
> [ 3830.739416]  [<ffffffff8104ccdb>] warn_slowpath_common+0x77/0x8f
> [ 3830.739420]  [<ffffffff8104cd02>] warn_slowpath_null+0xf/0x11

> [ 3830.739422]  [<ffffffff81183265>] blk_start_queue+0x29/0x42
> [ 3830.739475]  [<ffffffff81239462>] blk_done+0xe0/0xfa

static void blk_done(struct virtqueue *vq)
{
        struct virtio_blk *vblk = vq->vdev->priv;
        struct virtblk_req *vbr;
        unsigned int len;
        unsigned long flags;

        spin_lock_irqsave(&vblk->lock, flags);
        while ((vbr = vblk->vq->vq_ops->get_buf(vblk->vq, &len)) != NULL) {
                int error;

                switch (vbr->status) {
                case VIRTIO_BLK_S_OK:
                        error = 0;
                        break;
                case VIRTIO_BLK_S_UNSUPP:
                        error = -ENOTTY;
                        break;
                default:
                        error = -EIO;
                        break;
                }

                if (blk_pc_request(vbr->req)) {
                        vbr->req->resid_len = vbr->in_hdr.residual;
                        vbr->req->sense_len = vbr->in_hdr.sense_len;
                        vbr->req->errors = vbr->in_hdr.errors;
                }

                __blk_end_request_all(vbr->req, error);
                list_del(&vbr->list);
                mempool_free(vbr, vblk->pool);
        }
        /* In case queue is stopped waiting for more buffers. */
        blk_start_queue(vblk->disk->queue);			<<<==== THERE
        spin_unlock_irqrestore(&vblk->lock, flags);
}

If your kernel source looks like mine, then this would indicate something in
between spin_lock_irqsave and spin_unlock_irqrestore above would enable
spinlocks again, where is must not.

If that something is some part of DRBD, then that would be a serious bug.

If you run with spin lock debug enabled, that may provide some more insight.
We'll try to reproduce here anyways.
You say you simply start drbd 8 in a VM with virtio-blk,
and that warning triggers?

> [ 3830.739514]  [<ffffffff81090d6e>] ? __rcu_process_callbacks+0xf2/0x2a6
> [ 3830.739557]  [<ffffffff811f7a67>] vring_interrupt+0x27/0x30
> [ 3830.739572]  [<ffffffff8108d3e9>] handle_IRQ_event+0x2d/0xb7
> [ 3830.739575]  [<ffffffff8108f005>] handle_edge_irq+0xc1/0x102
> [ 3830.739607]  [<ffffffff810133b5>] handle_irq+0x89/0x94
> [ 3830.739610]  [<ffffffff8101326b>] do_IRQ+0x5a/0xab
> [ 3830.739613]  [<ffffffff81011593>] ret_from_intr+0x0/0x11
> [ 3830.739624]  <EOI>
> [ 3830.739627] ---[ end trace a9e0f5d8de037953 ]---
> [ 3830.739628] ------------[ cut here ]------------
> 
> 
> I don't get any message like this on real hardware.
> 
> This is absolutely reproducable and still exists in git head
> (drbd-8.3.9-5-g7fed7c2).
> 
> It didn't exist in 8.3.8.1.
> 
> Except for the warning DRBD is syncing fine.
> 
> Any clues?
> 
> 
>    Thomas

Thanks,

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed