[DRBD-user] kernel oops and null pointer dereference when using DRBD

David Bruzos david.bruzos at jaxport.com
Thu Jul 21 17:45:01 CEST 2016

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello Lars,

* Answers:
    . Does it reproduce with drbd 8.4.8?
        - I don't know, I plan to test that.
    . Does it reproduce with non-xen?
        - Idon't know, I probably cannot test that at this time.
    . Does it reproduce with kernel.org 3.10, 4.0, 4.2?
        - It has happened with 4.1.12, 4.1.15, and now 4.4.6 (DRBD version has not changed though).
 
* Questions:
    . Are there fixes in 8.4.8 that may fix these kinds of issues?
    . Do other people experience similar kinds of problems with Xen and DRBD?

Thank you for your help!

David

On Thu, Jul 21, 2016 at 10:09:13AM +0200, Lars Ellenberg wrote:
> On Wed, Jul 20, 2016 at 02:06:40PM -0400, David Bruzos wrote:
> > On Wed, Jul 20, 2016 at 07:00:33PM +0100, Trevor Hemsley wrote:
> > > On 20/07/16 17:34, David Bruzos wrote:
> > > > Hello all, I'm experiencing some kernel oops when using DRBD.  I'm
> > > > not sure if the root cause of the problem is DRBD, but it appears
> > > > that DRBD resources are always affected by the oops and I've seen
> > > > other DRBD related messages on the web that look similar to mine.
> > > >
> > > > The system is a Xen dom0 Gentoo system, running kernel 4.4.6, Xen
> > > > 4.6, ZFS 0.6.5.7 and DRBD 8.4.5.  The system runs very well while
> > > > it is running, but at some unpredictable point in time, it has
> > > > this problem.  The DRBD resource appears to stop working all
> > > > together when this happens.  It is a multi-volume DRBD resource on
> > > > top of ZFS zvols.
> > >
> > > My first step whenever I see xen and DRBD mentioned in the same sentence
> > > is to ask if you have disabled sendpage?
> > > 
> > > echo "options drbd disable_sendpage=1" >> /etc/modprobe.d/drbd.conf
> > > 
> > > then reload the drbd module.
> > > 
> > Yes,:
> >     # cat /etc/modprobe.d/drbd.conf: options drbd minor_count=254 disable_sendpage=1
> >     # cat /sys/module/drbd/parameters/disable_sendpage: Y
> > 
> > Thanks for the reply.
> 
> If this is sufficiently easy for you to reproduce,
> you could try to find out if this is kernel version dependend,
> or if using a newer DRBD module would change anything.
> 
> Does it reproduce with drbd 8.4.8?
> Does it reproduce with non-xen?
> Does it reproduce with kernel.org 3.10, 4.0, 4.2?
> 
> > > > * Here is the relevant output from dmesg when the event happens:
> > > >
> > > > [260175.411938] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> > > > [260175.412054] Oops: 0000 [#1] SMP 
> 
> > > > [260175.413309] CPU: 0 PID: 8884 Comm: drbd_w_vm_ex13- Tainted: P           O    4.4.6-gentoo-xen0u #3
> > > > [260175.413349] Hardware name: Supermicro AS -2022G-URF4+/H8DGU-LN4, BIOS 3.5a       09/25/2015
> > > > [260175.413420] RIP: e030:[<ffffffff813fb602>]  [<ffffffff813fb602>] __memcpy+0x12/0x20
> 
> > > > [260175.413874] Call Trace:
> > > > [260175.413892]  [<ffffffff813fff95>] ? copy_from_iter+0x1f5/0x260
> > > > [260175.413923]  [<ffffffff8179ef25>] tcp_sendmsg+0x605/0xaf0
> > > > [260175.413951]  [<ffffffff817c9c45>] inet_sendmsg+0x65/0xa0
> > > > [260175.415401]  [<ffffffff817340a5>] kernel_sendmsg+0x35/0x50
> > > > [260175.416848]  [<ffffffffa0f5f8b1>] drbd_send+0xe1/0x200 [drbd]
> > > > [260175.418262]  [<ffffffffa0f5fa79>] __send_command.isra.37+0xa9/0x1d0 [drbd]
> > > > [260175.419674]  [<ffffffffa0f617a6>] drbd_send_dblock+0x286/0x640 [drbd]
> 
> -- 
> : Lars Ellenberg
> : LINBIT | Keeping the Digital World Running
> : DRBD -- Heartbeat -- Corosync -- Pacemaker
> 
> DRBD? and LINBIT? are registered trademarks of LINBIT
> __
> please don't Cc me, but send to list -- I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user



More information about the drbd-user mailing list