Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Alright, I figured I update this thread, since six days have now passed. I upgraded DRBD to version 8.4.8-1 last Saturday and I have not had the problem yet. I've done plenty of the same I/O as I was doing before the upgrade and it is looking good so far. I'm using the same exact kernel (gentoo-sources-4.4.6), same drbd-utils, same Xen, etc. The only change is the updated DRBD kernel module. I'll update this thread again in a couple of weeks if no problems, sooner if death happens! Thanks! David -- David Bruzos (Systems Administrator) Jacksonville Port Authority 2831 Talleyrand Ave. Jacksonville, FL 32206 Cell: (904) 625-0969 Office: (904) 357-3069 Email: david.bruzos at jaxport.com On Thu, Jul 21, 2016 at 11:45:01AM -0400, david bruzos wrote: > Hello Lars, > > * Answers: > . Does it reproduce with drbd 8.4.8? > - I don't know, I plan to test that. > . Does it reproduce with non-xen? > - Idon't know, I probably cannot test that at this time. > . Does it reproduce with kernel.org 3.10, 4.0, 4.2? > - It has happened with 4.1.12, 4.1.15, and now 4.4.6 (DRBD version has not changed though). > > * Questions: > . Are there fixes in 8.4.8 that may fix these kinds of issues? > . Do other people experience similar kinds of problems with Xen and DRBD? > > Thank you for your help! > > David > > On Thu, Jul 21, 2016 at 10:09:13AM +0200, Lars Ellenberg wrote: > > On Wed, Jul 20, 2016 at 02:06:40PM -0400, David Bruzos wrote: > > > On Wed, Jul 20, 2016 at 07:00:33PM +0100, Trevor Hemsley wrote: > > > > On 20/07/16 17:34, David Bruzos wrote: > > > > > Hello all, I'm experiencing some kernel oops when using DRBD. I'm > > > > > not sure if the root cause of the problem is DRBD, but it appears > > > > > that DRBD resources are always affected by the oops and I've seen > > > > > other DRBD related messages on the web that look similar to mine. > > > > > > > > > > The system is a Xen dom0 Gentoo system, running kernel 4.4.6, Xen > > > > > 4.6, ZFS 0.6.5.7 and DRBD 8.4.5. The system runs very well while > > > > > it is running, but at some unpredictable point in time, it has > > > > > this problem. The DRBD resource appears to stop working all > > > > > together when this happens. It is a multi-volume DRBD resource on > > > > > top of ZFS zvols. > > > > > > > > My first step whenever I see xen and DRBD mentioned in the same sentence > > > > is to ask if you have disabled sendpage? > > > > > > > > echo "options drbd disable_sendpage=1" >> /etc/modprobe.d/drbd.conf > > > > > > > > then reload the drbd module. > > > > > > > Yes,: > > > # cat /etc/modprobe.d/drbd.conf: options drbd minor_count=254 disable_sendpage=1 > > > # cat /sys/module/drbd/parameters/disable_sendpage: Y > > > > > > Thanks for the reply. > > > > If this is sufficiently easy for you to reproduce, > > you could try to find out if this is kernel version dependend, > > or if using a newer DRBD module would change anything. > > > > Does it reproduce with drbd 8.4.8? > > Does it reproduce with non-xen? > > Does it reproduce with kernel.org 3.10, 4.0, 4.2? > > > > > > > * Here is the relevant output from dmesg when the event happens: > > > > > > > > > > [260175.411938] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 > > > > > [260175.412054] Oops: 0000 [#1] SMP > > > > > > > [260175.413309] CPU: 0 PID: 8884 Comm: drbd_w_vm_ex13- Tainted: P O 4.4.6-gentoo-xen0u #3 > > > > > [260175.413349] Hardware name: Supermicro AS -2022G-URF4+/H8DGU-LN4, BIOS 3.5a 09/25/2015 > > > > > [260175.413420] RIP: e030:[<ffffffff813fb602>] [<ffffffff813fb602>] __memcpy+0x12/0x20 > > > > > > > [260175.413874] Call Trace: > > > > > [260175.413892] [<ffffffff813fff95>] ? copy_from_iter+0x1f5/0x260 > > > > > [260175.413923] [<ffffffff8179ef25>] tcp_sendmsg+0x605/0xaf0 > > > > > [260175.413951] [<ffffffff817c9c45>] inet_sendmsg+0x65/0xa0 > > > > > [260175.415401] [<ffffffff817340a5>] kernel_sendmsg+0x35/0x50 > > > > > [260175.416848] [<ffffffffa0f5f8b1>] drbd_send+0xe1/0x200 [drbd] > > > > > [260175.418262] [<ffffffffa0f5fa79>] __send_command.isra.37+0xa9/0x1d0 [drbd] > > > > > [260175.419674] [<ffffffffa0f617a6>] drbd_send_dblock+0x286/0x640 [drbd] > > > > -- > > : Lars Ellenberg > > : LINBIT | Keeping the Digital World Running > > : DRBD -- Heartbeat -- Corosync -- Pacemaker > > > > DRBD? and LINBIT? are registered trademarks of LINBIT > > __ > > please don't Cc me, but send to list -- I'm subscribed > > _______________________________________________ > > drbd-user mailing list > > drbd-user at lists.linbit.com > > http://lists.linbit.com/mailman/listinfo/drbd-user