[DRBD-user] Problem with 3.5.3 and drbd 8.4.2

Lars Ellenberg lars.ellenberg at linbit.com
Mon Sep 17 09:55:23 CEST 2012

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Sep 14, 2012 at 11:00:35AM +0000, Holger Kiehl wrote:
> Hello,
> 
> Got the following error situation where I do not know why it happened. In
> /var/log/messages I found the following:

>From what I see, drbd does a simple printk only.

And your fb console wants to print those,
but thinks it needs to scroll something to show it.

That results in some memory areas being mapped/unmapped,
which should have triggered a "might sleep" warning before this already.

Anyways, it ends up needing a tlb flush,
which iterates over all cpus, and that triggers the
"someone calls smp_call_function_many with irq disabled,
 and that could deadlock! WTF?" check right there.
Because, well, vprintk_emit disabled irqs.

So I guess, get rid of your funky fb console and be happy.

Or get someone to fix that mga g200 fb driver for you...

>    Sep 14 08:32:06 praktifix kernel: WARNING: at kernel/smp.c:461 smp_call_function_many+0x6c/0x1bb()
>    Sep 14 08:32:06 praktifix kernel: Hardware name: PRIMERGY RX300 S4
>    Sep 14 08:32:06 praktifix kernel: Modules linked in: drbd(O) coretemp ipmi_devintf ipmi_si bonding binfmt_misc video acpi_ipmi ipmi_msghandler ac nvram sr_mod cdrom sg usbhid mgag200 fbcon ttm tileblit font bitblit softcursor drm_kms_helper drm i2c_algo_bit sysimgblt sysfillrect syscopyarea i5k_amb pata_acpi i2c_i801 ata_generic i2c_core i5000_edac ehci_hcd uhci_hcd usbcore usb_common [last unloaded: microcode]
>    Sep 14 08:32:06 praktifix kernel: Pid: 4442, comm: drbd_r_r0 Tainted: G           O 3.5.3 #1
>    Sep 14 08:32:06 praktifix kernel: Call Trace:
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff81060411>] ? smp_call_function_many+0x6c/0x1bb
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff8102ab0e>] warn_slowpath_common+0x80/0x99
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff8102ab3c>] warn_slowpath_null+0x15/0x17
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff81060411>] smp_call_function_many+0x6c/0x1bb
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff81024bd1>] ? leave_mm+0x43/0x43
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff81024bd1>] ? leave_mm+0x43/0x43
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff810605c2>] smp_call_function+0x20/0x24
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff810606a9>] on_each_cpu+0x16/0x32
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff81024aa3>] flush_tlb_all+0x17/0x19
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff8109f971>] __purge_vmap_area_lazy+0x122/0x17a
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff8109fa4b>] free_vmap_area_noflush+0x54/0x5b
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff810a05e9>] free_unmap_vmap_area+0x20/0x24
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff810a064a>] remove_vm_area+0x5d/0x71
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff810a076a>] __vunmap+0x38/0xb5
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff810a080d>] vunmap+0x26/0x28
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa00e9fb7>] ttm_bo_kunmap+0x55/0xa3 [ttm]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa00fc6a3>] mga_dirty_update+0x10b/0x122 [mgag200]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa00fc6e4>] mga_imageblit+0x2a/0x2f [mgag200]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa00ca7a4>] bit_putcs+0x44b/0x4b0 [bitblit]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa00cacf7>] ? bit_cursor+0x4ee/0x7f7 [bitblit]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa0103a74>] fbcon_putcs+0xa1/0x101 [fbcon]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa00ca359>] ? bit_clear+0xd6/0xd6 [bitblit]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa0105232>] fbcon_redraw+0xd8/0x16c [fbcon]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa0104494>] ? fbcon_cursor+0x127/0x150 [fbcon]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa00ca809>] ? bit_putcs+0x4b0/0x4b0 [bitblit]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa01071c5>] fbcon_scroll+0x687/0xc6c [fbcon]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff8102bc9f>] ? console_unlock+0x2e0/0x2ef
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff811ee543>] scrup+0x71/0xe8
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff811ee64e>] lf+0x2d/0x66
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff811f3119>] vt_console_print+0x1d9/0x304
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff8102afd5>] call_console_drivers+0x7b/0x8d
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff8102bc1f>] console_unlock+0x260/0x2ef
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff8102c435>] vprintk_emit+0x302/0x364
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff8102c97c>] printk_emit+0x88/0x8a
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff8104cd4b>] ? __wake_up+0x43/0x50
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff812fb18d>] ? netlink_broadcast_filtered+0x28e/0x2bb
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff81205d8b>] __dev_printk+0x1d2/0x1e4
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa01a5c82>] ? drbd_bcast_event+0xd7/0x11c [drbd]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa01a949f>] ? drbd_khelper+0x1cc/0x1ff [drbd]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff8120636a>] dev_printk+0xa9/0xab
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa018f551>] ? drbd_recv+0x26/0x15a [drbd]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa018f551>] ? drbd_recv+0x26/0x15a [drbd]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa018e4d1>] drbd_sync_handshake+0x34b/0x548 [drbd]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa0194d8d>] receive_state+0x3ce/0x75d [drbd]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa01908fc>] drbdd+0x9d/0x13a [drbd]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa019118c>] drbdd_init+0x79/0x98 [drbd]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa01a2b38>] drbd_thread_setup+0x97/0x13f [drbd]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff81377254>] kernel_thread_helper+0x4/0x10
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffffa01a2aa1>] ? drbd_bmio_clear_n_write+0x149/0x149 [drbd]
>    Sep 14 08:32:06 praktifix kernel:  [<ffffffff81377250>] ? gs_change+0xb/0xb
>    Sep 14 08:32:06 praktifix kernel: ---[ end trace 8b6e7b6ecbb1b906 ]---
>    Sep 14 08:32:06 praktifix kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0
>    Sep 14 08:32:06 praktifix kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
>    Sep 14 08:32:06 praktifix kernel: d-con r0: conn( NetworkFailure -> Disconnecting )
>    Sep 14 08:32:06 praktifix kernel: d-con r0: error receiving ReportState, e: -5 l: 0!
>    Sep 14 08:32:06 praktifix kernel: d-con r0: Connection closed
>    Sep 14 08:32:06 praktifix kernel: d-con r0: conn( Disconnecting -> StandAlone )
>    Sep 14 08:32:06 praktifix kernel: d-con r0: receiver terminated
>    Sep 14 08:32:06 praktifix kernel: d-con r0: Terminating receiver thread
> 
> Before this it was running kernel 3.2.x and drbd 8.4.1 for a long time
> without any errors. Any clue why this happened?
> 
> If more information is needed please just ask.
> 
> Regards,
> Holger
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list