Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Sun, Apr 04, 2010 at 01:44:24PM +0200, Alexander Thieme wrote: > Hi List, > > I am using XenServer 5.5 Update 2 with 2 servers in a XenPool and > DRBD 8.3.5. Both nodes share a DRBD device and both are primary. > > When I try to do an online verify of the resource one of the nodes crashes: > Node 1, where I execute the command "drbdadm verify <resource>" > continues to run, whereas node 2 crashes. on freenode #drbd, you also mentioned http://pastebin.com/sixfY1JE which I quote here: 1. block drbd1: role( Secondary -> Primary ) 2. block drbd1: conn( StandAlone -> Unconnected ) 3. block drbd1: Starting receiver thread (from drbd1_worker [4987]) 4. block drbd1: receiver (re)started 5. block drbd1: conn( Unconnected -> WFConnection ) 6. block drbd1: Handshake successful: Agreed network protocol version 91 7. block drbd1: conn( WFConnection -> WFReportParams ) 8. BUG: warning at kernel/softirq.c:143/local_bh_enable() (Not tainted) 9. [<c010681a>] show_trace_log_lvl+0x1a/0x30 10. [<c0107052>] show_trace+0x12/0x20 11. [<c0107079>] dump_stack+0x19/0x20 12. [<c012cf08>] local_bh_enable+0xa8/0xb0 13. [<c02ba7ce>] lock_sock+0x8e/0xa0 14. [<c02ed8c8>] tcp_setsockopt+0xb8/0x3b0 15. [<c02b9fe2>] sock_common_setsockopt+0x22/0x30 16. [<f0718a7f>] drbd_worker+0x29f/0x480 [drbd] 17. [<f0734697>] drbd_thread_setup+0x137/0x1f0 [drbd] 18. [<c0103005>] kernel_thread_helper+0x5/0x10 19. ======================= Let me point you to 4.6 of the Xen Faq http://wiki.xensource.com/xenwiki/XenFaq#head-a6ff59c593b136e2427534df391262f8b4ea7b1e 4.6. I get "Badness in local_bh_enable at kernel/softirq.c" messages, why is this? This is fairly likely to be caused by a module compiled for native i386 rather than Xen. When building modules outside of the Xen build tree, use make ARCH=xen .... Alternative, this may be a driver that uses interrupt en/disabling instructions directly rather than the proper API.... Well, and in this case, I'd suggest this is simply a broken build. And the below are results of the same. "Works for me". > I have a crash dump and I am not complete sure which lines are most > interesting. Probably it is: > Call Trace: > [c01014a7] hypercall_page+0x4a7 (37: __HYPERVISOR_kexec_op) > c025f6c1 machine_kexec+0x21 > c01d0069 sys_semctl+0x919 > c01d0069 sys_semctl+0x919 > c014a626 crash_kexec+0x66 > c01dcf48 md5_update+0x88 > c0106e8f die+0x34f > c011a4eb do_page_fault+0x68b > c0119e60 vmalloc_sync_all+0x4b0 > c01060d3 error_code+0x2b > c01dcf48 md5_update+0x88 > c01db7e5 update+0xa5 > c0178da7 __kmalloc+0xc7 > c02b9fe2 sock_common_setsockopt+0x22 > c0130fd1 del_timer_sync+0x11 > c0103005 kernel_thread_helper+0x5 > > When I change the hash function to sha1, I get the following: > > Call Trace: > [c01014a7] hypercall_page+0x4a7 (37: __HYPERVISOR_kexec_op) > c025f6c1 machine_kexec+0x21 > c0320069 xfrm_state_find+0x7e9 > c0320069 xfrm_state_find+0x7e9 > c014a626 crash_kexec+0x66 > c0327fa9 sha_transform+0x19 > c0106e8f die+0x34f > c011a4eb do_page_fault+0x68b > c02ba72e release_sock+0x9e > c02d3a07 __qdisc_run+0xd7 > c0119e60 vmalloc_sync_all+0x4b0 > c01060d3 error_code+0x2b > c0327fa9 sha_transform+0x19 > c011c6a9 __kmap_atomic+0x189 > c011c78f kmap_atomic+0x1f > c01db7e5 update+0xa5 > c0178da7 __kmalloc+0xc7 > c02b9fe2 sock_common_setsockopt+0x22 > c0130fd1 del_timer_sync+0x11 > c0103005 kernel_thread_helper+0x5 > > > > If you need any further information, let me know. I am very > interesting in getting this fixed. > > Best regards, > Alexander Thieme -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed