[DRBD-user] Online Verify and Kernel Panic

Lars Ellenberg lars.ellenberg at linbit.com
Tue Oct 12 22:02:56 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Tue, Oct 12, 2010 at 03:19:25PM +0200, Roland Friedwagner wrote:
> Hello,
> 
> Am Montag 11 Oktober 2010 schrieb Fabrice Charlier:
> > Hi all,
> >
> > We are running a web cluster based on dual primary drbd configuration
> > and ocfs2. During each week-end we run a online verify on the drbd
> > volume by executing "/sbin/drbdadm verify all" on one node. Last w-e,
> > one node (not the one executing the verify command) completely crash
> > and we found it this morning with a nice kernel panic message on the
> > console.
> >
> > Anybody else already observed this behavior?
> >
> 
> Yes, we (and Michael) did at Sep  2 00:18:01.
> 
> The DRBD-User thread concerning this is 
> "8.3.8 Online Verify Oops on kernel 2.6.34"
> 
> 
> DRBD Version: 8.3.8.1
> HW: HP DL380G6 (1 x Xeon X5570)
> OS: RHEL 5.5 x86_64
> Kernel: 2.6.18-194.11.3.el5 #1 SMP Mon Aug 23 15:51:38 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
> 
> It was nearly the same address (:drbd:w_e_end_ov_req+0x29/0x136) here
> and michael had w_e_end_ov_req+0x36/0x154.
> 
>  $ gdb drbd.ko -ex 'l *(w_e_end_ov_req+0x29)' -ex q

> 0x5fbf is in w_e_end_ov_req (include/linux/crypto.h:286).
> 281             return module_name(tfm->__crt_alg->cra_module);
> 282     }
> 283
> 284     static inline u32 crypto_tfm_alg_type(struct crypto_tfm *tfm)
> 285     {
> 286             return tfm->__crt_alg->cra_flags & CRYPTO_ALG_TYPE_MASK;

which would mean that some of those pointers are invalid.  and that's
hard to believe, given that they are used and dereferenced all the time.

> 287     }
> 288
> 289     static inline unsigned int crypto_tfm_alg_min_keysize(struct crypto_tfm *tfm)
> 290     {
> 
> We do an online verify each night.
> Does not reproduce since.

As long as it does not reproduce, we cannot really fix it.
Give us a reproducer, and we'll fix it.

> Slightly changed config now.
> Switched csums-alg and verify-alg from md5 to sha1
> (But the reason was concerning lower hash collisions probability by nearly same speed) 

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list