Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi all, We are running a web cluster based on dual primary drbd configuration and ocfs2. During each week-end we run a online verify on the drbd volume by executing "/sbin/drbdadm verify all" on one node. Last w-e, one node (not the one executing the verify command) completely crash and we found it this morning with a nice kernel panic message on the console. Anybody else already observed this behavior? OS: Linux server1.ucl.ac.be 2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:08:30 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux DRBD: # modinfo drbd filename: /lib/modules/2.6.18-194.3.1.el5/weak-updates/drbd83/drbd.ko alias: block-major-147-* license: GPL version: 8.3.2 description: drbd - Distributed Replicated Block Device v8.3.2 author: Philipp Reisner <phil at linbit.com>, Lars Ellenberg <lars at linbit.com> srcversion: EB9EAE1FF5D024E96B05208 depends: vermagic: 2.6.18-128.7.1.el5 SMP mod_unload gcc-4.1 parm: minor_count:Maximum number of drbd devices (1-255) (uint) parm: disable_sendpage:bool parm: allow_oos:DONT USE! (bool) parm: cn_idx:uint parm: proc_details:int parm: enable_faults:int parm: fault_rate:int parm: fault_count:int parm: fault_devs:int parm: usermode_helper:string Log on server1: Oct 10 00:42:01 server1 kernel: block drbd0: conn( Connected -> VerifyS ) Oct 10 00:42:01 server1 kernel: block drbd0: Starting Online Verify from sector 0 Oct 10 00:42:11 server1 kernel: block drbd0: PingAck did not arrive in time. Oct 10 00:42:11 server1 kernel: block drbd0: peer( Primary -> Unknown ) conn( VerifyS -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Oct 10 00:42:11 server1 kernel: block drbd0: Online Verify reached sector 0 Oct 10 00:42:11 server1 kernel: block drbd0: asender terminated Oct 10 00:42:11 server1 kernel: block drbd0: Terminating asender thread Oct 10 00:42:11 server1 kernel: block drbd0: short read expecting header on sock: r=-512 Oct 10 00:42:11 server1 kernel: block drbd0: Creating new current UUID Oct 10 00:42:11 server1 kernel: block drbd0: Connection closed Oct 10 00:42:11 server1 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Oct 10 00:42:11 server1 kernel: block drbd0: receiver terminated Oct 10 00:42:11 server1 kernel: block drbd0: Restarting receiver thread Oct 10 00:42:11 server1 kernel: block drbd0: receiver (re)started Oct 10 00:42:11 server1 kernel: block drbd0: conn( Unconnected -> WFConnection ) Log on server2: Oct 10 00:42:01 server2 kernel: block drbd0: conn( Connected -> VerifyT ) Oct 10 00:42:01 server2 kernel: block drbd0: Online Verify start sector: 0 -- -------------------------------------------------------------------- Fabrice Charlier - UCL/SGSI/SIPR Office : +32.10.47.32.34 GSM : +32.474.86.81.23 -------------------------------------------------------------------