Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 02/29/2012 07:33 AM, Dirk wrote: > Hi folks, > > I have setup a 2 node cluster on CentOS 6 using DRBD (dual primary) and > RHCS with CLVM and GFS2. So far, RHCS is just set up with node > definition to make GFS2 usable. Please ensure fencing is configured as part of that minimal setup. Without it, the cluster will hang and you will start seeing 120s timeout kernel dumps caused by gfs2/clvmd/rgmanager no longer getting locks ("a hung cluster is better than a corrupt cluster"). > I have compiled DRBD 8.4.1 myself against the Xen 4.1.2 kernel from > Steve Haigh I am using. I've not used Xen since EL5, but I know my testing under EL6 was ... difficult. Similarly, I've not tested 8.4 on EL6 and would strongly advice using 8.3.12 in any production environment. The 8.4 release looks promising but is, I would argue, far too young to production use. To help narrow down your problem, can you first remove Xen, and see if the panic's remain? If so, can you put Xen back and try DRBD 8.3.12 instead? See then if the panic's go away? > I have followed the current DRBD documentation concerning usage in a > Redhat Cluster and CLVM/GFS2 to the bit, but every time I start the > Cluster and mount the DRBD based GFS2 volume, after short time (approx. > 1 minute) the kernel of one node oopses like that: > >> Message from syslogd at pclus3cent6-01 at Feb 29 12:39:35 ... >> kernel:Oops: 0000 [#1] SMP >> >> Message from syslogd at pclus3cent6-01 at Feb 29 12:39:35 ... >> kernel:last sysfs file: /sys/kernel/dlm/VirtSpace01/control >> >> Message from syslogd at pclus3cent6-01 at Feb 29 12:39:35 ... >> kernel:Stack: >> >> Message from syslogd at pclus3cent6-01 at Feb 29 12:39:35 ... >> kernel:Call Trace: >> >> Message from syslogd at pclus3cent6-01 at Feb 29 12:39:35 ... >> kernel:Code: 89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 >> 0f 1f 44 00 00 4c 8b 6f 58 49 89 fc 85 f6 89 f3 49 8b 7d 08 4d 8b 7d >> 00 <48> 8b 47 08 4d 8b 37 48 8b 40 48 75 0e 41 f6 44 24 18 01 ba fb >> >> Message from syslogd at pclus3cent6-01 at Feb 29 12:39:35 ... >> kernel:CR2: 0000000000000008 >> > > and the message log contains > >> Feb 29 12:39:35 pclus3cent6-01 kernel: Modules linked in: gfs2 dlm >> configfs drbd ebtable_nat ebtables tun libcrc32c sunrpc bridge stp llc >> bonding ipt_REJECT ipt_LOG xt_recent ip6t_REJECT nf_conntrack_ipv6 >> xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_gntdev >> xen_evtchn xenfs bnx2 ics932s401 ibmaem ibmpex ipmi_msghandler >> serio_raw pcspkr i2c_i801 iTCO_wdt iTCO_vendor_support i5k_amb hwmon >> i5000_edac edac_core ioatdma dca e1000e ses enclosure sg shpchp ext4 >> mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi >> ata_piix aacraid radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core >> dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4] >> >> Feb 29 12:39:35 pclus3cent6-01 kernel: [<ffffffffa04de865>] >> drbd_make_request+0x325/0x330 [drbd] > > I am sure I have seen a thread concerning this googling through mail > archives, but I do not find it any more, so please bear with me if the > problem has been already solved once. > > Is this a known issue with DRBD 8.4.1 on CentOS 6? > What can I do to troubleshoot this? I do not have any idea on where to > start yet. > > Any hint or help is appreciated. > > Dirk This looks like, in part, a DLM issue (kernel:last sysfs file: /sys/kernel/dlm/VirtSpace01/control). If you don't solve it here, I might suggest asking on the linux-cluster mailing list as well. I do believe a couple of the DLM folks are on that. Please also share your cluster.conf (obfuscating passwords only, please), your drbd global and per-resource configurations and what, if any, lvm.conf changes you made. How your GFS2 partition is mounted is also possibly useful. -- Digimer E-Mail: digimer at alteeve.com Papers and Projects: https://alteeve.com