Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I have a cluster of 10 servers with many drbd devices. The drbd version is 8.3.7, module loaded with : drbd minor_count=128 usermode_helper=/bin/true (because I use it with ganeti). I have about 40 drbd devices per node (primary and secondaries). Our provider has lot of network issues, which sometimes cause drbd to disconnect/reconnect very often : about 500 NetworkFailure in 1 hour before the last crash : # grep "Connected -> NetworkFailure" /var/log/messages|grep -c "Mar 30 00" 483 Then the crash log : Mar 30 00:52:48 z2-6 kernel: [1685605.588315] CPU 2 Mar 30 00:52:48 z2-6 kernel: [1685605.588337] Modules linked in: hmac ip6table_filter ip6_tables xt_time xt_connlimit xt_realm iptable_raw xt_comment xt_rece nt xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_na t_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrac k_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_con ntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_he lper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf _defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_table Mar 30 00:52:48 z2-6 kernel: s kvm_intel kvm tun bridge stp drbd cn snd_pcm snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core pcspkr psmouse button ev dev processor serio_raw ext3 jbd mbcache dm_mod raid10 raid456 raid6_pq async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod sd_mod crc_t1 0dif ata_generic ide_pci_generic ide_core ata_piix uhci_hcd libata scsi_mod ehci_hcd e1000e thermal fan thermal_sys Mar 30 00:52:48 z2-6 kernel: [1685605.589086] Pid: 21781, comm: drbd0_worker Tainted: G W 2.6.30-2-amd64 #1 X8STi Mar 30 00:52:48 z2-6 kernel: [1685605.594280] RIP: 0010:[<ffffffff802bbc80>] [<ffffffff802bbc80>] cache_alloc_refill+0xf6/0x1f9 Mar 30 00:52:48 z2-6 kernel: [1685605.594332] RSP: 0018:ffff8802e340d880 EFLAGS: 00010046 Mar 30 00:52:48 z2-6 kernel: [1685605.594358] RAX: 000000000000003b RBX: ffff88033fc05140 RCX: 000000000000001c Mar 30 00:52:48 z2-6 kernel: [1685605.594401] RDX: ffff88033a076000 RSI: ffff88004d929000 RDI: ffff88033fc05150 Mar 30 00:52:48 z2-6 kernel: [1685605.594445] RBP: ffff88033e54d400 R08: ffff88033fc05160 R09: ffff88000001ec00 Mar 30 00:52:48 z2-6 kernel: [1685605.594488] R10: 00000000000011bc R11: 0000000000000002 R12: 0000000000000020 Mar 30 00:52:48 z2-6 kernel: [1685605.594531] R13: ffff88033fc08100 R14: 0000000000041210 R15: 0000000000000000 Mar 30 00:52:48 z2-6 kernel: [1685605.594574] FS: 0000000000000000(0000) GS:ffff880028139000(0000) knlGS:0000000000000000 Mar 30 00:52:48 z2-6 kernel: [1685605.594619] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Mar 30 00:52:48 z2-6 kernel: [1685605.594646] CR2: 00007f305f53ebc0 CR3: 0000000000201000 CR4: 00000000000026e0 Mar 30 00:52:48 z2-6 kernel: [1685605.594689] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 30 00:52:48 z2-6 kernel: [1685605.594732] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 30 00:52:48 z2-6 kernel: [1685605.594775] Process drbd0_worker (pid: 21781, threadinfo ffff8802e340c000, task ffff88033b612af0) Mar 30 00:52:48 z2-6 kernel: [1685605.594841] ffff880000022a48 ffff88033fc08100 0000000000001210 0000000000000000 Mar 30 00:52:48 z2-6 kernel: [1685605.596180] RSP <ffff8802e340d880> Mar 30 00:52:48 z2-6 kernel: [1685605.596443] ---[ end trace cf5f84225b823ee0 ]- regards, Maxence -- Maxence DUNNEWIND Contact : maxence at dunnewind.net Site : http://www.dunnewind.net 06 32 39 39 93 GPG : 18AE 61E4 D0B0 1C7C AAC9 E40D 4D39 68DB 0D2E B533 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: Digital signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100330/465713aa/attachment.pgp>