[DRBD-user] DRBD crash with bad network

Maxence DUNNEWIND maxence at dunnewind.net
Tue Mar 30 10:34:06 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi, 

I have a cluster of 10 servers with many drbd devices. The drbd version is
8.3.7, module loaded with :
drbd minor_count=128 usermode_helper=/bin/true
(because I use it with ganeti).

I have about 40 drbd devices per node (primary and secondaries). Our provider
has lot of network issues, which sometimes cause drbd to disconnect/reconnect
very often : about 500 NetworkFailure in 1 hour before the last crash :
# grep "Connected -> NetworkFailure" /var/log/messages|grep -c "Mar 30 00"
483

Then the crash log :

Mar 30 00:52:48 z2-6 kernel: [1685605.588315] CPU 2 
Mar 30 00:52:48 z2-6 kernel: [1685605.588337] Modules linked in: hmac
ip6table_filter ip6_tables xt_time xt_connlimit xt_realm iptable_raw xt_comment
xt_rece
nt xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_ECN
ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_na
t_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp
nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp
nf_conntrac
k_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre
nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323
nf_con
ntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG
nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange
xt_he
lper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark
xt_CLASSIFY ipt_LOG xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf
_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables
x_table
Mar 30 00:52:48 z2-6 kernel: s kvm_intel kvm tun bridge stp drbd cn snd_pcm
snd_timer snd soundcore snd_page_alloc i2c_i801 i2c_core pcspkr psmouse button
ev
dev processor serio_raw ext3 jbd mbcache dm_mod raid10 raid456 raid6_pq
async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod sd_mod
crc_t1
0dif ata_generic ide_pci_generic ide_core ata_piix uhci_hcd libata scsi_mod
ehci_hcd e1000e thermal fan thermal_sys
Mar 30 00:52:48 z2-6 kernel: [1685605.589086] Pid: 21781, comm: drbd0_worker
Tainted: G        W  2.6.30-2-amd64 #1 X8STi
Mar 30 00:52:48 z2-6 kernel: [1685605.594280] RIP: 0010:[<ffffffff802bbc80>]
[<ffffffff802bbc80>] cache_alloc_refill+0xf6/0x1f9
Mar 30 00:52:48 z2-6 kernel: [1685605.594332] RSP: 0018:ffff8802e340d880
EFLAGS: 00010046
Mar 30 00:52:48 z2-6 kernel: [1685605.594358] RAX: 000000000000003b RBX:
ffff88033fc05140 RCX: 000000000000001c
Mar 30 00:52:48 z2-6 kernel: [1685605.594401] RDX: ffff88033a076000 RSI:
ffff88004d929000 RDI: ffff88033fc05150
Mar 30 00:52:48 z2-6 kernel: [1685605.594445] RBP: ffff88033e54d400 R08:
ffff88033fc05160 R09: ffff88000001ec00
Mar 30 00:52:48 z2-6 kernel: [1685605.594488] R10: 00000000000011bc R11:
0000000000000002 R12: 0000000000000020
Mar 30 00:52:48 z2-6 kernel: [1685605.594531] R13: ffff88033fc08100 R14:
0000000000041210 R15: 0000000000000000
Mar 30 00:52:48 z2-6 kernel: [1685605.594574] FS:  0000000000000000(0000)
GS:ffff880028139000(0000) knlGS:0000000000000000
Mar 30 00:52:48 z2-6 kernel: [1685605.594619] CS:  0010 DS: 0018 ES: 0018 CR0:
000000008005003b
Mar 30 00:52:48 z2-6 kernel: [1685605.594646] CR2: 00007f305f53ebc0 CR3:
0000000000201000 CR4: 00000000000026e0
Mar 30 00:52:48 z2-6 kernel: [1685605.594689] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Mar 30 00:52:48 z2-6 kernel: [1685605.594732] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Mar 30 00:52:48 z2-6 kernel: [1685605.594775] Process drbd0_worker (pid: 21781,
threadinfo ffff8802e340c000, task ffff88033b612af0)
Mar 30 00:52:48 z2-6 kernel: [1685605.594841]  ffff880000022a48 ffff88033fc08100
0000000000001210 0000000000000000
Mar 30 00:52:48 z2-6 kernel: [1685605.596180]  RSP <ffff8802e340d880>
Mar 30 00:52:48 z2-6 kernel: [1685605.596443] ---[ end trace cf5f84225b823ee0
]-


regards,

Maxence
-- 
Maxence DUNNEWIND
Contact : maxence at dunnewind.net
Site : http://www.dunnewind.net
06 32 39 39 93
GPG : 18AE 61E4 D0B0 1C7C AAC9  E40D 4D39 68DB 0D2E B533
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100330/465713aa/attachment.pgp>


More information about the drbd-user mailing list