[DRBD-user] Server reboot on DRBD heavy load problem

Proskurin Kirill proskurin-kv at fxclub.org
Tue Sep 28 09:19:10 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello.

I really close to despair.
Could someone point me what`s wrong? Kernel? DRBD? OCFS2? Hardware?

On 24/09/10 12:05, Proskurin Kirill wrote:
> On 24/09/10 11:56, Michael wrote:
>> try different kernel
>
> I try:
> 2.6.26 stable
> 2.6.32-bpo5 from backports
> 2.6.32 testing
>
> Same thing.
> I try OCFS2 1.4.1 & 1.4.4-3 - same thing.
>
> So I think - it is a DRBD.
>


And I try 2.6.35-6 and DRBD-8.3.8
And get this:

[55921.451479] ------------[ cut here ]------------
[55921.451506] kernel BUG at mm/slub.c:2834!
[55921.451530] invalid opcode: 0000 [#1] SMP
[55921.451557] last sysfs file: /sys/module/drbd/parameters/cn_idx
[55921.451584] CPU 1
[55921.451589] Modules linked in: ocfs2 jbd2 quota_tree drbd 
xt_multiport sha1_generic hmac lru_cache cn xt_tcpudp nf_conntrack_ipv4 
nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables ocf
s2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue 
configfs ext2 loop snd_pcm i5000_edac edac_core i5k_amb snd_timer 
processor snd evdev button rng_core shpchp soundcore snd_page_alloc tpm
_tis pci_hotplug psmouse dcdbas tpm pcspkr tpm_bios serio_raw ext3 jbd 
mbcache ide_cd_mod uhci_hcd cdrom ata_generic ata_piix libata ses sd_mod 
enclosure crc_t10dif ehci_hcd megaraid_sas piix ide_core usbcor
e scsi_mod nls_base bnx2 thermal thermal_sys [last unloaded: drbd]
[55921.451964]
[55921.451984] Pid: 2995, comm: udevd Not tainted 2.6.35.6 #1 
0NH278/PowerEdge 2950
[55921.452027] RIP: 0010:[<ffffffff810df05d>]  [<ffffffff810df05d>] 
kfree+0x5b/0xc8
[55921.452076] RSP: 0018:ffff88012aa61d58  EFLAGS: 00010246
[55921.452102] RAX: 0200000000000400 RBX: ffff880100000001 RCX: 
0000000000000002
[55921.452131] RDX: ffffea0000000000 RSI: ffffea0003800000 RDI: 
ffff880100000001
[55921.452160] RBP: ffff8800375d8f00 R08: 0000000000000000 R09: 
0000000000000000
[55921.452189] R10: ffff88012bce1070 R11: ffff8800375d8f00 R12: 
ffffffff810f061e
[55921.452219] R13: 0000000018000040 R14: ffff88012c375cf0 R15: 
ffff88012bce1070
[55921.452248] FS:  00007f7646a967a0(0000) GS:ffff880001a40000(0000) 
knlGS:0000000000000000
[55921.452293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[55921.452319] CR2: 00007f7646a9c000 CR3: 000000012d245000 CR4: 
00000000000006e0
[55921.452349] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[55921.452378] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[55921.452407] Process udevd (pid: 2995, threadinfo ffff88012aa60000, 
task ffff880121f4d890)
[55921.452451] Stack:
[55921.452471]  0000000000000000 ffff8800375d8f00 ffff88012bce1070 
ffffffff810f061e
[55921.452505] <0> ffff880108000080 000000002bce1070 ffff88012c3759d0 
ffff880100000001
[55921.452556] <0> 0000029d0000029d ffff8800375d8fa0 ffff88012f8a4900 
ffff8800375d8f00
[55921.452623] Call Trace:
[55921.452647]  [<ffffffff810f061e>] ? vfs_rename+0x3d3/0x3e4
[55921.452674]  [<ffffffff810f1c78>] ? sys_renameat+0x1aa/0x22b
[55921.452702]  [<ffffffff810d13ab>] ? free_pages_and_swap_cache+0x53/0x6e
[55921.452732]  [<ffffffff810c83fb>] ? tlb_finish_mmu+0x2a/0x33
[55921.452759]  [<ffffffff810c8470>] ? remove_vma+0x6c/0x74
[55921.452786]  [<ffffffff810c95d8>] ? do_munmap+0x307/0x329
[55921.452814]  [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b
[55921.452841] Code: c5 10 48 83 7d 00 00 eb e6 48 83 fb 10 0f 86 80 00 
00 00 48 89 df e8 a9 f0 ff ff 48 89 c6 48 8b 00 84 c0 78 16 66 a9 00 c0 
75 04 <0f> 0b eb fe 5b 5d 41 5c 48 89 f7 e9 7d 75 fd ff 48 8b 4
c 24 18
[55921.453030] RIP  [<ffffffff810df05d>] kfree+0x5b/0xc8
[55921.453057]  RSP <ffff88012aa61d58>
[55921.453437] ---[ end trace 3f96fca7c9cbfb03 ]---
[55921.454368] JBD: Ignoring recovery information on journal
[55921.461099] general protection fault: 0000 [#2] SMP
[55921.461269] last sysfs file: /sys/module/drbd/parameters/cn_idx
[55921.461338] CPU 1
[55921.461385] Modules linked in: ocfs2 jbd2 quota_tree drbd 
xt_multiport sha1_generic hmac lru_cache cn xt_tcpudp nf_conntrack_ipv4 
nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 
ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue 
configfs ext2 loop snd_pcm i5000_edac edac_core i5k_amb snd_timer 
processor snd evdev button rng_core shpchp soundcore snd_page_alloc 
tpm_tis pci_hotplug psmouse dcdbas tpm pcspkr tpm_bios serio_raw ext3 
jbd mbcache ide_cd_mod uhci_hcd cdrom ata_generic ata_piix libata ses 
sd_mod enclosure crc_t10dif ehci_hcd megaraid_sas piix ide_core usbcore 
scsi_mod nls_base bnx2 thermal thermal_sys [last unloaded: drbd]
[55921.464840]
[55921.464902] Pid: 9281, comm: mount.ocfs2 Tainted: G      D 
2.6.35.6 #1 0NH278/PowerEdge 2950
[55921.464990] RIP: 0010:[<ffffffff810dffaa>]  [<ffffffff810dffaa>] 
__kmalloc+0xd3/0x136
[55921.465065] RSP: 0018:ffff880103e21ba8  EFLAGS: 00010006
[55921.465065] RAX: 0000000000000000 RBX: 0800000000000000 RCX: 
ffffffffa0449421
[55921.465065] RDX: 0000000000000000 RSI: ffff88012cfaf000 RDI: 
0000000000000004
[55921.465065] RBP: ffffffff81625520 R08: ffff880001a524d0 R09: 
0000000000000000
[55921.465065] R10: ffff88012cfaf260 R11: ffff88012ca24420 R12: 
000000000000000a
[55921.465065] R13: 00000000000080d0 R14: 00000000000080d0 R15: 
0000000000000246
[55921.465065] FS:  00007fee60afe720(0000) GS:ffff880001a40000(0000) 
knlGS:0000000000000000
[55921.465065] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[55921.465065] CR2: 00007f764630ab8c CR3: 000000012eae3000 CR4: 
00000000000006e0
[55921.465065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[55921.465065] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[55921.465065] Process mount.ocfs2 (pid: 9281, threadinfo 
ffff880103e20000, task ffff88012ca24420)
[55921.465065] Stack:
[55921.465065]  0000000000000000 ffffffffa0449421 ffff88012cfaf108 
ffff88012cfaf000
[55921.465065] <0> ffff88012cfaf000 ffff88012cfaf000 ffff88012aa2e000 
ffff88012ca24420
[55921.465065] <0> 0000000000000200 ffffffffa0449421 0000000000000000 
ffffffffa044ccec
[55921.465065] Call Trace:
[55921.465065]  [<ffffffffa0449421>] ? 
ocfs2_compute_replay_slots+0x31/0x10f [ocfs2]
[55921.465065]  [<ffffffffa0449421>] ? 
ocfs2_compute_replay_slots+0x31/0x10f [ocfs2]
[55921.465065]  [<ffffffffa044ccec>] ? ocfs2_journal_load+0x1d0/0x2b1 
[ocfs2]
[55921.465065]  [<ffffffffa0473525>] ? ocfs2_fill_super+0x19a2/0x2101 
[ocfs2]
[55921.465065]  [<ffffffff8118aa8f>] ? snprintf+0x36/0x3b
[55921.465065]  [<ffffffff810e9f9e>] ? get_sb_bdev+0x137/0x19a
[55921.465065]  [<ffffffffa0471b83>] ? ocfs2_fill_super+0x0/0x2101 [ocfs2]
[55921.465065]  [<ffffffff810e9675>] ? vfs_kern_mount+0xa6/0x196
[55921.465065]  [<ffffffff810e97c4>] ? do_kern_mount+0x49/0xe7
[55921.465065]  [<ffffffff810fdabb>] ? do_mount+0x75c/0x7d6
[55921.465065]  [<ffffffff810d829a>] ? alloc_pages_current+0x9f/0xc2
[55921.465065]  [<ffffffff810fdbbd>] ? sys_mount+0x88/0xc3
[55921.465065]  [<ffffffff810089c2>] ? system_call_fastpath+0x16/0x1b
[55921.465065] Code: 0f 1f 44 00 00 49 89 c7 fa 66 0f 1f 44 00 00 65 4c 
8b 04 25 b0 ea 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 0d 48 63 
45 18 <48> 8b 04 03 49 89 00 eb 11 83 ca ff 44 89 f6 48 89 ef e8 a1 f1
[55921.465065] RIP  [<ffffffff810dffaa>] __kmalloc+0xd3/0x136
[55921.465065]  RSP <ffff880103e21ba8>
[55921.465065] ---[ end trace 3f96fca7c9cbfb04 ]---
[55941.839304] o2net: accepted connection from node mail02.fxclub.org 
(num 1) at 192.168.1.2:7777
[55946.003594] o2dlm: Node 1 joins domain E4B99C68B65449068DC403326917DC29
[55946.003673] o2dlm: Nodes in domain E4B99C68B65449068DC403326917DC29: 0 1


After few min:

Message from syslogd at mail01 at Sep 28 07:27:03 ...
  kernel:[57519.645448] general protection fault: 0000 [#3] SMP

Message from syslogd at mail01 at Sep 28 07:27:03 ...
  kernel:[57519.645615] last sysfs file: /sys/module/drbd/parameters/cn_idx

Message from syslogd at mail01 at Sep 28 07:27:03 ...
  kernel:[57519.649409] Stack:

Message from syslogd at mail01 at Sep 28 07:27:03 ...
  kernel:[57519.649409] Call Trace:

Message from syslogd at mail01 at Sep 28 07:27:03 ...
  kernel:[57519.649409] Code: 0f 1f 44 00 00 49 89 c7 fa 66 0f 1f 44 00 
00 65 4c 8b 04 25 b0 ea 00 00 48 8b 45 00 49 01 c0 49 8b 18 48 85 db 74 
0d 48 63 45 18 <48> 8b 04 03 49 89 00 eb 11 83 ca ff 44 89 f6 48 89 ef 
e8 a1 f1

And halt.

-- 
Best regards,
Proskurin Kirill



More information about the drbd-user mailing list