Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all, We're experiencing page allocation errors when writing to a connected drbd device (connected via infiniband using IPoIB) that resulted in a kernel panic last night. When rsyncing data to the primary node, we get a slew of these errors in /var/log/messages: Sep 21 22:11:21 client-nfs5 kernel: drbd0_worker: page allocation failure. order:5, mode:0x10 Sep 21 22:11:21 client-nfs5 kernel: Pid: 6114, comm: drbd0_worker Tainted: P 2.6.27.25 #2 Sep 21 22:11:21 client-nfs5 kernel: Sep 21 22:11:21 client-nfs5 kernel: Call Trace: Sep 21 22:11:21 client-nfs5 kernel: [<ffffffff8026eec4>] __alloc_pages_internal+0x3a4/0x3c0 Sep 21 22:11:21 client-nfs5 kernel: [<ffffffff8028c61a>] kmem_getpages+0x6b/0x12b Sep 21 22:11:21 client-nfs5 kernel: [<ffffffff8028cb93>] fallback_alloc+0x11d/0x1b1 Sep 21 22:11:21 client-nfs5 kernel: [<ffffffff8028c816>] kmem_cache_alloc_node+0xa3/0xcf Sep 21 22:11:21 client-nfs5 kernel: [<ffffffff803ecf65>] __alloc_skb+0x64/0x12e Sep 21 22:11:21 client-nfs5 kernel: [<ffffffff8041af6f>] sk_stream_alloc_skb+0x2f/0xd5 Sep 21 22:11:21 client-nfs5 kernel: [<ffffffff8041beb0>] tcp_sendmsg+0x180/0x9d1 Sep 21 22:11:21 client-nfs5 kernel: [<ffffffff803e6d85>] sock_sendmsg+0xe2/0xff Sep 21 22:11:22 client-nfs5 kernel: [<ffffffff80246d06>] autoremove_wake_function+0x0/0x2e Sep 21 22:11:22 client-nfs5 kernel: [<ffffffff80246d06>] autoremove_wake_function+0x0/0x2e Sep 21 22:11:22 client-nfs5 kernel: [<ffffffff8027464b>] zone_statistics+0x3a/0x5d Sep 21 22:11:22 client-nfs5 kernel: [<ffffffff803e82f6>] kernel_sendmsg+0x2c/0x3e Sep 21 22:11:22 client-nfs5 kernel: [<ffffffffa0568d83>] drbd_send+0xb2/0x194 [drbd] Sep 21 22:11:22 client-nfs5 kernel: [<ffffffffa0569134>] _drbd_send_cmd+0x9c/0x116 [drbd] Sep 21 22:11:22 client-nfs5 kernel: [<ffffffffa0569285>] send_bitmap_rle_or_plain+0xd7/0x13a [drbd] Sep 21 22:11:22 client-nfs5 kernel: [<ffffffffa0569475>] _drbd_send_bitmap+0x18d/0x1ae [drbd] Sep 21 22:11:22 client-nfs5 kernel: [<ffffffffa056c242>] drbd_send_bitmap+0x39/0x4c [drbd] Sep 21 22:11:22 client-nfs5 kernel: [<ffffffffa056b9a1>] w_bitmap_io+0x45/0x95 [drbd] Sep 21 22:11:22 client-nfs5 kernel: [<ffffffffa05560c4>] drbd_worker+0x230/0x3eb [drbd] Sep 21 22:11:22 client-nfs5 kernel: [<ffffffffa056b0d9>] drbd_thread_setup+0x124/0x1ba [drbd] Sep 21 22:11:22 client-nfs5 kernel: [<ffffffff8020cd59>] child_rip+0xa/0x11 Sep 21 22:11:22 client-nfs5 kernel: [<ffffffffa056afb5>] drbd_thread_setup+0x0/0x1ba [drbd] Sep 21 22:11:22 client-nfs5 kernel: [<ffffffff8020cd4f>] child_rip+0x0/0x11 These were occurring for hours until a kernel panic: [Mon Sep 21 22:11:33 2009]INFO: task kjournald:7025 blocked for more than 120 seconds. [Mon Sep 21 22:11:33 2009]"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Mon Sep 21 22:11:33 2009] ffff88062e801d30 0000000000000046 ffff88062e801cf8 ffffffffa056b821 [Mon Sep 21 22:11:33 2009] ffff880637d617c0 ffff88063dd05120 ffff88063e668750 ffff88063dd05478 [Mon Sep 21 22:11:33 2009] 0000000900000002 000000011a942a2d ffffffffffffffff ffffffffffffffff [Mon Sep 21 22:11:33 2009]Call Trace: [Mon Sep 21 22:11:33 2009] [<ffffffffa056b821>] drbd_unplug_fn+0x14a/0x1aa [drbd] [Mon Sep 21 22:11:33 2009] [<ffffffff802b1fed>] sync_buffer+0x0/0x3f [Mon Sep 21 22:11:33 2009] [<ffffffff804903df>] io_schedule+0x5d/0x9f [Mon Sep 21 22:11:33 2009] [<ffffffff802b2028>] sync_buffer+0x3b/0x3f [Mon Sep 21 22:11:33 2009] [<ffffffff80490658>] __wait_on_bit+0x40/0x6f [Mon Sep 21 22:11:33 2009] [<ffffffff802b1fed>] sync_buffer+0x0/0x3f [Mon Sep 21 22:11:34 2009] [<ffffffff804906f3>] out_of_line_wait_on_bit+0x6c/0x78 [Mon Sep 21 22:11:34 2009] [<ffffffff80246d34>] wake_bit_function+0x0/0x23 [Mon Sep 21 22:11:34 2009] [<ffffffffa0031250>] journal_commit_transaction+0x7cd/0xc4d [jbd] [Mon Sep 21 22:11:34 2009] [<ffffffff8023dc9b>] lock_timer_base+0x26/0x4b [Mon Sep 21 22:11:34 2009] [<ffffffffa0033d32>] kjournald+0xc1/0x1fb [jbd] [Mon Sep 21 22:11:34 2009] [<ffffffff80246d06>] autoremove_wake_function+0x0/0x2e [Mon Sep 21 22:11:34 2009] [<ffffffffa0033c71>] kjournald+0x0/0x1fb [jbd] [Mon Sep 21 22:11:34 2009] [<ffffffff80246bd8>] kthread+0x47/0x73 [Mon Sep 21 22:11:34 2009] [<ffffffff8020cd59>] child_rip+0xa/0x11 [Mon Sep 21 22:11:34 2009] [<ffffffff80246b91>] kthread+0x0/0x73 [Mon Sep 21 22:11:34 2009] [<ffffffff8020cd4f>] child_rip+0x0/0x11 [Mon Sep 21 22:11:34 2009] [Mon Sep 21 22:11:34 2009]Kernel panic - not syncing: softlockup: blocked tasks [-- root at localhost.localdomain attached -- Mon Sep 21 22:17:24 2009] [-- root at localhost.localdomain detached -- Mon Sep 21 23:16:22 2009] [-- Console down -- Tue Sep 22 04:02:15 2009] The two systems are hardware and OS identical: [root at client-nfs5 log]# uname -a Linux client-nfs5 2.6.27.25 #2 SMP Fri Jun 26 00:07:23 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux [root at client-nfs5 log]# grep model\ name /proc/cpuinfo model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz model name : Intel(R) Xeon(R) CPU E5520 @ 2.27GHz [root at client-nfs5 log]# free -m total used free shared buffers cached Mem: 24110 2050 22060 0 157 1270 -/+ buffers/cache: 622 23488 Swap: 4094 0 4094 [root at client-nfs5 log]# cat /proc/partitions major minor #blocks name 8 0 1755310080 sda 8 1 1755302031 sda1 8 16 4881116160 sdb 8 17 4881116126 sdb1 8 32 62522712 sdc 8 33 58147236 sdc1 8 34 4192965 sdc2 147 0 1755248424 drbd0 147 1 4880967128 drbd1 [root at client-nfs5 log]# modinfo drbd filename: /lib/modules/2.6.27.25/kernel/drivers/block/drbd.ko alias: block-major-147-* license: GPL version: 8.3.3rc1 description: drbd - Distributed Replicated Block Device v8.3.3rc1 author: Philipp Reisner <phil at linbit.com>, Lars Ellenberg <lars at linbit.com> srcversion: A06488A2009E5EB94AF0825 depends: vermagic: 2.6.27.25 SMP mod_unload modversions parm: minor_count:Maximum number of drbd devices (1-255) (uint) parm: disable_sendpage:bool parm: allow_oos:DONT USE! (bool) parm: cn_idx:uint parm: proc_details:int parm: enable_faults:int parm: fault_rate:int parm: fault_count:int parm: fault_devs:int parm: usermode_helper:string [root at client-nfs5 log]# cat /etc/drbd.conf global { usage-count no; } common { syncer { rate 800M; } #This is temporary protocol C; handlers { } startup { degr-wfc-timeout 60; wfc-timeout 60; } disk { on-io-error detach; fencing dont-care; no-disk-flushes; no-md-flushes; no-disk-barrier; } net { ko-count 2; after-sb-2pri disconnect; max-buffers 16000; max-epoch-size 16000; unplug-watermark 16001; sndbuf-size 8m; rcvbuf-size 8m; } } resource drbd0 { on client-nfs5 { device /dev/drbd0; disk /dev/sda1; address 192.168.16.48:7789; flexible-meta-disk internal; } on client-nfs6 { device /dev/drbd0; disk /dev/sda1; address 192.168.16.49:7789; flexible-meta-disk internal; } } resource drbd1 { on client-nfs5 { device /dev/drbd1; disk /dev/sdb1; address 192.168.16.48:7790; flexible-meta-disk internal; } on client-nfs6 { device /dev/drbd1; disk /dev/sdb1; address 192.168.16.49:7790; flexible-meta-disk internal; } } [root at client-nfs5 log]# ibstat CA 'mlx4_0' CA type: MT26428 Number of ports: 2 Firmware version: 2.6.100 Hardware version: a0 Node GUID: 0x0002c9030004a79a System image GUID: 0x0002c9030004a79d Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 1 LMC: 0 SM lid: 1 Capability mask: 0x0251086a Port GUID: 0x0002c9030004a79b Port 2: State: Active Physical state: LinkUp Rate: 40 Base lid: 1 LMC: 0 SM lid: 1 Capability mask: 0x0251086a Port GUID: 0x0002c9030004a79c [root at client-nfs5 log]# cat /etc/sysctl.conf # Kernel sysctl configuration file for Red Hat Linux # # For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and # sysctl.conf(5) for more details. # Controls IP packet forwarding net.ipv4.ip_forward = 0 # Controls source route verification net.ipv4.conf.default.rp_filter = 1 # Do not accept source routing net.ipv4.conf.default.accept_source_route = 0 # Controls the System Request debugging functionality of the kernel kernel.sysrq = 0 # Controls whether core dumps will append the PID to the core filename # Useful for debugging multi-threaded applications kernel.core_uses_pid = 1 # Controls the use of TCP syncookies net.ipv4.tcp_syncookies = 1 # Controls the maximum size of a message, in bytes kernel.msgmnb = 65536 # Controls the default maxmimum size of a mesage queue kernel.msgmax = 65536 # Controls the maximum shared segment size, in bytes kernel.shmmax = 68719476736 # Controls the maximum number of shared memory segments, in pages kernel.shmall = 4294967296 net.ipv4.tcp_timestamps=0 net.ipv4.tcp_sack=0 net.core.netdev_max_backlog=250000 net.core.rmem_max=16777216 net.core.wmem_max=16777216 net.core.rmem_default=16777216 net.core.wmem_default=16777216 net.core.optmem_max=16777216 net.ipv4.tcp_mem=16777216 16777216 16777216 net.ipv4.tcp_rmem=4096 87380 16777216 net.ipv4.tcp_wmem=4096 65536 16777216 Since we noticed these errors, we've been searching for them on our logservers and found another case where this is occurirng using the same drbd version and infiniband. This other case, however, uses a stock redhat kernel. Has anyone else seen these errors? Do the tuning recommendations for send/receive buffers need to be revisited? Would switching to rdp vs IPoIB be a potential fix here? Thanks in advance for any insights. Jason -- Jason McKay Senior Engineer Logicworks