[DRBD-user] Page allocation errors and kernel panics with drbd 8.3.3rc1 and infiniband

Jason McKay jmckay at logicworks.net
Tue Sep 22 19:31:26 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello all,

We're experiencing page allocation errors when writing to a connected drbd device (connected via infiniband using IPoIB) that resulted in a kernel panic last night.

When rsyncing data to the primary node, we get a slew of these errors in /var/log/messages:

Sep 21 22:11:21 client-nfs5 kernel: drbd0_worker: page allocation failure. order:5, mode:0x10
Sep 21 22:11:21 client-nfs5 kernel: Pid: 6114, comm: drbd0_worker Tainted: P          2.6.27.25 #2
Sep 21 22:11:21 client-nfs5 kernel:
Sep 21 22:11:21 client-nfs5 kernel: Call Trace:
Sep 21 22:11:21 client-nfs5 kernel:  [<ffffffff8026eec4>] __alloc_pages_internal+0x3a4/0x3c0
Sep 21 22:11:21 client-nfs5 kernel:  [<ffffffff8028c61a>] kmem_getpages+0x6b/0x12b
Sep 21 22:11:21 client-nfs5 kernel:  [<ffffffff8028cb93>] fallback_alloc+0x11d/0x1b1
Sep 21 22:11:21 client-nfs5 kernel:  [<ffffffff8028c816>] kmem_cache_alloc_node+0xa3/0xcf
Sep 21 22:11:21 client-nfs5 kernel:  [<ffffffff803ecf65>] __alloc_skb+0x64/0x12e
Sep 21 22:11:21 client-nfs5 kernel:  [<ffffffff8041af6f>] sk_stream_alloc_skb+0x2f/0xd5
Sep 21 22:11:21 client-nfs5 kernel:  [<ffffffff8041beb0>] tcp_sendmsg+0x180/0x9d1
Sep 21 22:11:21 client-nfs5 kernel:  [<ffffffff803e6d85>] sock_sendmsg+0xe2/0xff
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffff80246d06>] autoremove_wake_function+0x0/0x2e
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffff80246d06>] autoremove_wake_function+0x0/0x2e
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffff8027464b>] zone_statistics+0x3a/0x5d
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffff803e82f6>] kernel_sendmsg+0x2c/0x3e
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffffa0568d83>] drbd_send+0xb2/0x194 [drbd]
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffffa0569134>] _drbd_send_cmd+0x9c/0x116 [drbd]
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffffa0569285>] send_bitmap_rle_or_plain+0xd7/0x13a [drbd]
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffffa0569475>] _drbd_send_bitmap+0x18d/0x1ae [drbd]
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffffa056c242>] drbd_send_bitmap+0x39/0x4c [drbd]
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffffa056b9a1>] w_bitmap_io+0x45/0x95 [drbd]
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffffa05560c4>] drbd_worker+0x230/0x3eb [drbd]
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffffa056b0d9>] drbd_thread_setup+0x124/0x1ba [drbd]
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffff8020cd59>] child_rip+0xa/0x11
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffffa056afb5>] drbd_thread_setup+0x0/0x1ba [drbd]
Sep 21 22:11:22 client-nfs5 kernel:  [<ffffffff8020cd4f>] child_rip+0x0/0x11

These were occurring for hours until a kernel panic:

[Mon Sep 21 22:11:33 2009]INFO: task kjournald:7025 blocked for more than 120 seconds.
[Mon Sep 21 22:11:33 2009]"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Sep 21 22:11:33 2009] ffff88062e801d30 0000000000000046 ffff88062e801cf8 ffffffffa056b821
[Mon Sep 21 22:11:33 2009] ffff880637d617c0 ffff88063dd05120 ffff88063e668750 ffff88063dd05478
[Mon Sep 21 22:11:33 2009] 0000000900000002 000000011a942a2d ffffffffffffffff ffffffffffffffff
[Mon Sep 21 22:11:33 2009]Call Trace:
[Mon Sep 21 22:11:33 2009] [<ffffffffa056b821>] drbd_unplug_fn+0x14a/0x1aa [drbd]
[Mon Sep 21 22:11:33 2009] [<ffffffff802b1fed>] sync_buffer+0x0/0x3f
[Mon Sep 21 22:11:33 2009] [<ffffffff804903df>] io_schedule+0x5d/0x9f
[Mon Sep 21 22:11:33 2009] [<ffffffff802b2028>] sync_buffer+0x3b/0x3f
[Mon Sep 21 22:11:33 2009] [<ffffffff80490658>] __wait_on_bit+0x40/0x6f
[Mon Sep 21 22:11:33 2009] [<ffffffff802b1fed>] sync_buffer+0x0/0x3f
[Mon Sep 21 22:11:34 2009] [<ffffffff804906f3>] out_of_line_wait_on_bit+0x6c/0x78
[Mon Sep 21 22:11:34 2009] [<ffffffff80246d34>] wake_bit_function+0x0/0x23
[Mon Sep 21 22:11:34 2009] [<ffffffffa0031250>] journal_commit_transaction+0x7cd/0xc4d [jbd]
[Mon Sep 21 22:11:34 2009] [<ffffffff8023dc9b>] lock_timer_base+0x26/0x4b
[Mon Sep 21 22:11:34 2009] [<ffffffffa0033d32>] kjournald+0xc1/0x1fb [jbd]
[Mon Sep 21 22:11:34 2009] [<ffffffff80246d06>] autoremove_wake_function+0x0/0x2e
[Mon Sep 21 22:11:34 2009] [<ffffffffa0033c71>] kjournald+0x0/0x1fb [jbd]
[Mon Sep 21 22:11:34 2009] [<ffffffff80246bd8>] kthread+0x47/0x73
[Mon Sep 21 22:11:34 2009] [<ffffffff8020cd59>] child_rip+0xa/0x11
[Mon Sep 21 22:11:34 2009] [<ffffffff80246b91>] kthread+0x0/0x73
[Mon Sep 21 22:11:34 2009] [<ffffffff8020cd4f>] child_rip+0x0/0x11
[Mon Sep 21 22:11:34 2009]
[Mon Sep 21 22:11:34 2009]Kernel panic - not syncing: softlockup: blocked tasks
[-- root at localhost.localdomain attached -- Mon Sep 21 22:17:24 2009]
[-- root at localhost.localdomain detached -- Mon Sep 21 23:16:22 2009]
[-- Console down -- Tue Sep 22 04:02:15 2009]



The two systems are hardware and OS identical:

[root at client-nfs5 log]# uname -a
Linux client-nfs5 2.6.27.25 #2 SMP Fri Jun 26 00:07:23 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

[root at client-nfs5 log]# grep model\ name /proc/cpuinfo 
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
model name	: Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz

[root at client-nfs5 log]# free -m
             total       used       free     shared    buffers     cached
Mem:         24110       2050      22060          0        157       1270
-/+ buffers/cache:        622      23488
Swap:         4094          0       4094

[root at client-nfs5 log]# cat /proc/partitions 
major minor  #blocks  name

   8     0 1755310080 sda
   8     1 1755302031 sda1
   8    16 4881116160 sdb
   8    17 4881116126 sdb1
   8    32   62522712 sdc
   8    33   58147236 sdc1
   8    34    4192965 sdc2
 147     0 1755248424 drbd0
 147     1 4880967128 drbd1

[root at client-nfs5 log]# modinfo drbd
filename:       /lib/modules/2.6.27.25/kernel/drivers/block/drbd.ko
alias:          block-major-147-*
license:        GPL
version:        8.3.3rc1
description:    drbd - Distributed Replicated Block Device v8.3.3rc1
author:         Philipp Reisner <phil at linbit.com>, Lars Ellenberg <lars at linbit.com>
srcversion:     A06488A2009E5EB94AF0825
depends:        
vermagic:       2.6.27.25 SMP mod_unload modversions 
parm:           minor_count:Maximum number of drbd devices (1-255) (uint)
parm:           disable_sendpage:bool
parm:           allow_oos:DONT USE! (bool)
parm:           cn_idx:uint
parm:           proc_details:int
parm:           enable_faults:int
parm:           fault_rate:int
parm:           fault_count:int
parm:           fault_devs:int
parm:           usermode_helper:string

[root at client-nfs5 log]# cat /etc/drbd.conf 
global {
  usage-count no;
}

common {
  syncer { rate 800M; }  #This is temporary

  protocol C;

  handlers {
  }

  startup {    degr-wfc-timeout        60;
    wfc-timeout             60;
  }

  disk {
    on-io-error     detach;
    fencing         dont-care;
    no-disk-flushes;
    no-md-flushes;
    no-disk-barrier;
  }

  net {
    ko-count        2;
    after-sb-2pri   disconnect;
    max-buffers 16000;
    max-epoch-size 16000;
    unplug-watermark 16001;
    sndbuf-size 8m;
    rcvbuf-size 8m;

  }
}

resource drbd0 {
  on client-nfs5 {
    device  /dev/drbd0;
    disk    /dev/sda1;
    address  192.168.16.48:7789;
    flexible-meta-disk internal;
  }

  on client-nfs6 {
    device  /dev/drbd0;
    disk    /dev/sda1;
    address  192.168.16.49:7789;
    flexible-meta-disk internal;
  }
}

resource drbd1 {
  on client-nfs5 {
    device  /dev/drbd1;
    disk    /dev/sdb1;
    address  192.168.16.48:7790;
    flexible-meta-disk internal;
  }

  on client-nfs6 {
    device  /dev/drbd1;
    disk    /dev/sdb1;
    address  192.168.16.49:7790;
    flexible-meta-disk internal;
  }
}

[root at client-nfs5 log]# ibstat
CA 'mlx4_0'
	CA type: MT26428
	Number of ports: 2
	Firmware version: 2.6.100
	Hardware version: a0
	Node GUID: 0x0002c9030004a79a
	System image GUID: 0x0002c9030004a79d
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40
		Base lid: 1
		LMC: 0
		SM lid: 1
		Capability mask: 0x0251086a
		Port GUID: 0x0002c9030004a79b
	Port 2:
		State: Active
		Physical state: LinkUp
		Rate: 40
		Base lid: 1
		LMC: 0
		SM lid: 1
		Capability mask: 0x0251086a
		Port GUID: 0x0002c9030004a79c

[root at client-nfs5 log]# cat /etc/sysctl.conf 
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and
# sysctl.conf(5) for more details.

# Controls IP packet forwarding
net.ipv4.ip_forward = 0

# Controls source route verification
net.ipv4.conf.default.rp_filter = 1

# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename
# Useful for debugging multi-threaded applications
kernel.core_uses_pid = 1

# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1

# Controls the maximum size of a message, in bytes
kernel.msgmnb = 65536

# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 65536

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296
net.ipv4.tcp_timestamps=0
net.ipv4.tcp_sack=0
net.core.netdev_max_backlog=250000
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.rmem_default=16777216
net.core.wmem_default=16777216
net.core.optmem_max=16777216
net.ipv4.tcp_mem=16777216 16777216 16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216



Since we noticed these errors, we've been searching for them on our logservers and found another case where this is occurirng using the same drbd version and infiniband.  This other case, however, uses a stock redhat kernel.

Has anyone else seen these errors?   Do the tuning recommendations for send/receive buffers need to be revisited? Would switching to rdp vs IPoIB be a potential fix here?  Thanks in advance for any insights.

Jason

-- 
Jason McKay
Senior Engineer
Logicworks


More information about the drbd-user mailing list