[DRBD-user] kernel panic with rhel 5.4, drbd 8.3.2 (and 4k hard sector size on lower level device)

Fri Feb 26 13:31:08 CET 2010

On Fri, Feb 26, 2010 at 11:21:13AM +0000, atp wrote:
> hi,
> 
>  I get a kernel panic when attempting to attach a resource to a 500GB
> iscsi volume. I don't get it when I try a local 500GB lvm volume. 
>  
>  I can dd or write data through a file system without an issue to the 
> iscsi volume. Any ideas as to what to do next? Is this a drbd bug?
> 
>  I'm attempting to set up a secondary. The primary is working fine. 
> We have several other pairs of systems configured pretty much
> identically, but this is the first that is running on iscsi. 
> 
>  Googling for this does not yield much. Also, if someone could point 
> me at the bug tracker, that would be great. 
> 
>   Thanks for any assistance. 
> 
> [root at drvms03 ~]# drbdadm create-md vm1
> Writing meta data...
> initializing activity log
> NOT initialized bitmap
> New drbd meta data block successfully created.
> 
> [root at drvms03 ~]# drbdadm attach vm1
> Unable to handle kernel NULL pointer dereference at 0000000000000000
> RIP: 
>  [<ffffffff8002de3c>] blk_recount_segments+0x74/0x36f
> PGD 223e04067 PUD 222e38067 PMD 0 
> Oops: 0000 [1] SMP 
> last sysfs file: /module/drbd/parameters/cn_idx
> CPU 2 
> Modules linked in: mptctl mptbase hidp l2cap bluetooth sg lockd sunrpc
> bridge bonding ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr
> iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i cxgb3 8021q
> libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi
> cpufreq_ondemand powernow_k8 freq_table drbd(U) dm_multipath scsi_dh
> video hwmon backlight sbs i2c_ec button battery asus_acpi
> acpi_memhotplug ac parport_pc lp parport ksm(U) kvm_amd(U) kvm(U) shpchp
> i2c_piix4 i2c_core tg3 e1000e pcspkr serio_raw dm_raid45 dm_message
> dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod
> sata_svw libata cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd
> ehci_hcd
> Pid: 347, comm: cqueue/2 Tainted: G      2.6.18-164.11.1.el5 #1
> RIP: 0010:[<ffffffff8002de3c>]  [<ffffffff8002de3c>]
> blk_recount_segments+0x74/0x36f
> RSP: 0000:ffff810427acda90  EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff8102273d9340 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: ffff8102273d9340 RDI: ffff8102273f26f0
> RBP: ffff810427a44e60 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000050 R12: 000000005732bdb0
> R13: 0000000000000000 R14: 000000005732bdb0 R15: 0000000000000000
> FS:  00002b23e6f4a230(0000) GS:ffff8101079591c0(0000)
> knlGS:00000000f7f6f6c0
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 000000021daaa000 CR4: 00000000000006e0
> Process cqueue/2 (pid: 347, threadinfo ffff810427acc000, task
> ffff810227968860)
> Stack:  ffff8102273f26f0 000000000000bdb0 0000000000000000
> ffff810200000000
>  0000000000000001 ffffffff800289b5 694420286b736964 ffff8102273d9340
>  ffff8102273d9340 000000005732bdb0 0000000000000008 000000005732bdb0
> Call Trace:
>  [<ffffffff800289b5>] get_request_wait+0x21/0x11f
>  [<ffffffff8004254a>] bio_phys_segments+0xf/0x15
>  [<ffffffff800258d4>] init_request_from_bio+0xc8/0x198
>  [<ffffffff8000c011>] __make_request+0x34f/0x401
>  [<ffffffff8001c028>] generic_make_request+0x211/0x228
>  [<ffffffff80033437>] submit_bio+0xe4/0xeb
>  [<ffffffff8835dc02>] :drbd:_drbd_md_sync_page_io+0x156/0x171
>  [<ffffffff8835e1e7>] :drbd:drbd_md_sync_page_io+0x3c6/0x4e6
>  [<ffffffff883675c2>] :drbd:drbd_md_read+0xb2/0x256
>  [<ffffffff8836d189>] :drbd:drbd_nl_disk_conf+0x6a6/0xc69
>  [<ffffffff800da72c>] __cache_alloc_node+0x9d/0xd2
>  [<ffffffff8836be5e>] :drbd:drbd_connector_callback+0xe9/0x1a8
>  [<ffffffff801b9722>] cn_queue_wrapper+0x0/0x23
>  [<ffffffff801b972d>] cn_queue_wrapper+0xb/0x23
>  [<ffffffff8004d8ed>] run_workqueue+0x94/0xe4
>  [<ffffffff8004a12f>] worker_thread+0x0/0x122
>  [<ffffffff8009fe9f>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff8004a21f>] worker_thread+0xf0/0x122
>  [<ffffffff8008c86c>] default_wake_function+0x0/0xe
>  [<ffffffff8009fe9f>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80032950>] kthread+0xfe/0x132
>  [<ffffffff8005dfb1>] child_rip+0xa/0x11
>  [<ffffffff8009fe9f>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80032852>] kthread+0x0/0x132
>  [<ffffffff8005dfa7>] child_rip+0x0/0x11
> 
> 
> Code: 4d 8b 1a 49 c1 eb 33 4c 89 d8 48 c1 e8 09 48 8b 3c c5 80 8b 
> RIP  [<ffffffff8002de3c>] blk_recount_segments+0x74/0x36f
>  RSP <ffff810427acda90>
> CR2: 0000000000000000
>  <0>Kernel panic - not syncing: Fatal exception
> 
>  Details of the disk/system.
> 
> uname -a
> Linux drvms03.dr.tradefair 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:04
> EST 2010 x86_64 x86_64 x86_64 GNU/Linux
> 
> [root at drvms03 ~]# rpm -qa | grep drbd
> drbd83-8.3.2-6
> kmod-drbd83-8.3.2-6.el5_3
> 
> [root at drvms03 ~]# fdisk /dev/sda   
> Note: sector size is 4096 (not 512)
                       ^^^^

This was the important bit of information.

We still use a meta data layout taylored to 512 byte hard sectors,
and use a workaround with a "bounce page" for larger sectors.
This workaround is not tested too often, and during some restructuring
of the code back then, we try to use our bounce page before we allocate
it :-(

We are currently reworking our meta data layout,
but for now, please use drbd 8.3.7 this patch:

diff --git a/drbd/drbd_nl.c b/drbd/drbd_nl.c
index 9aa6eba..1b2bffc 100644
--- a/drbd/drbd_nl.c
+++ b/drbd/drbd_nl.c
@@ -957,6 +957,25 @@ STATIC int drbd_nl_disk_conf(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp
 
 	drbd_md_set_sector_offsets(mdev, nbc);
 
+	/* allocate a second IO page if logical_block_size != 512 */
+	logical_block_size = bdev_logical_block_size(nbc->md_bdev);
+	if (logical_block_size == 0)
+		logical_block_size = MD_SECTOR_SIZE;
+
+	if (logical_block_size != MD_SECTOR_SIZE) {
+		if (!mdev->md_io_tmpp) {
+			struct page *page = alloc_page(GFP_NOIO);
+			if (!page)
+				goto force_diskless_dec;
+
+			dev_warn(DEV, "Meta data's bdev logical_block_size = %d != %d\n",
+			     logical_block_size, MD_SECTOR_SIZE);
+			dev_warn(DEV, "Workaround engaged (has performance impact).\n");
+
+			mdev->md_io_tmpp = page;
+		}
+	}
+
 	if (!mdev->bitmap) {
 		if (drbd_bm_init(mdev)) {
 			retcode = ERR_NOMEM;
@@ -996,25 +1015,6 @@ STATIC int drbd_nl_disk_conf(struct drbd_conf *mdev, struct drbd_nl_cfg_req *nlp
 		goto force_diskless_dec;
 	}
 
-	/* allocate a second IO page if logical_block_size != 512 */
-	logical_block_size = bdev_logical_block_size(nbc->md_bdev);
-	if (logical_block_size == 0)
-		logical_block_size = MD_SECTOR_SIZE;
-
-	if (logical_block_size != MD_SECTOR_SIZE) {
-		if (!mdev->md_io_tmpp) {
-			struct page *page = alloc_page(GFP_NOIO);
-			if (!page)
-				goto force_diskless_dec;
-
-			dev_warn(DEV, "Meta data's bdev logical_block_size = %d != %d\n",
-			     logical_block_size, MD_SECTOR_SIZE);
-			dev_warn(DEV, "Workaround engaged (has performance impact).\n");
-
-			mdev->md_io_tmpp = page;
-		}
-	}
-
 	/* Reset the "barriers don't work" bits here, then force meta data to
 	 * be written, to ensure we determine if barriers are supported. */
 	if (nbc->dc.no_md_flush)


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed