Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote: > >ok. >now, to help to find the actual problem, you could revert that again, >but now recompile and install a new kernel with >"kernel-hacking" -> > [*] Kernel debugging > [*] Debug memory allocations > [*] Page alloc debugging >or even enable xfs debugging... >then recompile drbd, of course. >and then trigger it again, maybe the logs show something more >interessting then... > > > > I compiled the kernel like you said, and i get this (the xfs debug is not enabled, i need to see how to do it. I remember it was a option in the kernel but i don't see i it in this kernel, I don't see it in vanilla 2.6.7 neither.) Jul 23 14:13:23 dell1 kernel: Bad page state at free_hot_cold_page (in process ' rm', page c169fe20) Jul 23 14:13:23 dell1 kernel: flags:0x20000080 mapping:00000000 mapcount:0 count :0 Jul 23 14:13:23 dell1 kernel: Backtrace: Jul 23 14:13:23 dell1 kernel: [<c0137f91>] bad_page+0x6d/0x99 Jul 23 14:13:23 dell1 kernel: [<c0138688>] free_hot_cold_page+0x7c/0x120 Jul 23 14:13:23 dell1 kernel: [<c02dfb3e>] skb_release_data+0x9b/0xae Jul 23 14:13:23 dell1 kernel: [<c02dfc9e>] skb_clone+0x1e/0x183 Jul 23 14:13:23 dell1 kernel: [<c02dfb64>] kfree_skbmem+0x13/0x2c Jul 23 14:13:23 dell1 kernel: [<c02dfc08>] __kfree_skb+0x8b/0x103 Jul 23 14:13:23 dell1 kernel: [<c0306358>] tcp_clean_rtx_queue+0x13d/0x3b4 Jul 23 14:13:23 dell1 kernel: [<c0306c26>] tcp_ack+0xec/0x57a Jul 23 14:13:23 dell1 kernel: [<c030906c>] __tcp_data_snd_check+0xdd/0xec Jul 23 14:13:23 dell1 kernel: [<c02dfc08>] __kfree_skb+0x8b/0x103 Jul 23 14:13:24 dell1 kernel: [<c03098ca>] tcp_rcv_established+0x460/0x8e0 Jul 23 14:13:24 dell1 kernel: [<c03129a4>] tcp_v4_do_rcv+0x139/0x13e Jul 23 14:13:24 dell1 kernel: [<c02df08f>] __release_sock+0x3c/0x5c Jul 23 14:13:24 dell1 kernel: [<c02df793>] release_sock+0x77/0x79 Jul 23 14:13:24 dell1 kernel: [<c02ff8cb>] tcp_sendpage+0x94/0x98 Jul 23 14:13:24 dell1 kernel: [<f89511d6>] _drbd_send_page+0x5f/0x100 [drbd] Jul 23 14:13:24 dell1 kernel: [<f8951583>] drbd_send_dblock+0x30c/0x41c [drbd] Jul 23 14:13:24 dell1 kernel: [<c0298efc>] blk_plug_device+0x57/0x84 Jul 23 14:13:24 dell1 kernel: [<f894c1ac>] drbd_make_request_common+0x3db/0x7a2 [drbd] Jul 23 14:13:24 dell1 kernel: [<c01151cc>] __change_page_attr+0x25/0x1a7 Jul 23 14:13:24 dell1 kernel: [<c0134980>] find_lock_page+0x29/0xb7 Jul 23 14:13:24 dell1 kernel: [<f894c637>] drbd_make_request_26+0xc4/0x249 [drb d] Jul 23 14:13:24 dell1 kernel: [<c029a787>] generic_make_request+0x113/0x194 Jul 23 14:13:24 dell1 kernel: [<c01376e5>] mempool_alloc+0x8b/0x150 Jul 23 14:13:24 dell1 kernel: [<c01195a5>] autoremove_wake_function+0x0/0x57 Jul 23 14:13:24 dell1 kernel: [<c029a878>] submit_bio+0x70/0x121 Jul 23 14:13:24 dell1 kernel: [<c0159a04>] __bio_add_page+0x118/0x11d Jul 23 14:13:24 dell1 kernel: [<c0159a3d>] bio_add_page+0x34/0x38 Jul 23 14:13:24 dell1 kernel: [<c022f135>] _pagebuf_ioapply+0x1bd/0x2bb Jul 23 14:13:24 dell1 kernel: [<c022f2d3>] pagebuf_iorequest+0xa0/0x16e Jul 23 14:13:24 dell1 kernel: [<c013eb8f>] __kmalloc+0x1bc/0x259 Jul 23 14:13:24 dell1 kernel: [<c0117ae5>] default_wake_function+0x0/0x12 Jul 23 14:13:24 dell1 kernel: [<c022dbee>] _pagebuf_get_pages+0xef/0x15a Jul 23 14:13:24 dell1 kernel: [<c0117ae5>] default_wake_function+0x0/0x12 Jul 23 14:13:24 dell1 kernel: [<c022e856>] pagebuf_associate_memory+0x6b/0x175 Jul 23 14:13:24 dell1 kernel: [<c020f620>] xlog_bdstrat_cb+0x1f/0x64 Jul 23 14:13:24 dell1 kernel: [<c02100aa>] xlog_sync+0x22b/0x491 Jul 23 14:13:24 dell1 kernel: [<c021085b>] xlog_write+0x3b7/0x4ea Jul 23 14:13:24 dell1 kernel: [<c020f247>] xfs_log_write+0x67/0x99 Jul 23 14:13:24 dell1 kernel: [<c021e61e>] xfs_trans_commit+0x118/0x44a Jul 23 14:13:24 dell1 kernel: [<c022051f>] xfs_trans_log_inode+0x2d/0x52 Jul 23 14:13:24 dell1 kernel: [<c0207eeb>] xfs_ifree+0xbc/0xe9 Jul 23 14:13:24 dell1 kernel: [<c0226535>] xfs_inactive+0x350/0x552 Jul 23 14:13:24 dell1 kernel: [<c01151cc>] __change_page_attr+0x25/0x1a7 Jul 23 14:13:24 dell1 kernel: [<c023688b>] vn_rele+0xb8/0xba Jul 23 14:13:24 dell1 kernel: [<c0235273>] linvfs_clear_inode+0x18/0x30 Jul 23 14:13:24 dell1 kernel: [<c016ca13>] clear_inode+0xb8/0xd1 Jul 23 14:13:24 dell1 kernel: [<c016d742>] generic_delete_inode+0x106/0x12e Jul 23 14:13:24 dell1 kernel: [<c016d90a>] iput+0x62/0x7c Jul 23 14:13:24 dell1 kernel: [<c0163b55>] sys_unlink+0x86/0x138 Jul 23 14:13:24 dell1 kernel: [<c0105c1f>] syscall_call+0x7/0xb Jul 23 14:13:24 dell1 kernel: Jul 23 14:13:24 dell1 kernel: Trying to fix it up, but a reboot is needed >solution approaches: > a. we could disable zero copy networking completely (tcp_sendpage). > b. we could make it configurable. > c. we could simply fall back to tcp_sendmsg for slab pages. > >patch for c. is attached. if it works for Florin (please confirm), >then it will go into svn soonish. > > > With the patch you posted this are the errors I get: Jul 23 18:12:30 dell1 kernel: drbd0: _drbd_send_page: (page_count(page) < 1) in /usr/local/src/drbd-0.7.0/drbd/drbd_main.c:895 Jul 23 18:12:30 dell1 kernel: drbd0: someone wants to send a free page! Jul 23 18:12:30 dell1 kernel: [<f8952381>] _drbd_send_page+0x1ad/0x1ba [drbd] Jul 23 18:12:30 dell1 kernel: [<f895269a>] drbd_send_dblock+0x30c/0x41c [drbd] Jul 23 18:12:30 dell1 kernel: [<c0298efc>] blk_plug_device+0x57/0x84 Jul 23 18:12:30 dell1 kernel: [<f894d1ac>] drbd_make_request_common+0x3db/0x7a2 [drbd] Jul 23 18:12:30 dell1 kernel: [<c01151cc>] __change_page_attr+0x25/0x1a7 Jul 23 18:12:30 dell1 kernel: [<f894d637>] drbd_make_request_26+0xc4/0x249 [drb d] Jul 23 18:12:30 dell1 kernel: [<c029a787>] generic_make_request+0x113/0x194 Jul 23 18:12:30 dell1 kernel: [<c01376e5>] mempool_alloc+0x8b/0x150 Jul 23 18:12:30 dell1 kernel: [<c01195a5>] autoremove_wake_function+0x0/0x57 Jul 23 18:12:30 dell1 kernel: [<c029a878>] submit_bio+0x70/0x121 Jul 23 18:12:30 dell1 kernel: [<c0159a04>] __bio_add_page+0x118/0x11d Jul 23 18:12:30 dell1 kernel: [<c0159a3d>] bio_add_page+0x34/0x38 Jul 23 18:12:30 dell1 kernel: [<c022f135>] _pagebuf_ioapply+0x1bd/0x2bb Jul 23 18:12:30 dell1 kernel: [<c022f2d3>] pagebuf_iorequest+0xa0/0x16e Jul 23 18:12:30 dell1 kernel: [<c013eb8f>] __kmalloc+0x1bc/0x259 Jul 23 18:12:30 dell1 kernel: [<c0117ae5>] default_wake_function+0x0/0x12 Jul 23 18:12:30 dell1 kernel: [<c022dbee>] _pagebuf_get_pages+0xef/0x15a Jul 23 18:12:30 dell1 kernel: [<c0117ae5>] default_wake_function+0x0/0x12 Jul 23 18:12:30 dell1 kernel: [<c022e856>] pagebuf_associate_memory+0x6b/0x175 Jul 23 18:12:30 dell1 kernel: [<c020f620>] xlog_bdstrat_cb+0x1f/0x64 Jul 23 18:12:31 dell1 kernel: [<c02100aa>] xlog_sync+0x22b/0x491 Jul 23 18:12:31 dell1 kernel: [<c02108af>] xlog_write+0x40b/0x4ea Jul 23 18:12:32 dell1 kernel: [<c020f247>] xfs_log_write+0x67/0x99 Jul 23 18:12:32 dell1 kernel: [<c021e61e>] xfs_trans_commit+0x118/0x44a Jul 23 18:12:33 dell1 kernel: [<c01151cc>] __change_page_attr+0x25/0x1a7 Jul 23 18:12:33 dell1 kernel: [<c021db02>] xfs_trans_dup+0x36/0xff Jul 23 18:12:33 dell1 kernel: [<c01154a7>] kernel_map_pages+0x33/0x64 Jul 23 18:12:33 dell1 kernel: [<c021db02>] xfs_trans_dup+0x36/0xff Jul 23 18:12:33 dell1 kernel: [<c013e43f>] kmem_cache_alloc+0x179/0x1ff Jul 23 18:12:33 dell1 kernel: [<c021db14>] xfs_trans_dup+0x48/0xff Jul 23 18:12:34 dell1 kernel: [<c0220f3f>] xfs_dir_ialloc+0x13e/0x2ed Jul 23 18:12:35 dell1 kernel: [<c0228057>] xfs_mkdir+0x3d1/0x767 Jul 23 18:12:36 dell1 kernel: [<c0232c2e>] linvfs_mknod+0x234/0x25d Jul 23 18:12:37 dell1 kernel: [<c01edba3>] xfs_dir2_lookup+0x14c/0x14e Jul 23 18:12:37 dell1 kernel: [<c013d4fd>] cache_init_objs+0xec/0x1ea Jul 23 18:12:38 dell1 kernel: [<c0220d22>] xfs_dir_lookup_int+0x4c/0x12b Jul 23 18:12:38 dell1 kernel: [<c0232c90>] linvfs_mkdir+0x2c/0x30 Jul 23 18:12:38 dell1 kernel: [<c016336c>] vfs_mkdir+0x8d/0x104 Jul 23 18:12:38 dell1 kernel: [<c01634a9>] sys_mkdir+0xc6/0xf5 Jul 23 18:12:38 dell1 kernel: [<c0105c1f>] syscall_call+0x7/0xb Jul 23 18:12:38 dell1 kernel: a lot's of them. This looks different from the others: Jul 23 18:12:49 dell1 kernel: drbd0: _drbd_send_page: (page_count(page) < 1) in /usr/local/src/drbd-0.7.0/drbd/drbd_main.c:895 Jul 23 18:12:49 dell1 kernel: drbd0: someone want4ab>] permission+0x2f/0x4b Jul 23 18:12:49 dell1 kernel: [<c016288a>] vfs_create+0x99/0x110 Jul 23 18:12:49 dell1 kernel: [<c0162ef1>] open_namei+0x3bb/0x40d Jul 23 18:12:49 dell1 kernel: [<c01538db>] filp_open+0x43/0x69 Jul 23 18:12:49 dell1 kernel: [<c0153d25>] sys_open+0x5b/0x8b Jul 23 18:12:49 dell1 kernel: [<c0105c1f>] syscall_call+0x7/0xb Jul 23 18:12:49 dell1 kernel: ----- Florin Cazacu