Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
cc'ed philipp and lmb, so they won't miss this mail, burried in some "uninteressting" thread. / 2004-07-22 20:51:20 +0000 \ Florin Cazacu: > Lars Ellenberg wrote: > > >so please revert that change for now, > >and disable all use of drbd_send_page, > >like below. > > > > > > > I disabled drbd_send_page, and it looks like is working ok. I ran a > bonnie benchmark and it looks like is holding ok. ok. now, to help to find the actual problem, you could revert that again, but now recompile and install a new kernel with "kernel-hacking" -> [*] Kernel debugging [*] Debug memory allocations [*] Page alloc debugging or even enable xfs debugging... then recompile drbd, of course. and then trigger it again, maybe the logs show something more interessting then... for the record, I think the problem is this: facts: xfs makes heavy use of slabs (kmem_zone_alloc, which maps to kmem_cache_alloc). all of those pages have the PG_slab set. eventually it submits them for io. the page now reaches drbd, which by tcp_sendpage puts a reference to it in the tcp sendbuffer. the tcp_stack first get_page() it, of course. when the socket buffer is cleaned up after tcp ack is received, or the socket is shutdown, or whatever: it put_page() it again. now, the stack traces show that at this point the page_count() reaches zero, so it actually is freed now. since it has PG_slab set -=> BOOM. analysis: either: the page_count() _IS_ already zero when it is submitted to drbd. this way, the tcp stack had the only reference to it, and put_page() would try to free a slab page. this seems very unlikely, and we could easily put an assert early in the drbd code to prove this wrong. or: xfs for some reason kmem_zone_free's (kmem_cache_free) the submitted pages _before_ they are sent (so before io on that page has completed. no bio_endio called yet!). which means that xfs "frees" a page of which the tcp stack still holds a reference. this seem to be the likely code path. now. either no one except xfs may hold a reference to their pages. then xfs should prominently state this somewhere. or xfs just does something it must not do: freeing pages that have reference counts. someone wants to ask the xfs guys about this? solution approaches: a. we could disable zero copy networking completely (tcp_sendpage). b. we could make it configurable. c. we could simply fall back to tcp_sendmsg for slab pages. patch for c. is attached. if it works for Florin (please confirm), then it will go into svn soonish. any comments? Lars Ellenberg well. drbd-"user" seems to be a very mixed newbie, beginners, users, power users, and developers list... but as long as nobody complains ... :) -------------- next part -------------- Index: drbd_main.c =================================================================== --- drbd_main.c (revision 1448) +++ drbd_main.c (working copy) @@ -883,12 +883,35 @@ that we do not reuse our own buffer pages (EEs) to early, therefore we have the net_ee list. */ +int _drbd_no_send_page(drbd_dev *mdev, struct page *page, + int offset, size_t size) +{ + int ret; + ret = drbd_send(mdev, mdev->data.socket, kmap(page) + offset, size, 0); + kunmap(page); + return ret; +} + int _drbd_send_page(drbd_dev *mdev, struct page *page, int offset, size_t size) { int sent,ok; int len = size; + /* PARANOIA. if this ever triggers, + * something in the layers above us is really kaputt */ + ERR_IF (page_count(page) < 1) { + ERR("someone wants to send a free page!\n"); + dump_stack(); + return _drbd_no_send_page(mdev, page, offset, size); + } + + if (PageSlab(page)) { + /* probably xfs. fall back to sendmsg instead of sendpage. + */ + return _drbd_no_send_page(mdev, page, offset, size); + } + spin_lock(&mdev->send_task_lock); mdev->send_task=current; spin_unlock(&mdev->send_task_lock);