<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 11">
<meta name=Originator content="Microsoft Word 11">
<link rel=File-List href="cid:filelist.xml@01C782AD.065B0550">
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:DoNotRelyOnCSS/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:SpellingState>Clean</w:SpellingState>
<w:GrammarState>Clean</w:GrammarState>
<w:DocumentKind>DocumentEmail</w:DocumentKind>
<w:EnvelopeVis/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:UseWord2002TableStyleRules/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
</w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" LatentStyleCount="156">
</w:LatentStyles>
</xml><![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;
        mso-font-alt:"?l?r ?????????????????????????";
        mso-font-charset:0;
        mso-generic-font-family:swiss;
        mso-font-pitch:variable;
        mso-font-signature:1627421319 -2147483648 8 0 66047 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {mso-style-parent:"";
        margin:0in;
        margin-bottom:.0001pt;
        mso-pagination:widow-orphan;
        font-size:12.0pt;
        font-family:"Times New Roman";
        mso-fareast-font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;
        text-underline:single;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;
        text-underline:single;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        mso-style-noshow:yes;
        mso-ansi-font-size:10.0pt;
        mso-bidi-font-size:10.0pt;
        font-family:Arial;
        mso-ascii-font-family:Arial;
        mso-hansi-font-family:Arial;
        mso-bidi-font-family:Arial;
        color:windowtext;}
span.SpellE
        {mso-style-name:"";
        mso-spl-e:yes;}
span.GramE
        {mso-style-name:"";
        mso-gram-e:yes;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.25in 1.0in 1.25in;
        mso-header-margin:.5in;
        mso-footer-margin:.5in;
        mso-paper-source:0;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
        {mso-style-name:"Table Normal";
        mso-tstyle-rowband-size:0;
        mso-tstyle-colband-size:0;
        mso-style-noshow:yes;
        mso-style-parent:"";
        mso-padding-alt:0in 5.4pt 0in 5.4pt;
        mso-para-margin:0in;
        mso-para-margin-bottom:.0001pt;
        mso-pagination:widow-orphan;
        font-size:10.0pt;
        font-family:"Times New Roman";
        mso-ansi-language:#0400;
        mso-fareast-language:#0400;
        mso-bidi-language:#0400;}
</style>
<![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple style='tab-interval:.5in'>
<div class=Section1>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Hi all, <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>This my second time actually reporting on this problem. <span
style='mso-spacerun:yes'> </span>The first time I did not<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Have much info other then the logs but this time I think I understand
a bit more a bout the problem.<span style='mso-spacerun:yes'> </span><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>So here it is.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=1 color=black face=Tahoma><span style='font-size:
9.0pt;font-family:Tahoma;color:black'>Problem:<br>
We are syncing and we are the sync target.<br>
We submit a write <span class=SpellE>io</span> request and when it is done <span
class=SpellE>bio_endio</span> calls <span class=SpellE>drbd_endio_write_<span
class=GramE>sec</span></span><span class=GramE>(</span>)<br>
When we access the sector field for the <span class=SpellE>ee</span>/bio in
this routine it is poisoned because we have<br>
<span class=GramE>slab</span> debugging turned on. The poison is
6d6d6d6d...This indicates that we are freeing the <span class=SpellE>ee</span><br>
<span class=GramE>and</span> then proceed to use it.<br>
The <span class=SpellE>symtom</span> is that we get:<br>
Apr 12 02:36:22<span class=GramE> kernel</span>: drbd1: <span
class=SpellE>drbd_rs_complete_io</span>() called, but extent not found<br>
Apr 12 02:36:22<span class=GramE> kernel</span>: drbd1: <span
class=SpellE>al_complete_io</span>() called on inactive extent 1532713819<br>
<br>
<span class=GramE>and</span> we loop forever with:<br>
Apr 12 02:43:19 <span class=SpellE>ellwood</span> kernel: drbd2: Retrying <span
class=SpellE>drbd_rs_del_<span class=GramE>all</span></span><span class=GramE>(</span>)
later. <span class=SpellE><span class=GramE>refcnt</span></span><span
class=GramE>=</span>1<br>
<br>
Well a look at the code yields:<br>
<span class=GramE>we</span> add to the <span class=SpellE>done_ee</span> list
in:<br>
./<span class=SpellE>src/drbd/drbd_receiver.c</span>: <span
class=SpellE>list_add_<span class=GramE>tail</span></span><span class=GramE>(</span>&e-><span
class=SpellE>w.list,&mdev</span>-><span class=SpellE>done_ee</span>);<br>
./<span class=SpellE>src/drbd/drbd_<span class=GramE>worker.c</span></span><span
class=GramE> :</span> <span class=SpellE>list_add_tail</span>(&e-><span
class=SpellE>w.list,&mdev</span>-><span class=SpellE>done_ee</span>);<br>
<br>
In ./<span class=SpellE>src/drbd/drbd_receiver.c:receive_<span class=GramE>data</span></span><span
class=GramE>(</span>) we add to the list and wake up the <span class=SpellE>asender</span>
and return. We do<br>
<span class=GramE>not</span> attempt to use any <span class=SpellE>ot</span>
the <span class=SpellE>ee's</span> after. <span class=GramE>So all maybe fine
there.</span><br>
<br>
However, in the ./<span class=SpellE>src/drbd/drbd_worker.c:drbd_endio_write_<span
class=GramE>sec</span></span><span class=GramE>(</span>) we add to the list and
then we call<br>
<span class=SpellE>drbd_rs_complete_<span class=GramE>io</span></span><span
class=GramE>(</span><span class=SpellE>mdev,e</span>->sector) and<br>
<span class=SpellE>drbd_al_complete_<span class=GramE>io</span></span><span
class=GramE>(</span><span class=SpellE>mdev,e</span>->sector) Using the <span
class=SpellE>ee</span> after it was placed on the done list so it could<br>
<span class=GramE>potentially</span> be freed. I suspect it is being
freed in <span class=SpellE>process_done_<span class=GramE>ee</span></span><span
class=GramE>(</span>) where <span class=SpellE>drbd_free_ee</span> is<br>
<span class=GramE>called</span>.<br>
<br>
Instrumentation shows that the bio and <span class=SpellE>ee</span> are intact
at the beginning of <span class=SpellE>drbd_endio_write_<span class=GramE>sec</span></span><span
class=GramE>(</span>)<br>
<span class=GramE>as</span> shown in the log below:<br>
<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel: drbd1: <span
class=SpellE>drbd_endio_write_sec</span>: EM-- XX BUG!! <span class=GramE>e</span>->sector=7740398493674204011s
<span class=SpellE>cap_sector</span>=10267584s size=1802201963 <span
class=SpellE>bytes_done</span>=32768 bio-><span class=SpellE>bi_sector</span>=774039849367420400<br>
<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:
[<c0105a67>] dump_stack+0x17/0x20<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:
[<ee2d0caf>] drbd_endio_write_sec+0x16f/0x3a0 [<span class=SpellE>drbd</span>]<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:
[<c0177319>] bio_endio+0x59/0x90<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:
[<ee0ab2b9>] dec_pending+0x39/0x70 [<span class=SpellE>dm_mod</span>]<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:
[<ee0ab391>] clone_endio+0xa1/0xc0 [<span class=SpellE>dm_mod</span>]<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:
[<c0177319>] bio_endio+0x59/0x90<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:
[<c022638a>] __end_that_request_first+0xba/0x330<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:
[<c0226618>] end_that_request_chunk+0x8/0x10<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:
[<ee0209f5>] scsi_end_request+0x25/0xe0 [<span class=SpellE>scsi_mod</span>]<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:
[<ee020c82>] scsi_io_completion+0xd2/0x3c0 [<span class=SpellE>scsi_mod</span>]<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel: [<ee06a02a>]
sd_rw_intr+0x14a/0x2c0 [<span class=SpellE>sd_mod</span>]<br>
Notice the poisoned e->sector captured just before we call <span
class=SpellE>drbd_rs_complete_io</span> with it.<br>
More importantly, notice the value of <span class=SpellE>cap_sector</span>
which is what came out of the kernel when<br>
<span class=SpellE><span class=GramE>bio_endio</span></span> called our
callback.<br>
<br>
What is the best way to fix this? Is it a matter of just moving <span
class=SpellE>list_add_<span class=GramE>tail</span></span><span class=GramE>(</span>&e-><span
class=SpellE>w.list,&mdev</span>-><span class=SpellE>done_ee</span>);<br>
<span class=GramE>to</span> after we use e->sector?<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=1 color=black face=Tahoma><span style='font-size:
9.0pt;font-family:Tahoma;color:black'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=1 color=black face=Tahoma><span style='font-size:
9.0pt;font-family:Tahoma;color:black'>Thanks,<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=1 color=black face=Tahoma><span style='font-size:
9.0pt;font-family:Tahoma;color:black'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=1 color=black face=Tahoma><span style='font-size:
9.0pt;font-family:Tahoma;color:black'>EM--</span></font><font size=2
face=Arial><span style='font-size:10.0pt;font-family:Arial'><o:p></o:p></span></font></p>
</div>
</body>
</html>