<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 11">
<meta name=Originator content="Microsoft Word 11">
<link rel=File-List href="cid:filelist.xml@01C782AD.065B0550">
<!--[if gte mso 9]><xml>
 <o:OfficeDocumentSettings>
  <o:DoNotRelyOnCSS/>
 </o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <w:WordDocument>
  <w:SpellingState>Clean</w:SpellingState>
  <w:GrammarState>Clean</w:GrammarState>
  <w:DocumentKind>DocumentEmail</w:DocumentKind>
  <w:EnvelopeVis/>
  <w:ValidateAgainstSchemas/>
  <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
  <w:IgnoreMixedContent>false</w:IgnoreMixedContent>
  <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
  <w:Compatibility>
   <w:BreakWrappedTables/>
   <w:SnapToGridInCell/>
   <w:WrapTextWithPunct/>
   <w:UseAsianBreakRules/>
   <w:UseWord2002TableStyleRules/>
  </w:Compatibility>
  <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
 </w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <w:LatentStyles DefLockedState="false" LatentStyleCount="156">
 </w:LatentStyles>
</xml><![endif]-->
<style>
<!--
 /* Font Definitions */
 @font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;
        mso-font-alt:"?l?r ?????????????????????????";
        mso-font-charset:0;
        mso-generic-font-family:swiss;
        mso-font-pitch:variable;
        mso-font-signature:1627421319 -2147483648 8 0 66047 0;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {mso-style-parent:"";
        margin:0in;
        margin-bottom:.0001pt;
        mso-pagination:widow-orphan;
        font-size:12.0pt;
        font-family:"Times New Roman";
        mso-fareast-font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;
        text-underline:single;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;
        text-underline:single;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        mso-style-noshow:yes;
        mso-ansi-font-size:10.0pt;
        mso-bidi-font-size:10.0pt;
        font-family:Arial;
        mso-ascii-font-family:Arial;
        mso-hansi-font-family:Arial;
        mso-bidi-font-family:Arial;
        color:windowtext;}
span.SpellE
        {mso-style-name:"";
        mso-spl-e:yes;}
span.GramE
        {mso-style-name:"";
        mso-gram-e:yes;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.25in 1.0in 1.25in;
        mso-header-margin:.5in;
        mso-footer-margin:.5in;
        mso-paper-source:0;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 10]>
<style>
 /* Style Definitions */ 
 table.MsoNormalTable
        {mso-style-name:"Table Normal";
        mso-tstyle-rowband-size:0;
        mso-tstyle-colband-size:0;
        mso-style-noshow:yes;
        mso-style-parent:"";
        mso-padding-alt:0in 5.4pt 0in 5.4pt;
        mso-para-margin:0in;
        mso-para-margin-bottom:.0001pt;
        mso-pagination:widow-orphan;
        font-size:10.0pt;
        font-family:"Times New Roman";
        mso-ansi-language:#0400;
        mso-fareast-language:#0400;
        mso-bidi-language:#0400;}
</style>
<![endif]-->
</head>

<body lang=EN-US link=blue vlink=purple style='tab-interval:.5in'>

<div class=Section1>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Hi all, <o:p></o:p></span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>This my second time actually reporting on this problem. <span
style='mso-spacerun:yes'>&nbsp;</span>The first time I did not<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>Have much info other then the logs but this time I think I understand
a bit more a bout the problem.<span style='mso-spacerun:yes'>&nbsp; </span><o:p></o:p></span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'>So here it is.<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=2 face=Arial><span style='font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=MsoNormal><font size=1 color=black face=Tahoma><span style='font-size:
9.0pt;font-family:Tahoma;color:black'>Problem:<br>
We are syncing and we are the sync target.<br>
We submit a write <span class=SpellE>io</span> request and when it is done <span
class=SpellE>bio_endio</span> calls <span class=SpellE>drbd_endio_write_<span
class=GramE>sec</span></span><span class=GramE>(</span>)<br>
When we access the sector field for the <span class=SpellE>ee</span>/bio in
this routine it is poisoned because we have<br>
<span class=GramE>slab</span> debugging turned on. The poison is
6d6d6d6d...This indicates that we are freeing the <span class=SpellE>ee</span><br>
<span class=GramE>and</span> then proceed to use it.<br>
The <span class=SpellE>symtom</span> is that we get:<br>
Apr 12 02:36:22<span class=GramE>&nbsp; kernel</span>: drbd1: <span
class=SpellE>drbd_rs_complete_io</span>() called, but extent not found<br>
Apr 12 02:36:22<span class=GramE>&nbsp; kernel</span>: drbd1: <span
class=SpellE>al_complete_io</span>() called on inactive extent 1532713819<br>
<br>
<span class=GramE>and</span> we loop forever with:<br>
Apr 12 02:43:19 <span class=SpellE>ellwood</span> kernel: drbd2: Retrying <span
class=SpellE>drbd_rs_del_<span class=GramE>all</span></span><span class=GramE>(</span>)
later. <span class=SpellE><span class=GramE>refcnt</span></span><span
class=GramE>=</span>1<br>
<br>
Well a look at the code yields:<br>
<span class=GramE>we</span> add to the <span class=SpellE>done_ee</span> list
in:<br>
./<span class=SpellE>src/drbd/drbd_receiver.c</span>:&nbsp; &nbsp; <span
class=SpellE>list_add_<span class=GramE>tail</span></span><span class=GramE>(</span>&amp;e-&gt;<span
class=SpellE>w.list,&amp;mdev</span>-&gt;<span class=SpellE>done_ee</span>);<br>
./<span class=SpellE>src/drbd/drbd_<span class=GramE>worker.c</span></span><span
class=GramE>&nbsp; :</span>&nbsp; &nbsp; <span class=SpellE>list_add_tail</span>(&amp;e-&gt;<span
class=SpellE>w.list,&amp;mdev</span>-&gt;<span class=SpellE>done_ee</span>);<br>
<br>
In ./<span class=SpellE>src/drbd/drbd_receiver.c:receive_<span class=GramE>data</span></span><span
class=GramE>(</span>) we add to the list and wake up the <span class=SpellE>asender</span>
and return.&nbsp; We do<br>
<span class=GramE>not</span> attempt to use any <span class=SpellE>ot</span>
the <span class=SpellE>ee's</span> after. <span class=GramE>So all maybe fine
there.</span><br>
<br>
However, in the ./<span class=SpellE>src/drbd/drbd_worker.c:drbd_endio_write_<span
class=GramE>sec</span></span><span class=GramE>(</span>) we add to the list and
then we call<br>
<span class=SpellE>drbd_rs_complete_<span class=GramE>io</span></span><span
class=GramE>(</span><span class=SpellE>mdev,e</span>-&gt;sector)&nbsp; and<br>
<span class=SpellE>drbd_al_complete_<span class=GramE>io</span></span><span
class=GramE>(</span><span class=SpellE>mdev,e</span>-&gt;sector) Using the <span
class=SpellE>ee</span> after it was placed on the done list so it could<br>
<span class=GramE>potentially</span> be freed.&nbsp; I suspect it is being
freed in <span class=SpellE>process_done_<span class=GramE>ee</span></span><span
class=GramE>(</span>) where <span class=SpellE>drbd_free_ee</span> is<br>
<span class=GramE>called</span>.<br>
<br>
Instrumentation shows that the bio and <span class=SpellE>ee</span> are intact
at the beginning of <span class=SpellE>drbd_endio_write_<span class=GramE>sec</span></span><span
class=GramE>(</span>)<br>
<span class=GramE>as</span> shown in the log below:<br>
<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel: drbd1: <span
class=SpellE>drbd_endio_write_sec</span>: EM-- XX BUG!! <span class=GramE>e</span>-&gt;sector=7740398493674204011s
<span class=SpellE>cap_sector</span>=10267584s size=1802201963 <span
class=SpellE>bytes_done</span>=32768&nbsp; bio-&gt;<span class=SpellE>bi_sector</span>=774039849367420400<br>
<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:&nbsp;
[&lt;c0105a67&gt;] dump_stack+0x17/0x20<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:&nbsp;
[&lt;ee2d0caf&gt;] drbd_endio_write_sec+0x16f/0x3a0 [<span class=SpellE>drbd</span>]<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:&nbsp;
[&lt;c0177319&gt;] bio_endio+0x59/0x90<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:&nbsp;
[&lt;ee0ab2b9&gt;] dec_pending+0x39/0x70 [<span class=SpellE>dm_mod</span>]<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:&nbsp;
[&lt;ee0ab391&gt;] clone_endio+0xa1/0xc0 [<span class=SpellE>dm_mod</span>]<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:&nbsp;
[&lt;c0177319&gt;] bio_endio+0x59/0x90<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:&nbsp;
[&lt;c022638a&gt;] __end_that_request_first+0xba/0x330<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:&nbsp;
[&lt;c0226618&gt;] end_that_request_chunk+0x8/0x10<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:&nbsp;
[&lt;ee0209f5&gt;] scsi_end_request+0x25/0xe0 [<span class=SpellE>scsi_mod</span>]<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:&nbsp;
[&lt;ee020c82&gt;] scsi_io_completion+0xd2/0x3c0 [<span class=SpellE>scsi_mod</span>]<br>
Apr 19 17:28:12 <span class=SpellE>godzilla</span> kernel:&nbsp; [&lt;ee06a02a&gt;]
sd_rw_intr+0x14a/0x2c0 [<span class=SpellE>sd_mod</span>]<br>
Notice the poisoned e-&gt;sector captured just before we call <span
class=SpellE>drbd_rs_complete_io</span> with it.<br>
More importantly, notice the value of <span class=SpellE>cap_sector</span>
which is what came out of the kernel when<br>
<span class=SpellE><span class=GramE>bio_endio</span></span> called our
callback.<br>
<br>
What is the best way to fix this? Is it a matter of just moving <span
class=SpellE>list_add_<span class=GramE>tail</span></span><span class=GramE>(</span>&amp;e-&gt;<span
class=SpellE>w.list,&amp;mdev</span>-&gt;<span class=SpellE>done_ee</span>);<br>
<span class=GramE>to</span> after we use e-&gt;sector?<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 color=black face=Tahoma><span style='font-size:
9.0pt;font-family:Tahoma;color:black'><o:p>&nbsp;</o:p></span></font></p>

<p class=MsoNormal><font size=1 color=black face=Tahoma><span style='font-size:
9.0pt;font-family:Tahoma;color:black'>Thanks,<o:p></o:p></span></font></p>

<p class=MsoNormal><font size=1 color=black face=Tahoma><span style='font-size:
9.0pt;font-family:Tahoma;color:black'><o:p>&nbsp;</o:p></span></font></p>

<p class=MsoNormal><font size=1 color=black face=Tahoma><span style='font-size:
9.0pt;font-family:Tahoma;color:black'>EM--</span></font><font size=2
face=Arial><span style='font-size:10.0pt;font-family:Arial'><o:p></o:p></span></font></p>

</div>

</body>

</html>