<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page Section1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.Section1
        {page:Section1;}
-->
</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="Section1">
<p class="MsoNormal">Hi,<o:p></o:p></p>
<p class="MsoNormal">I’m using drbd 8.3.7 on a 2.6.32 kernel .<o:p></o:p></p>
<p class="MsoNormal">This is running in a embedded environment in a card cage with 2 cards.<o:p></o:p></p>
<p class="MsoNormal">The cards are connected with an internal 10G MAC.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">My drbd configuration is:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">global {<o:p></o:p></p>
<p class="MsoNormal"> usage-count no;<o:p></o:p></p>
<p class="MsoNormal">}<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">common {<o:p></o:p></p>
<p class="MsoNormal"> protocol C;<o:p></o:p></p>
<p class="MsoNormal"> syncer {<o:p></o:p></p>
<p class="MsoNormal"> rate 10M;<o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal"> net {<o:p></o:p></p>
<p class="MsoNormal"> ko-count 6;<o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal"> handlers {<o:p></o:p></p>
<p class="MsoNormal"> # see also /etc/drbd.d/global_common.conf<o:p></o:p></p>
<p class="MsoNormal"> split-brain "/opt/compass/bin/drbd-notify-split-brain.sh";<o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal">}<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">resource home {<o:p></o:p></p>
<p class="MsoNormal"> meta-disk internal;<o:p></o:p></p>
<p class="MsoNormal"> device /dev/drbd1;<o:p></o:p></p>
<p class="MsoNormal"> disk /dev/ssd2;<o:p></o:p></p>
<p class="MsoNormal"> on cpm0 {<o:p></o:p></p>
<p class="MsoNormal"> address 1.1.1.129:7788;<o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal"> on cpm1 {<o:p></o:p></p>
<p class="MsoNormal"> address 1.1.1.130:7788;<o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal">}<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">resource syslog {<o:p></o:p></p>
<p class="MsoNormal"> meta-disk internal;<o:p></o:p></p>
<p class="MsoNormal"> device /dev/drbd2;<o:p></o:p></p>
<p class="MsoNormal"> disk /dev/ssd3;<o:p></o:p></p>
<p class="MsoNormal"> on cpm0 {<o:p></o:p></p>
<p class="MsoNormal"> address 1.1.1.129:7789;<o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal"> on cpm1 {<o:p></o:p></p>
<p class="MsoNormal"> address 1.1.1.130:7789;<o:p></o:p></p>
<p class="MsoNormal"> }<o:p></o:p></p>
<p class="MsoNormal">}<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">As you see my syncer rate is 10M, which is not too much for a 10G link.<o:p></o:p></p>
<p class="MsoNormal">BTW, tcpdump doesn’t show much traffic other than DRBD.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">When I get into a situation that requires a re-sync I keep getting sock_sendmsg timeouts, and the situation never heals.<o:p></o:p></p>
<p class="MsoNormal">Here’s the console output on the primary node:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">[ 116.784427] block drbd1: Starting worker thread (from cqueue [4659])<o:p></o:p></p>
<p class="MsoNormal">[ 116.803430] block drbd1: disk( Diskless -> Attaching )<o:p></o:p></p>
<p class="MsoNormal">[ 116.813328] block drbd1: Found 4 transactions (4 active extents) in activity log.<o:p></o:p></p>
<p class="MsoNormal">[ 116.825358] block drbd1: Method to ensure write ordering: barrier<o:p></o:p></p>
<p class="MsoNormal">[ 116.832006] block drbd1: max_segment_size ( = BIO size ) = 32768<o:p></o:p></p>
<p class="MsoNormal">[ 116.838643] block drbd1: drbd_bm_resize called with capacity == 88081624<o:p></o:p></p>
<p class="MsoNormal">[ 116.846592] block drbd1: resync bitmap: bits=11010203 words=172035<o:p></o:p></p>
<p class="MsoNormal">[ 116.853419] block drbd1: size = 42 GB (44040812 KB)<o:p></o:p></p>
<p class="MsoNormal">[ 116.877240] block drbd1: recounting of set bits took additional 3 jiffies<o:p></o:p></p>
<p class="MsoNormal">[ 116.885609] block drbd1: 7729 MB (1978523 bits) marked out-of-sync by on disk bit-map.<o:p></o:p></p>
<p class="MsoNormal">[ 116.894682] block drbd1: Marked additional 4096 KB as out-of-sync based on AL.<o:p></o:p></p>
<p class="MsoNormal">[ 116.903905] block drbd1: disk( Attaching -> UpToDate ) pdsk( DUnknown -> Outdated )<o:p></o:p></p>
<p class="MsoNormal">[ 117.019539] block drbd1: conn( StandAlone -> Unconnected )<o:p></o:p></p>
<p class="MsoNormal">[ 117.025479] block drbd1: Starting receiver thread (from drbd1_worker [4664])<o:p></o:p></p>
<p class="MsoNormal"> [ 117.057752] block drbd1: receiver (re)started<o:p></o:p></p>
<p class="MsoNormal">[ 117.062463] block drbd1: conn( Unconnected -> WFConnection )<o:p></o:p></p>
<p class="MsoNormal">[ 117.147161] block drbd2: Starting worker thread (from cqueue [4659])<o:p></o:p></p>
<p class="MsoNormal">[ 117.154200] block drbd2: disk( Diskless -> Attaching )<o:p></o:p></p>
<p class="MsoNormal">[ 117.161682] block drbd2: Found 4 transactions (8 active extents) in activity log.<o:p></o:p></p>
<p class="MsoNormal">[ 117.169835] block drbd2: Method to ensure write ordering: barrier<o:p></o:p></p>
<p class="MsoNormal">[ 117.176383] block drbd2: max_segment_size ( = BIO size ) = 32768<o:p></o:p></p>
<p class="MsoNormal">[ 117.182833] block drbd2: drbd_bm_resize called with capacity == 20980168<o:p></o:p></p>
<p class="MsoNormal">[ 117.190127] block drbd2: resync bitmap: bits=2622521 words=40977<o:p></o:p></p>
<p class="MsoNormal">[ 117.196635] block drbd2: size = 10 GB (10490084 KB)<o:p></o:p></p>
<p class="MsoNormal">[ 117.206486] block drbd2: recounting of set bits took additional 1 jiffies<o:p></o:p></p>
<p class="MsoNormal">[ 117.214012] block drbd2: 1668 MB (427065 bits) marked out-of-sync by on disk bit-map.<o:p></o:p></p>
<p class="MsoNormal">[ 117.222494] block drbd2: Marked additional 12 MB as out-of-sync based on AL.<o:p></o:p></p>
<p class="MsoNormal">[ 117.230980] block drbd2: disk( Attaching -> UpToDate ) pdsk( DUnknown -> Outdated )<o:p></o:p></p>
<p class="MsoNormal">[ 117.313623] block drbd2: conn( StandAlone -> Unconnected )<o:p></o:p></p>
<p class="MsoNormal">[ 117.320820] block drbd2: Starting receiver thread (from drbd2_worker [4710])<o:p></o:p></p>
<p class="MsoNormal">[ 117.328676] block drbd2: receiver (re)started<o:p></o:p></p>
<p class="MsoNormal">[ 117.333448] block drbd2: conn( Unconnected -> WFConnection )<o:p></o:p></p>
<p class="MsoNormal">[ 117.813264] block drbd2: role( Secondary -> Primary )<o:p></o:p></p>
<p class="MsoNormal"> [ 127.324098] block drbd1: Handshake successful: Agreed network protocol version 91<o:p></o:p></p>
<p class="MsoNormal">[ 127.332124] block drbd1: conn( WFConnection -> WFReportParams )<o:p></o:p></p>
<p class="MsoNormal">[ 127.338691] block drbd1: Starting asender thread (from drbd1_receiver [4692])<o:p></o:p></p>
<p class="MsoNormal">[ 127.368208] block drbd1: data-integrity-alg: <not-used><o:p></o:p></p>
<p class="MsoNormal">[ 127.373997] block drbd1: drbd_sync_handshake:<o:p></o:p></p>
<p class="MsoNormal">[ 127.378829] block drbd1: self 3475127B7403BB0B:8D44B59FE2BB68BF:0E1B69D867AAE02D:2D27340FC037E077 bits:1979547 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 127.390381] block drbd1: peer 8D44B59FE2BB68BE:0000000000000000:0000000000000000:0000000000000000 bits:1978523 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 127.401931] block drbd1: uuid_compare()=1 by rule 70<o:p></o:p></p>
<p class="MsoNormal">[ 127.407382] block drbd1: Becoming sync source due to disk states.<o:p></o:p></p>
<p class="MsoNormal">[ 127.414065] block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Inconsistent )<o:p></o:p></p>
<p class="MsoNormal">[ 127.566844] block drbd2: Handshake successful: Agreed network protocol version 91<o:p></o:p></p>
<p class="MsoNormal">[ 127.575037] block drbd2: conn( WFConnection -> WFReportParams )<o:p></o:p></p>
<p class="MsoNormal">[ 127.581654] block drbd2: Starting asender thread (from drbd2_receiver [4738])<o:p></o:p></p>
<p class="MsoNormal">[ 127.589652] block drbd2: data-integrity-alg: <not-used><o:p></o:p></p>
<p class="MsoNormal">[ 127.595445] block drbd2: drbd_sync_handshake:<o:p></o:p></p>
<p class="MsoNormal">[ 127.600247] block drbd2: self B89B0FA9261A88A7:13E00A97958E8BFD:D469C59FC5DBB2D0:B46B3E465F64FBCE bits:430137 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 127.611695] block drbd2: peer 13E00A97958E8BFC:0000000000000000:0000000000000000:0000000000000000 bits:427065 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 127.623370] block drbd2: uuid_compare()=1 by rule 70<o:p></o:p></p>
<p class="MsoNormal">[ 127.628795] block drbd2: Becoming sync source due to disk states.<o:p></o:p></p>
<p class="MsoNormal">[ 127.635429] block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Inconsistent )<o:p></o:p></p>
<p class="MsoNormal">[ 127.671232] block drbd1: conn( WFBitMapS -> SyncSource )<o:p></o:p></p>
<p class="MsoNormal">[ 127.677439] block drbd1: Began resync as SyncSource (will sync 7918188 KB [1979547 bits set]).<o:p></o:p></p>
<p class="MsoNormal">[ 127.867559] block drbd2: conn( WFBitMapS -> SyncSource )<o:p></o:p></p>
<p class="MsoNormal">[ 127.873426] block drbd2: Began resync as SyncSource (will sync 1720548 KB [430137 bits set]).<o:p></o:p></p>
<p class="MsoNormal">[ 130.103616] JBD: barrier-based sync failed on drbd2-8 - disabling barriers<o:p></o:p></p>
<p class="MsoNormal">[ 142.482440] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 5<o:p></o:p></p>
<p class="MsoNormal">[ 142.606298] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 5<o:p></o:p></p>
<p class="MsoNormal">[ 148.486473] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 4<o:p></o:p></p>
<p class="MsoNormal">[ 148.610322] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 4<o:p></o:p></p>
<p class="MsoNormal">[ 154.491468] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 3<o:p></o:p></p>
<p class="MsoNormal">[ 154.615339] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 3<o:p></o:p></p>
<p class="MsoNormal">[ 160.495491] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 2<o:p></o:p></p>
<p class="MsoNormal">[ 160.619364] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 2<o:p></o:p></p>
<p class="MsoNormal">[ 166.499522] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 1<o:p></o:p></p>
<p class="MsoNormal">[ 166.624422] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 1<o:p></o:p></p>
<p class="MsoNormal">[ 172.503544] block drbd2: drbd_send_block() failed<o:p></o:p></p>
<p class="MsoNormal">[ 172.508664] block drbd2: peer( Secondary -> Unknown ) conn( SyncSource -> NetworkFailure )<o:p></o:p></p>
<p class="MsoNormal">[ 172.517517] block drbd2: drbd_pp_alloc interrupted!<o:p></o:p></p>
<p class="MsoNormal">[ 172.522827] block drbd2: alloc_ee: Allocation of a page failed<o:p></o:p></p>
<p class="MsoNormal">[ 172.529116] block drbd2: error receiving RSDataRequest, l: 24!<o:p></o:p></p>
<p class="MsoNormal">[ 172.535380] block drbd2: asender terminated<o:p></o:p></p>
<p class="MsoNormal">[ 172.539924] block drbd2: Terminating drbd2_asender<o:p></o:p></p>
<p class="MsoNormal">[ 172.540060] block drbd2: Connection closed<o:p></o:p></p>
<p class="MsoNormal">[ 172.540067] block drbd2: conn( NetworkFailure -> Unconnected )<o:p></o:p></p>
<p class="MsoNormal">[ 172.540073] block drbd2: receiver terminated<o:p></o:p></p>
<p class="MsoNormal">[ 172.540076] block drbd2: Restarting drbd2_receiver<o:p></o:p></p>
<p class="MsoNormal">[ 172.540080] block drbd2: receiver (re)started<o:p></o:p></p>
<p class="MsoNormal">[ 172.540086] block drbd2: conn( Unconnected -> WFConnection )<o:p></o:p></p>
<p class="MsoNormal">[ 172.629405] block drbd1: drbd_send_block() failed<o:p></o:p></p>
<p class="MsoNormal">[ 172.634522] block drbd1: peer( Secondary -> Unknown ) conn( SyncSource -> NetworkFailure )<o:p></o:p></p>
<p class="MsoNormal">[ 172.643429] block drbd1: drbd_pp_alloc interrupted!<o:p></o:p></p>
<p class="MsoNormal">[ 172.648795] block drbd1: alloc_ee: Allocation of a page failed<o:p></o:p></p>
<p class="MsoNormal">[ 172.655137] block drbd1: error receiving RSDataRequest, l: 24!<o:p></o:p></p>
<p class="MsoNormal">[ 172.661566] block drbd1: asender terminated<o:p></o:p></p>
<p class="MsoNormal">[ 172.666273] block drbd1: Terminating drbd1_asender<o:p></o:p></p>
<p class="MsoNormal">[ 172.666366] block drbd1: Connection closed<o:p></o:p></p>
<p class="MsoNormal">[ 172.666375] block drbd1: conn( NetworkFailure -> Unconnected )<o:p></o:p></p>
<p class="MsoNormal">[ 172.666381] block drbd1: receiver terminated<o:p></o:p></p>
<p class="MsoNormal">[ 172.666385] block drbd1: Restarting drbd1_receiver<o:p></o:p></p>
<p class="MsoNormal">[ 172.666389] block drbd1: receiver (re)started<o:p></o:p></p>
<p class="MsoNormal">[ 172.666395] block drbd1: conn( Unconnected -> WFConnection )<o:p></o:p></p>
<p class="MsoNormal">[ 172.902451] block drbd2: Handshake successful: Agreed network protocol version 91<o:p></o:p></p>
<p class="MsoNormal">[ 172.910548] block drbd2: conn( WFConnection -> WFReportParams )<o:p></o:p></p>
<p class="MsoNormal">[ 172.917074] block drbd2: Starting asender thread (from drbd2_receiver [4738])<o:p></o:p></p>
<p class="MsoNormal">[ 172.924983] block drbd2: data-integrity-alg: <not-used><o:p></o:p></p>
<p class="MsoNormal">[ 172.930761] block drbd2: drbd_sync_handshake:<o:p></o:p></p>
<p class="MsoNormal">[ 172.935531] block drbd2: self B89B0FA9261A88A7:EC151750C3343BBF:13E00A97958E8BFD:D469C59FC5DBB2D0 bits:423852 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 172.947034] block drbd2: peer EC151750C3343BBE:0000000000000000:0000000000000000:0000000000000000 bits:423841 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 172.958516] block drbd2: uuid_compare()=1 by rule 70<o:p></o:p></p>
<p class="MsoNormal">[ 172.963993] block drbd2: Becoming sync source due to disk states.<o:p></o:p></p>
<p class="MsoNormal">[ 172.970690] block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )<o:p></o:p></p>
<p class="MsoNormal">[ 172.992500] block drbd2: conn( WFBitMapS -> SyncSource )<o:p></o:p></p>
<p class="MsoNormal">[ 172.998432] block drbd2: Began resync as SyncSource (will sync 1695408 KB [423852 bits set]).<o:p></o:p></p>
<p class="MsoNormal">[ 173.058406] block drbd1: Handshake successful: Agreed network protocol version 91<o:p></o:p></p>
<p class="MsoNormal">[ 173.066554] block drbd1: conn( WFConnection -> WFReportParams )<o:p></o:p></p>
<p class="MsoNormal">[ 173.073052] block drbd1: Starting asender thread (from drbd1_receiver [4692])<o:p></o:p></p>
<p class="MsoNormal">[ 173.081008] block drbd1: data-integrity-alg: <not-used><o:p></o:p></p>
<p class="MsoNormal">[ 173.086807] block drbd1: drbd_sync_handshake:<o:p></o:p></p>
<p class="MsoNormal">[ 173.091627] block drbd1: self 3475127B7403BB0B:FD0FED39356C1BD2:8D44B59FE2BB68BF:0E1B69D867AAE02D bits:1977979 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 173.103125] block drbd1: peer FD0FED39356C1BD2:0000000000000000:0000000000000000:0000000000000000 bits:1977979 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 173.114608] block drbd1: uuid_compare()=1 by rule 70<o:p></o:p></p>
<p class="MsoNormal">[ 173.120079] block drbd1: Becoming sync source due to disk states.<o:p></o:p></p>
<p class="MsoNormal">[ 173.126763] block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )<o:p></o:p></p>
<p class="MsoNormal">[ 173.169295] block drbd1: conn( WFBitMapS -> SyncSource )<o:p></o:p></p>
<p class="MsoNormal">[ 173.175262] block drbd1: Began resync as SyncSource (will sync 7911916 KB [1977979 bits set]).<o:p></o:p></p>
<p class="MsoNormal">[ 191.031348] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 5<o:p></o:p></p>
<p class="MsoNormal">[ 194.232777] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 5<o:p></o:p></p>
<p class="MsoNormal">[ 197.035373] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 4<o:p></o:p></p>
<p class="MsoNormal">[ 200.236775] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 4<o:p></o:p></p>
<p class="MsoNormal">[ 203.039389] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 3<o:p></o:p></p>
<p class="MsoNormal">[ 206.240799] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 3<o:p></o:p></p>
<p class="MsoNormal">[ 209.044437] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 2<o:p></o:p></p>
<p class="MsoNormal">[ 212.244823] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 2<o:p></o:p></p>
<p class="MsoNormal">[ 215.048434] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 1<o:p></o:p></p>
<p class="MsoNormal">[ 218.248841] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 1<o:p></o:p></p>
<p class="MsoNormal">[ 221.052458] block drbd2: drbd_send_block() failed<o:p></o:p></p>
<p class="MsoNormal">[ 221.057638] block drbd2: peer( Secondary -> Unknown ) conn( SyncSource -> NetworkFailure )<o:p></o:p></p>
<p class="MsoNormal">[ 221.066546] block drbd2: drbd_pp_alloc interrupted!<o:p></o:p></p>
<p class="MsoNormal">[ 221.071978] block drbd2: alloc_ee: Allocation of a page failed<o:p></o:p></p>
<p class="MsoNormal">[ 221.078315] block drbd2: error receiving RSDataRequest, l: 24!<o:p></o:p></p>
<p class="MsoNormal">[ 221.084632] block drbd2: asender terminated<o:p></o:p></p>
<p class="MsoNormal">[ 221.089256] block drbd2: Terminating drbd2_asender<o:p></o:p></p>
<p class="MsoNormal">[ 221.089381] block drbd2: Connection closed<o:p></o:p></p>
<p class="MsoNormal">[ 221.089389] block drbd2: conn( NetworkFailure -> Unconnected )<o:p></o:p></p>
<p class="MsoNormal">[ 221.089394] block drbd2: receiver terminated<o:p></o:p></p>
<p class="MsoNormal">[ 221.089396] block drbd2: Restarting drbd2_receiver<o:p></o:p></p>
<p class="MsoNormal">[ 221.089400] block drbd2: receiver (re)started<o:p></o:p></p>
<p class="MsoNormal">[ 221.089405] block drbd2: conn( Unconnected -> WFConnection )<o:p></o:p></p>
<p class="MsoNormal">[ 221.461213] block drbd2: Handshake successful: Agreed network protocol version 91<o:p></o:p></p>
<p class="MsoNormal">[ 221.469411] block drbd2: conn( WFConnection -> WFReportParams )<o:p></o:p></p>
<p class="MsoNormal">[ 221.476062] block drbd2: Starting asender thread (from drbd2_receiver [4738])<o:p></o:p></p>
<p class="MsoNormal">[ 221.483979] block drbd2: data-integrity-alg: <not-used><o:p></o:p></p>
<p class="MsoNormal">[ 221.489887] block drbd2: drbd_sync_handshake:<o:p></o:p></p>
<p class="MsoNormal">[ 221.494746] block drbd2: self B89B0FA9261A88A7:91631364A53FB4F2:EC151750C3343BBF:13E00A97958E8BFD bits:423851 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 221.506290] block drbd2: peer 91631364A53FB4F2:0000000000000000:0000000000000000:0000000000000000 bits:423841 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 221.517838] block drbd2: uuid_compare()=1 by rule 70<o:p></o:p></p>
<p class="MsoNormal">[ 221.523408] block drbd2: Becoming sync source due to disk states.<o:p></o:p></p>
<p class="MsoNormal">[ 221.530182] block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )<o:p></o:p></p>
<p class="MsoNormal">[ 221.553601] block drbd2: conn( WFBitMapS -> SyncSource )<o:p></o:p></p>
<p class="MsoNormal">[ 221.559566] block drbd2: Began resync as SyncSource (will sync 1695404 KB [423851 bits set]).<o:p></o:p></p>
<p class="MsoNormal">[ 224.252896] block drbd1: drbd_send_block() failed<o:p></o:p></p>
<p class="MsoNormal">[ 224.258048] block drbd1: peer( Secondary -> Unknown ) conn( SyncSource -> NetworkFailure )<o:p></o:p></p>
<p class="MsoNormal">[ 224.266887] block drbd1: drbd_pp_alloc interrupted!<o:p></o:p></p>
<p class="MsoNormal">[ 224.272236] block drbd1: alloc_ee: Allocation of a page failed<o:p></o:p></p>
<p class="MsoNormal">[ 224.278451] block drbd1: error receiving RSDataRequest, l: 24!<o:p></o:p></p>
<p class="MsoNormal">[ 224.284793] block drbd1: asender terminated<o:p></o:p></p>
<p class="MsoNormal">[ 224.289390] block drbd1: Terminating drbd1_asender<o:p></o:p></p>
<p class="MsoNormal">[ 224.289523] block drbd1: Connection closed<o:p></o:p></p>
<p class="MsoNormal">[ 224.289533] block drbd1: conn( NetworkFailure -> Unconnected )<o:p></o:p></p>
<p class="MsoNormal">[ 224.289540] block drbd1: receiver terminated<o:p></o:p></p>
<p class="MsoNormal">[ 224.289545] block drbd1: Restarting drbd1_receiver<o:p></o:p></p>
<p class="MsoNormal">[ 224.289551] block drbd1: receiver (re)started<o:p></o:p></p>
<p class="MsoNormal">[ 224.289558] block drbd1: conn( Unconnected -> WFConnection )<o:p></o:p></p>
<p class="MsoNormal">[ 224.659743] block drbd1: Handshake successful: Agreed network protocol version 91<o:p></o:p></p>
<p class="MsoNormal">[ 224.667906] block drbd1: conn( WFConnection -> WFReportParams )<o:p></o:p></p>
<p class="MsoNormal">[ 224.674475] block drbd1: Starting asender thread (from drbd1_receiver [4692])<o:p></o:p></p>
<p class="MsoNormal">[ 224.682466] block drbd1: data-integrity-alg: <not-used><o:p></o:p></p>
<p class="MsoNormal">[ 224.688127] block drbd1: drbd_sync_handshake:<o:p></o:p></p>
<p class="MsoNormal">[ 224.692940] block drbd1: self 3475127B7403BB0B:90A869D6EAB8005B:FD0FED39356C1BD2:8D44B59FE2BB68BF bits:1977979 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 224.704508] block drbd1: peer 90A869D6EAB8005A:0000000000000000:0000000000000000:0000000000000000 bits:1977979 flags:0<o:p></o:p></p>
<p class="MsoNormal">[ 224.716017] block drbd1: uuid_compare()=1 by rule 70<o:p></o:p></p>
<p class="MsoNormal">[ 224.721420] block drbd1: Becoming sync source due to disk states.<o:p></o:p></p>
<p class="MsoNormal">[ 224.728012] block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )<o:p></o:p></p>
<p class="MsoNormal">[ 224.776680] block drbd1: conn( WFBitMapS -> SyncSource )<o:p></o:p></p>
<p class="MsoNormal">[ 224.782791] block drbd1: Began resync as SyncSource (will sync 7911916 KB [1977979 bits set]).<o:p></o:p></p>
<p class="MsoNormal">[ 230.192821] IPv6 addrconf: prefix with wrong length 126<o:p></o:p></p>
<p class="MsoNormal">[ 237.015598] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 5<o:p></o:p></p>
<p class="MsoNormal">[ 239.595381] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 5<o:p></o:p></p>
<p class="MsoNormal">[ 243.020626] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 4<o:p></o:p></p>
<p class="MsoNormal">[ 245.599408] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 4<o:p></o:p></p>
<p class="MsoNormal">[ 249.024669] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 3<o:p></o:p></p>
<p class="MsoNormal">[ 251.604426] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 3<o:p></o:p></p>
<p class="MsoNormal">[ 255.029664] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 2<o:p></o:p></p>
<p class="MsoNormal">[ 257.609466] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 2<o:p></o:p></p>
<p class="MsoNormal">[ 261.034700] block drbd1: [drbd1_worker/4664] sock_sendmsg time expired, ko = 1<o:p></o:p></p>
<p class="MsoNormal">[ 263.614471] block drbd2: [drbd2_worker/4710] sock_sendmsg time expired, ko = 1<o:p></o:p></p>
<p class="MsoNormal">[ 267.039708] block drbd1: drbd_send_block() failed<o:p></o:p></p>
<p class="MsoNormal">[ 267.044821] block drbd1: peer( Secondary -> Unknown ) conn( SyncSource -> NetworkFailure )<o:p></o:p></p>
<p class="MsoNormal">[ 267.053725] block drbd1: drbd_pp_alloc interrupted!<o:p></o:p></p>
<p class="MsoNormal">[ 267.059179] block drbd1: alloc_ee: Allocation of a page failed<o:p></o:p></p>
<p class="MsoNormal">[ 267.065563] block drbd1: error receiving RSDataRequest, l: 24!<o:p></o:p></p>
<p class="MsoNormal">[ 267.071947] block drbd1: asender terminated<o:p></o:p></p>
<p class="MsoNormal">[ 267.076520] block drbd1: Terminating drbd1_asender<o:p></o:p></p>
<p class="MsoNormal">[ 267.076606] block drbd1: Connection closed<o:p></o:p></p>
<p class="MsoNormal">[ 267.076614] block drbd1: conn( NetworkFailure -> Unconnected )<o:p></o:p></p>
<p class="MsoNormal">[ 267.076619] block drbd1: receiver terminated<o:p></o:p></p>
<p class="MsoNormal">[ 267.076622] block drbd1: Restarting drbd1_receiver<o:p></o:p></p>
<p class="MsoNormal">[ 267.076627] block drbd1: receiver (re)started<o:p></o:p></p>
<p class="MsoNormal">[ 267.076634] block drbd1: conn( Unconnected -> WFConnection )<o:p></o:p></p>
<p class="MsoNormal">[ 267.446395] block drbd1: Handshake successful: Agreed network protocol version 91<o:p></o:p></p>
<p class="MsoNormal">[ 267.454619] block drbd1: conn( WFConnection -> WFReportParams )<o:p></o:p></p>
<p class="MsoNormal">[ 267.461377] block drbd1: Starting asender thread (from drbd1_receiver [4692])<o:p></o:p></p>
<p class="MsoNormal">[ 267.469379] block drbd1: data-integrity-alg: <not-used><o:p></o:p></p>
<p class="MsoNormal">[ 267.475246] block drbd1: drbd_sync_handshake:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Is this a bug in DRBD? Other TCP traffic works fine. <o:p></o:p></p>
<p class="MsoNormal">Is this timeout configurable from DRBD side or is it TCP configuration?<o:p></o:p></p>
<p class="MsoNormal">I ran iperf and got TCP BW from Primary to Secondary to be 1.15Gbps and 420Mbps from Standby to Primary.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Any help is appreciate,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
<p class="MsoNormal">Jacob<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>