version: 8.3.11 (api:88/proto:86-96)<div>May 10 21:11:55 scale-192-168-54-14 kernel: block drbd1: [drbd1_worker/8936] sock_sendmsg time expired, ko = 4294967295</div><div>... (long countdown that will never get anywhere)</div>
<div>This is a "dual primary" setup (underneath GPFS) over a failover-bonded network interface.</div><div>Everything works fine (read/write/reboot/etc) until I attempt a verify.</div><div><br></div><div>My configuration has no reference to ko-count, which from the documentation suggests it should be 0 and be disabled. Does the documentation actually intend to say that the default is 2^32?</div>
<div>I'm building/running this all on a clone of RHEL6.2.</div><div><br></div><div>This is occurring during an attempt to 'verify' a dual primary DRBD device. Originally I received this message on every attempt at verify, but after I reduced syncer { rate }, this message only props up after a few iterations. There is no network/connectivity problem during this time period, yet drbd commands hang such as:</div>
<div><br></div><div>strace -f drbdsetup 1 disconnect --force</div><div>...</div><div><div>stat("/proc/drbd", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0</div><div>open("/var/lock/drbd-147-1", O_RDWR|O_CREAT, 0600) = 3</div>
<div>rt_sigaction(SIGALRM, {0x406b30, [], SA_RESTORER, 0x3935232900}, {SIG_DFL, [], 0}, 8) = 0</div><div>alarm(1) = 0</div><div>fcntl(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0</div>
<div>alarm(0) = 1</div><div>rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, 0x3935232900}, NULL, 8) = 0</div><div>socket(PF_NETLINK, SOCK_DGRAM, 11) = 4</div><div>getpid() = 13360</div>
<div>bind(4, {sa_family=AF_NETLINK, pid=13360, groups=ffffffff}, 12) = 0</div><div>sendto(4, "9\0\0\0\3\0\0\0\1\0\0\00004\0\0\4\0\0\0\1\0\0\0\1\0\0\00004\0\0"..., 57, 0, NULL, 0) = 57</div><div>poll([{fd=4, events=POLLIN}], 1, 120000</div>
</div><div><< this is where it hangs and exits after a terminate (ctrl-c) >></div><div>All that's going on in the dmesg output is sock_sendmsg expiration reports.</div><div><br></div><div><br></div><div>The documentation here also would be better if <i>count </i>and <i>number</i> were consistent (either 'count' or 'number').</div>
<div>
<blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">ko-count <i>number<br>
</i>In case the secondary node fails to complete a single write request for <span class="s1"><i>count</i></span> times the <span class="s1"><i>timeout</i></span>, it is expelled from the cluster. (I.e. the primary node goes into <span class="s1">StandAlone</span> mode.) The default value is 0, which disables this feature.</blockquote>
<p class="p2"><br></p></div>