<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Lars Ellenberg wrote:
<blockquote cite="mid:20090209223632.GB16024@barkeeper1-xen.linbit"
type="cite">
<pre wrap="">On Mon, Feb 09, 2009 at 01:04:37PM -0800, John Du wrote:
</pre>
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="">I still do not understand why iostat only shows DRBD devices on this
particular node with 8.2.7 and 8.3.0 but not other nodes with the
same hardware, same Linux Kernel and same DRBD version.
</pre>
</blockquote>
<pre wrap="">io stats accounting was introduced only in drbd-8.0.12 respective 8.2.6.
if you don't see drbd in iostats, you probably use an older DRBD version.
</pre>
</blockquote>
<pre wrap="">I obviously did not make myself clear. We were running 8.3 on six nodes
and only this node showed DRBD in iostat and only this node was having
the problem I reported. I reverted to 8.2 on this node to make our
production going.
</pre>
</blockquote>
<pre wrap=""><!---->
so you say
six nodes.
same hardware. same linux kernel. same drbd.
but ONE node behaves different.
pretty non-deterministic behaviour for software.
</pre>
</blockquote>
Yes. Everything is identical. Only this node works with 8.2 but not
8.3. I know it is hard to believe. It is hard for me to believe too.
Assume something is different on this node, what difference would make
DRBD 8.3 not work but 8.2 do? is that possible that 8.3 sees the meta
data differently than 8.2?<br>
<br>
According to your message, iostat should show DRBD with 8.3. But it
does not on all of the other five nodes.<br>
<blockquote cite="mid:20090209223632.GB16024@barkeeper1-xen.linbit"
type="cite">
<pre wrap="">I doubt I can help, as if that is true,
circumstantial evidence suggests that it has nothing to do with drbd,
but everything to do with whatever makes the non-behaving node behave
different.
though my guess is
that either these nodes are not all that identical as you think they are.
or you installed the new kernel module, but did not actually reload it.
</pre>
</blockquote>
The log from the problematic node is as follows: You can see it went
from 8.3.0 to 8.2.7 to 8.2.0. You cannot see the server was slow from
the log though. Trust me, it was very very slow. Also I ran the
different versions of DRBD with the same config file shown in my
original message.<br>
<br>
Feb 6 22:22:17 newimapn kernel: drbd: initialised. Version: 8.3.0
(api:88/proto:86-89)<br>
Feb 6 22:22:17 newimapn kernel: drbd: GIT-hash:
9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by root@newimapr,
2009-02-02 23:57:10<br>
Feb 6 22:22:17 newimapn kernel: drbd: registered as block device major
147<br>
Feb 6 22:22:17 newimapn kernel: drbd: minor_table @ 0xffff81021f0294c0<br>
Feb 6 22:22:17 newimapn kernel: drbd1: disk( Diskless -> Attaching
) <br>
Feb 6 22:22:17 newimapn kernel: drbd1: Starting worker thread (from
cqueue/3 [257])<br>
Feb 6 22:22:17 newimapn kernel: klogd 1.4.1, ---------- state change
---------- <br>
Feb 6 22:22:17 newimapn kernel: drbd1: Found 4 transactions (192
active extents) in activity log.<br>
Feb 6 22:22:17 newimapn kernel: drbd1: Method to ensure write
ordering: barrier<br>
Feb 6 22:22:17 newimapn kernel: drbd1: max_segment_size ( = BIO size )
= 32768<br>
Feb 6 22:22:17 newimapn kernel: drbd1: drbd_bm_resize called with
capacity == 2571204968<br>
Feb 6 22:22:17 newimapn kernel: drbd1: resync bitmap: bits=321400621
words=5021885<br>
Feb 6 22:22:17 newimapn kernel: drbd1: size = 1226 GB (1285602484 KB)<br>
Feb 6 22:22:17 newimapn kernel: drbd1: recounting of set bits took
additional 43 jiffies<br>
Feb 6 22:22:17 newimapn kernel: drbd1: 148 KB (37 bits) marked
out-of-sync by on disk bit-map.<br>
Feb 6 22:22:17 newimapn kernel: drbd1: disk( Attaching -> UpToDate
) <br>
Feb 6 22:22:17 newimapn kernel: drbd1: conn( StandAlone ->
Unconnected ) <br>
Feb 6 22:22:17 newimapn kernel: drbd1: Starting receiver thread (from
drbd1_worker [5507])<br>
Feb 6 22:22:17 newimapn kernel: drbd1: receiver (re)started<br>
Feb 6 22:22:17 newimapn kernel: drbd1: conn( Unconnected ->
WFConnection ) <br>
Feb 6 22:22:52 newimapn kernel: drbd1: role( Secondary -> Primary )
<br>
Feb 6 22:22:53 newimapn kernel: kjournald starting. Commit interval 5
seconds<br>
Feb 6 22:22:53 newimapn kernel: EXT3-fs warning: maximal mount count
reached, running e2fsck is recommended<br>
Feb 6 22:22:53 newimapn kernel: EXT3 FS on drbd1, internal journal<br>
Feb 6 22:22:53 newimapn kernel: EXT3-fs: mounted filesystem with
ordered data mode.<br>
Feb 6 22:23:07 newimapn avahi-daemon[4967]: Withdrawing address record
for 10.100.2.239 on eth0.<br>
Feb 6 22:23:07 newimapn avahi-daemon[4967]: Leaving mDNS multicast
group on interface eth0.IPv4 with address 10.100.2.239.<br>
Feb 6 22:23:07 newimapn avahi-daemon[4967]: Joining mDNS multicast
group on interface eth0.IPv4 with address 10.100.2.232.<br>
Feb 6 22:23:09 newimapn kernel: drbd1: role( Primary -> Secondary )
<br>
Feb 6 22:24:45 newimapn kernel: drbd1: role( Secondary -> Primary )
<br>
Feb 6 22:24:45 newimapn kernel: kjournald starting. Commit interval 5
seconds<br>
Feb 6 22:24:45 newimapn kernel: EXT3-fs warning: maximal mount count
reached, running e2fsck is recommended<br>
Feb 6 22:24:45 newimapn kernel: EXT3 FS on drbd1, internal journal<br>
Feb 6 22:24:45 newimapn kernel: EXT3-fs: mounted filesystem with
ordered data mode.<br>
Feb 6 22:24:45 newimapn avahi-daemon[4967]: Registering new address
record for 10.100.2.239 on eth0.<br>
Feb 6 22:24:45 newimapn avahi-daemon[4967]: Withdrawing address record
for 10.100.2.239 on eth0.<br>
Feb 6 22:24:45 newimapn avahi-daemon[4967]: Registering new address
record for 10.100.2.239 on eth0.<br>
Feb 6 22:25:01 newimapn ntpd[4078]: synchronized to 10.100.2.249,
stratum 2<br>
Feb 6 22:25:03 newimapn avahi-daemon[4967]: Withdrawing address record
for 10.100.2.239 on eth0.<br>
Feb 6 22:25:04 newimapn kernel: drbd1: role( Primary -> Secondary )
<br>
Feb 6 22:26:08 newimapn kernel: drbd1: role( Secondary -> Primary )
<br>
Feb 6 22:26:58 newimapn kernel: kjournald starting. Commit interval 5
seconds<br>
Feb 6 22:26:58 newimapn kernel: EXT3-fs warning: maximal mount count
reached, running e2fsck is recommended<br>
Feb 6 22:26:58 newimapn kernel: EXT3 FS on drbd1, internal journal<br>
Feb 6 22:26:58 newimapn kernel: EXT3-fs: mounted filesystem with
ordered data mode.<br>
Feb 6 22:28:15 newimapn ntpd[4078]: synchronized to 70.86.250.6,
stratum 2<br>
Feb 6 22:28:18 newimapn ntpd[4078]: synchronized to 63.240.161.99,
stratum 2<br>
Feb 6 22:29:11 newimapn avahi-daemon[4967]: Registering new address
record for 10.100.2.239 on eth0.<br>
Feb 6 22:29:11 newimapn avahi-daemon[4967]: Withdrawing address record
for 10.100.2.239 on eth0.<br>
Feb 6 22:29:11 newimapn avahi-daemon[4967]: Registering new address
record for 10.100.2.239 on eth0.<br>
Feb 6 22:31:24 newimapn ntpd[4078]: synchronized to 70.86.250.6,
stratum 2<br>
Feb 6 22:36:43 newimapn ntpd[4078]: synchronized to 64.247.17.251,
stratum 2<br>
Feb 6 22:42:35 newimapn httpd: nss_ldap: reconnected to LDAP server
<a class="moz-txt-link-freetext" href="ldap://ldap">ldap://ldap</a> after 1 attempt<br>
Feb 6 22:43:47 newimapn httpd: nss_ldap: reconnected to LDAP server
<a class="moz-txt-link-freetext" href="ldap://ldap">ldap://ldap</a> after 1 attempt<br>
Feb 6 22:45:46 newimapn httpd: nss_ldap: reconnected to LDAP server
<a class="moz-txt-link-freetext" href="ldap://ldap">ldap://ldap</a> after 1 attempt<br>
Feb 6 22:48:18 newimapn avahi-daemon[4967]: Withdrawing address record
for 10.100.2.239 on eth0.<br>
Feb 6 22:48:19 newimapn kernel: drbd1: role( Primary -> Secondary )
<br>
Feb 6 22:48:47 newimapn kernel: drbd1: conn( WFConnection ->
Disconnecting ) <br>
Feb 6 22:48:47 newimapn kernel: drbd1: Discarding network
configuration.<br>
Feb 6 22:48:47 newimapn kernel: drbd1: Connection closed<br>
Feb 6 22:48:47 newimapn kernel: drbd1: conn( Disconnecting ->
StandAlone ) <br>
Feb 6 22:48:47 newimapn kernel: drbd1: receiver terminated<br>
Feb 6 22:48:47 newimapn kernel: drbd1: Terminating receiver thread<br>
Feb 6 22:48:47 newimapn kernel: drbd1: disk( UpToDate -> Diskless )
<br>
Feb 6 22:48:47 newimapn kernel: drbd1: drbd_bm_resize called with
capacity == 0<br>
Feb 6 22:48:47 newimapn kernel: drbd1: worker terminated<br>
Feb 6 22:48:47 newimapn kernel: drbd1: Terminating worker thread<br>
Feb 6 22:48:47 newimapn kernel: drbd: module cleanup done.<br>
Feb 6 22:51:31 newimapn kernel: drbd: initialised. Version: 8.2.7
(api:88/proto:86-88)<br>
Feb 6 22:51:31 newimapn kernel: drbd: GIT-hash:
61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d build by root@newimapn,
2009-02-06 22:33:19<br>
Feb 6 22:51:31 newimapn kernel: drbd: registered as block device major
147<br>
Feb 6 22:51:31 newimapn kernel: drbd: minor_table @ 0xffff81023c700480<br>
Feb 6 22:51:31 newimapn kernel: drbd1: disk( Diskless -> Attaching
) <br>
Feb 6 22:51:31 newimapn kernel: drbd1: Starting worker thread (from
cqueue/5 [259])<br>
Feb 6 22:51:31 newimapn kernel: klogd 1.4.1, ---------- state change
---------- <br>
Feb 6 22:51:31 newimapn kernel: drbd1: Found 4 transactions (192
active extents) in activity log.<br>
Feb 6 22:51:31 newimapn kernel: drbd1: Method to ensure write
ordering: barrier<br>
Feb 6 22:51:31 newimapn kernel: drbd1: max_segment_size ( = BIO size )
= 32768<br>
Feb 6 22:51:31 newimapn kernel: drbd1: drbd_bm_resize called with
capacity == 2571204968<br>
Feb 6 22:51:31 newimapn kernel: drbd1: resync bitmap: bits=321400621
words=5021885<br>
Feb 6 22:51:31 newimapn kernel: drbd1: size = 1226 GB (1285602484 KB)<br>
Feb 6 22:51:31 newimapn kernel: drbd1: recounting of set bits took
additional 40 jiffies<br>
Feb 6 22:51:31 newimapn kernel: drbd1: 10 MB (2684 bits) marked
out-of-sync by on disk bit-map.<br>
Feb 6 22:51:31 newimapn kernel: drbd1: disk( Attaching -> UpToDate
) <br>
Feb 6 22:51:31 newimapn kernel: drbd1: conn( StandAlone ->
Unconnected ) <br>
Feb 6 22:51:31 newimapn kernel: drbd1: Starting receiver thread (from
drbd1_worker [7544])<br>
Feb 6 22:51:31 newimapn kernel: drbd1: receiver (re)started<br>
Feb 6 22:51:31 newimapn kernel: drbd1: conn( Unconnected ->
WFConnection ) <br>
Feb 6 22:52:19 newimapn kernel: drbd1: role( Secondary -> Primary )
<br>
Feb 6 22:52:19 newimapn kernel: kjournald starting. Commit interval 5
seconds<br>
Feb 6 22:52:19 newimapn kernel: EXT3-fs warning: maximal mount count
reached, running e2fsck is recommended<br>
Feb 6 22:52:19 newimapn kernel: EXT3 FS on drbd1, internal journal<br>
Feb 6 22:52:19 newimapn kernel: EXT3-fs: mounted filesystem with
ordered data mode.<br>
Feb 6 22:52:20 newimapn avahi-daemon[4967]: Registering new address
record for 10.100.2.239 on eth0.<br>
Feb 6 22:52:20 newimapn avahi-daemon[4967]: Withdrawing address record
for 10.100.2.239 on eth0.<br>
Feb 6 22:52:20 newimapn avahi-daemon[4967]: Registering new address
record for 10.100.2.239 on eth0.<br>
Feb 6 22:57:07 newimapn ntpd[4078]: synchronized to 70.86.250.6,
stratum 2<br>
Feb 6 23:00:13 newimapn avahi-daemon[4967]: Withdrawing address record
for 10.100.2.239 on eth0.<br>
Feb 6 23:00:15 newimapn kernel: drbd1: role( Primary -> Secondary )
<br>
Feb 6 23:01:01 newimapn kernel: drbd1: conn( WFConnection ->
Disconnecting ) <br>
Feb 6 23:01:01 newimapn kernel: drbd1: Discarding network
configuration.<br>
Feb 6 23:01:01 newimapn kernel: drbd1: Connection closed<br>
Feb 6 23:01:01 newimapn kernel: drbd1: conn( Disconnecting ->
StandAlone ) <br>
Feb 6 23:01:01 newimapn kernel: drbd1: receiver terminated<br>
Feb 6 23:01:01 newimapn kernel: drbd1: Terminating receiver thread<br>
Feb 6 23:01:01 newimapn kernel: drbd1: disk( UpToDate -> Diskless )
<br>
Feb 6 23:01:01 newimapn kernel: drbd1: drbd_bm_resize called with
capacity == 0<br>
Feb 6 23:01:01 newimapn kernel: drbd1: worker terminated<br>
Feb 6 23:01:01 newimapn kernel: drbd1: Terminating worker thread<br>
Feb 6 23:01:01 newimapn kernel: drbd: module cleanup done.<br>
Feb 6 23:04:01 newimapn kernel: drbd: initialised. Version: 8.2.0
(api:86/proto:86-87)<br>
Feb 6 23:04:01 newimapn kernel: drbd: SVN Revision: 3079 build by
root@newimapn, 2009-02-06 22:58:05<br>
Feb 6 23:04:01 newimapn kernel: drbd: registered as block device major
147<br>
Feb 6 23:04:01 newimapn kernel: drbd: minor_table @ 0xffff81023c700c80<br>
Feb 6 23:04:01 newimapn kernel: drbd1: disk( Diskless -> Attaching
) <br>
Feb 6 23:04:01 newimapn kernel: klogd 1.4.1, ---------- state change
---------- <br>
Feb 6 23:04:01 newimapn kernel: drbd1: Found 4 transactions (52 active
extents) in activity log.<br>
Feb 6 23:04:01 newimapn kernel: drbd1: max_segment_size ( = BIO size )
= 32768<br>
Feb 6 23:04:01 newimapn kernel: drbd1: drbd_bm_resize called with
capacity == 2571204968<br>
Feb 6 23:04:01 newimapn kernel: drbd1: resync bitmap: bits=321400621
words=5021885<br>
Feb 6 23:04:01 newimapn kernel: drbd1: size = 1226 GB (1285602484 KB)<br>
Feb 6 23:04:02 newimapn kernel: drbd1: reading of bitmap took 198
jiffies<br>
Feb 6 23:04:02 newimapn kernel: drbd1: recounting of set bits took
additional 39 jiffies<br>
Feb 6 23:04:02 newimapn kernel: drbd1: 11 MB marked out-of-sync by on
disk bit-map.<br>
Feb 6 23:04:02 newimapn kernel: drbd1: disk( Attaching -> UpToDate
) <br>
Feb 6 23:04:02 newimapn kernel: drbd1: Writing meta data super block
now.<br>
Feb 6 23:04:02 newimapn kernel: drbd1: conn( StandAlone ->
Unconnected ) <br>
Feb 6 23:04:02 newimapn kernel: drbd1: receiver (re)started<br>
Feb 6 23:04:02 newimapn kernel: drbd1: conn( Unconnected ->
WFConnection ) <br>
Feb 6 23:04:28 newimapn ntpd[4078]: synchronized to 63.240.161.99,
stratum 2<br>
Feb 6 23:04:46 newimapn kernel: drbd1: role( Secondary -> Primary )
<br>
Feb 6 23:04:46 newimapn kernel: drbd1: Writing meta data super block
now.<br>
Feb 6 23:04:46 newimapn kernel: kjournald starting. Commit interval 5
seconds<br>
Feb 6 23:04:46 newimapn kernel: EXT3-fs warning: maximal mount count
reached, running e2fsck is recommended<br>
Feb 6 23:04:46 newimapn kernel: EXT3 FS on drbd1, internal journal<br>
Feb 6 23:04:46 newimapn kernel: EXT3-fs: mounted filesystem with
ordered data mode.<br>
Feb 6 23:04:46 newimapn avahi-daemon[4967]: Registering new address
record for 10.100.2.239 on eth0.<br>
Feb 6 23:04:46 newimapn avahi-daemon[4967]: Withdrawing address record
for 10.100.2.239 on eth0.<br>
Feb 6 23:04:46 newimapn avahi-daemon[4967]: Registering new address
record for 10.100.2.239 on eth0.<br>
Feb 6 23:13:23 newimapn ntpd[4078]: synchronized to 70.86.250.6,
stratum 2<br>
Feb 6 23:16:24 newimapn kernel: drbd1: conn( WFConnection ->
WFReportParams ) <br>
Feb 6 23:16:24 newimapn kernel: drbd1: Handshake successful: Agreed
network protocol version 87<br>
Feb 6 23:16:24 newimapn kernel: drbd1: data-integrity-alg: <br>
Feb 6 23:16:24 newimapn kernel: drbd1: peer( Unknown -> Secondary )
conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) <br>
Feb 6 23:16:43 newimapn kernel: drbd1: Writing meta data super block
now.<br>
Feb 6 23:16:43 newimapn kernel: drbd1: BUG! md_sync_timer expired!
Worker calls drbd_md_sync().<br>
Feb 6 23:16:54 newimapn kernel: drbd1: conn( WFBitMapS ->
SyncSource ) pdsk( UpToDate -> Inconsistent ) <br>
Feb 6 23:16:54 newimapn kernel: drbd1: Began resync as SyncSource
(will sync 13052 KB [3263 bits set]).<br>
Feb 6 23:16:54 newimapn kernel: drbd1: Writing meta data super block
now.<br>
Feb 6 23:17:12 newimapn kernel: drbd1: Resync done (total 18 sec;
paused 0 sec; 724 K/sec)<br>
Feb 6 23:17:12 newimapn kernel: drbd1: conn( SyncSource ->
Connected ) pdsk( Inconsistent -> UpToDate ) <br>
Feb 6 23:17:12 newimapn kernel: drbd1: Writing meta data super block
now.<br>
Feb 6 23:18:12 newimapn kernel: drbd1: peer( Secondary -> Unknown )
conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) <br>
Feb 6 23:18:12 newimapn kernel: drbd1: Creating new current UUID<br>
Feb 6 23:18:12 newimapn kernel: drbd1: Writing meta data super block
now.<br>
Feb 6 23:18:12 newimapn kernel: drbd1: asender terminated<br>
Feb 6 23:18:12 newimapn kernel: drbd1: tl_clear()<br>
Feb 6 23:18:12 newimapn kernel: drbd1: Connection closed<br>
Feb 6 23:18:12 newimapn kernel: drbd1: conn( TearDown ->
Unconnected ) <br>
Feb 6 23:18:12 newimapn kernel: drbd1: receiver terminated<br>
Feb 6 23:18:12 newimapn kernel: drbd1: receiver (re)started<br>
Feb 6 23:18:12 newimapn kernel: drbd1: conn( Unconnected ->
WFConnection ) <br>
<br>
<br>
<br>
<br>
<blockquote cite="mid:20090209223632.GB16024@barkeeper1-xen.linbit"
type="cite">
<pre wrap="">if you still have the kernel logs, double check whether you find the
"drbd: initialised. Version 8.3.0 ..." line.
if not, you never loaded nor used nor benchmarked against 8.3.0.
</pre>
</blockquote>
<br>
</body>
</html>