Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote: > On Mon, Feb 09, 2009 at 01:04:37PM -0800, John Du wrote: > >>>> I still do not understand why iostat only shows DRBD devices on this >>>> particular node with 8.2.7 and 8.3.0 but not other nodes with the >>>> same hardware, same Linux Kernel and same DRBD version. >>>> >>>> >>> io stats accounting was introduced only in drbd-8.0.12 respective 8.2.6. >>> if you don't see drbd in iostats, you probably use an older DRBD version. >>> >>> >>> >> I obviously did not make myself clear. We were running 8.3 on six nodes >> and only this node showed DRBD in iostat and only this node was having >> the problem I reported. I reverted to 8.2 on this node to make our >> production going. >> > > so you say > six nodes. > same hardware. same linux kernel. same drbd. > but ONE node behaves different. > > pretty non-deterministic behaviour for software. > > Yes. Everything is identical. Only this node works with 8.2 but not 8.3. I know it is hard to believe. It is hard for me to believe too. Assume something is different on this node, what difference would make DRBD 8.3 not work but 8.2 do? is that possible that 8.3 sees the meta data differently than 8.2? According to your message, iostat should show DRBD with 8.3. But it does not on all of the other five nodes. > I doubt I can help, as if that is true, > circumstantial evidence suggests that it has nothing to do with drbd, > but everything to do with whatever makes the non-behaving node behave > different. > > though my guess is > that either these nodes are not all that identical as you think they are. > or you installed the new kernel module, but did not actually reload it. > > The log from the problematic node is as follows: You can see it went from 8.3.0 to 8.2.7 to 8.2.0. You cannot see the server was slow from the log though. Trust me, it was very very slow. Also I ran the different versions of DRBD with the same config file shown in my original message. Feb 6 22:22:17 newimapn kernel: drbd: initialised. Version: 8.3.0 (api:88/proto:86-89) Feb 6 22:22:17 newimapn kernel: drbd: GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by root at newimapr, 2009-02-02 23:57:10 Feb 6 22:22:17 newimapn kernel: drbd: registered as block device major 147 Feb 6 22:22:17 newimapn kernel: drbd: minor_table @ 0xffff81021f0294c0 Feb 6 22:22:17 newimapn kernel: drbd1: disk( Diskless -> Attaching ) Feb 6 22:22:17 newimapn kernel: drbd1: Starting worker thread (from cqueue/3 [257]) Feb 6 22:22:17 newimapn kernel: klogd 1.4.1, ---------- state change ---------- Feb 6 22:22:17 newimapn kernel: drbd1: Found 4 transactions (192 active extents) in activity log. Feb 6 22:22:17 newimapn kernel: drbd1: Method to ensure write ordering: barrier Feb 6 22:22:17 newimapn kernel: drbd1: max_segment_size ( = BIO size ) = 32768 Feb 6 22:22:17 newimapn kernel: drbd1: drbd_bm_resize called with capacity == 2571204968 Feb 6 22:22:17 newimapn kernel: drbd1: resync bitmap: bits=321400621 words=5021885 Feb 6 22:22:17 newimapn kernel: drbd1: size = 1226 GB (1285602484 KB) Feb 6 22:22:17 newimapn kernel: drbd1: recounting of set bits took additional 43 jiffies Feb 6 22:22:17 newimapn kernel: drbd1: 148 KB (37 bits) marked out-of-sync by on disk bit-map. Feb 6 22:22:17 newimapn kernel: drbd1: disk( Attaching -> UpToDate ) Feb 6 22:22:17 newimapn kernel: drbd1: conn( StandAlone -> Unconnected ) Feb 6 22:22:17 newimapn kernel: drbd1: Starting receiver thread (from drbd1_worker [5507]) Feb 6 22:22:17 newimapn kernel: drbd1: receiver (re)started Feb 6 22:22:17 newimapn kernel: drbd1: conn( Unconnected -> WFConnection ) Feb 6 22:22:52 newimapn kernel: drbd1: role( Secondary -> Primary ) Feb 6 22:22:53 newimapn kernel: kjournald starting. Commit interval 5 seconds Feb 6 22:22:53 newimapn kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Feb 6 22:22:53 newimapn kernel: EXT3 FS on drbd1, internal journal Feb 6 22:22:53 newimapn kernel: EXT3-fs: mounted filesystem with ordered data mode. Feb 6 22:23:07 newimapn avahi-daemon[4967]: Withdrawing address record for 10.100.2.239 on eth0. Feb 6 22:23:07 newimapn avahi-daemon[4967]: Leaving mDNS multicast group on interface eth0.IPv4 with address 10.100.2.239. Feb 6 22:23:07 newimapn avahi-daemon[4967]: Joining mDNS multicast group on interface eth0.IPv4 with address 10.100.2.232. Feb 6 22:23:09 newimapn kernel: drbd1: role( Primary -> Secondary ) Feb 6 22:24:45 newimapn kernel: drbd1: role( Secondary -> Primary ) Feb 6 22:24:45 newimapn kernel: kjournald starting. Commit interval 5 seconds Feb 6 22:24:45 newimapn kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Feb 6 22:24:45 newimapn kernel: EXT3 FS on drbd1, internal journal Feb 6 22:24:45 newimapn kernel: EXT3-fs: mounted filesystem with ordered data mode. Feb 6 22:24:45 newimapn avahi-daemon[4967]: Registering new address record for 10.100.2.239 on eth0. Feb 6 22:24:45 newimapn avahi-daemon[4967]: Withdrawing address record for 10.100.2.239 on eth0. Feb 6 22:24:45 newimapn avahi-daemon[4967]: Registering new address record for 10.100.2.239 on eth0. Feb 6 22:25:01 newimapn ntpd[4078]: synchronized to 10.100.2.249, stratum 2 Feb 6 22:25:03 newimapn avahi-daemon[4967]: Withdrawing address record for 10.100.2.239 on eth0. Feb 6 22:25:04 newimapn kernel: drbd1: role( Primary -> Secondary ) Feb 6 22:26:08 newimapn kernel: drbd1: role( Secondary -> Primary ) Feb 6 22:26:58 newimapn kernel: kjournald starting. Commit interval 5 seconds Feb 6 22:26:58 newimapn kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Feb 6 22:26:58 newimapn kernel: EXT3 FS on drbd1, internal journal Feb 6 22:26:58 newimapn kernel: EXT3-fs: mounted filesystem with ordered data mode. Feb 6 22:28:15 newimapn ntpd[4078]: synchronized to 70.86.250.6, stratum 2 Feb 6 22:28:18 newimapn ntpd[4078]: synchronized to 63.240.161.99, stratum 2 Feb 6 22:29:11 newimapn avahi-daemon[4967]: Registering new address record for 10.100.2.239 on eth0. Feb 6 22:29:11 newimapn avahi-daemon[4967]: Withdrawing address record for 10.100.2.239 on eth0. Feb 6 22:29:11 newimapn avahi-daemon[4967]: Registering new address record for 10.100.2.239 on eth0. Feb 6 22:31:24 newimapn ntpd[4078]: synchronized to 70.86.250.6, stratum 2 Feb 6 22:36:43 newimapn ntpd[4078]: synchronized to 64.247.17.251, stratum 2 Feb 6 22:42:35 newimapn httpd: nss_ldap: reconnected to LDAP server ldap://ldap after 1 attempt Feb 6 22:43:47 newimapn httpd: nss_ldap: reconnected to LDAP server ldap://ldap after 1 attempt Feb 6 22:45:46 newimapn httpd: nss_ldap: reconnected to LDAP server ldap://ldap after 1 attempt Feb 6 22:48:18 newimapn avahi-daemon[4967]: Withdrawing address record for 10.100.2.239 on eth0. Feb 6 22:48:19 newimapn kernel: drbd1: role( Primary -> Secondary ) Feb 6 22:48:47 newimapn kernel: drbd1: conn( WFConnection -> Disconnecting ) Feb 6 22:48:47 newimapn kernel: drbd1: Discarding network configuration. Feb 6 22:48:47 newimapn kernel: drbd1: Connection closed Feb 6 22:48:47 newimapn kernel: drbd1: conn( Disconnecting -> StandAlone ) Feb 6 22:48:47 newimapn kernel: drbd1: receiver terminated Feb 6 22:48:47 newimapn kernel: drbd1: Terminating receiver thread Feb 6 22:48:47 newimapn kernel: drbd1: disk( UpToDate -> Diskless ) Feb 6 22:48:47 newimapn kernel: drbd1: drbd_bm_resize called with capacity == 0 Feb 6 22:48:47 newimapn kernel: drbd1: worker terminated Feb 6 22:48:47 newimapn kernel: drbd1: Terminating worker thread Feb 6 22:48:47 newimapn kernel: drbd: module cleanup done. Feb 6 22:51:31 newimapn kernel: drbd: initialised. Version: 8.2.7 (api:88/proto:86-88) Feb 6 22:51:31 newimapn kernel: drbd: GIT-hash: 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d build by root at newimapn, 2009-02-06 22:33:19 Feb 6 22:51:31 newimapn kernel: drbd: registered as block device major 147 Feb 6 22:51:31 newimapn kernel: drbd: minor_table @ 0xffff81023c700480 Feb 6 22:51:31 newimapn kernel: drbd1: disk( Diskless -> Attaching ) Feb 6 22:51:31 newimapn kernel: drbd1: Starting worker thread (from cqueue/5 [259]) Feb 6 22:51:31 newimapn kernel: klogd 1.4.1, ---------- state change ---------- Feb 6 22:51:31 newimapn kernel: drbd1: Found 4 transactions (192 active extents) in activity log. Feb 6 22:51:31 newimapn kernel: drbd1: Method to ensure write ordering: barrier Feb 6 22:51:31 newimapn kernel: drbd1: max_segment_size ( = BIO size ) = 32768 Feb 6 22:51:31 newimapn kernel: drbd1: drbd_bm_resize called with capacity == 2571204968 Feb 6 22:51:31 newimapn kernel: drbd1: resync bitmap: bits=321400621 words=5021885 Feb 6 22:51:31 newimapn kernel: drbd1: size = 1226 GB (1285602484 KB) Feb 6 22:51:31 newimapn kernel: drbd1: recounting of set bits took additional 40 jiffies Feb 6 22:51:31 newimapn kernel: drbd1: 10 MB (2684 bits) marked out-of-sync by on disk bit-map. Feb 6 22:51:31 newimapn kernel: drbd1: disk( Attaching -> UpToDate ) Feb 6 22:51:31 newimapn kernel: drbd1: conn( StandAlone -> Unconnected ) Feb 6 22:51:31 newimapn kernel: drbd1: Starting receiver thread (from drbd1_worker [7544]) Feb 6 22:51:31 newimapn kernel: drbd1: receiver (re)started Feb 6 22:51:31 newimapn kernel: drbd1: conn( Unconnected -> WFConnection ) Feb 6 22:52:19 newimapn kernel: drbd1: role( Secondary -> Primary ) Feb 6 22:52:19 newimapn kernel: kjournald starting. Commit interval 5 seconds Feb 6 22:52:19 newimapn kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Feb 6 22:52:19 newimapn kernel: EXT3 FS on drbd1, internal journal Feb 6 22:52:19 newimapn kernel: EXT3-fs: mounted filesystem with ordered data mode. Feb 6 22:52:20 newimapn avahi-daemon[4967]: Registering new address record for 10.100.2.239 on eth0. Feb 6 22:52:20 newimapn avahi-daemon[4967]: Withdrawing address record for 10.100.2.239 on eth0. Feb 6 22:52:20 newimapn avahi-daemon[4967]: Registering new address record for 10.100.2.239 on eth0. Feb 6 22:57:07 newimapn ntpd[4078]: synchronized to 70.86.250.6, stratum 2 Feb 6 23:00:13 newimapn avahi-daemon[4967]: Withdrawing address record for 10.100.2.239 on eth0. Feb 6 23:00:15 newimapn kernel: drbd1: role( Primary -> Secondary ) Feb 6 23:01:01 newimapn kernel: drbd1: conn( WFConnection -> Disconnecting ) Feb 6 23:01:01 newimapn kernel: drbd1: Discarding network configuration. Feb 6 23:01:01 newimapn kernel: drbd1: Connection closed Feb 6 23:01:01 newimapn kernel: drbd1: conn( Disconnecting -> StandAlone ) Feb 6 23:01:01 newimapn kernel: drbd1: receiver terminated Feb 6 23:01:01 newimapn kernel: drbd1: Terminating receiver thread Feb 6 23:01:01 newimapn kernel: drbd1: disk( UpToDate -> Diskless ) Feb 6 23:01:01 newimapn kernel: drbd1: drbd_bm_resize called with capacity == 0 Feb 6 23:01:01 newimapn kernel: drbd1: worker terminated Feb 6 23:01:01 newimapn kernel: drbd1: Terminating worker thread Feb 6 23:01:01 newimapn kernel: drbd: module cleanup done. Feb 6 23:04:01 newimapn kernel: drbd: initialised. Version: 8.2.0 (api:86/proto:86-87) Feb 6 23:04:01 newimapn kernel: drbd: SVN Revision: 3079 build by root at newimapn, 2009-02-06 22:58:05 Feb 6 23:04:01 newimapn kernel: drbd: registered as block device major 147 Feb 6 23:04:01 newimapn kernel: drbd: minor_table @ 0xffff81023c700c80 Feb 6 23:04:01 newimapn kernel: drbd1: disk( Diskless -> Attaching ) Feb 6 23:04:01 newimapn kernel: klogd 1.4.1, ---------- state change ---------- Feb 6 23:04:01 newimapn kernel: drbd1: Found 4 transactions (52 active extents) in activity log. Feb 6 23:04:01 newimapn kernel: drbd1: max_segment_size ( = BIO size ) = 32768 Feb 6 23:04:01 newimapn kernel: drbd1: drbd_bm_resize called with capacity == 2571204968 Feb 6 23:04:01 newimapn kernel: drbd1: resync bitmap: bits=321400621 words=5021885 Feb 6 23:04:01 newimapn kernel: drbd1: size = 1226 GB (1285602484 KB) Feb 6 23:04:02 newimapn kernel: drbd1: reading of bitmap took 198 jiffies Feb 6 23:04:02 newimapn kernel: drbd1: recounting of set bits took additional 39 jiffies Feb 6 23:04:02 newimapn kernel: drbd1: 11 MB marked out-of-sync by on disk bit-map. Feb 6 23:04:02 newimapn kernel: drbd1: disk( Attaching -> UpToDate ) Feb 6 23:04:02 newimapn kernel: drbd1: Writing meta data super block now. Feb 6 23:04:02 newimapn kernel: drbd1: conn( StandAlone -> Unconnected ) Feb 6 23:04:02 newimapn kernel: drbd1: receiver (re)started Feb 6 23:04:02 newimapn kernel: drbd1: conn( Unconnected -> WFConnection ) Feb 6 23:04:28 newimapn ntpd[4078]: synchronized to 63.240.161.99, stratum 2 Feb 6 23:04:46 newimapn kernel: drbd1: role( Secondary -> Primary ) Feb 6 23:04:46 newimapn kernel: drbd1: Writing meta data super block now. Feb 6 23:04:46 newimapn kernel: kjournald starting. Commit interval 5 seconds Feb 6 23:04:46 newimapn kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended Feb 6 23:04:46 newimapn kernel: EXT3 FS on drbd1, internal journal Feb 6 23:04:46 newimapn kernel: EXT3-fs: mounted filesystem with ordered data mode. Feb 6 23:04:46 newimapn avahi-daemon[4967]: Registering new address record for 10.100.2.239 on eth0. Feb 6 23:04:46 newimapn avahi-daemon[4967]: Withdrawing address record for 10.100.2.239 on eth0. Feb 6 23:04:46 newimapn avahi-daemon[4967]: Registering new address record for 10.100.2.239 on eth0. Feb 6 23:13:23 newimapn ntpd[4078]: synchronized to 70.86.250.6, stratum 2 Feb 6 23:16:24 newimapn kernel: drbd1: conn( WFConnection -> WFReportParams ) Feb 6 23:16:24 newimapn kernel: drbd1: Handshake successful: Agreed network protocol version 87 Feb 6 23:16:24 newimapn kernel: drbd1: data-integrity-alg: Feb 6 23:16:24 newimapn kernel: drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Feb 6 23:16:43 newimapn kernel: drbd1: Writing meta data super block now. Feb 6 23:16:43 newimapn kernel: drbd1: BUG! md_sync_timer expired! Worker calls drbd_md_sync(). Feb 6 23:16:54 newimapn kernel: drbd1: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Feb 6 23:16:54 newimapn kernel: drbd1: Began resync as SyncSource (will sync 13052 KB [3263 bits set]). Feb 6 23:16:54 newimapn kernel: drbd1: Writing meta data super block now. Feb 6 23:17:12 newimapn kernel: drbd1: Resync done (total 18 sec; paused 0 sec; 724 K/sec) Feb 6 23:17:12 newimapn kernel: drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Feb 6 23:17:12 newimapn kernel: drbd1: Writing meta data super block now. Feb 6 23:18:12 newimapn kernel: drbd1: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) Feb 6 23:18:12 newimapn kernel: drbd1: Creating new current UUID Feb 6 23:18:12 newimapn kernel: drbd1: Writing meta data super block now. Feb 6 23:18:12 newimapn kernel: drbd1: asender terminated Feb 6 23:18:12 newimapn kernel: drbd1: tl_clear() Feb 6 23:18:12 newimapn kernel: drbd1: Connection closed Feb 6 23:18:12 newimapn kernel: drbd1: conn( TearDown -> Unconnected ) Feb 6 23:18:12 newimapn kernel: drbd1: receiver terminated Feb 6 23:18:12 newimapn kernel: drbd1: receiver (re)started Feb 6 23:18:12 newimapn kernel: drbd1: conn( Unconnected -> WFConnection ) > if you still have the kernel logs, double check whether you find the > "drbd: initialised. Version 8.3.0 ..." line. > > if not, you never loaded nor used nor benchmarked against 8.3.0. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090209/741ebbe5/attachment.htm>