Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
John Du wrote: > Lars Ellenberg wrote: >> On Mon, Feb 09, 2009 at 01:04:37PM -0800, John Du wrote: >> >>>>> I still do not understand why iostat only shows DRBD devices on this >>>>> particular node with 8.2.7 and 8.3.0 but not other nodes with the >>>>> same hardware, same Linux Kernel and same DRBD version. >>>>> >>>>> >>>> io stats accounting was introduced only in drbd-8.0.12 respective 8.2.6. >>>> if you don't see drbd in iostats, you probably use an older DRBD version. >>>> >>>> >>>> >>> I obviously did not make myself clear. We were running 8.3 on six nodes >>> and only this node showed DRBD in iostat and only this node was having >>> the problem I reported. I reverted to 8.2 on this node to make our >>> production going. >>> >> >> so you say >> six nodes. >> same hardware. same linux kernel. same drbd. >> but ONE node behaves different. >> >> pretty non-deterministic behaviour for software. >> >> > Yes. Everything is identical. Only this node works with 8.2 but not > 8.3. I know it is hard to believe. It is hard for me to believe too. > Assume something is different on this node, what difference would make > DRBD 8.3 not work but 8.2 do? is that possible that 8.3 sees the meta > data differently than 8.2? > > According to your message, iostat should show DRBD with 8.3. But it > does not on all of the other five nodes. >> I doubt I can help, as if that is true, >> circumstantial evidence suggests that it has nothing to do with drbd, >> but everything to do with whatever makes the non-behaving node behave >> different. >> >> though my guess is >> that either these nodes are not all that identical as you think they are. >> or you installed the new kernel module, but did not actually reload it. >> >> > The log from the problematic node is as follows: You can see it went > from 8.3.0 to 8.2.7 to 8.2.0. You cannot see the server was slow from > the log though. Trust me, it was very very slow. Also I ran the > different versions of DRBD with the same config file shown in my > original message. > > Feb 6 22:22:17 newimapn kernel: drbd: initialised. Version: 8.3.0 > (api:88/proto:86-89) > Feb 6 22:22:17 newimapn kernel: drbd: GIT-hash: > 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by root at newimapr, > 2009-02-02 23:57:10 > Feb 6 22:22:17 newimapn kernel: drbd: registered as block device > major 147 > Feb 6 22:22:17 newimapn kernel: drbd: minor_table @ 0xffff81021f0294c0 > Feb 6 22:22:17 newimapn kernel: drbd1: disk( Diskless -> Attaching ) > Feb 6 22:22:17 newimapn kernel: drbd1: Starting worker thread (from > cqueue/3 [257]) > Feb 6 22:22:17 newimapn kernel: klogd 1.4.1, ---------- state change > ---------- > Feb 6 22:22:17 newimapn kernel: drbd1: Found 4 transactions (192 > active extents) in activity log. > Feb 6 22:22:17 newimapn kernel: drbd1: Method to ensure write > ordering: barrier > Feb 6 22:22:17 newimapn kernel: drbd1: max_segment_size ( = BIO size > ) = 32768 > Feb 6 22:22:17 newimapn kernel: drbd1: drbd_bm_resize called with > capacity == 2571204968 > Feb 6 22:22:17 newimapn kernel: drbd1: resync bitmap: bits=321400621 > words=5021885 > Feb 6 22:22:17 newimapn kernel: drbd1: size = 1226 GB (1285602484 KB) > Feb 6 22:22:17 newimapn kernel: drbd1: recounting of set bits took > additional 43 jiffies > Feb 6 22:22:17 newimapn kernel: drbd1: 148 KB (37 bits) marked > out-of-sync by on disk bit-map. > Feb 6 22:22:17 newimapn kernel: drbd1: disk( Attaching -> UpToDate ) > Feb 6 22:22:17 newimapn kernel: drbd1: conn( StandAlone -> Unconnected ) > Feb 6 22:22:17 newimapn kernel: drbd1: Starting receiver thread (from > drbd1_worker [5507]) > Feb 6 22:22:17 newimapn kernel: drbd1: receiver (re)started > Feb 6 22:22:17 newimapn kernel: drbd1: conn( Unconnected -> > WFConnection ) > Feb 6 22:22:52 newimapn kernel: drbd1: role( Secondary -> Primary ) > Feb 6 22:22:53 newimapn kernel: kjournald starting. Commit interval > 5 seconds > Feb 6 22:22:53 newimapn kernel: EXT3-fs warning: maximal mount count > reached, running e2fsck is recommended > Feb 6 22:22:53 newimapn kernel: EXT3 FS on drbd1, internal journal > Feb 6 22:22:53 newimapn kernel: EXT3-fs: mounted filesystem with > ordered data mode. > Feb 6 22:23:07 newimapn avahi-daemon[4967]: Withdrawing address > record for 10.100.2.239 on eth0. > Feb 6 22:23:07 newimapn avahi-daemon[4967]: Leaving mDNS multicast > group on interface eth0.IPv4 with address 10.100.2.239. > Feb 6 22:23:07 newimapn avahi-daemon[4967]: Joining mDNS multicast > group on interface eth0.IPv4 with address 10.100.2.232. > Feb 6 22:23:09 newimapn kernel: drbd1: role( Primary -> Secondary ) > Feb 6 22:24:45 newimapn kernel: drbd1: role( Secondary -> Primary ) > Feb 6 22:24:45 newimapn kernel: kjournald starting. Commit interval > 5 seconds > Feb 6 22:24:45 newimapn kernel: EXT3-fs warning: maximal mount count > reached, running e2fsck is recommended > Feb 6 22:24:45 newimapn kernel: EXT3 FS on drbd1, internal journal > Feb 6 22:24:45 newimapn kernel: EXT3-fs: mounted filesystem with > ordered data mode. > Feb 6 22:24:45 newimapn avahi-daemon[4967]: Registering new address > record for 10.100.2.239 on eth0. > Feb 6 22:24:45 newimapn avahi-daemon[4967]: Withdrawing address > record for 10.100.2.239 on eth0. > Feb 6 22:24:45 newimapn avahi-daemon[4967]: Registering new address > record for 10.100.2.239 on eth0. > Feb 6 22:25:01 newimapn ntpd[4078]: synchronized to 10.100.2.249, > stratum 2 > Feb 6 22:25:03 newimapn avahi-daemon[4967]: Withdrawing address > record for 10.100.2.239 on eth0. > Feb 6 22:25:04 newimapn kernel: drbd1: role( Primary -> Secondary ) > Feb 6 22:26:08 newimapn kernel: drbd1: role( Secondary -> Primary ) > Feb 6 22:26:58 newimapn kernel: kjournald starting. Commit interval > 5 seconds > Feb 6 22:26:58 newimapn kernel: EXT3-fs warning: maximal mount count > reached, running e2fsck is recommended > Feb 6 22:26:58 newimapn kernel: EXT3 FS on drbd1, internal journal > Feb 6 22:26:58 newimapn kernel: EXT3-fs: mounted filesystem with > ordered data mode. > Feb 6 22:28:15 newimapn ntpd[4078]: synchronized to 70.86.250.6, > stratum 2 > Feb 6 22:28:18 newimapn ntpd[4078]: synchronized to 63.240.161.99, > stratum 2 > Feb 6 22:29:11 newimapn avahi-daemon[4967]: Registering new address > record for 10.100.2.239 on eth0. > Feb 6 22:29:11 newimapn avahi-daemon[4967]: Withdrawing address > record for 10.100.2.239 on eth0. > Feb 6 22:29:11 newimapn avahi-daemon[4967]: Registering new address > record for 10.100.2.239 on eth0. > Feb 6 22:31:24 newimapn ntpd[4078]: synchronized to 70.86.250.6, > stratum 2 > Feb 6 22:36:43 newimapn ntpd[4078]: synchronized to 64.247.17.251, > stratum 2 > Feb 6 22:42:35 newimapn httpd: nss_ldap: reconnected to LDAP server > ldap://ldap after 1 attempt > Feb 6 22:43:47 newimapn httpd: nss_ldap: reconnected to LDAP server > ldap://ldap after 1 attempt > Feb 6 22:45:46 newimapn httpd: nss_ldap: reconnected to LDAP server > ldap://ldap after 1 attempt > Feb 6 22:48:18 newimapn avahi-daemon[4967]: Withdrawing address > record for 10.100.2.239 on eth0. > Feb 6 22:48:19 newimapn kernel: drbd1: role( Primary -> Secondary ) > Feb 6 22:48:47 newimapn kernel: drbd1: conn( WFConnection -> > Disconnecting ) > Feb 6 22:48:47 newimapn kernel: drbd1: Discarding network configuration. > Feb 6 22:48:47 newimapn kernel: drbd1: Connection closed > Feb 6 22:48:47 newimapn kernel: drbd1: conn( Disconnecting -> > StandAlone ) > Feb 6 22:48:47 newimapn kernel: drbd1: receiver terminated > Feb 6 22:48:47 newimapn kernel: drbd1: Terminating receiver thread > Feb 6 22:48:47 newimapn kernel: drbd1: disk( UpToDate -> Diskless ) > Feb 6 22:48:47 newimapn kernel: drbd1: drbd_bm_resize called with > capacity == 0 > Feb 6 22:48:47 newimapn kernel: drbd1: worker terminated > Feb 6 22:48:47 newimapn kernel: drbd1: Terminating worker thread > Feb 6 22:48:47 newimapn kernel: drbd: module cleanup done. > Feb 6 22:51:31 newimapn kernel: drbd: initialised. Version: 8.2.7 > (api:88/proto:86-88) > Feb 6 22:51:31 newimapn kernel: drbd: GIT-hash: > 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d build by root at newimapn, > 2009-02-06 22:33:19 > Feb 6 22:51:31 newimapn kernel: drbd: registered as block device > major 147 > Feb 6 22:51:31 newimapn kernel: drbd: minor_table @ 0xffff81023c700480 > Feb 6 22:51:31 newimapn kernel: drbd1: disk( Diskless -> Attaching ) > Feb 6 22:51:31 newimapn kernel: drbd1: Starting worker thread (from > cqueue/5 [259]) > Feb 6 22:51:31 newimapn kernel: klogd 1.4.1, ---------- state change > ---------- > Feb 6 22:51:31 newimapn kernel: drbd1: Found 4 transactions (192 > active extents) in activity log. > Feb 6 22:51:31 newimapn kernel: drbd1: Method to ensure write > ordering: barrier > Feb 6 22:51:31 newimapn kernel: drbd1: max_segment_size ( = BIO size > ) = 32768 > Feb 6 22:51:31 newimapn kernel: drbd1: drbd_bm_resize called with > capacity == 2571204968 > Feb 6 22:51:31 newimapn kernel: drbd1: resync bitmap: bits=321400621 > words=5021885 > Feb 6 22:51:31 newimapn kernel: drbd1: size = 1226 GB (1285602484 KB) > Feb 6 22:51:31 newimapn kernel: drbd1: recounting of set bits took > additional 40 jiffies > Feb 6 22:51:31 newimapn kernel: drbd1: 10 MB (2684 bits) marked > out-of-sync by on disk bit-map. > Feb 6 22:51:31 newimapn kernel: drbd1: disk( Attaching -> UpToDate ) > Feb 6 22:51:31 newimapn kernel: drbd1: conn( StandAlone -> Unconnected ) > Feb 6 22:51:31 newimapn kernel: drbd1: Starting receiver thread (from > drbd1_worker [7544]) > Feb 6 22:51:31 newimapn kernel: drbd1: receiver (re)started > Feb 6 22:51:31 newimapn kernel: drbd1: conn( Unconnected -> > WFConnection ) > Feb 6 22:52:19 newimapn kernel: drbd1: role( Secondary -> Primary ) > Feb 6 22:52:19 newimapn kernel: kjournald starting. Commit interval > 5 seconds > Feb 6 22:52:19 newimapn kernel: EXT3-fs warning: maximal mount count > reached, running e2fsck is recommended > Feb 6 22:52:19 newimapn kernel: EXT3 FS on drbd1, internal journal > Feb 6 22:52:19 newimapn kernel: EXT3-fs: mounted filesystem with > ordered data mode. > Feb 6 22:52:20 newimapn avahi-daemon[4967]: Registering new address > record for 10.100.2.239 on eth0. > Feb 6 22:52:20 newimapn avahi-daemon[4967]: Withdrawing address > record for 10.100.2.239 on eth0. > Feb 6 22:52:20 newimapn avahi-daemon[4967]: Registering new address > record for 10.100.2.239 on eth0. > Feb 6 22:57:07 newimapn ntpd[4078]: synchronized to 70.86.250.6, > stratum 2 > Feb 6 23:00:13 newimapn avahi-daemon[4967]: Withdrawing address > record for 10.100.2.239 on eth0. > Feb 6 23:00:15 newimapn kernel: drbd1: role( Primary -> Secondary ) > Feb 6 23:01:01 newimapn kernel: drbd1: conn( WFConnection -> > Disconnecting ) > Feb 6 23:01:01 newimapn kernel: drbd1: Discarding network configuration. > Feb 6 23:01:01 newimapn kernel: drbd1: Connection closed > Feb 6 23:01:01 newimapn kernel: drbd1: conn( Disconnecting -> > StandAlone ) > Feb 6 23:01:01 newimapn kernel: drbd1: receiver terminated > Feb 6 23:01:01 newimapn kernel: drbd1: Terminating receiver thread > Feb 6 23:01:01 newimapn kernel: drbd1: disk( UpToDate -> Diskless ) > Feb 6 23:01:01 newimapn kernel: drbd1: drbd_bm_resize called with > capacity == 0 > Feb 6 23:01:01 newimapn kernel: drbd1: worker terminated > Feb 6 23:01:01 newimapn kernel: drbd1: Terminating worker thread > Feb 6 23:01:01 newimapn kernel: drbd: module cleanup done. > Feb 6 23:04:01 newimapn kernel: drbd: initialised. Version: 8.2.0 > (api:86/proto:86-87) > Feb 6 23:04:01 newimapn kernel: drbd: SVN Revision: 3079 build by > root at newimapn, 2009-02-06 22:58:05 > Feb 6 23:04:01 newimapn kernel: drbd: registered as block device > major 147 > Feb 6 23:04:01 newimapn kernel: drbd: minor_table @ 0xffff81023c700c80 > Feb 6 23:04:01 newimapn kernel: drbd1: disk( Diskless -> Attaching ) > Feb 6 23:04:01 newimapn kernel: klogd 1.4.1, ---------- state change > ---------- > Feb 6 23:04:01 newimapn kernel: drbd1: Found 4 transactions (52 > active extents) in activity log. > Feb 6 23:04:01 newimapn kernel: drbd1: max_segment_size ( = BIO size > ) = 32768 > Feb 6 23:04:01 newimapn kernel: drbd1: drbd_bm_resize called with > capacity == 2571204968 > Feb 6 23:04:01 newimapn kernel: drbd1: resync bitmap: bits=321400621 > words=5021885 > Feb 6 23:04:01 newimapn kernel: drbd1: size = 1226 GB (1285602484 KB) > Feb 6 23:04:02 newimapn kernel: drbd1: reading of bitmap took 198 jiffies > Feb 6 23:04:02 newimapn kernel: drbd1: recounting of set bits took > additional 39 jiffies > Feb 6 23:04:02 newimapn kernel: drbd1: 11 MB marked out-of-sync by on > disk bit-map. > Feb 6 23:04:02 newimapn kernel: drbd1: disk( Attaching -> UpToDate ) > Feb 6 23:04:02 newimapn kernel: drbd1: Writing meta data super block now. > Feb 6 23:04:02 newimapn kernel: drbd1: conn( StandAlone -> Unconnected ) > Feb 6 23:04:02 newimapn kernel: drbd1: receiver (re)started > Feb 6 23:04:02 newimapn kernel: drbd1: conn( Unconnected -> > WFConnection ) > Feb 6 23:04:28 newimapn ntpd[4078]: synchronized to 63.240.161.99, > stratum 2 > Feb 6 23:04:46 newimapn kernel: drbd1: role( Secondary -> Primary ) > Feb 6 23:04:46 newimapn kernel: drbd1: Writing meta data super block now. > Feb 6 23:04:46 newimapn kernel: kjournald starting. Commit interval > 5 seconds > Feb 6 23:04:46 newimapn kernel: EXT3-fs warning: maximal mount count > reached, running e2fsck is recommended > Feb 6 23:04:46 newimapn kernel: EXT3 FS on drbd1, internal journal > Feb 6 23:04:46 newimapn kernel: EXT3-fs: mounted filesystem with > ordered data mode. > Feb 6 23:04:46 newimapn avahi-daemon[4967]: Registering new address > record for 10.100.2.239 on eth0. > Feb 6 23:04:46 newimapn avahi-daemon[4967]: Withdrawing address > record for 10.100.2.239 on eth0. > Feb 6 23:04:46 newimapn avahi-daemon[4967]: Registering new address > record for 10.100.2.239 on eth0. > Feb 6 23:13:23 newimapn ntpd[4078]: synchronized to 70.86.250.6, > stratum 2 > Feb 6 23:16:24 newimapn kernel: drbd1: conn( WFConnection -> > WFReportParams ) > Feb 6 23:16:24 newimapn kernel: drbd1: Handshake successful: Agreed > network protocol version 87 > Feb 6 23:16:24 newimapn kernel: drbd1: data-integrity-alg: > Feb 6 23:16:24 newimapn kernel: drbd1: peer( Unknown -> Secondary ) > conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) > Feb 6 23:16:43 newimapn kernel: drbd1: Writing meta data super block now. > Feb 6 23:16:43 newimapn kernel: drbd1: BUG! md_sync_timer expired! > Worker calls drbd_md_sync(). > Feb 6 23:16:54 newimapn kernel: drbd1: conn( WFBitMapS -> SyncSource > ) pdsk( UpToDate -> Inconsistent ) > Feb 6 23:16:54 newimapn kernel: drbd1: Began resync as SyncSource > (will sync 13052 KB [3263 bits set]). > Feb 6 23:16:54 newimapn kernel: drbd1: Writing meta data super block now. > Feb 6 23:17:12 newimapn kernel: drbd1: Resync done (total 18 sec; > paused 0 sec; 724 K/sec) > Feb 6 23:17:12 newimapn kernel: drbd1: conn( SyncSource -> Connected > ) pdsk( Inconsistent -> UpToDate ) > Feb 6 23:17:12 newimapn kernel: drbd1: Writing meta data super block now. > Feb 6 23:18:12 newimapn kernel: drbd1: peer( Secondary -> Unknown ) > conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) > Feb 6 23:18:12 newimapn kernel: drbd1: Creating new current UUID > Feb 6 23:18:12 newimapn kernel: drbd1: Writing meta data super block now. > Feb 6 23:18:12 newimapn kernel: drbd1: asender terminated > Feb 6 23:18:12 newimapn kernel: drbd1: tl_clear() > Feb 6 23:18:12 newimapn kernel: drbd1: Connection closed > Feb 6 23:18:12 newimapn kernel: drbd1: conn( TearDown -> Unconnected ) > Feb 6 23:18:12 newimapn kernel: drbd1: receiver terminated > Feb 6 23:18:12 newimapn kernel: drbd1: receiver (re)started > Feb 6 23:18:12 newimapn kernel: drbd1: conn( Unconnected -> > WFConnection ) > > > > >> if you still have the kernel logs, double check whether you find the >> "drbd: initialised. Version 8.3.0 ..." line. >> >> if not, you never loaded nor used nor benchmarked against 8.3.0. >> >> > Forgot to show you the kernel version: [root at newimapn# uname -a Linux newimapn 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090209/5b104b60/attachment.htm>