Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello everyone, I am still currently testing DRBD + OCFS2 in both master nodes and notices that if the networks interface is sort of saturated DRBD drop the connection. Node A and B both centos 6 - drbd-8.3.12 build from vanilla source kernel 2.6.37.6 with vserver patch vs2.3.0.37-rc5 It has happened before I do not know what the cause but I suspect network related problems. Today I was just running rsync a large directory from A to another host (not B) but using the same interface that drbd is using (that is, I do not have dedicated interface for drbd). The rsync takes long time I I found in dmesg [ 1844.822895] block drbd0: Resync done (total 1 sec; paused 0 sec; 8 K/sec) [ 1844.822904] block drbd0: updated UUIDs 56D7AD5A0D1052FC:0000000000000000:34133820BE9332EE:34123820BE9332EF [ 1844.822910] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) [ 1844.823072] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 [ 1844.825128] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) [ 1844.825584] block drbd0: bitmap WRITE of 22 pages took 0 jiffies [ 1844.825592] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. [ 1893.341247] block drbd0: peer( Secondary -> Primary ) [ 1899.914431] block drbd0: role( Secondary -> Primary ) [ 2639.422664] Loading kernel module for a network device with CAP_SYS_MODULE (deprecated). Use CAP_NET_ADMIN and alias netdev-dummy0 instead [ 2648.421314] dummy0: no IPv6 routers present [72370.860513] block drbd0: role( Primary -> Secondary ) [72370.860600] block drbd0: bitmap WRITE of 0 pages took 0 jiffies [72370.860611] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. [72393.260995] block drbd0: Considerable difference in lower level device sizes: 37747512s vs. 6291192s [72491.794195] block drbd0: role( Secondary -> Primary ) [72566.421735] block drbd0: Considerable difference in lower level device sizes: 37747512s vs. 6291192s [72664.997280] block drbd0: role( Primary -> Secondary ) [72664.997365] block drbd0: bitmap WRITE of 0 pages took 0 jiffies [72664.997375] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. [72708.334128] block drbd0: Considerable difference in lower level device sizes: 37747512s vs. 6291192s [72708.334134] block drbd0: drbd_bm_resize called with capacity == 37747512 [72708.334238] block drbd0: resync bitmap: bits=4718439 words=73726 pages=144 [72708.334242] block drbd0: size = 18 GB (18873756 KB) [72708.334247] block drbd0: Writing the whole bitmap, size changed and md moved [72708.335349] block drbd0: bitmap WRITE of 120 pages took 1 jiffies [72708.335357] block drbd0: 15 GB (3932040 bits) marked out-of-sync by on disk bit-map. [72708.335564] block drbd0: Resync of new storage after online grow [72708.335573] block drbd0: conn( Connected -> WFSyncUUID ) disk( UpToDate -> Outdated ) [72708.339931] block drbd0: updated sync uuid 067858B97D1E599C:0000000000000000:34133820BE9332EE:34123820BE9332EF [72708.340130] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 [72708.342435] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) [72708.342443] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) [72708.342454] block drbd0: Began resync as SyncTarget (will sync 15728160 KB [3932040 bits set]). [72742.631740] block drbd0: role( Secondary -> Primary ) [75037.775246] block drbd0: Resync done (total 2335 sec; paused 0 sec; 6732 K/sec) [75037.775254] block drbd0: updated UUIDs 56D7AD5A0D1052FD:0000000000000000:067858B97D1E599D:34133820BE9332EF [75037.775262] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) [75037.775455] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 [75037.777818] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) [75037.777837] block drbd0: bitmap WRITE of 0 pages took 0 jiffies [75037.777846] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. [759301.427734] block drbd0: sock was shut down by peer [759301.427743] block drbd0: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) [759301.427751] block drbd0: short read expecting header on sock: r=0 [759301.427793] block drbd0: meta connection shut down by peer. [759301.427812] block drbd0: asender terminated [759301.427814] block drbd0: Terminating asender thread [759301.427848] block drbd0: new current UUID 4E3A8EF514190575:56D7AD5A0D1052FD:067858B97D1E599D:34133820BE9332EF [759301.428512] block drbd0: Connection closed [759301.428518] block drbd0: conn( BrokenPipe -> Unconnected ) [759301.428523] block drbd0: receiver terminated [759301.428526] block drbd0: Restarting receiver thread [759301.428528] block drbd0: receiver (re)started [759301.428532] block drbd0: conn( Unconnected -> WFConnection ) [759302.119164] block drbd0: Handshake successful: Agreed network protocol version 96 [759302.119192] block drbd0: conn( WFConnection -> WFReportParams ) [759302.119342] block drbd0: Starting asender thread (from drbd0_receiver [2247]) [759302.119563] block drbd0: data-integrity-alg: <not-used> [759302.119671] block drbd0: drbd_sync_handshake: [759302.119676] block drbd0: self 4E3A8EF514190575:56D7AD5A0D1052FD:067858B97D1E599D:34133820BE9332EF bits:0 flags:0 [759302.119681] block drbd0: peer B8D842CF6950FAB3:56D7AD5A0D1052FD:067858B97D1E599C:34133820BE9332EF bits:1 flags:0 [759302.119685] block drbd0: uuid_compare()=100 by rule 90 [759302.119689] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 [759302.121991] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0) [759302.121995] block drbd0: Split-Brain detected but unresolved, dropping connection! [759302.122016] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 [759302.124199] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) [759302.124205] block drbd0: conn( WFReportParams -> Disconnecting ) [759302.124213] block drbd0: error receiving ReportState, l: 4! [759302.124242] block drbd0: asender terminated [759302.124247] block drbd0: Terminating asender thread [759302.124288] block drbd0: Connection closed [759302.124294] block drbd0: conn( Disconnecting -> StandAlone ) [759302.124401] block drbd0: receiver terminated [759302.124405] block drbd0: Terminating receiver thread [842477.436163] block drbd0: role( Primary -> Secondary ) [842477.436245] block drbd0: bitmap WRITE of 0 pages took 0 jiffies [842477.436256] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. [842477.436817] block drbd0: disk( UpToDate -> Failed ) [842477.436906] block drbd0: disk( Failed -> Diskless ) [842477.437531] block drbd0: drbd_bm_resize called with capacity == 0 [842477.437591] block drbd0: worker terminated [842477.437594] block drbd0: Terminating worker thread [842477.590493] drbd: module cleanup done. [842488.657890] drbd: initialized. Version: 8.3.12 (api:88/proto:86-96) [842488.657893] drbd: GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by root at cosmos, 2011-12-25 16:56:17 [842488.657896] drbd: registered as block device major 147 [842488.657898] drbd: minor_table @ 0xffff88041c35e100 [842488.680349] block drbd0: Starting worker thread (from kworker/u:3 [345]) [842488.680729] block drbd0: disk( Diskless -> Attaching ) [842488.694650] block drbd0: No usable activity log found. [842488.694655] block drbd0: Method to ensure write ordering: barrier [842488.694658] block drbd0: max BIO size = 131072 [842488.694664] block drbd0: drbd_bm_resize called with capacity == 37747512 [842488.694806] block drbd0: resync bitmap: bits=4718439 words=73726 pages=144 [842488.694809] block drbd0: size = 18 GB (18873756 KB) [842488.703982] block drbd0: bitmap READ of 144 pages took 1 jiffies [842488.704114] block drbd0: recounting of set bits took additional 0 jiffies [842488.704117] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. [842488.704123] block drbd0: disk( Attaching -> UpToDate ) [842488.704127] block drbd0: attached to UUIDs 4E3A8EF514190575:56D7AD5A0D1052FD:067858B97D1E599D:34133820BE9332EF [842488.707024] block drbd0: conn( StandAlone -> Unconnected ) [842488.707043] block drbd0: Starting receiver thread (from drbd0_worker [20773]) [842488.707244] block drbd0: receiver (re)started [842488.707253] block drbd0: conn( Unconnected -> WFConnection ) [842489.201687] block drbd0: Handshake successful: Agreed network protocol version 96 [842489.201713] block drbd0: conn( WFConnection -> WFReportParams ) [842489.201905] block drbd0: Starting asender thread (from drbd0_receiver [20782]) [842489.202075] block drbd0: data-integrity-alg: <not-used> [842489.202090] block drbd0: drbd_sync_handshake: [842489.202094] block drbd0: self 4E3A8EF514190574:56D7AD5A0D1052FD:067858B97D1E599D:34133820BE9332EF bits:0 flags:0 [842489.202099] block drbd0: peer B8D842CF6950FAB2:56D7AD5A0D1052FD:067858B97D1E599C:34133820BE9332EF bits:36737 flags:0 [842489.202102] block drbd0: uuid_compare()=100 by rule 90 [842489.202107] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 [842489.204369] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0) [842489.204374] block drbd0: Split-Brain detected, 0 primaries, automatically solved. Sync from peer node [842489.204381] block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate ) [842489.297122] block drbd0: conn( WFBitMapT -> WFSyncUUID ) [842489.307270] block drbd0: updated sync uuid 56D8AD5A0D1052FC:0000000000000000:067858B97D1E599D:34133820BE9332EF [842489.307479] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 [842489.309577] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) [842489.309585] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) [842489.309595] block drbd0: Began resync as SyncTarget (will sync 146948 KB [36737 bits set]). [842508.372386] block drbd0: Resync done (total 19 sec; paused 0 sec; 7732 K/sec) [842508.372393] block drbd0: updated UUIDs B8D842CF6950FAB2:0000000000000000:56D8AD5A0D1052FC:56D7AD5A0D1052FD [842508.372398] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) [842508.372485] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 [842508.374683] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) [842508.375756] block drbd0: bitmap WRITE of 131 pages took 0 jiffies [842508.375765] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. [842525.184360] block drbd0: peer( Secondary -> Primary ) [842565.780288] block drbd0: role( Secondary -> Primary ) [1871271.922857] block drbd0: sock was shut down by peer [1871271.922866] block drbd0: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) [1871271.922873] block drbd0: short read expecting header on sock: r=0 [1871271.922900] block drbd0: new current UUID 2EDBDEEE9284642B:B8D842CF6950FAB3:56D8AD5A0D1052FC:56D7AD5A0D1052FD [1871271.922908] block drbd0: meta connection shut down by peer. [1871271.922929] block drbd0: asender terminated [1871271.922932] block drbd0: Terminating asender thread [1871271.923606] block drbd0: Connection closed [1871271.923612] block drbd0: conn( BrokenPipe -> Unconnected ) [1871271.923618] block drbd0: receiver terminated [1871271.923620] block drbd0: Restarting receiver thread [1871271.923623] block drbd0: receiver (re)started [1871271.923627] block drbd0: conn( Unconnected -> WFConnection ) [1871272.616257] block drbd0: Handshake successful: Agreed network protocol version 96 [1871272.616282] block drbd0: conn( WFConnection -> WFReportParams ) [1871272.616433] block drbd0: Starting asender thread (from drbd0_receiver [20782]) [1871272.616598] block drbd0: data-integrity-alg: <not-used> [1871272.616829] block drbd0: drbd_sync_handshake: [1871272.616835] block drbd0: self 2EDBDEEE9284642B:B8D842CF6950FAB3:56D8AD5A0D1052FC:56D7AD5A0D1052FD bits:0 flags:0 [1871272.616839] block drbd0: peer C301948203A19D2F:B8D842CF6950FAB3:56D8AD5A0D1052FD:56D7AD5A0D1052FD bits:408 flags:0 [1871272.616843] block drbd0: uuid_compare()=100 by rule 90 [1871272.616848] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 [1871272.619242] block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0) [1871272.619247] block drbd0: Split-Brain detected but unresolved, dropping connection! [1871272.619268] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 [1871272.621532] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) [1871272.621538] block drbd0: conn( WFReportParams -> Disconnecting ) [1871272.621547] block drbd0: error receiving ReportState, l: 4! [1871272.621627] block drbd0: asender terminated [1871272.621634] block drbd0: Terminating asender thread [1871272.621817] block drbd0: Connection closed [1871272.621825] block drbd0: conn( Disconnecting -> StandAlone ) [1871272.622039] block drbd0: receiver terminated [1871272.622043] block drbd0: Terminating receiver thread and now I have cat /proc/drbd version: 8.3.12 (api:88/proto:86-96) GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by root at cosmos, 2011-12-25 16:56:17 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- ns:0 nr:107240176 dw:107240176 dr:664 al:0 bm:24 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 I do not think it is expected? how can I prevent this Thanks in advance -- Steve Kieu -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120116/cdeb86bd/attachment.htm>