Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Pascal, One thing is unclear : did it used to work in the past (and if yes what has changed lately that could explain this behavior) or is it a new feature you’ve just added to your customer’s config ? Furthermore, I suspect you have scripted all this process haven’t you ? If so, have you identified which step induces this communication disruption? Have you tried to execute manually this sequence and then at what step does it happen ? Best regards, Pascal. De : drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] De la part de Pascal Charest Envoyé : samedi 27 août 2011 22:52 À : drbd-user at lists.linbit.com Objet : [DRBD-user] Frequent disconnect when doing backup. Hi, I have a small issue with one of my DRBD setup. When my backup is running (-see lower for setup and backup details), i`m getting those errors: Aug 27 10:24:18 pig-two -- MARK -- Aug 27 10:27:26 pig-two kernel: drbd0: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Aug 27 10:27:26 pig-two kernel: drbd0: asender terminated Aug 27 10:27:26 pig-two kernel: drbd0: Terminating asender thread Aug 27 10:27:26 pig-two kernel: drbd0: sock was reset by peer Aug 27 10:27:26 pig-two kernel: drbd0: _drbd_send_page: size=4096 len=3064 sent=-32 Aug 27 10:27:26 pig-two kernel: drbd0: Creating new current UUID Aug 27 10:27:26 pig-two kernel: drbd0: Writing meta data super block now. Aug 27 10:27:26 pig-two kernel: drbd0: tl_clear() Aug 27 10:27:26 pig-two kernel: drbd0: Connection closed Aug 27 10:27:26 pig-two kernel: drbd0: conn( NetworkFailure -> Unconnected ) Aug 27 10:27:26 pig-two kernel: drbd0: receiver terminated Aug 27 10:27:26 pig-two kernel: drbd0: receiver (re)started Aug 27 10:27:26 pig-two kernel: drbd0: conn( Unconnected -> WFConnection ) Aug 27 10:27:27 pig-two kernel: drbd0: Handshake successful: Agreed network protocol version 88 Aug 27 10:27:27 pig-two kernel: drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Aug 27 10:27:27 pig-two kernel: drbd0: conn( WFConnection -> WFReportParams ) Aug 27 10:27:27 pig-two kernel: drbd0: Starting asender thread (from drbd0_receiver [3066]) Aug 27 10:27:27 pig-two kernel: drbd0: data-integrity-alg: md5 Aug 27 10:27:27 pig-two kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Aug 27 10:27:27 pig-two kernel: drbd0: Writing meta data super block now. Aug 27 10:27:27 pig-two kernel: drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Aug 27 10:27:27 pig-two kernel: drbd0: Began resync as SyncSource (will sync 2160 KB [540 bits set]). Aug 27 10:27:27 pig-two kernel: drbd0: Writing meta data super block now. Aug 27 10:27:27 pig-two kernel: drbd0: Resync done (total 1 sec; paused 0 sec; 2160 K/sec) Aug 27 10:27:27 pig-two kernel: drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Aug 27 10:27:27 pig-two kernel: drbd0: Writing meta data super block now. Aug 27 10:44:19 pig-two -- MARK -- and Aug 27 11:04:19 pig-two -- MARK -- Aug 27 11:20:36 pig-two kernel: drbd0: _drbd_send_page: size=4096 len=4096 sent=-104 Aug 27 11:20:37 pig-two kernel: drbd0: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Aug 27 11:20:37 pig-two kernel: drbd0: Creating new current UUID Aug 27 11:20:37 pig-two kernel: drbd0: Writing meta data super block now. Aug 27 11:20:37 pig-two kernel: drbd0: asender terminated Aug 27 11:20:37 pig-two kernel: drbd0: Terminating asender thread Aug 27 11:20:37 pig-two kernel: drbd0: sock was shut down by peer Aug 27 11:20:37 pig-two kernel: drbd0: tl_clear() Aug 27 11:20:37 pig-two kernel: drbd0: Connection closed Aug 27 11:20:37 pig-two kernel: drbd0: conn( NetworkFailure -> Unconnected ) Aug 27 11:20:37 pig-two kernel: drbd0: receiver terminated Aug 27 11:20:37 pig-two kernel: drbd0: receiver (re)started Aug 27 11:20:37 pig-two kernel: drbd0: conn( Unconnected -> WFConnection ) Aug 27 11:20:37 pig-two kernel: drbd0: Handshake successful: Agreed network protocol version 88 Aug 27 11:20:37 pig-two kernel: drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC Aug 27 11:20:37 pig-two kernel: drbd0: conn( WFConnection -> WFReportParams ) Aug 27 11:20:37 pig-two kernel: drbd0: Starting asender thread (from drbd0_receiver [3066]) Aug 27 11:20:37 pig-two kernel: drbd0: data-integrity-alg: md5 Aug 27 11:20:37 pig-two kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) Aug 27 11:20:37 pig-two kernel: drbd0: Writing meta data super block now. Aug 27 11:20:37 pig-two kernel: drbd0: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) Aug 27 11:20:37 pig-two kernel: drbd0: Began resync as SyncSource (will sync 5788 KB [1447 bits set]). Aug 27 11:20:37 pig-two kernel: drbd0: Writing meta data super block now. Aug 27 11:20:37 pig-two kernel: drbd0: Resync done (total 1 sec; paused 0 sec; 5788 K/sec) Aug 27 11:20:37 pig-two kernel: drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Aug 27 11:20:37 pig-two kernel: drbd0: Writing meta data super block now. Aug 27 11:44:19 pig-two -- MARK -- Analysis: it look like the network is failing, then everything - under a second - re-connect, resync and work again. There are no impact on the 'production'. Anyone got some kind of idea, why ? Is it an error in my setup/design (see lower). Some background on the setup: It's an old version. Very old in fact - roadmap to upgrade has been drafted and submitted to client - I`m just wondering about the specific issue here... I want to be sure it's not an infrastructure design problem. pig-two:~# cat /proc/drbd version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by root at pig-two, 2008-08-19 15:02:28 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r--- ns:650469968 nr:0 dw:648856776 dr:16725553 al:5463958 bm:22571 lo:0 pe:0 ua:0 ap:0 oos:0 We are speaking, of: - 4x SAS 15k drives in a hardware raid-5 array (DELL Perc5)... presented to the OS as /dev/sda. - /dev/sda is the back-end device for DRBD... presented to the OS as /dev/drbd0 - /dev/drbd0 is a lone "physical volume" in a volume group (called SAN) from which Logical Volume are created. Those are NOT locally mounted. - those logical volumes are exported with vblade (AoE protocol, layer 2) to some other physical system (Xen dom0) where they are used as backend device (/dev/etherd/e0.1) for root volume of virtual system Everything work fine, but when I do backup, I follow this process: - mount a CIFS exported share over the network - take a LV snapshot, mount it, and copy everything to the CIFS share. - unmount snapshot, delete it... do for all LV. - unmount network share The backup are consistent and valid (tested)... What have I missed ? Should I move away from AoE to a Linux based iSCSI ? P. -- Pascal Charest - Cutting-edge technology consultant https://www.labsphoenix.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110828/4b550d08/attachment.htm>