[DRBD-user] Cannot synchronize stacked device to backup server with DRBD9

Tue Jun 19 09:19:04 CEST 2018

Hi Lars, thx for answer

W dniu 18.06.2018 o 17:10, Lars Ellenberg pisze:
> On Wed, Jun 13, 2018 at 01:03:53PM +0200, Artur Kaszuba wrote:
>> I know about 3 node solution and i have used it for some time (from ~9.0.8),
>> but i had problems with stability and decided to change configuration to
>> stacked configuration, with hope it will work more stable. As a last
>> solution i will downgrade to drbd8 where never had any problems with
>> stability, but i would like to stay with 9 and after some time switch again
>> to 3 node config.
>>
>> By stability i mean such situation:
>> - last version of drbd9 (9.0.14)
>> - kernel 4.13.0-43-generic on Ubuntu 16.04
>> - high disk usage/IO on drbd devices
>> - 3 node configuration
>> - random system crash on "drbdam disconnect/connect" command
>> When i disable one node everything works without problems and
>> disconnect/connect works perfectly. Before 9.0.14 i dont had such crashes,
>> but had other which are fixed now.
> 
> And you cannot be bothered to report "such crashes"
> in a way that makes it possible to understand and fix those?
> 
> "random system crash" is not good enough :-/
> 

Yep, i know it is not enough to find a reason of this crashes, and that 
why i don't reported this separately, i asked only why stacking solution 
does not work in my case :).

Sorry but i cannot wrote to much more, this happening on production 
environment and i cannot make tests there.
I can add:
- simple tests to reproduce this situation, but without high disk usage 
does not create crashes
- problems started after upgrade from drbd 9.0.12 to 9.0.14 and 
drbd-utils 9.3.0-1ppa1~xenial1 to 9.4.0-1ppa1~xenial1, before this we 
dont had such crashes
- we have ~15 drbd resources on this environment, with high IO in random 
pattern (databases, indexers, git, file servers, kvm etc)

>> Unfortunately i cannot wait for next fix,
>> i need stable environment.
> 
> "I want it all, and I want it now" :-)
> 
> For the benefit of those that can afford to wait for the next fix,
> maybe you should still report the crashes in a way that we can work with.
> 

Sorry if i wrote it in wrong way, English is not my native language and 
i did not want to be sound rude.
I only wrote about such situation:
- system works without crashes for months
- system is core production environment in company
- drbd upgrade causes random crashes (3 node configuration for drbd9)
- we cannot manage/create drbd resources because system could crash on 
any drbdadm connect/disconnect command (what already happened in the 
middle of day when we trying to reconnect backup server :/)

Such situation does not allow me to wait for next fix, i need to find 
other solution/workaround.

>> I prefer to use stacking configuration, even when it is deprecated in
>> DRBD9.  I decided to write this post because stacked configuration is
>> still described in documentation and should work? Unfortunately for
>> now it is not possible to create such configuration or i missed
>> something :/
> 
> I know there are DRBD 9 users using "stacked" configurations out there.
> 

Hmm, maybe they created resources some time ago and drbd works for 
already created resources. That what i found is problem with initial 
synchronization to backup server:
- source servers pair are up and one is primary
- backup server try to synchronize data (first time)
- primary server try to enter into Source state for stacked device, at 
this moment it end with error:

[1636671.252028] drbd system-test-U/0 drbd113 z1: helper command: 
/sbin/drbdadm before-resync-source
[1636671.255933] drbd system-test-U/0 drbd113: before-resync-source 
handler returned 1, dropping connection.
[1636671.255942] drbd system-test-U z1: conn( Connected -> Disconnecting 
) peer( Secondary -> Unknown )

- the same error (error code) happened when i executed drbdadm 
before-resync-source directly:
'system-test-U' is a stacked resource, and not available in normal mode.

I think it could be a problem in drbd-utils or in drbd module:
- drbdadm before-resync-source should detect type of resource (not 
requiring --stacking switch)
- or drbd module should execute drbdadm before-resync-source command 
with "--stacking" switch when start before-resync-source handler for 
stacked resources

Of course, it could be caused by my misconfiguration (test config in 
initial mail), but i cannot find error there :(

> Maybe you missed to upgrade your drbd-utils?
> Current drbd-utils version would be 9.4.0
> 

Im already use latest version of drbd-utils and drdb-dkms module:
ii  drbd-dkms 9.0.14-1ppa1~xenial1    all    RAID 1 over TCP/IP for 
Linux module source
ii  drbd-utils 9.4.0-1ppa1~xenial1    amd64  RAID 1 over TCP/IP for 
Linux (user utilities)

If someone could help me to understand this situation i will be really 
grateful.

-- 
Artur Kaszuba