[DRBD-user] DRBD can not get it working

Mon May 29 15:23:22 CEST 2017

Hi folks,

I try to get drbd9 working on my servers. My configuration is following:

I have 5 nodes, 2 of them are primary fileserver (fs1 and fs2)
3 are virtualisation hosts running proxmox (virt1…virt3)

All of them have network cards for derby connections:
 10.10.10.33/26

The virtualisation hosts have additional network cards.

Running newest drbd from linbit repo.

After several hours all nodes seem to be connected and talk to each other.
(drbdmanage n —> all have OK state)

+------------------------------------------------------------------------------------------------------------+
| Name  | Pool Size | Pool Free |                                                                    | State |
|------------------------------------------------------------------------------------------------------------|
| fs1   |   7630888 |   6479712 |                                                                    |    ok |
| fs2   |   7630888 |   6436592 |                                                                    |    ok |
| virt1 |     19260 |     19252 |                                                                    |    ok |
| virt2 |     19260 |     19252 |                                                                    |    ok |
| virt3 |     19260 |     19252 |                                                                    |    ok |
+------------------------------------------------------------------------------------------------------------+

With drbdadm I can see that fs1 is primary control node.

When trying to deploy a VM or move storage to drbd I nearly always get "TASK ERROR: storage migration failed: drbd error: Could not forward data to leader"
Sometimes this is working for the setting of "redundancy 1"; the setting of "redundancy 2" or higher never worked. In higher redundancy I can see in log file something like "Initial split brain detected" and it fails completely.

After the storage correctly runs on redundancy-1-level, I can assign it to the second fileserver. 
When doing this while the deploy is running it will most likely fail at 99.8% or so with some blocks not syncing.

So what am I possibly doing wrong? 

Thanks in advance for useful hints,
cheers, Frank

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170529/2b94db98/attachment.htm>