[DRBD-user] DRBD9 - Joining node with 4Kn disk into existing cluster with 512n disks

Tue Sep 19 21:06:29 CEST 2017

Hi all,

I'm trying to join a new node into an existing 2-node cluster and it 
seems to be broken somehow...

I'm using DRBD9 on Ubuntu 16.04 LTS:
ii  drbd-dkms 9.0.9-1ppa1~xenial1                          all          
RAID 1 over TCP/IP for Linux module source
ii  drbd-utils 9.1.1-1ppa1~xenial1                          amd64        
RAID 1 over TCP/IP for Linux (user utilities)
ii  python-drbdmanage 0.99.10-1ppa1~xenial1                        
all          DRBD distributed resource management utility

The existing 2-node cluster:
- Node hv2 / HW-Raid5 = 5x 1TB (512n) / LVM-Crypt / Drbd uses LvmThinLv
- Node hv3 / HW-Raid5 = 3x 2TB (512n) / LVM-Crypt / Drbd uses LvmThinLv

The new one:
- Node hv1 / 1x 6TB (4Kn) / LVM-Crypt / Drbd uses LvmThinLv
(actually tried with SW-Raid1 and had the same results before)

State before joining:

root at hv2 ~ # drbd-overview
Resources:
   0:.drbdctrl/0    Connected(2*) Second/Primar UpToDa/UpToDa
   1:.drbdctrl/1    Connected(2*) Second/Primar UpToDa/UpToDa
100:vm_dc2/0       Connected(2*) Primar/Second UpToDa/UpToDa 
*dc2            sda scsi
101:vm_dc1/0       Connected(2*) Primar/Second UpToDa/UpToDa 
*dc1            sda scsi
... some more in same state

Then I add hv1:

root at hv3 ~ # drbdmanage add-node hv1 192.168.42.2

In dmesg of hv1 I see:
[ 1186.205695] drbd .drbdctrl/0 drbd0: logical block size of local 
backend does not match (drbd:512, backend:4096); was this a late attach?
[ 1186.205702] drbd .drbdctrl/0 drbd0: logical block sizes do not match 
(me:512, peer:512); this may cause problems.
In dmesg of hv2 and hv3 I see:
[ 7765.951161] drbd .drbdctrl/0 drbd0: logical block sizes do not match 
(me:512, peer:4096); this may cause problems.
[ 7765.951165] drbd .drbdctrl/0 drbd0: current Primary must NOT adjust 
logical block size (512 -> 4096); hope for the best.

State still looks good:

root at hv1 ~ # drbd-overview
  0:.drbdctrl/0  Connected(3*) Seco(hv1,hv2)/Prim(hv3) 
UpTo(hv1)/UpTo(hv2,hv3)
  1:.drbdctrl/1  Connected(3*) Seco(hv2,hv1)/Prim(hv3) 
UpTo(hv1)/UpTo(hv2,hv3)

Then I assign a resource:

root at hv3:~# drbdmanage assign-resource vm_dc2 hv1

and hv1 ends diskless :-(

[ 4564.041711] drbd vm_dc2/0 drbd100 hv2: helper command: /sbin/drbdadm 
before-resync-target
[ 4564.044700] drbd vm_dc2/0 drbd100 hv2: helper command: /sbin/drbdadm 
before-resync-target exit code 0 (0x0)
[ 4564.044759] drbd vm_dc2/0 drbd100 hv2: repl( WFBitMapT -> SyncTarget )
[ 4564.044765] drbd vm_dc2/0 drbd100 hv3: resync-susp( no -> connection 
dependency )
[ 4564.044978] drbd vm_dc2/0 drbd100 hv2: Began resync as SyncTarget 
(will sync 16777216 KB [4194304 bits set]).
[ 4564.048031] drbd vm_dc2/0 drbd100 hv3: receive bitmap stats 
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 4564.050267] drbd vm_dc2/0 drbd100 hv3: send bitmap stats 
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 4564.050285] drbd vm_dc2/0 drbd100 hv3: helper command: /sbin/drbdadm 
before-resync-target
[ 4564.053016] drbd vm_dc2/0 drbd100 hv3: helper command: /sbin/drbdadm 
before-resync-target exit code 0 (0x0)
[ 4564.053074] drbd vm_dc2/0 drbd100 hv3: repl( WFBitMapT -> PausedSyncT )
[ 4564.053286] drbd vm_dc2/0 drbd100 hv3: Began resync as PausedSyncT 
(will sync 16777216 KB [4194304 bits set]).
[ 4636.342184] sd 4:0:0:0: [sda] Bad block number requested
[ 4636.346976] drbd vm_dc2/0 drbd100: write: error=10 s=2050s
[ 4636.347015] drbd vm_dc2/0 drbd100: disk( Inconsistent -> Failed )
[ 4636.347021] drbd vm_dc2/0 drbd100 hv2: repl( SyncTarget -> Established )
[ 4636.347026] drbd vm_dc2/0 drbd100 hv3: repl( PausedSyncT -> 
Established ) resync-susp( connection dependency -> no )
[ 4636.347063] drbd vm_dc2/0 drbd100: Local IO failed in 
drbd_endio_write_sec_final. Detaching...
[ 4636.354204] drbd vm_dc2/0 drbd100: disk( Failed -> Diskless )

Any ideas what is going wrong?

Thanks and regards
Sebastian Hasait