[DRBD-user] pacemaker + drb9 and prefers location constraint
Salatiel Filho
salatiel.filho at gmail.com
Mon Dec 11 20:51:00 CET 2023
Hello , everyone. I am seeing a behaviour that I can not understand
very well when I have drbd managed by pacemaker and I have a prefers
location constraint.
These are my resources:
Full List of Resources:
* Clone Set: DRBDData-clone [DRBDData] (promotable):
* Promoted: [ pcs01 ]
* Unpromoted: [ pcs02 pcs03 ]
* Resource Group: nfs:
* portblock_on_nfs (ocf:heartbeat:portblock): Started pcs01
* vip_nfs (ocf:heartbeat:IPaddr2): Started pcs01
* drbd_fs (ocf:heartbeat:Filesystem): Started pcs01
* nfsd (ocf:heartbeat:nfsserver): Started pcs01
* exportnfs (ocf:heartbeat:exportfs): Started pcs01
* portblock_off_nfs (ocf:heartbeat:portblock): Started pcs01
And I have a location preference for pcs01 and DRBD-clone
resource 'DRBDData-clone' prefers node 'pcs01' with score INFINITY
# drbdadm status
exports role:Primary
disk:UpToDate
pcs02.lan role:Secondary
peer-disk:UpToDate
pcs03.lan role:Secondary
peer-disk:UpToDate
Now, while pcs01 is providing the resources, I mount the NFS export in
some client and start copying a 15GB random file.
After copying 5 GB I pull the plug of the pcs01 node. After a few
seconds pcs02 is promoted and the copy resumes.
Output for drbdadm status:
exports role:Primary
disk:UpToDate
pcs01.lan connection:Connecting
pcs03.lan role:Secondary congested:yes ap-in-flight:1032 rs-in-flight:0
peer-disk:UpToDate
output for pcs status
Node List:
* Online: [ pcs02 pcs03 ]
* OFFLINE: [ pcs01 ]
Full List of Resources:
* Clone Set: DRBDData-clone [DRBDData] (promotable):
* Promoted: [ pcs02 ]
* Unpromoted: [ pcs03 ]
* Stopped: [ pcs01 ]
* Resource Group: nfs:
* portblock_on_nfs (ocf:heartbeat:portblock): Started pcs02
* vip_nfs (ocf:heartbeat:IPaddr2): Started pcs02
* drbd_fs (ocf:heartbeat:Filesystem): Started pcs02
* nfsd (ocf:heartbeat:nfsserver): Started pcs02
* exportnfs (ocf:heartbeat:exportfs): Started pcs02
* portblock_off_nfs (ocf:heartbeat:portblock): Started pcs02
Now after the 15GB file is near 14GB copied (so, at least 9GB needed
to resync if pcs01 is back online) I start the pcs01 again.
Since it has preference on pacemaker, as soon as pacemaker detects it,
the service will move there.
The question is, how can a "inconsistent/degraded" replica become
primary without the resync is completed ?
# drbdadm status
exports role:Primary
disk:Inconsistent
pcs02.lan role:Secondary
replication:SyncTarget peer-disk:UpToDate done:79.16
pcs03.lan role:Secondary
replication:PausedSyncT peer-disk:UpToDate done:78.24
resync-suspended:dependency
The service moved back to pcs01:
Node List:
* Online: [ pcs01 pcs02 pcs03 ]
Full List of Resources:
* Clone Set: DRBDData-clone [DRBDData] (promotable):
* Promoted: [ pcs01 ]
* Unpromoted: [ pcs02 pcs03 ]
* Resource Group: nfs:
* portblock_on_nfs (ocf:heartbeat:portblock): Started pcs01
* vip_nfs (ocf:heartbeat:IPaddr2): Started pcs01
* drbd_fs (ocf:heartbeat:Filesystem): Started pcs01
* nfsd (ocf:heartbeat:nfsserver): Started pcs01
* exportnfs (ocf:heartbeat:exportfs): Started pcs01
* portblock_off_nfs (ocf:heartbeat:portblock): Started pcs01
# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ fd0904f7bf256ecd380e1c19ec73c712f3855d40\
build\ by\ mockbuild at 42fe748df8a24339966f712147eb3bfd\,\ 2023-11-01\
01:47:26
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090111
DRBD_KERNEL_VERSION=9.1.17
DRBDADM_VERSION_CODE=0x091a00
DRBDADM_VERSION=9.26.0
# cat /etc/redhat-release
AlmaLinux release 9.3 (Shamrock Pampas Cat)
Is that a bug ? Shouldn´t that corrupt the filesystem ?
Atenciosamente/Kind regards,
Salatiel
More information about the drbd-user
mailing list