[DRBD-user] Update DRBD in product

matthieu le roy leroy.matthieu50 at gmail.com
Wed Mar 22 11:30:09 CET 2023


Hello,

I have two servers running in high availability..
here is the info of the first server :

OS : Ubuntu 18.04.2 LTS
#drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ 38a99411a8fcb883214a5300ad0ce1ef7ca37730\
build\ by\ buildd at lgw01-amd64-016\,\ 2019-05-27\ 12:45:18
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090012
DRBD_KERNEL_VERSION=9.0.18
DRBDADM_VERSION_CODE=0x090900
DRBDADM_VERSION=9.9.0

here is the info of the second server after update :

OS : Ubuntu 20.04.6 LTS
# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ e267c4413f7cb3d8ec5e793c3fa7f518e95f23b1\
build\ by\ buildd at lcy02-amd64-101\,\ 2023-03-14\ 09:57:26
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090202
DRBD_KERNEL_VERSION=9.2.2
DRBDADM_VERSION_CODE=0x091701
DRBDADM_VERSION=9.23.1

drbd config :

#cat /etc/drbd.d/alfresco.conf
resource alfresco {
  handlers {
#    before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh";
#    after-resync-target "/usr/lib/drbd/unsnapshot-resync-target-lvm.sh";
  }
  on storage1 {
    device /dev/drbd5;
    disk /dev/datavg/alfresco;
    node-id 10;
    address   10.50.20.1:7004;
    meta-disk internal;
  }
  on storage2 {
    device /dev/drbd5;
    disk /dev/datavg/appli;
    node-id 11;
    address   10.50.20.2:7004;
    meta-disk internal;
  }
}

# cat /etc/drbd.d/global_common.conf
global {
usage-count yes;
udev-always-use-vnr;
}

common {
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
}
net {
         after-sb-0pri discard-zero-changes;
         after-sb-1pri discard-secondary;
         after-sb-2pri disconnect;
data-integrity-alg crc32c;
timeout 90;
  ping-timeout 20;
  ping-int 15;
  connect-int 10;
}
}


order placed after update :


#drbdadm create-md appli
#drbdadm up appli

the synchro is launched I was able to follow the progress but arrived at
100% here is the status of the servers:

storage1 :
# drbdadm status alfresco
alfresco role:Primary
  disk:UpToDate
  storage2 role:Secondary
    replication:SyncSource peer-disk:Inconsistent

# drbdsetup status --verbose --statistics alfresco
alfresco node-id:10 role:Primary suspended:no
    write-ordering:flush
  volume:0 minor:5 disk:UpToDate quorum:yes
      size:536854492 read:423078021 written:419423956 al-writes:9640
bm-writes:0 upper-pending:0 lower-pending:0
      al-suspended:no blocked:no
  storage2 node-id:11 connection:Connected role:Secondary congested:no
ap-in-flight:0 rs-in-flight:0
    volume:0 replication:SyncSource peer-disk:Inconsistent
resync-suspended:no
        received:0 sent:421584224 out-of-sync:0 pending:0 unacked:0

storage2 :

# drbdadm status alfresco
alfresco role:Secondary
  disk:Inconsistent
  storage1 role:Primary
    replication:SyncTarget peer-disk:UpToDate

# drbdsetup status --verbose --statistics alfresco
alfresco node-id:11 role:Secondary suspended:no force-io-failures:no
    write-ordering:flush
  volume:0 minor:5 disk:Inconsistent backing_dev:/dev/datavg/alfresco
quorum:yes
      size:536854492 read:0 written:421584224 al-writes:14 bm-writes:6112
upper-pending:0 lower-pending:0
      al-suspended:no blocked:no
  storage1 node-id:10 connection:Connected role:Primary congested:no
ap-in-flight:0 rs-in-flight:0
    volume:0 replication:SyncTarget peer-disk:UpToDate resync-suspended:no
        received:421584224 sent:0 out-of-sync:0 pending:0 unacked:0


and while I haven't had any logs concerning drbd on storage1 since the
start of the sync, I have on storage2 these logs in a loop :

Mar 22 10:22:31 storage2 kernel: [ 4713.898381] INFO: task
drbd_s_alfresco:2104 blocked for more than 120 seconds.
Mar 22 10:22:31 storage2 kernel: [ 4713.898465]       Tainted: G
OE     5.4.0-144-generic #161-Ubuntu
Mar 22 10:22:31 storage2 kernel: [ 4713.898530] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 22 10:22:31 storage2 kernel: [ 4713.898604] drbd_s_alfresco D    0
 2104      2 0x80004000
Mar 22 10:22:31 storage2 kernel: [ 4713.898609] Call Trace:
Mar 22 10:22:31 storage2 kernel: [ 4713.898624]  __schedule+0x2e3/0x740
Mar 22 10:22:31 storage2 kernel: [ 4713.898633]  ?
update_load_avg+0x7c/0x670
Mar 22 10:22:31 storage2 kernel: [ 4713.898641]  ? sched_clock+0x9/0x10
Mar 22 10:22:31 storage2 kernel: [ 4713.898648]  schedule+0x42/0xb0
Mar 22 10:22:31 storage2 kernel: [ 4713.898656]
 rwsem_down_write_slowpath+0x244/0x4d0
Mar 22 10:22:31 storage2 kernel: [ 4713.898663]  ?
put_prev_entity+0x23/0x100
Mar 22 10:22:31 storage2 kernel: [ 4713.898675]  down_write+0x41/0x50
Mar 22 10:22:31 storage2 kernel: [ 4713.898703]
 drbd_resync_finished+0x97/0x7c0 [drbd]
Mar 22 10:22:31 storage2 kernel: [ 4713.898735]  ? drbd_cork+0x64/0x70
[drbd]
Mar 22 10:22:31 storage2 kernel: [ 4713.898754]  ?
wait_for_sender_todo+0x21e/0x240 [drbd]
Mar 22 10:22:31 storage2 kernel: [ 4713.898777]
 w_resync_finished+0x2c/0x40 [drbd]
Mar 22 10:22:31 storage2 kernel: [ 4713.898795]  drbd_sender+0x13e/0x3d0
[drbd]
Mar 22 10:22:31 storage2 kernel: [ 4713.898827]
 drbd_thread_setup+0x87/0x1d0 [drbd]
Mar 22 10:22:31 storage2 kernel: [ 4713.898836]  kthread+0x104/0x140
Mar 22 10:22:31 storage2 kernel: [ 4713.898861]  ?
drbd_destroy_connection+0x150/0x150 [drbd]
Mar 22 10:22:31 storage2 kernel: [ 4713.898866]  ? kthread_park+0x90/0x90
Mar 22 10:22:31 storage2 kernel: [ 4713.898873]  ret_from_fork+0x1f/0x40


I need help please.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20230322/cdd50b19/attachment.htm>


More information about the drbd-user mailing list