From rene.peinthor at linbit.com Tue May 4 12:55:42 2021 From: rene.peinthor at linbit.com (Rene Peinthor) Date: Tue, 4 May 2021 12:55:42 +0200 Subject: [DRBD-announce] linstor-server release 1.12.2 Message-ID: Hi! Another little bugfix release, postgresql DB upgrade didn't work as expected and there was another migration script that forgot about resources with suffix volumes (e.g. external metadata) linstor-server 1.12.2 --------------------- * Fix migration on postgresql * Fixed db migration of resource with suffix volumes (e.g. ext-metadata) https://www.linbit.com/downloads/linstor/linstor-server-1.12.2.tar.gz Linstor PPA: https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack Cheers, Rene -------------- next part -------------- An HTML attachment was scrubbed... URL: From philipp.reisner at linbit.com Thu May 6 15:53:24 2021 From: philipp.reisner at linbit.com (Philipp Reisner) Date: Thu, 6 May 2021 15:53:24 +0200 Subject: [DRBD-announce] drbd-9.0.29-1 & drbd-9.1.2 Message-ID: Hello, I have the honor to announce new DRBD releases. We got a report about data corruption on DRBD volumes if the backing device is a degraded Linux software raid 5 (or raid 6). Affected are kernels >=4.3. I.e. the distros: RHEL8/CentOS8/AlmaLinux8/RockyLinux8, Xenial and newer etc. Not affected: RHEL7 and all distros with kernels before 4.3 For explaining what was going on, let me explain what a 'read ahead' request is. When a user-space application reads some file via the file-system some (regular) read requests might get submitted to a block device. The page-cache often appends a few 'read ahead' requests to that to already pre-load the data that the application might want to see soon. These 'read ahead' requests are special in the way that a block device might decide 'out of convenience' to fail them. Here comes the relation to a degraded software raid 5/6. A degraded software raid is not in a convenient position, it's stripe cache is under pressure because it needs to restore data blocks by doing reverse parity calculations in the stripe cache. So, it is not unusual that read-ahead requests get failed by the md-driver while it misses one of its backing disks. The bug that we had in DRBD is, that when DRBD has to split a big IO request into smaller parts, and only some of the parts are failed by DRBD's backing disk it failed to correctly combine the return codes of the individual parts. So, it happened that it returns a large read-ahead request to the page cache as successfully read-ahead, although some parts of it were not filled with data from the storage. The result is not only that the application might get corrupt data, if one of those pages is partially touched by user-space it might also be written back to storage. This is fixed in DRBD now. It is also an interesting story how this bug happened. It is closely connected in how upstream Linux evolved, and we suspect that there are some other places in the kernel broken in the same way. We are looking into that. In other news, after introducing quorum to DRBD the first adopters used it with the on-no-quorum=io-error. That is the preferable setting for HA clusters, where you want to terminate your application in case a primary node loses quorum. The other possibility is on-no-quorum=suspend-io. That was neglected in the past and had a few bugs in it. With this relase, this mode works nicely and you can recover a primary without quorum by either adding nodes to or by changing the quorum setting. The frozen applications will either unfreeze or get IO errors. The last thing I want to mention is that the `invalidate` and `invalidate-remote` commands got a new option `--reset-bitmap=no`. That allows you to resync differences found by using online verify. If you are not on software raid and not using quorum with on-no-quorum=suspend-io this release still brings several minor bug fixes. Still, I recommend upgrading to this release. 9.0.29-1 (api:genl2/proto:86-120/transport:14) -------- * fix data corruption when DRBD's backing disk is a degraded Linux software raid (MD) * add correct thawing of IO requests after IO was frozen due to loss of quorum * fix timeout detection after idle periods and for configs with ko-count when a disk on an a secondary stops delivering IO-completion events * fixed an issue where UUIDs where not shifted in the history slots; that caused false "unrelated data" events * fix switching resync sources by letting resync requests drain before issuing resync requests to the new source; before the fix, it could happen that the resync does not terminate since a late reply from the previous caused a out-of-sync bit set after the "scan point" * fix a temporal deadlock you could trigger when you exercise promotion races and mix some read-only openers into the test case * fix for bitmap-copy operation in a very specific and unlikely case where two nodes do a bitmap-based resync due to disk-states * fix size negotiation when combining nodes of different CPU architectures that have different page sizes * fix a very rare race where DRBD reported wrong magic in a header packet right after reconnecting * fix a case where DRBD ends up reporting unrelated data; it affected thinly allocated resources with a diskless node in a recreate from day0 event * speedup open() of drbd devices if promote has not chance to go through * new option "--reset-bitmap=no" for the invalidate and invalidate-remote commands; this allows to do a resync after online verify found differences * changes to socket buffer sizes get applied to established connections immediately; before it was applied after a re-connect * add exists events for path objects * forbid keyed hash algorithms for online verify, csyms and HMAC base alg * following upstream changes to DRBD up to Linux 5.12 and updated compat rules to support up to Linux 5.12 9.1.2 (api:genl2/proto:110-120/transport:17) -------- * merged all fixes from drbd-9.0.29; other than that no changes in this branch https://linbit.com/downloads/drbd/9.0/drbd-9.0.29-1.tar.gz https://github.com/LINBIT/drbd/commit/9a7bc817880ab1ac800f4c53f2b832ddd5da87c5 https://linbit.com/downloads/drbd/9/drbd-9.1.2.tar.gz https://github.com/LINBIT/drbd/commit/a60cffa380085d75c5f62b6bcb500c5b43ca801e From rene.peinthor at linbit.com Fri May 7 11:27:26 2021 From: rene.peinthor at linbit.com (Rene Peinthor) Date: Fri, 7 May 2021 11:27:26 +0200 Subject: [DRBD-announce] linstor-server release 1.12.3 Message-ID: Hi All! There was still a problem with resource groups on postgresql DB backends, which should now be finally fixed and 2 smaller bug fixes as well made it into this release. linstor-server 1.12.3 --------------------- * Fix creating resource groups using postgresql backend * Fixed re-auto-placing resource-definition on moving to different rscgrp * Updated automatic-verify algorithm list for DRBD https://www.linbit.com/downloads/linstor/linstor-server-1.12.3.tar.gz Linstor PPA: https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack Cheers, Rene -------------- next part -------------- An HTML attachment was scrubbed... URL: From johannes at johannesthoma.com Mon May 17 15:37:25 2021 From: johannes at johannesthoma.com (Johannes Thoma) Date: Mon, 17 May 2021 15:37:25 +0200 Subject: [DRBD-announce] WinDRBD 1.0.0-rc12 released Message-ID: <1338eaaf-121b-9146-0c4a-99ac96990031@johannesthoma.com> Dear DRBD and WinDRBD users, We just released another rc: 1.0.0-rc12. This is a special release because in this release 3 bugs that affected system stability were fixed. This rc12 ran within a disconnect / connect loop via Windows firewall and I/O for 5 days without issues (releases up to rc11 used to crash after max. 2 hours for this test), so we think that this rc12 is a huge step forward towards a stable release. Here's the changes in rc12 from the WHATSNEW.md file: What's new in version 1.0.0-rc12 -------------------------------- Fixed a blue screen on disconnect: Reason was sock_really_free called wait_event while in APC (in interrupt) and wait_event tried to sleep. Fixed a missing mutex_lock in delete_multicast_elements_and_replies_for_file_object() which caused a BSOD from time to time. Fixed deadlock in recursive calls to rcu_read_lock (which are legal) (rcu_read_lock (A) / synchronize_rcu (B) / rcu_read_lock (A) where the inner rcu used to hang forever at DISPATCH_LEVEL. WinDRBD root directory is now relocatable (currently registry key WinDRBDRoot in HKLM/system/CurrentControlSet/services/WinDRBD must be edited manually, no installer support yet) As always an installable EXE can be downloaded from the Linbit homepage: the URL is: https://linbit.com/linbit-software-download-page-for-linstor-and-drbd-linux-driver/#windrbd Thank you for using WInDRBD we are always excited to hear feedback from you, Best regards, - Johannes -------------- next part -------------- An HTML attachment was scrubbed... URL: From moritz.wanzenboeck at linbit.com Mon May 17 16:25:16 2021 From: moritz.wanzenboeck at linbit.com (=?UTF-8?Q?Moritz_Wanzenb=C3=B6ck?=) Date: Mon, 17 May 2021 16:25:16 +0200 Subject: [DRBD-announce] linstor-operator 1.5.0 + linstor-csi 0.13.0 released Message-ID: Dear LINSTOR on Kubernetes users, We just released Linstor Operator 1.5.0 together with Linstor CSI 0.13.0. This release contains a new and exciting feature: Monitoring via Prometheus. With the help of the new drbd-reactor project, we can export metrics related to DRBD from each satellite. If you are already using the Prometheus Operator the Linstor Operator will even configure the ServiceMonitor resource to start fetching metrics. For more information, take a look at the updated user guide. [1] Behind the scenes, we cleaned up the labels applied to generated resources. Unless you have other services targeting Pods or configs created by Linstor Operator, this change should be completely transparent. It might take a bit longer for the upgrade to complete, as Deployments need to be deleted and recreated by the Operator instead of simply patched. Finally, the operator also defaults to the latest versions of DRBD 9.0 and Linstor. On the CSI front, this release contains mostly bug fixes. One new feature is the possibility to set any Linstor property in a storage class by using the "property.linstor.csi.linbit.com" prefix in parameters. [2] Unfortunately, the introduction of monitoring makes it necessary to upgrade the LinstorSatelliteSet CRD. This is not included in the normal upgrade path, and requires you to run these steps before the actual upgrade: $ helm repo update $ helm pull linstor/linstor --untar $ kubectl replace -f linstor/crds/ After this step is done, the usual procedure applies: $ helm upgrade linstor-op linstor/linstor -f orig.yaml For more information, please take a look at the upgrade guide. [3] Best regards, Moritz [1]: https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-kubernetes-monitoring [2]: https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-kubernetes-storage-class-properties [3]: https://linbit.com/drbd-user-guide/linstor-guide-1_0-en/#s-kubernetes-upgrade Linstor Operator Changelog -------------------------- Added: - All operator-managed workloads apply recommended labels. This requires the recreation of Deployments and DaemonSets on upgrade. This is automatically handled by the operator, however any customizations applied to the deployments not managed by the operator will be reverted in the process. - Use drbd-reactor to expose Prometheus endpoints on each satellite. - Configure `ServiceMonitor` resources if they are supported by the cluster (i.e. prometheus operator is configured) Changed: - CSI Nodes no longer use `hostNetwork: true`. The pods already got the correct hostname via the downwardAPI and do not talk to DRBD's netlink interface directly. - External: CSI snapshotter subchart now packages `v1` CRDs. Fixes deprecation warnings when installing the snapshot controller. - Default images: * Linstor Server v1.12.3 * Linstor CSI v0.13.0 * DRBD v9.0.29 Linstor CSI Driver ------------------ Added: - Allow setting arbitrary properties using a parameter prefixed with the ` property.linstor.csi.linbit.com` namespace. Changed: - Generate a resource group name if non was provided. The name is generated based on the provided parameters. Since storage classes are immutable, volumes provisioned using the same storage class will always receive the same resource group name. Removed: - Resource groups no longer update existing properties or remove additional properties if they not set in the storage class. A resource group is immutable from the linstor-csi's point of view. Fixed: - Failed snapshots are now cleaned up and retried properly. This mitigates an issue whereby the snapshot failed for one reason or other, but the snapshot controller contiously polls it for "completion". -------------- next part -------------- An HTML attachment was scrubbed... URL: From rene.peinthor at linbit.com Tue May 18 14:35:17 2021 From: rene.peinthor at linbit.com (Rene Peinthor) Date: Tue, 18 May 2021 14:35:17 +0200 Subject: [DRBD-announce] linstor-server release 1.12.4 Message-ID: Hi! Still some small problems with the 1.12 release, leading to this small bugfix release. linstor-server 1.12.4 --------------------- * Fix controller disable auto verify algorithm * Update auto-verify algorithm list to not use algorithms with optional keys * Exos: fixed setting correct properties * Exos: recache settings * Exos: support OverrideVlmId property https://www.linbit.com/downloads/linstor/linstor-server-1.12.4.tar.gz Linstor PPA: https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack Cheers, Rene -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland.kammerer at linbit.com Mon May 31 15:35:40 2021 From: roland.kammerer at linbit.com (Roland Kammerer) Date: Mon, 31 May 2021 15:35:40 +0200 Subject: [DRBD-announce] drbd-utils v9.18.0-rc.1 Message-ID: <20210531133540.GE1398@rck.sh> Dear DRBD users, this is the first release candidate of drbd-utils 9.18.0. This release is mainly triggered by features/scripts that drbd-reactor will depend on soon, but contains some additions on its own as well. There is no urgent need to upgrade. Notable changes: - as I mentioned previously in some discussion on drbd-user, the helper targets for rpm building are now gone. This allowed us to simplify the make-magic substantially. If you know rpm building you most likely have not used these "helpers" anyways. But if, you can find them in the git history/last release. Feel free to copy these to a GNUmakefile or whatever works for you. - drbd.service is basically a wrapper around the old and not too pretty init shell script. This release contains a set of proper service files that should be used instead. Documentation if currently "sparse" (i.e., does not exist). We will add documentation within drbd-utils and the users-guide soon. If you know systemd, the information in the commit message actually should be good enough: https://github.com/LINBIT/drbd-utils/commit/04eb3c13f062bc541683dc6d5f48a37f4a291f67 - the before mentioned systemd service templates that will be used by drbd-reactor and its promoter plugin soon. This will allow simple HA via systemd without the complexity of pacemaker. It will even support OCF resource agents, but all of that will be part of the soon to be released RC1 of drbd-reactor v0.4.0. Regards, rck 9.18.0-rc.1 ----------- * build: remove rpm related targets * drbdsetup,v84: fix minor compile warnings * systemd: resource specific activation * systemd: drbd-reactor promoter templates GIT: https://github.com/LINBIT/drbd-utils/commit/5d8a547e4c0afdbe84409cfdd6431d6082ca5317 PPA: https://launchpad.net/~linbit/+archive/ubuntu/linbit-drbd9-stack/ TGZ: https://linbit.com/downloads/drbd/utils/drbd-utils-9.18.0-rc.1.tar.gz -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 488 bytes Desc: not available URL: