Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
DRBD Bug report - I/O hangs on primary node Hello everyone, I use DRBD as a backup solution for many servers, one big storage server backs up several smaller servers. SMALL SERVERS My Small servers are KVM Hypervisors. I work with drbd over local storage. HW Configuration: Dual AMD Opteron 6272, 128 GB RAM, Dual 1 Gbps Intel 82576 NICs. SW Configuration: Centos 7.0 Kernel 3.10.0-123.el7.x86_64, DRBD 8.4.7-2 GIT-hash: 15c1a783b673d9c1a2a9f9de38b3d7e98c275709 The DRBD volume on the small servers works over a 4TB LSI Hardware RAID with DRBD external metadata over a SSD partition. I use the DRBD block device as Physical Volume for LVM, where I create Logical Volumes to represent RAW Virtual Hard disks for KVM Virtual Machines. THE BIG BACKUP SERVER The Big server is also Dual AMD Opteron 6272, 128 GB RAM, Dual 1 Gbps Intel 82576 NICs. SW Configuration: Centos 7.0 Kernel 3.10.0-123.el7.x86_64, DRBD 8.4.7-2 GIT-hash: 15c1a783b673d9c1a2a9f9de38b3d7e98c275709 I called it "BIG" because it has 60TB of storage over an Adaptec Hardware RAID. Its role is only to use DRBD to replicate the data from the small servers. I created 15 GPT 4TB partitions over that harware RAID, each one presents the backup volume for one of the small servers. I use also a SSD for external metadata, I have a LVM VG group called "metadata" where I created one Logical Volume for the metadata of each DRBD resource. THE BACKUP PROCESS All DRBD volumes works with protocol A, the small servers always hold the primary role and the corresponding drbd resource works disconnected most of the time. In the backup server I continuously execute a backup script, It iterates over all small servers in order to backup their data, one server at a time, the backup script connects to each small server via SSH, and executes the following commands as root in each small server: drbdadm connect <resource> drbdadm resume-sync <resource> <wait until the resource reaches the state UpToDate/UpToDate> drbdadm pause-sync <resource> drbdadm disconnect <resource> Then it continues to the next small server. This process runs all the time without stops. THE PROBLEM Everything works fine most of the time, but at least once a month i/o stops on at least one small server. (All small servers have experimented the problem). The DRBD resource "hangs" on the primary node and it is not possible to write any data to it. Of course all Virtual Machines running on the Hypervisor stop working and the server's Load Average grows without limit. Operating system boot does not work because it hangs when trying to umount filesystems that were mounted from the DRBD resource. A Hardware Boot is always required to recover control of the system. A normal backup session (DRBD resync) on the small servers generates the following set of messages in the syslog: This is actually the last successful sync session before the failure: May 1 10:48:39 node-80-13 kernel: drbd node-80-13: conn( StandAlone -> Unconnected ) May 1 10:48:39 node-80-13 kernel: drbd node-80-13: Starting receiver thread (from drbd_w_node-80- [3406]) May 1 10:48:39 node-80-13 kernel: drbd node-80-13: receiver (re)started May 1 10:48:39 node-80-13 kernel: drbd node-80-13: conn( Unconnected -> WFConnection ) May 1 10:48:39 node-80-13 kernel: drbd node-80-13: Handshake successful: Agreed network protocol version 101 May 1 10:48:39 node-80-13 kernel: drbd node-80-13: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME. May 1 10:48:39 node-80-13 kernel: drbd node-80-13: conn( WFConnection -> WFReportParams ) May 1 10:48:39 node-80-13 kernel: drbd node-80-13: Starting ack_recv thread (from drbd_r_node-80- [2607]) May 1 10:48:39 node-80-13 kernel: block drbd13: drbd_sync_handshake: May 1 10:48:39 node-80-13 kernel: block drbd13: self C38A5875B169C401:0690DCE7CD1B204B:F1B7EF7B79DF16F3:F1B6EF7B7 9DF16F3 bits:45856 flags:0 May 1 10:48:39 node-80-13 kernel: block drbd13: peer 0690DCE7CD1B204A:0000000000000000:F1B7EF7B79DF16F2:F1B5EF7B7 9DF16F3 bits:0 flags:0 May 1 10:48:39 node-80-13 kernel: block drbd13: uuid_compare()=1 by rule 70 May 1 10:48:39 node-80-13 kernel: block drbd13: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent ) May 1 10:48:39 node-80-13 kernel: block drbd13: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 8123(2), tota l 8123; compression: 100.0% May 1 10:48:39 node-80-13 kernel: block drbd13: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 8123(2), t otal 8123; compression: 100.0% May 1 10:48:39 node-80-13 kernel: block drbd13: helper command: /sbin/drbdadm before-resync-source minor-13 May 1 10:48:39 node-80-13 kernel: block drbd13: helper command: /sbin/drbdadm before-resync-source minor-13 exit code 0 (0x0) May 1 10:48:39 node-80-13 kernel: block drbd13: conn( WFBitMapS -> PausedSyncS ) pdsk( Consistent -> Inconsistent ) May 1 10:48:39 node-80-13 kernel: block drbd13: Began resync as PausedSyncS (will sync 183436 KB [45859 bits set] ). May 1 10:48:39 node-80-13 kernel: block drbd13: updated sync UUID C38A5875B169C401:0691DCE7CD1B204B:0690DCE7CD1B2 04B:F1B7EF7B79DF16F3 May 1 10:48:42 node-80-13 kernel: block drbd13: conn( PausedSyncS -> SyncSource ) user_isp( 1 -> 0 ) May 1 10:48:42 node-80-13 kernel: block drbd13: Syncer continues. May 1 10:48:47 node-80-13 kernel: block drbd13: Resync done (total 7 sec; paused 2 sec; 36684 K/sec) May 1 10:48:47 node-80-13 kernel: block drbd13: updated UUIDs C38A5875B169C401:0000000000000000:0691DCE7CD1B204B: 0690DCE7CD1B204B May 1 10:48:47 node-80-13 kernel: block drbd13: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) May 1 10:48:52 node-80-13 kernel: block drbd13: user_isp( 0 -> 1 ) May 1 10:48:54 node-80-13 kernel: drbd node-80-13: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) May 1 10:48:54 node-80-13 kernel: drbd node-80-13: ack_receiver terminated May 1 10:48:54 node-80-13 kernel: drbd node-80-13: Terminating drbd_a_node-80- May 1 10:48:54 node-80-13 kernel: block drbd13: new current UUID 598C5E0CEB5F8A4D:C38A5875B169C401:0691DCE7CD1B20 4B:0690DCE7CD1B204B May 1 10:48:54 node-80-13 kernel: drbd node-80-13: Connection closed May 1 10:48:54 node-80-13 kernel: drbd node-80-13: conn( Disconnecting -> StandAlone ) May 1 10:48:54 node-80-13 kernel: drbd node-80-13: receiver terminated May 1 10:48:54 node-80-13 kernel: drbd node-80-13: Terminating drbd_r_node-80- However the next backup session did not finish successfully: May 1 10:52:32 node-80-13 kernel: drbd node-80-13: conn( StandAlone -> Unconnected ) May 1 10:52:32 node-80-13 kernel: drbd node-80-13: Starting receiver thread (from drbd_w_node-80- [3406]) May 1 10:52:32 node-80-13 kernel: drbd node-80-13: receiver (re)started May 1 10:52:32 node-80-13 kernel: drbd node-80-13: conn( Unconnected -> WFConnection ) May 1 10:52:33 node-80-13 kernel: drbd node-80-13: Handshake successful: Agreed network protocol version 101 May 1 10:52:33 node-80-13 kernel: drbd node-80-13: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME. May 1 10:52:33 node-80-13 kernel: drbd node-80-13: conn( WFConnection -> WFReportParams ) May 1 10:52:33 node-80-13 kernel: drbd node-80-13: Starting ack_recv thread (from drbd_r_node-80- [5857]) In this moment i/o hangs and the DRBD resource remains forever in the WFReportParams state. From the BIG Server's perspective this is a successful session: May 1 10:00:02 storage-80-101 kernel: drbd node-80-13: Handshake successful: Agreed network protocol version 101 May 1 10:00:02 storage-80-101 kernel: drbd node-80-13: Feature flags enabled on protocol level: 0x7 TRIM THIN_RES YNC WRITE_SAME. May 1 10:00:02 storage-80-101 kernel: drbd node-80-13: conn( WFConnection -> WFReportParams ) May 1 10:00:02 storage-80-101 kernel: drbd node-80-13: Starting ack_recv thread (from drbd_r_node-80- [9039]) May 1 10:00:02 storage-80-101 kernel: block drbd13: drbd_sync_handshake: May 1 10:00:02 storage-80-101 kernel: block drbd13: self 86CB6F103DB64E5A:0000000000000000:86A850CE30BAFCC2:46121 5DE40F8C736 bits:0 flags:0 May 1 10:00:02 storage-80-101 kernel: block drbd13: peer BAA6F74140F9B5A1:86CB6F103DB64E5B:86A850CE30BAFCC2:46121 5DE40F8C737 bits:51667 flags:0 May 1 10:00:02 storage-80-101 kernel: block drbd13: uuid_compare()=-1 by rule 50 May 1 10:00:02 storage-80-101 kernel: block drbd13: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) May 1 10:00:02 storage-80-101 kernel: block drbd13: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 8985(3 ), total 8985; compression: 100.0% May 1 10:00:02 storage-80-101 kernel: block drbd13: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 8985(3), total 8985; compression: 100.0% May 1 10:00:02 storage-80-101 kernel: block drbd13: conn( WFBitMapT -> WFSyncUUID ) May 1 10:00:02 storage-80-101 kernel: block drbd13: updated sync uuid 86CC6F103DB64E5A:0000000000000000:86A850CE3 0BAFCC2:461215DE40F8C736 May 1 10:00:02 storage-80-101 kernel: block drbd13: helper command: /sbin/drbdadm before-resync-target minor-13 May 1 10:00:02 storage-80-101 kernel: block drbd13: helper command: /sbin/drbdadm before-resync-target minor-13 e xit code 0 (0x0) May 1 10:00:02 storage-80-101 kernel: block drbd13: conn( WFSyncUUID -> PausedSyncT ) disk( Outdated -> Inconsist ent ) May 1 10:00:02 storage-80-101 kernel: block drbd13: Began resync as PausedSyncT (will sync 209244 KB [52311 bits set]). May 1 10:00:04 storage-80-101 kernel: block drbd13: conn( PausedSyncT -> SyncTarget ) peer_isp( 1 -> 0 ) May 1 10:00:04 storage-80-101 kernel: block drbd13: Syncer continues. May 1 10:00:31 storage-80-101 kernel: block drbd13: Resync done (total 29 sec; paused 2 sec; 7748 K/sec) May 1 10:00:31 storage-80-101 kernel: block drbd13: updated UUIDs BAA6F74140F9B5A0:0000000000000000:86CC6F103DB64E5A:86CB6F103DB64E5B May 1 10:00:31 storage-80-101 kernel: block drbd13: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) May 1 10:00:31 storage-80-101 kernel: block drbd13: helper command: /sbin/drbdadm after-resync-target minor-13 May 1 10:00:31 storage-80-101 kernel: block drbd13: helper command: /sbin/drbdadm after-resync-target minor-13 exit code 0 (0x0) May 1 10:00:34 storage-80-101 kernel: block drbd13: conn( Connected -> Behind ) disk( UpToDate -> Outdated ) May 1 10:00:35 storage-80-101 kernel: block drbd13: peer_isp( 0 -> 1 ) May 1 10:00:35 storage-80-101 kernel: block drbd13: updated sync uuid 85C59D47AB684CF0:0000000000000000:86CC6F103DB64E5A:86CB6F103DB64E5B May 1 10:00:35 storage-80-101 kernel: block drbd13: helper command: /sbin/drbdadm before-resync-target minor-13 May 1 10:00:35 storage-80-101 kernel: block drbd13: helper command: /sbin/drbdadm before-resync-target minor-13 exit code 0 (0x0) May 1 10:00:35 storage-80-101 kernel: block drbd13: conn( Behind -> PausedSyncT ) disk( Outdated -> Inconsistent ) May 1 10:00:35 storage-80-101 kernel: block drbd13: Began resync as PausedSyncT (will sync 7860 KB [1965 bits set]). May 1 10:00:36 storage-80-101 kernel: block drbd13: conn( PausedSyncT -> Behind ) May 1 10:00:37 storage-80-101 kernel: block drbd13: peer_isp( 1 -> 0 ) May 1 10:00:37 storage-80-101 kernel: drbd node-80-13: peer( Primary -> Unknown ) conn( Behind -> TearDown ) pdsk( UpToDate -> DUnknown ) May 1 10:00:37 storage-80-101 kernel: drbd node-80-13: ack_receiver terminated May 1 10:00:37 storage-80-101 kernel: drbd node-80-13: Terminating drbd_a_node-80- May 1 10:00:37 storage-80-101 kernel: drbd node-80-13: Connection closed May 1 10:00:37 storage-80-101 kernel: drbd node-80-13: conn( TearDown -> Unconnected ) May 1 10:00:37 storage-80-101 kernel: drbd node-80-13: receiver terminated May 1 10:00:37 storage-80-101 kernel: drbd node-80-13: Restarting receiver thread May 1 10:00:37 storage-80-101 kernel: drbd node-80-13: receiver (re)started May 1 10:00:37 storage-80-101 kernel: drbd node-80-13: conn( Unconnected -> WFConnection ) And this is the failure from the BIG server's perspective: May 1 10:52:33 storage-80-101 kernel: drbd node-80-13: Handshake successful: Agreed network protocol version 101 May 1 10:52:33 storage-80-101 kernel: drbd node-80-13: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME. May 1 10:52:33 storage-80-101 kernel: drbd node-80-13: conn( WFConnection -> WFReportParams ) May 1 10:52:33 storage-80-101 kernel: drbd node-80-13: Starting ack_recv thread (from drbd_r_node-80- [9039]) May 1 10:52:33 storage-80-101 kernel: block drbd13: drbd_sync_handshake: May 1 10:52:33 storage-80-101 kernel: block drbd13: self C38A5875B169C400:0000000000000000:0691DCE7CD1B204A:0690DCE7CD1B204B bits:0 flags:0 May 1 10:52:33 storage-80-101 kernel: block drbd13: peer 598C5E0CEB5F8A4D:C38A5875B169C401:0691DCE7CD1B204B:0690DCE7CD1B204B bits:44218 flags:0 May 1 10:52:33 storage-80-101 kernel: block drbd13: uuid_compare()=-1 by rule 50 May 1 10:52:33 storage-80-101 kernel: block drbd13: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) CONFIGURATION DETAILS global_common.conf (for both small and big servers) global { usage-count yes; } common { disk { resync-rate 200M; no-disk-barrier; al-extents 3389; disk-barrier no; } syncer { rate 200M; } net { protocol A; sndbuf-size 4000k; max-buffers 8000; max-epoch-size 8000; on-congestion pull-ahead; congestion-fill 3600k; congestion-extents 3000; } } An example resource file: resource node-80-13 { on small_server { device /dev/drbd13; disk /dev/sda1; address 10.80.1.13:8013; meta-disk /dev/sdb2; } on big_server { device /dev/drbd13; disk /dev/sda13; address 10.80.1.101:8013; meta-disk /dev/meta/node-80-13; } } LAST COMMENTS We have experimented this problem for about 3 years, we have installed all DRBD releases from the 8.4.X branch during that time, but the problem remains. We have reduced the frequency of the problem by only running the backup process when Load Average, CPU usage and i/o are in low levels in the small servers. So our backup script checks all those parameters via SNMP before executing the sync process. If you need any additional information, please let me know. Thanks a lot for your time and help. Best regards, Miguel Castellanos -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160502/c63485cf/attachment.htm>