[DRBD-user] DRBD device hang issue

박기혁 korea.oops at gmail.com
Mon Nov 23 06:15:42 CET 2020


Hello, Community

My system is using Pacemaker + DRBD + MySQL DB.
There is something unusual about your system.

kernel version: 3.10.0-693.el7.x86_64
drbd version: drbd90-utils-9.0.0-1.el7.elrepo.x86_64
kmod-drbd90-9.0.9-1.el7_4.elrepo.x86_64
DB version: mariadb-10.3.22

Issue Time: October 15, 2020, 18:13:47 to 18:13:55

   - When monitoring with IOSTAT, it detects 100% Utiliztion which cannot
   be IO-handled for the /dev/drbd0 device.
   command: iostat -td 1 -x

10/15/2020 06:13:47 PM
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await
w_await svctm %util
sda 0.00 0.00 0.00 6.00 0.00 24.00 8.00 0.00 0.17 0.00 0.17 0.17 0.10
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 6.00 0.00 24.00 8.00 0.00 0.17 0.00 0.17 0.17 0.10
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 3.00 0.00 1.50 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
drbd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 100.00
<<----- **

10/15/2020 06:13:52 PM
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await
w_await svctm %util
drbd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 100.00

10/15/2020 06:13:53 PM
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await
w_await svctm %util
drbd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 100.00

10/15/2020 06:13:54 PM
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await
w_await svctm %util
sda 0.00 0.00 0.00 6.00 0.00 24.00 8.00 0.00 0.67 0.00 0.67 0.67 0.40
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 6.00 0.00 24.00 8.00 0.00 0.67 0.00 0.67 0.67 0.40
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 3.00 0.00 1.50 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
drbd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 100.00

   - When monitoring the DRBD status, it detects that an upper-pending has
   occurred.
   exists resource name:drbd01 role:Primary suspended:no
   write-ordering:flush
   exists connection name:drbd01 peer-node-id:2 conn-name:node2
   connection:Connected role:Secondary congested:no
   exists device name:drbd01 volume:0 minor:0 disk:UpToDate client:no
   size:1610559452 read:17730265 written:69192955 al-writes:16100 bm-writes:0
   upper-pending:1 lower-pending:0 al-suspended:no blocked:no
   exists peer-device name:drbd01 peer-node-id:2 conn-name:node2 volume:0
   replication:Established peer-disk:UpToDate peer-client:no
   resync-suspended:no received:8483 sent:69184366 out-of-sync:0 *pending:1*
   unacked:0
   exists -


   - upper-pending (application pending) : Number of block I/O requests
   forwarded to DRBD, but not yet answered by DRBD


   -

   When you check Mysql Slow Query, the response was received after 9
   seconds after the IO Hang was finished after the Query request.
   User at Host: nodeapp[nodeapp] @ [100.100.100.142]
   Thread_id: 7879 Schema: MYMQDB QC_hit: No
   Query_time: 9.492522 Lock_time: 0.000058 Rows_sent: 0 Rows_examined: 1
   Rows_affected: 1 Bytes_sent: 52
   use MYMQDB;
   SET timestamp=1602753235;
   UPDATE ACTIVEMQ_LOCK SET BROKER_NAME='node2', TIME=1602753250881 WHERE
   BROKER_NAME='node2' AND ID = 1;
   -

   drbd configuration
   disk {
   on-io-error detach;
   no-disk-flushes ;
   no-disk-barrier;
   c-plan-ahead 0;
   c-fill-target 24M;
   c-min-rate 80M;
   c-max-rate 720M;
   }
   net {
   max-buffers 36k;
   sndbuf-size 1024k ;
   rcvbuf-size 2048k;
   }

In conclusion, the %util level in the DRBD device is 100%, but there is no
read write at this time, and the slow time of MySQL is the same as the time
of 100% duration.

Does anyone know a similar case or solution to this phenomenon?

Hang does not occur if drbd is operated as single.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20201123/7d0a151b/attachment.htm>


More information about the drbd-user mailing list