[DRBD-user] Problem with disk errors

Lars Ellenberg lars.ellenberg at linbit.com
Mon Feb 2 09:42:39 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Fri, Jan 30, 2009 at 10:02:30AM -0600, Terry Hull wrote:
> I have had a strange problem with DRBD 8.3 twice.  The server running as a
> secondary had a disk problem and went diskless.  The primary then saw the
> secondary was diskless and showed the transition for the secondary from
> UpToDate to Diskless.  However, the primary still had problems with
> timeouts. My question is what do I need to do to allow the primary to run
> after the secondary has a disk problem?
> 
> Here is a small part of /var/log/messages on the primary.  Also, please note
> the Concurrent local write message:

what is writing to those DRBD?
complete io stack up to application?
usage pattern?

> Jan 29 11:10:32 bg-host-m2 iscsi_trgt: Logical Unit Reset (05) issued on
> tid:3 l
> un:2 by sid:1127000341282880 (Function Complete)
> Jan 29 11:10:32 bg-host-m2 drbd0: istiod3[22036] Concurrent local write
> detected
> ! [DISCARD L] new: 1471221576s +2048; pending: 1471221576s +2048
> Jan 29 11:10:32 bg-host-m2 drbd0: istiod3[22036] Concurrent local write
> detected

please avoid line wraps when pasting log information.

> ! [DISCARD L] new: 3606015618s +19968; pending: 3606015618s +19968
> Jan 29 11:10:57 bg-host-m2 iscsi_trgt: Logical Unit Reset (05) issued on
> tid:3 l
> un:2 by sid:1127000341282880 (Function Complete)
> Jan 29 11:10:59 bg-host-m2 ntpd[6722]: kernel time sync status change
> 4001Jan 29 11:11:06 bg-host-m2 drbd0: Got NegAck packet. Peer is in
> troubles?
> Jan 29 11:11:06 bg-host-m2 drbd0: Got NegAck packet. Peer is in troubles?Jan
> 29 11:11:06 bg-host-m2 drbd0: pdsk( UpToDate -> Diskless )
> Jan 29 11:11:06 bg-host-m2 drbd0: Creating new current UUIDJan 29 11:11:06
> bg-host-m2 drbd0: Got NegAck packet. Peer is in troubles?
> 
> Later on the primary:
> an 29 11:11:06 bg-host-m2 drbd0: istiod3[22035] Concurrent local write
> detected
> ! [DISCARD L] new: 3617118144s +3584; pending: 3617118144s +3584
> Jan 29 11:12:05 bg-host-m2 nrpe[18563]: Could not read request from client,
> bail
> ing out...
> Jan 29 11:12:18 bg-host-m2 INFO: task istiod5:22060 blocked for more than
> 120 seconds.

thats not my problem, unless you prove where is blocks, and that it
actually blocks within drbd.

> Jan 29 11:12:18 bg-host-m2 "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jan 29 11:12:18 bg-host-m2 istiod5       D 0000000000000000     0 22060

bleeding edge of iet?
or some home grown patches?

> 2Jan 29 11:12:18 bg-host-m2 ffff88006a523d30 0000000000000046
> 0000000000000806 ff
> ffffffa001764eJan 29 11:12:18 bg-host-m2 ffff88006a51abb0 ffff88006a51b140
> ffff88006a51ade0 00
> 000001a0018206
> Jan 29 11:12:18 bg-host-m2 0000000000000246 ffff88012e5f9c80
> ffff88012e379800 ffff88012c0237f0
> Jan 29 11:12:18 bg-host-m2 Call Trace:
> Jan 29 11:12:18 bg-host-m2 [<ffffffffa001764e>] megasas_make_sgl64+0x46/0x59
> [megaraid_sas]


but, apparently, it blocks in the megasas_make_sgl64,
so nothing we can do about that in drbd.

> Here is the secondary:
> an 29 11:11:06 bg-host-m1 sd 4:0:0:0: [sdb] Device not ready: ASC=0x4
> ASCQ=0x0
> Jan 29 11:11:06 bg-host-m1 end_request: I/O error, dev sdb, sector
> 1216507571
> Jan 29 11:11:06 bg-host-m1 drbd0: disk( UpToDate -> Failed )
> Jan 29 11:11:06 bg-host-m1 drbd0: Local IO failed. Detaching...
> Jan 29 11:11:06 bg-host-m1 drbd0: disk( Failed -> Diskless )
> Jan 29 11:11:06 bg-host-m1 drbd0: Notified peer that my disk is broken.
> 
> Then later on the secondary:
> Jan 29 11:13:29 bg-host-m1 INFO: task drbd0_worker:32651 blocked for more
> than 120 seconds.
> Jan 29 11:13:29 bg-host-m1 "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jan 29 11:13:29 bg-host-m1 drbd0_worker  D 000000000000000a     0 32651
> 2
> Jan 29 11:13:29 bg-host-m1 ffff88008f589e10 0000000000000046
> ffff8801088f0000 0000000000000000


and where does this one block?
probably also in your megasas thingy?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list