Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, Jan 30, 2009 at 10:02:30AM -0600, Terry Hull wrote: > I have had a strange problem with DRBD 8.3 twice. The server running as a > secondary had a disk problem and went diskless. The primary then saw the > secondary was diskless and showed the transition for the secondary from > UpToDate to Diskless. However, the primary still had problems with > timeouts. My question is what do I need to do to allow the primary to run > after the secondary has a disk problem? > > Here is a small part of /var/log/messages on the primary. Also, please note > the Concurrent local write message: what is writing to those DRBD? complete io stack up to application? usage pattern? > Jan 29 11:10:32 bg-host-m2 iscsi_trgt: Logical Unit Reset (05) issued on > tid:3 l > un:2 by sid:1127000341282880 (Function Complete) > Jan 29 11:10:32 bg-host-m2 drbd0: istiod3[22036] Concurrent local write > detected > ! [DISCARD L] new: 1471221576s +2048; pending: 1471221576s +2048 > Jan 29 11:10:32 bg-host-m2 drbd0: istiod3[22036] Concurrent local write > detected please avoid line wraps when pasting log information. > ! [DISCARD L] new: 3606015618s +19968; pending: 3606015618s +19968 > Jan 29 11:10:57 bg-host-m2 iscsi_trgt: Logical Unit Reset (05) issued on > tid:3 l > un:2 by sid:1127000341282880 (Function Complete) > Jan 29 11:10:59 bg-host-m2 ntpd[6722]: kernel time sync status change > 4001Jan 29 11:11:06 bg-host-m2 drbd0: Got NegAck packet. Peer is in > troubles? > Jan 29 11:11:06 bg-host-m2 drbd0: Got NegAck packet. Peer is in troubles?Jan > 29 11:11:06 bg-host-m2 drbd0: pdsk( UpToDate -> Diskless ) > Jan 29 11:11:06 bg-host-m2 drbd0: Creating new current UUIDJan 29 11:11:06 > bg-host-m2 drbd0: Got NegAck packet. Peer is in troubles? > > Later on the primary: > an 29 11:11:06 bg-host-m2 drbd0: istiod3[22035] Concurrent local write > detected > ! [DISCARD L] new: 3617118144s +3584; pending: 3617118144s +3584 > Jan 29 11:12:05 bg-host-m2 nrpe[18563]: Could not read request from client, > bail > ing out... > Jan 29 11:12:18 bg-host-m2 INFO: task istiod5:22060 blocked for more than > 120 seconds. thats not my problem, unless you prove where is blocks, and that it actually blocks within drbd. > Jan 29 11:12:18 bg-host-m2 "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jan 29 11:12:18 bg-host-m2 istiod5 D 0000000000000000 0 22060 bleeding edge of iet? or some home grown patches? > 2Jan 29 11:12:18 bg-host-m2 ffff88006a523d30 0000000000000046 > 0000000000000806 ff > ffffffa001764eJan 29 11:12:18 bg-host-m2 ffff88006a51abb0 ffff88006a51b140 > ffff88006a51ade0 00 > 000001a0018206 > Jan 29 11:12:18 bg-host-m2 0000000000000246 ffff88012e5f9c80 > ffff88012e379800 ffff88012c0237f0 > Jan 29 11:12:18 bg-host-m2 Call Trace: > Jan 29 11:12:18 bg-host-m2 [<ffffffffa001764e>] megasas_make_sgl64+0x46/0x59 > [megaraid_sas] but, apparently, it blocks in the megasas_make_sgl64, so nothing we can do about that in drbd. > Here is the secondary: > an 29 11:11:06 bg-host-m1 sd 4:0:0:0: [sdb] Device not ready: ASC=0x4 > ASCQ=0x0 > Jan 29 11:11:06 bg-host-m1 end_request: I/O error, dev sdb, sector > 1216507571 > Jan 29 11:11:06 bg-host-m1 drbd0: disk( UpToDate -> Failed ) > Jan 29 11:11:06 bg-host-m1 drbd0: Local IO failed. Detaching... > Jan 29 11:11:06 bg-host-m1 drbd0: disk( Failed -> Diskless ) > Jan 29 11:11:06 bg-host-m1 drbd0: Notified peer that my disk is broken. > > Then later on the secondary: > Jan 29 11:13:29 bg-host-m1 INFO: task drbd0_worker:32651 blocked for more > than 120 seconds. > Jan 29 11:13:29 bg-host-m1 "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jan 29 11:13:29 bg-host-m1 drbd0_worker D 000000000000000a 0 32651 > 2 > Jan 29 11:13:29 bg-host-m1 ffff88008f589e10 0000000000000046 > ffff8801088f0000 0000000000000000 and where does this one block? probably also in your megasas thingy? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed