[DRBD-user] disk-timeout actually in deciseconds?

Lars Ellenberg lars.ellenberg at linbit.com
Wed Nov 26 14:27:09 CET 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On Wed, Nov 26, 2014 at 11:26:13AM +0100, Felix Zachlod wrote:
> I am currently investigating a problem with one drbd peer.
> 
> We are runnung 8.4.5 on this and had configured disk-timeout 300 in
> global-properties.
> 
> On Monday I observed a failing primary but could not see any hints
> for the reason.
> 
> Today I saw this in the log file:
> 
> Nov 26 07:56:33 philippus-arabs kernel: [118541.912505] block drbd10: Local backing device failed to meet the disk-timeout
> Nov 26 07:56:33 philippus-arabs kernel: [118541.912516] block drbd10: disk( UpToDate -> Failed )
> Nov 26 07:56:33 philippus-arabs kernel: [118541.912524] block drbd10: Local IO failed in request_timer_fn. Detaching...
> Nov 26 07:56:33 philippus-arabs kernel: [118541.912597] block drbd10: local WRITE IO error sector 635915776+5 on sdb1
> Nov 26 07:56:33 philippus-arabs kernel: [118541.913113] block drbd10: bitmap WRITE of 0 pages took 0 jiffies
> Nov 26 07:56:33 philippus-arabs kernel: [118541.913121] block drbd10: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> Nov 26 07:56:33 philippus-arabs kernel: [118541.913131] block drbd10: disk( Failed -> Diskless )
> Nov 26 07:56:33 philippus-arabs kernel: [118541.913819] block drbd10: receiver updated UUIDs to effective data uuid: C43F4D29D7F6A132
> Nov 26 07:56:33 philippus-arabs kernel: [118541.915874] block drbd10: Should have called drbd_al_complete_io(, 635915776, 2560),
> 
> I suspect this might have happend on monday with a subsequent kernel
> panic, I don't have seen the screen and the machine was reset after
> this.
> 
> But as I cannot find anything suspicious from the raid controller
> log (no failing disk, no command timeout, no bus reset) and have
> never seen any i/o access time on this disk subsystem even near 30s
> so I wondered if I really have configured 30s of disk-timeout or
> rather 3s (which I could imagine to occur in normale operation in a
> high i/o situation as this is a hard disk backed device which
> sometimes is under highly random i/o pressure).
> 
> Unfortunately I have no other suspicious information from syslog or
> dmesg, especially not for the first crash, nothing logged from drbd
> too.
> 
> So my questions are:
> 
> - is it possible to imagnine that drdb detaches a device and is not
> able to log this information because a kernel panic happens too fast
> afterwards or would drbd do this the other way round (log, detach,
> kernel panic)  (the syslog is on another backing device, attached to
> another disk controller)

DRBD "logging" is simply a printk.
Whether or not that makes it to stable storage via some syslog channel
or not is no longer in control of DRBD.
Especially if the storage in fact *did* have problems, I think it is
very unlikely that any logging would have made it to disk on that box...

Also: the disk-timeout option is *dangerous* and *may lead to kernel
panic*.  So don't use it (unless you are *very* certain that you know
what you are doing, and have a very good reason to do it).

More details in the man pages of drbd.conf and drbdsetup.

> - is it possible that the documentation is wrong about this param
> being specified in deciseconds?

No, the documentation is *right* and documents correctly that the unit
of this parameter is 0.1 seconds, 1/10 of a second, or 100ms (all the
same).  At least everywhere I looked.

Of course there may be bugs in our code, so if you should be able to
reproduce "misbehaviour", let us know.

> After attaching device the sync went through without problems and
> the node is up again serving normally.

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed



More information about the drbd-user mailing list