Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, Thank you for taking the time to review my (rather lengthy) description. > From your config below: >> local-io-error "/usr/lib/drbd/notify-io-error.sh; drbdadm detach $DRBD_RESOURCE"; > VERY bad idea. > Synchronously calling drbdadm from inside a synchronous handler will block > (until that drbdadm will eventually timeout, 121 seconds later). > And it is absolutely useless: DRBD will detach after local io error all by itself. This is not really obvious from the documentation: on-io-error handler handler is taken, if the lower level device reports io-errors to the upper layers. handler may be pass_on, call-local-io-error *or* detach. If on-io-error is set to call-local-io-error, DRBD will also detach? On a similar note, I've also been abusing the out-of-sync handler pretty much the same way: to issue a disconnect and reconnect on out of sync. How would DRBD need to be configured to automatically do a disconnect/reconnect after verify has found out-of-sync blocks? I don't really remember where I got the idea to use the handlers, but a quick Google shows this: "The reason of having out-of-sync handler is exactly tp provide possibility of automation." Perhaps it should be noted stronger in the documentation not to change DRBD states in the handlers? > I would expect that you have a few > "INFO: task drbd_w_... blocked for more than 120 seconds." > and then call traces in the kernel log as well? Not really, but I just found out my hung_task_warnings is set to 0. No idea why as I can't remember setting this neither does a search in /etc find anything related to this.I'll need to investigate more. > There. > This is actually something we may need to fix. > But that's in fact a multiple error scenario including misconfiguration we did not think of yet. There always is a multiple error scenario ;-) > Because we did not anticipate that you would block the worker, > and did call the handler before we notify the peer. . > *again* your blocking drbdadm detach from the local-io-error handler is the trigger. > Don't do that. You also should even background the "notify", if you really insist on using it. A local-io-error handler in the documentation example even has 'halt -f' in the sequence. How does that run before notifying the peer? However no documentation examples show handlers being backgrounded. ? > really? for 7 megabyte you want to pull ahead already? > you basically won't have a consistent secondary, ever. Another problem for me with DRBD. Without proxy, DRBD even in protocol A blocks when network buffer is full. If buffer is set to 10MB, pull ahead needs to be before buffer is full or DRBD blocks. Our write load is low enough not to trigger resync too often, but when it does I want it to pull ahead, not slow down to WAN link speed. Somehow I don't feel comfortable with the idea of having hundreds of MBs of network buffer in the kernel. In this case, the secondary is more a standby to try to have the last possible data, so I don't mind it being a bit out of sync from time to time. Regards, Saso Slavicic -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150126/dbc1a755/attachment.htm>