Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Dec 02, 2008 at 04:49:18PM +0000, NM wrote:
> (I apologize for the dupe, I just posted this on the devel list but I
> intended to post this here)
>
>
> A weird problem hit me today: I changed the MTU on the standby node in an
> active/passive PG cluster based on drbd; this caused a freeze of exactly
> 15 min on the drbd device, during which all postgres threads couldn't
> commit.
>
> Any idea why the timeout was so long?
nope.
> Note that the two nodes are in separate locations, linked by a (currently
> mostly idle) 100Mbps bridge.
>
>
> This is the last message from postgres in /var/log/messages:
>
> Dec 2 10:20:18 alice postgres[28729]: [38440-1] 2008-12-02 10:20:18 GMT
> radiusdb 192.168.0.51 28729 48fff5bc.7039 SELECTLOG: duration: 0.183 ms
>
> Nothing happens for 15 mins, until this shows up:
>
> Dec 2 10:35:12 alice kernel: drbd1: sock_recvmsg returned -110
-ETIMEDOUT
hm.
should have timedout within 6 seconds, as that is our default timeout.
strangeness in the tcp stack, I guess.
> Dec 2 10:35:12 alice kernel: drbd1: peer( Secondary -> Unknown ) conn
> ( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknow
> n )
> Dec 2 10:35:12 alice kernel: drbd1: short read expecting header on sock:
> r=-110
> Dec 2 10:35:12 alice kernel: drbd1: asender terminated
> Dec 2 10:35:12 alice kernel: drbd1: Terminating asender thread
> Dec 2 10:35:12 alice kernel: drbd1: Creating new current UUID
> Dec 2 10:35:12 alice kernel: drbd1: Writing meta data super block now.
> Dec 2 10:35:12 alice kernel: drbd1: tl_clear()
> Dec 2 10:35:12 alice kernel: drbd1: Connection closed
> Dec 2 10:35:12 alice kernel: drbd1: conn( BrokenPipe -> Unconnected )
> Dec 2 10:35:12 alice kernel: drbd1: receiver terminated
> Dec 2 10:35:12 alice kernel: drbd1: receiver (re)started
>
> This is:
>
> # uname -a
> Linux alkaid 2.6.18-92.1.13.el5 #1 SMP Thu Sep 4 03:51:21 EDT 2008 x86_64
> x86_64 x86_64 GNU/Linux
> # rpm -qa |grep drbd
> drbd-km-2.6.18_92.1.13.el5-8.2.6-3
> drbd-8.2.6-3
>
>
> Here is my drbd.conf:
>
> common {
> protocol C;
> startup {
> wfc-timeout 10;
> degr-wfc-timeout 10;
> }
>
> disk {
> on-io-error detach;
> }
>
> net {
> cram-hmac-alg "sha1";
> shared-secret "xxxxxxx";
> }
>
> syncer {
> rate 10M;
> verify-alg md5;
> }
> }
>
> resource rb {
>
> on alice {
> device /dev/drbd1;
> disk /dev/System/data_share;
> address 192.168.5.21:7789;
> meta-disk internal;
> }
> on bob {
> device /dev/drbd1;
> disk /dev/System/data_share;
> address 192.168.5.22:7789;
> meta-disk internal;
> }
> }
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed