Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Dec 02, 2008 at 04:49:18PM +0000, NM wrote: > (I apologize for the dupe, I just posted this on the devel list but I > intended to post this here) > > > A weird problem hit me today: I changed the MTU on the standby node in an > active/passive PG cluster based on drbd; this caused a freeze of exactly > 15 min on the drbd device, during which all postgres threads couldn't > commit. > > Any idea why the timeout was so long? nope. > Note that the two nodes are in separate locations, linked by a (currently > mostly idle) 100Mbps bridge. > > > This is the last message from postgres in /var/log/messages: > > Dec 2 10:20:18 alice postgres[28729]: [38440-1] 2008-12-02 10:20:18 GMT > radiusdb 192.168.0.51 28729 48fff5bc.7039 SELECTLOG: duration: 0.183 ms > > Nothing happens for 15 mins, until this shows up: > > Dec 2 10:35:12 alice kernel: drbd1: sock_recvmsg returned -110 -ETIMEDOUT hm. should have timedout within 6 seconds, as that is our default timeout. strangeness in the tcp stack, I guess. > Dec 2 10:35:12 alice kernel: drbd1: peer( Secondary -> Unknown ) conn > ( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknow > n ) > Dec 2 10:35:12 alice kernel: drbd1: short read expecting header on sock: > r=-110 > Dec 2 10:35:12 alice kernel: drbd1: asender terminated > Dec 2 10:35:12 alice kernel: drbd1: Terminating asender thread > Dec 2 10:35:12 alice kernel: drbd1: Creating new current UUID > Dec 2 10:35:12 alice kernel: drbd1: Writing meta data super block now. > Dec 2 10:35:12 alice kernel: drbd1: tl_clear() > Dec 2 10:35:12 alice kernel: drbd1: Connection closed > Dec 2 10:35:12 alice kernel: drbd1: conn( BrokenPipe -> Unconnected ) > Dec 2 10:35:12 alice kernel: drbd1: receiver terminated > Dec 2 10:35:12 alice kernel: drbd1: receiver (re)started > > This is: > > # uname -a > Linux alkaid 2.6.18-92.1.13.el5 #1 SMP Thu Sep 4 03:51:21 EDT 2008 x86_64 > x86_64 x86_64 GNU/Linux > # rpm -qa |grep drbd > drbd-km-2.6.18_92.1.13.el5-8.2.6-3 > drbd-8.2.6-3 > > > Here is my drbd.conf: > > common { > protocol C; > startup { > wfc-timeout 10; > degr-wfc-timeout 10; > } > > disk { > on-io-error detach; > } > > net { > cram-hmac-alg "sha1"; > shared-secret "xxxxxxx"; > } > > syncer { > rate 10M; > verify-alg md5; > } > } > > resource rb { > > on alice { > device /dev/drbd1; > disk /dev/System/data_share; > address 192.168.5.21:7789; > meta-disk internal; > } > on bob { > device /dev/drbd1; > disk /dev/System/data_share; > address 192.168.5.22:7789; > meta-disk internal; > } > } -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed