[DRBD-user] Crash - Centos 5.3, drbd83 RPM [SOLVED]

David Buechler dbuechler at cablemo.com
Tue Sep 22 07:11:29 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Mark,

    You were correct.  DRBD was not the source of the problem.  Postdrop
appears to consume significant amounts of memory when unable to write to
the maildrop directory in /var/spool/postfix.  Guess what's on the DRBD
resource? ;-)  I've worked out a solution that should solve the problem
until the mail service here is overhauled early next year.

    Thanks for pointing me in the right direction!

-- 
Regards,
David A. Buechler
System Administrator,
CableAmerica Missouri


Mark Watts wrote:
> On Thu, 2009-09-17 at 15:29 -0700, Dave Buechler wrote:
>   
>> Memo to Self:  Self, try Reply-All if you want more than one person to 
>> look at your problem. ;-)
>>
>> Original message:
>>
>>
>> Ok.  I have six hours of log data, but it all says pretty much the same
>> thing.  Here's the first 15 minutes or so.  If you want more, I'll send
>> more.
>>
>> Thanks for looking at this.
>>
>>     
>
> Looks like you ran out of ram (and swap) and the Out Of Memory (OOM)
> killer started taking a hatchet to your system.
>
> I wouldn't have thought DRBD itself was responsible for this; DRBD says
> its sync'd OK its unlikely that there would be any outstanding writes
> being buffered, so I suspect something else was eating all your ram.
>
> Mark.
>
>   
>> -- 
>> Regards,
>> David A. Buechler
>> System Administrator,
>> CableAmerica Missouri
>>
>>
>>
>> /var/log/messages:
>> Sep 15 04:17:38 smtp2 heartbeat: [2717]: WARN: Gmain_timeout_dispatch:
>> Dispatch function for send local status took too long to execute: 150 ms
>> (> 50 ms) (GSource: 0x1aacc9f8)
>> Sep 15 04:19:38 smtp2 snmpd[2489]: Connection from UDP:
>> [206.16.46.12]:46136
>> Sep 15 04:19:38 smtp2 snmpd[2489]: Received SNMP packet(s) from UDP:
>> [206.16.46.12]:46136
>> Sep 15 04:19:42 smtp2 snmpd[2489]: Connection from UDP:
>> [206.16.46.12]:46136
>> Sep 15 04:19:49 smtp2 last message repeated 35 times
>> Sep 15 04:31:41 smtp2 kernel: block drbd0: sock was shut down by peer
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: peer( Primary -> Unknown )
>> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: short read expecting header
>> on sock: r=0
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: md_sync_timer expired! Worker
>> calls drbd_md_sync().
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: msock_sendmsg returned -32
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: short sent PingAck size=8 sent=0
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: asender terminated
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Terminating asender thread
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Connection closed
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( BrokenPipe ->
>> Unconnected )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: receiver terminated
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Restarting receiver thread
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: receiver (re)started
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( Unconnected ->
>> WFConnection )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Handshake successful: Agreed
>> network protocol version 90
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( WFConnection ->
>> WFReportParams )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Starting asender thread (from
>> drbd0_receiver [2581])
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: data-integrity-alg: <not-used>
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: drbd_sync_handshake:
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: self
>> E6B7DB61310BE3A6:0000000000000000:B939FB24AD00D552:7E8DF909A8E37EDF
>> bits:0 flags:0
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: peer
>> E720051EDAD32591:E6B7DB61310BE3A7:B939FB24AD00D552:7E8DF909A8E37EDF
>> bits:1090 flags:0
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: uuid_compare()=-1 by rule 5
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: peer( Unknown -> Primary )
>> conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: md_sync_timer expired! Worker
>> calls drbd_md_sync().
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0 exit code 0 (0x0)
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( WFSyncUUID ->
>> SyncTarget ) disk( UpToDate -> Inconsistent )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Began resync as SyncTarget
>> (will sync 4360 KB [1090 bits set]).
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Resync done (total 3 sec;
>> paused 0 sec; 1452 K/sec)
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( SyncTarget -> Connected
>> ) disk( Inconsistent -> UpToDate )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: helper command: /sbin/drbdadm
>> after-resync-target minor-0
>> Sep 15 04:31:42 smtp2 kernel: drbdadm invoked oom-killer:
>> gfp_mask=0x84d0, order=0, oomkilladj=0
>> Sep 15 04:31:42 smtp2 kernel:
>> Sep 15 04:31:42 smtp2 kernel: Call Trace:
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff800c3bda>] out_of_memory+0x8e/0x2f3
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8000f2ea>]
>> __alloc_pages+0x245/0x2ce
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8002309f>]
>> alloc_page_interleave+0x3d/0x74
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8002b5e5>]
>> get_zeroed_page+0x21/0x82
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8002d80c>] __pmd_alloc+0x14/0x8c
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80008127>]
>> copy_page_range+0x1f3/0x73e
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff800d6f0e>]
>> alternate_node_alloc+0x70/0x8c
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8001f847>]
>> copy_process+0xd30/0x15b8
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80030d3a>] do_fork+0x69/0x1c1
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8005d427>]
>> ptregscall_common+0x67/0xac
>> Sep 15 04:31:42 smtp2 kernel:
>> Sep 15 04:31:42 smtp2 kernel: Mem-info:
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA per-cpu:
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 hot: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 cold: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 hot: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 cold: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32 per-cpu:
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 hot: high 186, batch 31 used:48
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 cold: high 62, batch 15 used:40
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 hot: high 186, batch 31 used:156
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 cold: high 62, batch 15 used:60
>> Sep 15 04:31:42 smtp2 kernel: Node 0 Normal per-cpu: empty
>> Sep 15 04:31:42 smtp2 kernel: Node 0 HighMem per-cpu: empty
>> Sep 15 04:31:42 smtp2 kernel: Free pages:        7860kB (0kB HighMem)
>> Sep 15 04:31:42 smtp2 kernel: Active:222971 inactive:131062 dirty:0
>> writeback:0 unstable:0 free:1965 slab:32477 mapped-file:2705
>> mapped-anon:353176 pagetables:97307
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA free:2152kB min:28kB low:32kB
>> high:40kB active:0kB inactive:0kB present:10716kB pages_scanned:0
>> all_unreclaimable? yes
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 2003 2003 2003
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32 free:5708kB min:5708kB
>> low:7132kB high:8560kB active:891884kB inactive:524248kB
>> present:2051184kB pages_scanned:2988074 all_unreclaimable? yes
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 Normal free:0kB min:0kB low:0kB
>> high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0
>> all_unreclaimable? no
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 HighMem free:0kB min:128kB
>> low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0
>> all_unreclaimable? no
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA: 2*4kB 4*8kB 2*16kB 5*32kB
>> 4*64kB 1*128kB 2*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2152kB
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32: 129*4kB 1*8kB 2*16kB 13*32kB
>> 4*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 5708kB
>> Sep 15 04:31:42 smtp2 kernel: Node 0 Normal: empty
>> Sep 15 04:31:42 smtp2 kernel: Node 0 HighMem: empty
>> Sep 15 04:31:42 smtp2 kernel: 6436 pagecache pages
>> Sep 15 04:31:42 smtp2 kernel: Swap cache: add 613470, delete 609770,
>> find 99957/110522, race 3+8
>> Sep 15 04:31:42 smtp2 kernel: Free swap  = 0kB
>> Sep 15 04:31:42 smtp2 kernel: Total swap = 2096472kB
>> Sep 15 04:31:42 smtp2 kernel: Free swap:            0kB
>> Sep 15 04:31:42 smtp2 kernel: 524000 pages of RAM
>> Sep 15 04:31:42 smtp2 kernel: 9347 reserved pages
>> Sep 15 04:31:42 smtp2 kernel: 1057738 pages shared
>> Sep 15 04:31:42 smtp2 kernel: 3700 pages swap cached
>> Sep 15 04:31:42 smtp2 kernel: Out of memory: Killed process 3250 (crond).
>> Sep 15 04:31:42 smtp2 kernel: postdrop invoked oom-killer:
>> gfp_mask=0x201d2, order=0, oomkilladj=0
>> Sep 15 04:31:42 smtp2 kernel:
>> Sep 15 04:31:42 smtp2 kernel: Call Trace:
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff800c3bda>] out_of_memory+0x8e/0x2f3
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8000f2ea>]
>> __alloc_pages+0x245/0x2ce
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80012a77>]
>> __do_page_cache_readahead+0xa2/0x1e6
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80054ab8>] sock_sendmsg+0xf3/0x110
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8000e87a>] link_path_walk+0xd3/0xe5
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80013406>]
>> filemap_nopage+0x148/0x322
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8000886a>]
>> __handle_mm_fault+0x1f8/0xe5c
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80066b9f>]
>> do_page_fault+0x4cb/0x830
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80059fdf>] hrtimer_cancel+0xc/0x16
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80063db7>] do_nanosleep+0x47/0x70
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80059ecc>]
>> hrtimer_nanosleep+0x58/0x118
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8005dde9>] error_exit+0x0/0x84
>> Sep 15 04:31:42 smtp2 kernel:
>> Sep 15 04:31:42 smtp2 kernel: Mem-info:
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA per-cpu:
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 hot: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 cold: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 hot: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 cold: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32 per-cpu:
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 hot: high 186, batch 31 used:57
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 cold: high 62, batch 15 used:37
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 hot: high 186, batch 31 used:166
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 cold: high 62, batch 15 used:14
>> Sep 15 04:31:42 smtp2 kernel: Node 0 Normal per-cpu: empty
>> Sep 15 04:31:42 smtp2 kernel: Node 0 HighMem per-cpu: empty
>> Sep 15 04:31:42 smtp2 kernel: Free pages:        7984kB (0kB HighMem)
>> Sep 15 04:31:42 smtp2 kernel: Active:182184 inactive:162368 dirty:0
>> writeback:0 unstable:0 free:1996 slab:32439 mapped-file:2705
>> mapped-anon:353176 pagetables:97307
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA free:2152kB min:28kB low:32kB
>> high:40kB active:0kB inactive:0kB present:10716kB pages_scanned:0
>> all_unreclaimable? yes
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 2003 2003 2003
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32 free:5832kB min:5708kB
>> low:7132kB high:8560kB active:728576kB inactive:649760kB
>> present:2051184kB pages_scanned:422478 all_unreclaimable? no
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 Normal free:0kB min:0kB low:0kB
>> high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0
>> all_unreclaimable? no
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 HighMem free:0kB min:128kB
>> low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0
>> all_unreclaimable? no
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA: 2*4kB 4*8kB 2*16kB 5*32kB
>> 4*64kB 1*128kB 2*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2152kB
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32: 130*4kB 4*8kB 2*16kB 14*32kB
>> 5*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 5832kB
>> Sep 15 04:31:43 smtp2 kernel: Node 0 Normal: empty
>> Sep 15 04:31:43 smtp2 kernel: Node 0 HighMem: empty
>> Sep 15 04:31:43 smtp2 kernel: 6467 pagecache pages
>> Sep 15 04:31:43 smtp2 kernel: Swap cache: add 613470, delete 609770,
>> find 99957/110522, race 3+8
>> Sep 15 04:31:43 smtp2 kernel: Free swap  = 0kB
>> Sep 15 04:31:43 smtp2 kernel: Total swap = 2096472kB
>> Sep 15 04:31:43 smtp2 kernel: Free swap:            0kB
>> Sep 15 04:31:43 smtp2 kernel: 524000 pages of RAM
>> Sep 15 04:31:43 smtp2 kernel: 9347 reserved pages
>> Sep 15 04:31:43 smtp2 kernel: 1067643 pages shared
>> Sep 15 04:31:43 smtp2 kernel: 3700 pages swap cached

<Cut for length>

>>
>>
>>
>>
>> Andrea Dell'Amico wrote:
>>     
>>> On Wed, 2009-09-16 at 03:23 -0700, David Buechler wrote:
>>>   
>>>       
>>>> Hi Andrea!
>>>>
>>>> Heh, yeah, I should have specified that.  No, I'm not using Xen on this 
>>>> particular system.  I'm using Xen on other systems under DRBD 8.2 and 
>>>> they're pretty stable.
>>>>     
>>>>         
>>> Ugh. A new kind of crash, not reassuring :-).
>>> I think you should post the kernel Oops on the list, then.
>>>
>>>   
>>>       
>>>> -- 
>>>> Regards,
>>>> David A. Buechler
>>>> System Administrator,
>>>> CableAmerica Missouri
>>>>     
>>>>         
>>> ciao
>>> andrea
>>>
>>>   
>>>       
>>
>>
>>     




-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




More information about the drbd-user mailing list