[DRBD-user] Crash - Centos 5.3, drbd83 RPM

Dave Buechler dbuechler at cablemo.com
Sat Sep 19 00:17:12 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Mark Watts wrote:
> On Thu, 2009-09-17 at 15:29 -0700, Dave Buechler wrote:
>   
>> Memo to Self:  Self, try Reply-All if you want more than one person to 
>> look at your problem. ;-)
>>
>> Original message:
>>
>>
>> Ok.  I have six hours of log data, but it all says pretty much the same
>> thing.  Here's the first 15 minutes or so.  If you want more, I'll send
>> more.
>>
>> Thanks for looking at this.
>>
>>     
>
> Looks like you ran out of ram (and swap) and the Out Of Memory (OOM)
> killer started taking a hatchet to your system.
>
> I wouldn't have thought DRBD itself was responsible for this; DRBD says
> its sync'd OK its unlikely that there would be any outstanding writes
> being buffered, so I suspect something else was eating all your ram.
>
> Mark.
>
>   
That's a fairly accurate description, considering there wasn't much left 
of it when I finally got to the console. ;-)

I'm not entirely certain HOW the system ran out of memory, considering 
it has 2 GB of RAM, and was the secondary node... meaning except for 
DRBD, the system is supposed to be idle.  It looks like I'm going to 
have to put this server back into Production and keep a close eye on it.

One of the things that concerned me was that it appeared to lose network 
connectivity before it crashed.  Is it possible that I'm looking at a 
hardware fault?

-- 
Regards,
David A. Buechler
System Administrator,
CableAmerica Missouri


>>
>>
>> /var/log/messages:
>> Sep 15 04:17:38 smtp2 heartbeat: [2717]: WARN: Gmain_timeout_dispatch:
>> Dispatch function for send local status took too long to execute: 150 ms
>> (> 50 ms) (GSource: 0x1aacc9f8)
>> Sep 15 04:19:38 smtp2 snmpd[2489]: Connection from UDP:
>> [206.16.46.12]:46136
>> Sep 15 04:19:38 smtp2 snmpd[2489]: Received SNMP packet(s) from UDP:
>> [206.16.46.12]:46136
>> Sep 15 04:19:42 smtp2 snmpd[2489]: Connection from UDP:
>> [206.16.46.12]:46136
>> Sep 15 04:19:49 smtp2 last message repeated 35 times
>> Sep 15 04:31:41 smtp2 kernel: block drbd0: sock was shut down by peer
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: peer( Primary -> Unknown )
>> conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: short read expecting header
>> on sock: r=0
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: md_sync_timer expired! Worker
>> calls drbd_md_sync().
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: msock_sendmsg returned -32
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: short sent PingAck size=8 sent=0
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: asender terminated
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Terminating asender thread
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Connection closed
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( BrokenPipe ->
>> Unconnected )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: receiver terminated
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Restarting receiver thread
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: receiver (re)started
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( Unconnected ->
>> WFConnection )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Handshake successful: Agreed
>> network protocol version 90
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( WFConnection ->
>> WFReportParams )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Starting asender thread (from
>> drbd0_receiver [2581])
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: data-integrity-alg: <not-used>
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: drbd_sync_handshake:
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: self
>> E6B7DB61310BE3A6:0000000000000000:B939FB24AD00D552:7E8DF909A8E37EDF
>> bits:0 flags:0
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: peer
>> E720051EDAD32591:E6B7DB61310BE3A7:B939FB24AD00D552:7E8DF909A8E37EDF
>> bits:1090 flags:0
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: uuid_compare()=-1 by rule 5
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: peer( Unknown -> Primary )
>> conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( WFBitMapT -> WFSyncUUID )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: md_sync_timer expired! Worker
>> calls drbd_md_sync().
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0 exit code 0 (0x0)
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( WFSyncUUID ->
>> SyncTarget ) disk( UpToDate -> Inconsistent )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Began resync as SyncTarget
>> (will sync 4360 KB [1090 bits set]).
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: Resync done (total 3 sec;
>> paused 0 sec; 1452 K/sec)
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: conn( SyncTarget -> Connected
>> ) disk( Inconsistent -> UpToDate )
>> Sep 15 04:31:42 smtp2 kernel: block drbd0: helper command: /sbin/drbdadm
>> after-resync-target minor-0
>> Sep 15 04:31:42 smtp2 kernel: drbdadm invoked oom-killer:
>> gfp_mask=0x84d0, order=0, oomkilladj=0
>> Sep 15 04:31:42 smtp2 kernel:
>> Sep 15 04:31:42 smtp2 kernel: Call Trace:
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff800c3bda>] out_of_memory+0x8e/0x2f3
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8000f2ea>]
>> __alloc_pages+0x245/0x2ce
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8002309f>]
>> alloc_page_interleave+0x3d/0x74
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8002b5e5>]
>> get_zeroed_page+0x21/0x82
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8002d80c>] __pmd_alloc+0x14/0x8c
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80008127>]
>> copy_page_range+0x1f3/0x73e
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff800d6f0e>]
>> alternate_node_alloc+0x70/0x8c
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8001f847>]
>> copy_process+0xd30/0x15b8
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff80030d3a>] do_fork+0x69/0x1c1
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>> Sep 15 04:31:42 smtp2 kernel:  [<ffffffff8005d427>]
>> ptregscall_common+0x67/0xac
>> Sep 15 04:31:42 smtp2 kernel:
>> Sep 15 04:31:42 smtp2 kernel: Mem-info:
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA per-cpu:
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 hot: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 cold: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 hot: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 cold: high 0, batch 1 used:0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32 per-cpu:
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 hot: high 186, batch 31 used:48
>> Sep 15 04:31:42 smtp2 kernel: cpu 0 cold: high 62, batch 15 used:40
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 hot: high 186, batch 31 used:156
>> Sep 15 04:31:42 smtp2 kernel: cpu 1 cold: high 62, batch 15 used:60
>> Sep 15 04:31:42 smtp2 kernel: Node 0 Normal per-cpu: empty
>> Sep 15 04:31:42 smtp2 kernel: Node 0 HighMem per-cpu: empty
>> Sep 15 04:31:42 smtp2 kernel: Free pages:        7860kB (0kB HighMem)
>> Sep 15 04:31:42 smtp2 kernel: Active:222971 inactive:131062 dirty:0
>> writeback:0 unstable:0 free:1965 slab:32477 mapped-file:2705
>> mapped-anon:353176 pagetables:97307
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA free:2152kB min:28kB low:32kB
>> high:40kB active:0kB inactive:0kB present:10716kB pages_scanned:0
>> all_unreclaimable? yes
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 2003 2003 2003
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32 free:5708kB min:5708kB
>> low:7132kB high:8560kB active:891884kB inactive:524248kB
>> present:2051184kB pages_scanned:2988074 all_unreclaimable? yes
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 Normal free:0kB min:0kB low:0kB
>> high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0
>> all_unreclaimable? no
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 HighMem free:0kB min:128kB
>> low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0
>> all_unreclaimable? no
>> Sep 15 04:31:42 smtp2 kernel: lowmem_reserve[]: 0 0 0 0
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA: 2*4kB 4*8kB 2*16kB 5*32kB
>> 4*64kB 1*128kB 2*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2152kB
>> Sep 15 04:31:42 smtp2 kernel: Node 0 DMA32: 129*4kB 1*8kB 2*16kB 13*32kB
>> 4*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 5708kB
>> Sep 15 04:31:42 smtp2 kernel: Node 0 Normal: empty
>> Sep 15 04:31:42 smtp2 kernel: Node 0 HighMem: empty
>> Sep 15 04:31:42 smtp2 kernel: 6436 pagecache pages
>> Sep 15 04:31:42 smtp2 kernel: Swap cache: add 613470, delete 609770,
>> find 99957/110522, race 3+8
>> Sep 15 04:31:42 smtp2 kernel: Free swap  = 0kB
>> Sep 15 04:31:42 smtp2 kernel: Total swap = 2096472kB
>> Sep 15 04:31:42 smtp2 kernel: Free swap:            0kB
>> Sep 15 04:31:42 smtp2 kernel: 524000 pages of RAM
>> Sep 15 04:31:42 smtp2 kernel: 9347 reserved pages
>> Sep 15 04:31:42 smtp2 kernel: 1057738 pages shared
>> Sep 15 04:31:42 smtp2 kernel: 3700 pages swap cached
>> Sep 15 04:31:42 smtp2 kernel: Out of memory: Killed process 3250 (crond).
>>
>> <Cut for length - DAB>
>>
>>
>> Andrea Dell'Amico wrote:
>>     
>>> On Wed, 2009-09-16 at 03:23 -0700, David Buechler wrote:
>>>   
>>>       
>>>> Hi Andrea!
>>>>
>>>> Heh, yeah, I should have specified that.  No, I'm not using Xen on this 
>>>> particular system.  I'm using Xen on other systems under DRBD 8.2 and 
>>>> they're pretty stable.
>>>>     
>>>>         
>>> Ugh. A new kind of crash, not reassuring :-).
>>> I think you should post the kernel Oops on the list, then.
>>>
>>>   
>>>       
>>>> -- 
>>>> Regards,
>>>> David A. Buechler
>>>> System Administrator,
>>>> CableAmerica Missouri
>>>>     
>>>>         
>>> ciao
>>> andrea
>>>
>>>   
>>>       
>>
>>
>>     



-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




More information about the drbd-user mailing list