現(xiàn)在android智能手機(jī)市場(chǎng)異常火熱,硬件升級(jí)非常迅猛,arm cortex A9 + 1GB DDR似乎已經(jīng)跟不上主流配置了。雖說硬件是王道,可我們還是不禁還懷疑這么強(qiáng)大的硬件配置得到充分利用了嗎?因此以后我都會(huì)正對(duì)ARM平臺(tái)分析kernel的內(nèi)容。?
正文
在linux內(nèi)存管理中,有兩個(gè)資源非常重要,一個(gè)是虛擬地址,一個(gè)是物理地址。聽起來似乎是廢話,實(shí)際上內(nèi)存管理主要就是圍繞這兩個(gè)概念展開的。如果對(duì)linux kernel如果管理虛擬地址和物理地址還沒有概念的,建議瀏覽一下文獻(xiàn)【2】,這是一本很棒的書,言簡(jiǎn)意賅。文獻(xiàn)【1】會(huì)講更多的實(shí)現(xiàn)細(xì)節(jié)。
本文主要目的是對(duì)內(nèi)核1GB虛擬地址空間映射有個(gè)總體了解,包括:
1. 1GB內(nèi)核虛擬地址空間具體用于什么地方?
2. 其和實(shí)際物理地址的映射關(guān)系.
3. 一些板級(jí)相關(guān)的宏定義,為了便于日后查閱,我也將這些宏定義整理了出來。根據(jù)這些宏定義,你也可以輕松畫出你所用的平臺(tái)的內(nèi)核虛擬地址空間映射關(guān)系。
首先申明,實(shí)例中的映射規(guī)劃不見得就是最優(yōu)的,但它卻是一個(gè)實(shí)際的例子。實(shí)際上我個(gè)人覺得還是有很多值得商榷的地方。
從下圖我們可以看到,粉色部分0xbf80 0000 ~ 0xc000 0000是為modules及kpmap的,從下面的板級(jí)宏定義我們可以看到,modules放在這段位置是因?yàn)樗枰蚹ernel code段在32MB尋址空間內(nèi)。kpmap為什么放這段空間我還不清楚,這個(gè)是在map highmem時(shí)用到的。
橙色部分0xc000 0000 ~ 0xe000 0000映射?lowmem(低端內(nèi)存,即zone[Normal])。這段映射是一對(duì)一的平坦映射,也就是說kernel初始化這段映射后,頁(yè)表將不會(huì)改變。這樣即可以省去不斷的修改頁(yè)表,刷新TLB(TLB可以認(rèn)為是頁(yè)表的硬件cache,如果訪問的虛擬地址的頁(yè)表在這個(gè)cache中,則CPU無需訪問DDR尋址頁(yè)表了,這樣可以提高IO效率)了。顯然這段地址空間非常珍貴,因?yàn)檫@段映射的效率高。從圖中我們可以看到,在512MB映射空間中,有128MB預(yù)留給PMEM(android特有的連續(xù)物理內(nèi)存管理機(jī)制),16MB預(yù)留CP(modem運(yùn)行空間)。實(shí)際可用lowmem大致只有360MB。
藍(lán)色部分0xe000 0000 ~ 0xf000 0000銀蛇h(yuǎn)ighmem(高端內(nèi)存,即zone[HighMem])。因?yàn)槭纠秊?GB DDR,因此需要高端內(nèi)存映射部分物理地址空間。
綠色部分0xf000 0000 ~ 0xffc0 0000為IO映射區(qū)域。我們知道在內(nèi)核空間,比如寫驅(qū)動(dòng)的時(shí)候,需要訪問芯片的寄存器(IO空間),部分IO空間映射是通過ioremap在VMALLOC區(qū)域動(dòng)態(tài)申請(qǐng)映射,還有部分是系統(tǒng)初始化時(shí)通過iotable_init靜態(tài)映射的。圖中我們可以看到在IO靜態(tài)映射區(qū)域有大約200MB的空間沒有使用。這個(gè)是不是太浪費(fèi)了呢?
紫色部分沒什么花頭,ARM default定義就是這樣的。
下圖給出了內(nèi)核虛擬地址空間和實(shí)際物理地址的映射關(guān)系。
下面開始玩點(diǎn)激情的,看看這個(gè)mapping存在什么問題。
實(shí)際上我在這個(gè)平臺(tái)上遇到一個(gè)bug,即在用monkey test做壓力測(cè)試的時(shí)候,系統(tǒng)運(yùn)行很長(zhǎng)時(shí)間后會(huì)出現(xiàn)vmalloc失敗。OMG,調(diào)用vmalloc都會(huì)失敗,而且此時(shí)還有足夠多的物理內(nèi)存,神奇吧?
【錯(cuò)誤log】系統(tǒng)的graphic模塊在用vmalloc申請(qǐng)1MB內(nèi)存時(shí)失敗
【分析】
1. 首先查看此時(shí)基本的內(nèi)存信息。通過/proc/meminfo可以看到,實(shí)際可用物理內(nèi)存還剩156MB,內(nèi)存此時(shí)并未耗盡。vmalloc所使用的VMALLOC虛擬地址還剩余22MB,也是夠用的。根據(jù)vmalloc實(shí)現(xiàn)原理,它會(huì)通過調(diào)用alloc_page()去buddy系統(tǒng)中取一個(gè)個(gè)孤立的page(即在2^0鏈表上取page)。page此時(shí)是足夠多的,為什么會(huì)申請(qǐng)失敗呢?vmalloc要求虛擬地址是連續(xù)的,難道是VMALLOC中沒有連續(xù)的1MB虛擬地址了?
2. 帶著這個(gè)問題,我們繼續(xù)分析/proc/vmallocinfo.
從/proc/vmallocinfo的信息看到,VMALLOC已經(jīng)用到0xefeff00了,那么最大可用連續(xù)空間為0xf0000000 - 0xefeff000 = 0x101000. 還記得我們要申請(qǐng)的內(nèi)存空間大小嗎?沒錯(cuò),是0x1a0000。哇,第一次發(fā)現(xiàn)kernel虛擬地址也能耗盡。那為什么從meminfo信息來看還有22MB VMALLOC虛擬地址呢?顯然這段虛擬地址空間也產(chǎn)生了大量碎片。
好吧,虛擬地址資源耗盡,我們似乎也沒辦法了,窮途末路。不過本著研究的精神,我們還得懷疑為什么VMALLOC這段虛擬地址使用這么多,畢竟我們給這段空間規(guī)劃了256MB。物理內(nèi)存還有這么多,為什么不直接調(diào)用kmalloc或者get_free_pages呢?
3. 繼續(xù)分析看下此時(shí)物理內(nèi)存分布情況
/proc/buddyinfo可以看到buddy系統(tǒng)總得內(nèi)存分配狀態(tài),?及更多關(guān)于碎片管理的信息。
大致了解下pagetypeinfo,kernel會(huì)將物理內(nèi)存分為不同的zone, 在我的平臺(tái)上上,有zone[Normal]及zone[HighMem]。migrate type是為避免內(nèi)存碎片而設(shè)計(jì)的,不明的可以參考文獻(xiàn)【1】。從/proc/pagetypeinfo看到我們可以得到的最大連續(xù)內(nèi)存為2^7個(gè)page,即512KB??磥泶藭r(shí)是滿足不了graphic需求,進(jìn)一步驗(yàn)證的graphic為什么會(huì)大量使用vmalloc.
/proc/buddyinfo信息。
4. 結(jié)論
根據(jù)上面分析,graphic通過get_free_pages()向kernel的buddy系統(tǒng)申請(qǐng)連續(xù)內(nèi)存,經(jīng)過一段時(shí)間,buddy系統(tǒng)產(chǎn)生了大量碎片,graphic無法獲取連續(xù)的物理內(nèi)存,因此通過vmalloc想從buddy系統(tǒng)申請(qǐng)不連續(xù)的內(nèi)存,不幸的是VMALLOC的虛擬地址空間耗盡,盡管這是還有大量物理內(nèi)存,vmalloc申請(qǐng)失敗。
5. 從新審視內(nèi)存映射
這里一個(gè)問題就是lowmem的規(guī)劃空間太小了,vmalloc默認(rèn)會(huì)從zone[HighMem]申請(qǐng)內(nèi)存,這樣很容易在highmem產(chǎn)生碎片。看到最開始我們kernel虛擬映射圖了嗎?我們不是有200MB的虛擬空間沒有使用嗎?如果把它mapping給lowmem多好啊。
下面我對(duì)這段映射做了修改。最大的變化就是lowmem從512MB增加到了720MB。200MB未使用的虛擬地址空間得到了充分利用。
修改后,我們?cè)倏纯碽uddy信息吧,最大可申請(qǐng)的連續(xù)內(nèi)存為2^15個(gè)page=128MB。這樣的規(guī)劃也增加內(nèi)存利用效率。
下面列表是板級(jí)相關(guān)的一些宏定義,這些宏定義決定了如何規(guī)劃內(nèi)核虛擬地址?,F(xiàn)在一般也沒什么機(jī)會(huì)從零開始bringup一塊新的芯片,因此這些定義大家可能不會(huì)關(guān)注。不過在研究?jī)?nèi)存規(guī)劃時(shí),這些定義還是非常重要的,我將它們整理出來也是為了日后方便查閱。大家也可以試著根據(jù)自己的板子填寫這些宏定義,這樣整個(gè)內(nèi)核空間映射視圖就會(huì)展現(xiàn)出來。
Board specific macro definition
Refer to [Documentation/arm/Porting]
Decompressor Symbols
Macro name
description
example
ZTEXTADDR
[arch/arm/boot/compressed/Makefile]
Start address of decompressor.? There's no point in talking about virtual or physical addresses here, since the MMU will be off at the time when you call the decompressor code.? You normally call the kernel at this address to start it booting.? This doesn't have to be located in RAM, it can be in flash or other read-only or read-write addressable medium.
0x0
ZTEXTADDR??????? := $(CONFIG_ZBOOT_ROM_TEXT)
ONFIG_ZBOOT_ROM_TEXT=0x0
ZBSSADDR
[arch/arm/boot/compressed/Makefile]
Start address of zero-initialised work area for the decompressor. This must be pointing at RAM.? The decompressor will zero initialize this for you.? Again, the MMU will be off.
0x0
ZBSSADDR?? := $(CONFIG_ZBOOT_ROM_BSS)
CONFIG_ZBOOT_ROM_BSS=0x0
ZRELADDR
[arch/arm/boot/Makefile]
This is the address where the decompressed kernel will be written, and eventually executed.? The following constraint must be valid:
__virt_to_phys(TEXTADDR) == ZRELADDR
The initial part of the kernel is carefully coded to be position independent.
Note: the following conditions must always be true:
ZRELADDR == virt_to_phys(PAGE_OFFSET + TEXT_OFFSET)
0x81088000
ZRELADDR??? := $(zreladdr-y)
zreladdr-y?????? := $(__ZRELADDR)
__ZRELADDR = TEXT_OFFSET + 0x80000000
[arch/arm/mach-pxa/Makefile.boot]
INITRD_PHYS
Physical address to place the initial RAM disk.? Only relevant if you are using the bootpImage stuff (which only works on the old struct param_struct).
INITRD_PHYS must be in RAM
Not defined
INITRD_VIRT
Virtual address of the initial RAM disk.? The following constraint must be valid:
__virt_to_phys(INITRD_VIRT) == INITRD_PHYS
Not defined
PARAMS_PHYS
Physical address of the struct param_struct or tag list, giving the kernel various parameters about its execution environment.
PARAMS_PHYS must be within 4MB of ZRELADDR
Not defined
Kernel Symbols
PHYS_OFFSET
[arch/arm/include/asm/memory.h]
Physical start address of the first bank of RAM.
#define PHYS_OFFSET????? PLAT_PHYS_OFFSET
#define PLAT_PHYS_OFFSET??? UL(0x80000000)
[arch/arm/mach-pxa/include/mach/memory.h]
PAGE_OFFSET
[arch/arm/include/asm/memory.h]
Virtual start address of the first bank of RAM.? During the kernel boot phase, virtual address PAGE_OFFSET will be mapped to physical address PHYS_OFFSET, along with any other mappings you supply. This should be the same value as TASK_SIZE.
CONFIG_PAGE_OFFSET
=0xC0000000
TASK_SIZE
[arch/arm/include/asm/memory.h]
The maximum size of a user process in bytes.? Since user space always starts at zero, this is the maximum address that a user process can access+1.? The user space stack grows down from this address.
Any virtual address below TASK_SIZE is deemed to be user process area, and therefore managed dynamically on a process by process basis by the kernel.? I'll call this the user segment.
Anything above TASK_SIZE is common to all processes.? I'll call this the kernel segment.
(In other words, you can't put IO mappings below TASK_SIZE, and hence PAGE_OFFSET).
CONFIG_PAGE_OFFSET
-0x01000000
=0xBF000000
TASK_UNMAPPED_BASE
[arch/arm/include/asm/memory.h]
the lower boundary of the mmap VM area
CONFIG_PAGE_OFFSET/3
=0x40000000
MODULES_VADDR
[arch/arm/include/asm/memory.h]
The module space lives between the addresses given by TASK_SIZE and PAGE_OFFSET - it must be within 32MB of the kernel text.
TEXT_OFFSET does not allow to use 16MB modules area as ARM32 branches to kernel may go out of range taking into account the kernel .text size
PAGE_OFFSET
- 8*1024*1024
=0x0XBF800000
MODULES_END
[arch/arm/include/asm/memory.h]
The highmem pkmap virtual space shares the end of the module area.
0XBFE00000
#ifdef CONFIG_HIGHMEM
#define MODULES_END?????????? (PAGE_OFFSET - PMD_SIZE)
#else
#define MODULES_END?????????? (PAGE_OFFSET)
#endif
TEXTADDR
Virtual start address of kernel, normally PAGE_OFFSET + 0x8000.
This is where the kernel image ends up.? With the latest kernels, it must be located at 32768 bytes into a 128MB region.? Previous kernels placed a restriction of 256MB here.
DATAADDR
Virtual address for the kernel data segment.? Must not be defined when using the decompressor.
VMALLOC_START
VMALLOC_END
[arch/arm/mach-pxa/include/mach/vmalloc.h]
Virtual addresses bounding the vmalloc() area.? There must not be any static mappings in this area; vmalloc will overwrite them. The addresses must also be in the kernel segment (see above). Normally, the vmalloc() area starts VMALLOC_OFFSET bytes above the last virtual RAM address (found using variable high_memory).
#define VMALLOC_END?????? (0xf0000000UL)
The default vmalloc size is 128MB.
vmalloc_min = (VMALLOC_END - SZ_128M);
[defined in arch/arm/mm/mmu.c]
If vmalloc is configured passed by OSL, then it’s redefined.
early_param("vmalloc", early_vmalloc);
[defined in arch/arm/mm/mmu.c]
VMALLOC_OFFSET
[arch/arm/include/asm/pgtable.h]
Offset normally incorporated into VMALLOC_START to provide a hole between virtual RAM and the vmalloc area.? We do this to allow out of bounds memory accesses (eg, something writing off the end of the mapped memory map) to be caught.? Normally set to 8MB.
#define VMALLOC_OFFSET?????????????? (8*1024*1024)
CONSISTENT_DMA_SIZE
CONSISTENT_BASE
CONSISTENT_END
[arch/arm/include/asm/memory.h]
Size of DMA-consistent memory region.? Must be multiple of 2M, between 2MB and 14MB inclusive.
CONSISTENT_DMA_SIZE = 2MB
CONSISTENT_BASE = 0XFFC00000
CONSISTENT_END = 0XFFE00000
FIXADDR_START
FIXADDR_TOP
FIXADDR_SIZE
[arch/arm/include/asm/fixmap.h]
fixed virtual addresses
#define FIXADDR_START????????? 0xfff00000UL
#define FIXADDR_TOP????????????? 0xfffe0000UL
#define FIXADDR_SIZE????????????? (FIXADDR_TOP - FIXADDR_START)
PKMAP_BASE
[arch/arm/include/asm/highmen.h]
0XBFE00000
#define PKMAP_BASE?????????????? (PAGE_OFFSET - PMD_SIZE)
?
評(píng)論
查看更多