譯者注
筆者在MacBook M2上搭建Linux虛擬機上開發eBPF程序時,遇到一些LSM eBPF類型程序無法運行的問題,哪怕是5.15內核的ubuntu server,依舊無法正常運行。顯然,aarch64跟x86_64的內核功能有差異。在筆者嘗試定位這些差異時,看到這篇文章,可以讓大家更直觀地了解LSM eBPF在兩種CPU 內核上的差異。
原文本博客文章是我們在Linux中對于`aarch64`上`BPF LSM`支持的內部研究的摘要。如果你對內核代碼庫不熟悉,要開始查看內核源碼是非常困難的,因此我們決定發布這篇文章,展示我們的方法,因為這對于想要探索內核內部的任何人都可能有所幫助。
簡介
在x86_64上,我們已經在使用BPF LSM,而在aarch64上,我們依賴于Kprobes,因此我們想知道內核中缺少了哪些功能,才能讓這些功能在aarch64上可用。
我們曾多次深入研究內核源代碼,但通常我們搜索的是已經存在的東西,以了解其工作原理。但在這種情況下,我們在尋找的是不存在的東西,我們追尋的是那些因為未實現而返回錯誤的內容。
回想起Steven Rostedt關于如何開始學習Linux內核的講話,我們從ftrace(以及構建在跟蹤基礎設施上的工具)開始,以了解當我們將一個不受支持的BPF程序加載到內核時會發生什么。
問題
這是當我們嘗試將一個BPF LSM程序加載到aarch64 5.15 Linux內核時,使用我們的軟件pulsar[2]時的輸出:
root@pine64-1:/home/exein#./pulsar-enterprise-execpulsard [2023-02-16T1445ZINFOpulsar::daemon]Startingmoduleprocess-monitor [2023-02-16T1445ZINFOpulsar::daemon]Startingmodulefile-system-monitor [2023-02-16T1446ZINFOpulsar::daemon]Startingmodulenetwork-monitor [2023-02-16T1446ZINFOpulsar::daemon]Startingmodulelogger [2023-02-16T1446ZINFOpulsar::daemon]Startingmodulerules-engine [2023-02-16T1446ZINFOpulsar::daemon]Startingmoduledesktop-notifier [2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinfile-system-monitor:failedprogramattachlsmpath_mknod Causedby: 0:`bpf_raw_tracepoint_open`failed 1:Noerrorinformation(oserror524) [2023-02-16T1446ZINFOpulsar::daemon]Startingmoduleanomaly-detection [2023-02-16T1446ZINFOpulsar::daemon]Startingmodulemalware-detection [2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinmalware-detection:/var/lib/pulsar/malware_detection/models/parameters.jsonnotfound [2023-02-16T1446ZINFOpulsar::daemon]Startingmoduleplatform-connector [2023-02-16T1446ZINFOplatform_connector::client]Connectedtohttps://platform-dev-instance.exein.io:8001/ [2023-02-16T1446ZINFOpulsar::daemon]Startingmodulethreat-response [2023-02-16T1446ZERRORpulsar::module_manager]Moduleerrorinnetwork-monitor:failedprogramattachlsmsocket_bind Causedby: 0:`bpf_raw_tracepoint_open`failed 1:Noerrorinformation(oserror524)
我們在嘗試加載與path_mknodLSM掛鉤相關的BPF程序時,pulsar出現了錯誤524或ENOTSUPP。讓我們嘗試深入研究這個問題。
注意: 在進行這項研究時,我們當時無法找到預先編譯為啟用BPF和BTF的aarch64,因此我們不得不編譯一個自定義內核。我們還啟用了跟蹤選項和function_graph插件,以使用下面的工具。
所有的實驗都是在一臺裝有定制Armbian[3]鏡像的Pine A64上進行的。
這些鏡像具有帶有標準Ubuntu 22.04 LTS Jammy用戶空間的自定義內核。
工具
為了調查這個問題,我們使用了以下工具:
bpftrace[4]:基于BPF的工具,使用自定義類C語言動態附加探針。
trace-cmd[5]:圍繞tracefs文件系統的包裝器,與ftrace基礎設施交互。
要使用這些工具,您需要在Linux內核中啟用一些選項,請查閱官方文檔獲取完整的要求。
注意: 也可以使用其他工具來完成相同的工作,例如perf-tools[6]中的funcgraph和kprobe。
Linux 5.15
現在我們開始使用這些工具來查看在內核5.15中嘗試加載我們的BPF程序時會發生什么。
從這一點開始到本文末尾,我們將使用probe二進制文件代替pulsar,因為它更簡單。為了簡要概括其工作原理,以下是命令行幫助:
exein@pine64-1:~$./probe TestrunnerforeBPFprograms Usage:probe[OPTIONS]Commands: file-system-monitorWatchfilecreations process-monitorWatchprocessevents(fork/exec/exit) network-monitorWatchnetworkevents helpPrintthismessageorthehelpofthegivensubcommand(s) Options: -v,--verbose -h,--helpPrinthelp -V,--versionPrintversion
在這些示例中,我們將嘗試加載file-system-monitor探針。
通過運行以下命令,我們可以看到__sys_bpf函數的函數圖調用,這是BPF系統調用的入口點:
trace-cmdrecord-pfunction_graph-g__sys_bpf./probefile-system-monitor trace-cmdreport
輸出是一個非常龐大的函數圖,太大了,無法在這里粘貼。由于我們遇到了錯誤,我們對程序停止前的最后幾個函數感興趣。以下是trace-cmd report輸出的最后幾行:
... tokio-runtime-w-1666[003]1318.058019:funcgraph_entry:|bpf_trampoline_link_prog(){ tokio-runtime-w-1666[003]1318.058020:funcgraph_entry:2.292us|bpf_attach_type_to_tramp(); tokio-runtime-w-1666[003]1318.058024:funcgraph_entry:1.250us|mutex_lock(); tokio-runtime-w-1666[003]1318.058028:funcgraph_entry:|bpf_trampoline_update(){ tokio-runtime-w-1666[003]1318.058030:funcgraph_entry:|kmem_cache_alloc_trace(){ tokio-runtime-w-1666[003]1318.058031:funcgraph_entry:1.167us|should_failslab(); tokio-runtime-w-1666[003]1318.058036:funcgraph_exit:6.792us|} tokio-runtime-w-1666[003]1318.058039:funcgraph_entry:|kmem_cache_alloc_trace(){ tokio-runtime-w-1666[003]1318.058042:funcgraph_entry:2.750us|should_failslab(); tokio-runtime-w-1666[003]1318.058046:funcgraph_exit:6.417us|} tokio-runtime-w-1666[003]1318.058048:funcgraph_entry:2.708us|bpf_jit_charge_modmem(); tokio-runtime-w-1666[003]1318.058053:funcgraph_entry:|bpf_jit_alloc_exec_page(){ tokio-runtime-w-1666[003]1318.058055:funcgraph_entry:|bpf_jit_alloc_exec(){ tokio-runtime-w-1666[003]1318.058057:funcgraph_entry:|vmalloc(){ tokio-runtime-w-1666[003]1318.058059:funcgraph_entry:|__vmalloc_node(){ tokio-runtime-w-1666[003]1318.058061:funcgraph_entry:|__vmalloc_node_range(){ tokio-runtime-w-1666[003]1318.058064:funcgraph_entry:|__get_vm_area_node.constprop.64(){ tokio-runtime-w-1666[003]1318.058067:funcgraph_entry:|kmem_cache_alloc_node_trace(){ tokio-runtime-w-1666[003]1318.058069:funcgraph_entry:1.459us|should_failslab(); tokio-runtime-w-1666[003]1318.058073:funcgraph_exit:6.292us|} tokio-runtime-w-1666[003]1318.058075:funcgraph_entry:|alloc_vmap_area(){ tokio-runtime-w-1666[003]1318.058077:funcgraph_entry:|kmem_cache_alloc_node(){ tokio-runtime-w-1666[003]1318.058079:funcgraph_entry:1.167us|should_failslab(); tokio-runtime-w-1666[003]1318.058085:funcgraph_exit:7.625us|} tokio-runtime-w-1666[003]1318.058088:funcgraph_entry:|kmem_cache_alloc_node(){ tokio-runtime-w-1666[003]1318.058089:funcgraph_entry:1.208us|should_failslab(); tokio-runtime-w-1666[003]1318.058092:funcgraph_exit:4.584us|} tokio-runtime-w-1666[003]1318.058104:funcgraph_entry:|kmem_cache_free(){ tokio-runtime-w-1666[003]1318.058107:funcgraph_entry:2.084us|__slab_free(); tokio-runtime-w-1666[003]1318.058110:funcgraph_exit:5.667us|} tokio-runtime-w-1666[003]1318.058112:funcgraph_entry:6.375us|insert_vmap_area.constprop.74(); tokio-runtime-w-1666[003]1318.058119:funcgraph_exit:+44.667us|} tokio-runtime-w-1666[003]1318.058122:funcgraph_exit:+58.250us|} tokio-runtime-w-1666[003]1318.058124:funcgraph_entry:|__kmalloc_node(){ tokio-runtime-w-1666[003]1318.058125:funcgraph_entry:1.625us|kmalloc_slab(); tokio-runtime-w-1666[003]1318.058128:funcgraph_entry:1.167us|should_failslab(); tokio-runtime-w-1666[003]1318.058131:funcgraph_exit:7.208us|} tokio-runtime-w-1666[003]1318.058133:funcgraph_entry:|alloc_pages(){ tokio-runtime-w-1666[003]1318.058135:funcgraph_entry:1.583us|get_task_policy.part.48(); tokio-runtime-w-1666[003]1318.058138:funcgraph_entry:1.500us|policy_node(); tokio-runtime-w-1666[003]1318.058141:funcgraph_entry:1.209us|policy_nodemask(); tokio-runtime-w-1666[003]1318.058143:funcgraph_entry:|__alloc_pages(){ tokio-runtime-w-1666[003]1318.058145:funcgraph_entry:1.458us|should_fail_alloc_page(); tokio-runtime-w-1666[003]1318.058147:funcgraph_entry:|get_page_from_freelist(){ tokio-runtime-w-1666[003]1318.058150:funcgraph_entry:1.583us|prep_new_page(); tokio-runtime-w-1666[003]1318.058153:funcgraph_exit:5.459us|} tokio-runtime-w-1666[003]1318.058154:funcgraph_exit:+10.542us|} tokio-runtime-w-1666[003]1318.058155:funcgraph_exit:+22.083us|} tokio-runtime-w-1666[003]1318.058157:funcgraph_entry:|__cond_resched(){ tokio-runtime-w-1666[003]1318.058158:funcgraph_entry:1.833us|rcu_all_qs(); tokio-runtime-w-1666[003]1318.058161:funcgraph_exit:4.167us|} tokio-runtime-w-1666[003]1318.058166:funcgraph_entry:5.542us|vmap_pages_range_noflush(); tokio-runtime-w-1666[003]1318.058173:funcgraph_exit:!112.375us|} tokio-runtime-w-1666[003]1318.058175:funcgraph_exit:!116.000us|} tokio-runtime-w-1666[003]1318.058176:funcgraph_exit:!119.292us|} tokio-runtime-w-1666[003]1318.058177:funcgraph_exit:!122.542us|} tokio-runtime-w-1666[003]1318.058179:funcgraph_entry:|find_vm_area(){ tokio-runtime-w-1666[003]1318.058180:funcgraph_entry:1.375us|find_vmap_area(); tokio-runtime-w-1666[003]1318.058183:funcgraph_exit:4.333us|} tokio-runtime-w-1666[003]1318.058185:funcgraph_entry:|set_memory_x(){ tokio-runtime-w-1666[003]1318.058186:funcgraph_entry:|change_memory_common(){ tokio-runtime-w-1666[003]1318.058188:funcgraph_entry:|find_vm_area(){ tokio-runtime-w-1666[003]1318.058189:funcgraph_entry:1.333us|find_vmap_area(); tokio-runtime-w-1666[003]1318.058192:funcgraph_exit:3.875us|} tokio-runtime-w-1666[003]1318.058193:funcgraph_entry:|vm_unmap_aliases(){ tokio-runtime-w-1666[003]1318.058194:funcgraph_entry:|_vm_unmap_aliases.part.58(){ tokio-runtime-w-1666[003]1318.058196:funcgraph_entry:1.542us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058199:funcgraph_entry:1.208us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058202:funcgraph_entry:1.166us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058205:funcgraph_entry:1.208us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058207:funcgraph_entry:1.208us|mutex_lock(); tokio-runtime-w-1666[003]1318.058210:funcgraph_entry:|purge_fragmented_blocks_allcpus(){ tokio-runtime-w-1666[003]1318.058212:funcgraph_entry:1.500us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058214:funcgraph_entry:1.500us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058217:funcgraph_entry:1.500us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058220:funcgraph_entry:1.167us|rcu_read_unlock_strict(); tokio-runtime-w-1666[003]1318.058222:funcgraph_exit:+11.917us|} tokio-runtime-w-1666[003]1318.058224:funcgraph_entry:|__purge_vmap_area_lazy(){ tokio-runtime-w-1666[003]1318.058232:funcgraph_entry:|kmem_cache_free(){ tokio-runtime-w-1666[003]1318.058234:funcgraph_entry:1.250us|__slab_free(); tokio-runtime-w-1666[003]1318.058237:funcgraph_exit:4.791us|} tokio-runtime-w-1666[003]1318.058241:funcgraph_entry:1.209us|__cond_resched_lock(); tokio-runtime-w-1666[003]1318.058244:funcgraph_exit:+19.625us|} tokio-runtime-w-1666[003]1318.058245:funcgraph_entry:1.167us|mutex_unlock(); tokio-runtime-w-1666[003]1318.058247:funcgraph_exit:+53.042us|} tokio-runtime-w-1666[003]1318.058248:funcgraph_exit:+55.625us|} tokio-runtime-w-1666[003]1318.058250:funcgraph_entry:|__change_memory_common(){ tokio-runtime-w-1666[003]1318.058251:funcgraph_entry:|apply_to_page_range(){ tokio-runtime-w-1666[003]1318.058253:funcgraph_entry:|__apply_to_page_range(){ tokio-runtime-w-1666[003]1318.058255:funcgraph_entry:1.250us|pud_huge(); tokio-runtime-w-1666[003]1318.058258:funcgraph_entry:1.166us|pmd_huge(); tokio-runtime-w-1666[003]1318.058260:funcgraph_entry:1.208us|change_page_range(); tokio-runtime-w-1666[003]1318.058263:funcgraph_exit:9.834us|} tokio-runtime-w-1666[003]1318.058264:funcgraph_exit:+12.709us|} tokio-runtime-w-1666[003]1318.058266:funcgraph_exit:+15.459us|} tokio-runtime-w-1666[003]1318.058268:funcgraph_exit:+80.791us|} tokio-runtime-w-1666[003]1318.058270:funcgraph_exit:+84.834us|} tokio-runtime-w-1666[003]1318.058272:funcgraph_exit:!218.500us|} tokio-runtime-w-1666[003]1318.058274:funcgraph_entry:|__alloc_percpu_gfp(){ tokio-runtime-w-1666[003]1318.058276:funcgraph_entry:|pcpu_alloc(){ tokio-runtime-w-1666[003]1318.058281:funcgraph_entry:2.250us|mutex_lock_killable(); tokio-runtime-w-1666[003]1318.058290:funcgraph_entry:|pcpu_find_block_fit(){ tokio-runtime-w-1666[003]1318.058293:funcgraph_entry:2.833us|pcpu_next_fit_region.constprop.38(); tokio-runtime-w-1666[003]1318.058299:funcgraph_exit:9.084us|} tokio-runtime-w-1666[003]1318.058301:funcgraph_entry:|pcpu_alloc_area(){ tokio-runtime-w-1666[003]1318.058315:funcgraph_entry:4.000us|pcpu_block_update_hint_alloc(); tokio-runtime-w-1666[003]1318.058320:funcgraph_entry:2.208us|pcpu_chunk_relocate(); tokio-runtime-w-1666[003]1318.058324:funcgraph_exit:+22.625us|} tokio-runtime-w-1666[003]1318.058327:funcgraph_entry:1.208us|mutex_unlock(); tokio-runtime-w-1666[003]1318.058332:funcgraph_entry:1.584us|pcpu_memcg_post_alloc_hook(); tokio-runtime-w-1666[003]1318.058335:funcgraph_exit:+58.833us|} tokio-runtime-w-1666[003]1318.058336:funcgraph_exit:+61.834us|} tokio-runtime-w-1666[003]1318.058338:funcgraph_entry:|kmem_cache_alloc_trace(){ tokio-runtime-w-1666[003]1318.058339:funcgraph_entry:1.167us|should_failslab(); tokio-runtime-w-1666[003]1318.058342:funcgraph_exit:4.458us|} tokio-runtime-w-1666[003]1318.058359:funcgraph_entry:|bpf_image_ksym_add(){ tokio-runtime-w-1666[003]1318.058360:funcgraph_entry:|bpf_ksym_add(){ tokio-runtime-w-1666[003]1318.058363:funcgraph_entry:1.583us|__local_bh_enable_ip(); tokio-runtime-w-1666[003]1318.058366:funcgraph_exit:5.750us|} tokio-runtime-w-1666[003]1318.058369:funcgraph_exit:9.834us|} tokio-runtime-w-1666[003]1318.058371:funcgraph_entry:1.250us|arch_prepare_bpf_trampoline(); tokio-runtime-w-1666[003]1318.058373:funcgraph_entry:2.292us|kfree(); tokio-runtime-w-1666[003]1318.058377:funcgraph_exit:!348.625us|} tokio-runtime-w-1666[003]1318.058379:funcgraph_entry:1.250us|mutex_unlock(); tokio-runtime-w-1666[003]1318.058382:funcgraph_exit:!363.167us|} tokio-runtime-w-1666[003]1318.058384:funcgraph_entry:|bpf_link_cleanup(){ tokio-runtime-w-1666[003]1318.058386:funcgraph_entry:|bpf_link_free_id.part.30(){ tokio-runtime-w-1666[003]1318.058392:funcgraph_entry:|call_rcu(){ tokio-runtime-w-1666[003]1318.058396:funcgraph_entry:1.834us|rcu_segcblist_enqueue(); tokio-runtime-w-1666[003]1318.058401:funcgraph_exit:9.333us|} tokio-runtime-w-1666[003]1318.058403:funcgraph_entry:1.542us|__local_bh_enable_ip(); tokio-runtime-w-1666[003]1318.058406:funcgraph_exit:+19.542us|} tokio-runtime-w-1666[003]1318.058408:funcgraph_entry:|fput(){ tokio-runtime-w-1666[003]1318.058409:funcgraph_entry:|fput_many(){ tokio-runtime-w-1666[003]1318.058411:funcgraph_entry:|task_work_add(){ tokio-runtime-w-1666[003]1318.058414:funcgraph_entry:1.625us|kick_process(); tokio-runtime-w-1666[003]1318.058418:funcgraph_exit:6.750us|} tokio-runtime-w-1666[003]1318.058419:funcgraph_exit:+10.333us|} tokio-runtime-w-1666[003]1318.058420:funcgraph_exit:+12.708us|} tokio-runtime-w-1666[003]1318.058422:funcgraph_entry:2.250us|put_unused_fd(); tokio-runtime-w-1666[003]1318.058426:funcgraph_exit:+41.416us|} tokio-runtime-w-1666[003]1318.058428:funcgraph_entry:1.292us|mutex_unlock(); tokio-runtime-w-1666[003]1318.058430:funcgraph_entry:1.250us|kfree(); tokio-runtime-w-1666[003]1318.058433:funcgraph_exit:!567.458us|} tokio-runtime-w-1666[003]1318.058435:funcgraph_entry:2.125us|__bpf_prog_put.isra.47(); tokio-runtime-w-1666[003]1318.058438:funcgraph_exit:!602.291us|} tokio-runtime-w-1666[003]1318.058439:funcgraph_exit:!631.791us|} ```shell 這是`kernel/bpf/trampoline.c`中與最后執行的函數`bpf_trampoline_update`對應的源代碼: ```c staticintbpf_trampoline_update(structbpf_trampoline*tr) { structbpf_tramp_image*im; structbpf_tramp_progs*tprogs; u32flags=BPF_TRAMP_F_RESTORE_REGS; boolip_arg=false; interr,total; tprogs=bpf_trampoline_get_progs(tr,&total,&ip_arg); if(IS_ERR(tprogs)) returnPTR_ERR(tprogs); if(total==0){ err=unregister_fentry(tr,tr->cur_image->image); bpf_tramp_image_put(tr->cur_image); tr->cur_image=NULL; tr->selector=0; gotoout; } im=bpf_tramp_image_alloc(tr->key,tr->selector); if(IS_ERR(im)){ err=PTR_ERR(im); gotoout; } if(tprogs[BPF_TRAMP_FEXIT].nr_progs|| tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs) flags=BPF_TRAMP_F_CALL_ORIG|BPF_TRAMP_F_SKIP_FRAME; if(ip_arg) flags|=BPF_TRAMP_F_IP_ARG; err=arch_prepare_bpf_trampoline(im,im->image,im->image+PAGE_SIZE, &tr->func.model,flags,tprogs, tr->func.addr); if(err0) ??????goto?out; ???? ?????WARN_ON(tr->cur_image&&tr->selector==0); WARN_ON(!tr->cur_image&&tr->selector); if(tr->cur_image) /*progsalreadyrunningatthisaddress*/ err=modify_fentry(tr,tr->cur_image->image,im->image); else /*firsttimeregistering*/ err=register_fentry(tr,im->image); if(err) gotoout; if(tr->cur_image) bpf_tramp_image_put(tr->cur_image); tr->cur_image=im; tr->selector++; out: kfree(tprogs); returnerr; }
根據先前的輸出,我們可以看到:
tokio-runtime-w-1666[003]1318.058371:funcgraph_entry:1.250us|arch_prepare_bpf_trampoline(); tokio-runtime-w-1666[003]1318.058373:funcgraph_entry:2.292us|kfree();
在arch_prepare_bpf_trampoline和kfree函數之間沒有其他函數調用,所以很可能第一個函數在err變量中返回了錯誤代碼。讓我們來驗證一下!
通過以下方式在shell中啟動bpftace,我們可以捕獲arch_prepare_bpf_trampoline函數的返回值并將其打印到控制臺上:
bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvallink:%d ",retval);}'
并且在另一個終端中啟動probe后,我們從bpftace得到了以下輸出:
root@pine64-1:/home/exein#bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvallink:%d ",retval);}' Attaching1probe... retvallink:-524
這是因為內核5.15缺乏對aarch64架構的arch_prepare_bpf_trampoline實現,并使用了默認的占位符實現。
int__weak arch_prepare_bpf_trampoline(structbpf_tramp_image*tr,void*image,void*image_end, conststructbtf_func_model*m,u32flags, structbpf_tramp_links*tlinks, void*orig_call) { return-ENOTSUPP; }
因此,這個功能在這個內核版本上是不受支持的。好消息是,多虧了這個補丁[7],它在6.x內核中得到了實現。
讓我們移步到6.x內核。
Linux 6.1
如果我們嘗試在內核 6.1 上運行 probe,我們會得到以下輸出:
root@pine64:/home/exein#./probefile-system-monitor thread'main'panickedat'initializationfailed:ProgramAttachError{program:"lsmpath_mknod",program_error:SyscallError{call:"bpf_raw_tracepoint_open",io_error:Os{code:524,kind:Uncategorized,message:"Noerrorinformation"}}}',src/bin/probe.rs43 note:runwith`RUST_BACKTRACE=1`environmentvariabletodisplayabacktrace
對于內核版本6.1,我們仍然遇到了和5.15內核一樣的錯誤!!!讓我們找出其中的原因。
這次在arch_prepare_bpf_trampoline上運行bpftrace,我們得到了以下輸出:
root@pine64:/home/exein#bpftrace-e'kretprobe:arch_prepare_bpf_trampoline{printf("retvaltplink:%d ",retval);}' Attaching1probe... retvaltplink:284
所以問題不在這里,這個函數不再返回錯誤了。讓我們回到函數調用圖。
這次我們啟動trace-cmd,跳過一些函數以獲得更清晰的輸出:
trace-cmdrecord -pfunction_graph -gbpf_trampoline_link_prog -nbpf_jit_alloc_exec -nkmalloc_trace -narch_prepare_bpf_trampoline -ngeneric_handle_domain_irq -ndo_interrupt_handler -nirq_exit_rcu ./probefile-system-monitor
我們從trace-cmd report中獲得以下輸出:
root@pine64:/home/exein#trace-cmdreport CPU0isempty CPU1isempty CPU3isempty cpus=4 tokio-runtime-w-11886[002]193385.056283:funcgraph_entry:|bpf_trampoline_link_prog(){ tokio-runtime-w-11886[002]193385.056321:funcgraph_entry:+15.042us|mutex_lock(); tokio-runtime-w-11886[002]193385.056373:funcgraph_entry:|__bpf_trampoline_link_prog(){ tokio-runtime-w-11886[002]193385.056395:funcgraph_entry:+14.833us|bpf_attach_type_to_tramp(); tokio-runtime-w-11886[002]193385.056428:funcgraph_entry:|bpf_trampoline_update.isra.23(){ tokio-runtime-w-11886[002]193385.056459:funcgraph_entry:2.917us|bpf_jit_charge_modmem(); tokio-runtime-w-11886[002]193385.056531:funcgraph_entry:|find_vm_area(){ tokio-runtime-w-11886[002]193385.056540:funcgraph_entry:3.000us|find_vmap_area(); tokio-runtime-w-11886[002]193385.056547:funcgraph_exit:+16.208us|} tokio-runtime-w-11886[002]193385.056554:funcgraph_entry:|__alloc_percpu_gfp(){ tokio-runtime-w-11886[002]193385.056563:funcgraph_entry:|pcpu_alloc(){ tokio-runtime-w-11886[002]193385.056568:funcgraph_entry:4.875us|mutex_lock_killable(); tokio-runtime-w-11886[002]193385.056591:funcgraph_entry:|pcpu_find_block_fit(){ tokio-runtime-w-11886[002]193385.056599:funcgraph_entry:8.625us|pcpu_next_fit_region.constprop.38(); tokio-runtime-w-11886[002]193385.056608:funcgraph_exit:+17.166us|} tokio-runtime-w-11886[002]193385.056610:funcgraph_entry:|pcpu_alloc_area(){ tokio-runtime-w-11886[002]193385.056639:funcgraph_entry:9.167us|pcpu_block_update(); tokio-runtime-w-11886[002]193385.056656:funcgraph_entry:7.667us|pcpu_block_update_hint_alloc(); tokio-runtime-w-11886[002]193385.056671:funcgraph_entry:7.750us|pcpu_chunk_relocate(); tokio-runtime-w-11886[002]193385.056679:funcgraph_exit:+69.667us|} tokio-runtime-w-11886[002]193385.056682:funcgraph_entry:7.042us|mutex_unlock(); tokio-runtime-w-11886[002]193385.056703:funcgraph_entry:2.792us|pcpu_memcg_post_alloc_hook(); tokio-runtime-w-11886[002]193385.056712:funcgraph_exit:!148.709us|} tokio-runtime-w-11886[002]193385.056719:funcgraph_exit:!165.250us|} tokio-runtime-w-11886[002]193385.056866:funcgraph_entry:|bpf_image_ksym_add(){ tokio-runtime-w-11886[002]193385.056873:funcgraph_entry:|bpf_ksym_add(){ tokio-runtime-w-11886[002]193385.056882:funcgraph_entry:2.750us|__local_bh_disable_ip(); tokio-runtime-w-11886[002]193385.056897:funcgraph_entry:4.625us|__local_bh_enable_ip(); tokio-runtime-w-11886[002]193385.056905:funcgraph_exit:+32.459us|} tokio-runtime-w-11886[002]193385.056922:funcgraph_entry:7.584us|perf_event_ksymbol(); tokio-runtime-w-11886[002]193385.056944:funcgraph_exit:+78.417us|} tokio-runtime-w-11886[002]193385.057492:funcgraph_entry:|set_memory_ro(){ tokio-runtime-w-11886[002]193385.057501:funcgraph_entry:|change_memory_common(){ tokio-runtime-w-11886[002]193385.057504:funcgraph_entry:|find_vm_area(){ tokio-runtime-w-11886[002]193385.057506:funcgraph_entry:8.875us|find_vmap_area(); tokio-runtime-w-11886[002]193385.057518:funcgraph_exit:+14.250us|} tokio-runtime-w-11886[002]193385.057522:funcgraph_entry:|__change_memory_common(){ tokio-runtime-w-11886[002]193385.057531:funcgraph_entry:|apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057538:funcgraph_entry:|__apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057544:funcgraph_entry:+12.791us|pud_huge(); tokio-runtime-w-11886[002]193385.057559:funcgraph_entry:2.708us|pmd_huge(); tokio-runtime-w-11886[002]193385.057574:funcgraph_entry:+15.125us|change_page_range(); tokio-runtime-w-11886[002]193385.057591:funcgraph_exit:+53.792us|} tokio-runtime-w-11886[002]193385.057597:funcgraph_exit:+66.083us|} tokio-runtime-w-11886[002]193385.057610:funcgraph_exit:+88.125us|} tokio-runtime-w-11886[002]193385.057619:funcgraph_entry:|vm_unmap_aliases(){ tokio-runtime-w-11886[002]193385.057622:funcgraph_entry:|_vm_unmap_aliases.part.77(){ tokio-runtime-w-11886[002]193385.057625:funcgraph_entry:9.125us|mutex_lock(); tokio-runtime-w-11886[002]193385.057637:funcgraph_entry:3.084us|purge_fragmented_blocks_allcpus(); tokio-runtime-w-11886[002]193385.057643:funcgraph_entry:|__purge_vmap_area_lazy(){ tokio-runtime-w-11886[002]193385.057687:funcgraph_entry:|kmem_cache_free(){ tokio-runtime-w-11886[002]193385.057693:funcgraph_entry:+13.250us|__slab_free(); tokio-runtime-w-11886[002]193385.057705:funcgraph_exit:+18.750us|} tokio-runtime-w-11886[002]193385.057718:funcgraph_entry:7.416us|__cond_resched_lock(); tokio-runtime-w-11886[002]193385.057733:funcgraph_exit:+90.042us|} tokio-runtime-w-11886[002]193385.057741:funcgraph_entry:2.792us|mutex_unlock(); tokio-runtime-w-11886[002]193385.057747:funcgraph_exit:!124.666us|} tokio-runtime-w-11886[002]193385.057749:funcgraph_exit:!130.291us|} tokio-runtime-w-11886[002]193385.057756:funcgraph_entry:|__change_memory_common(){ tokio-runtime-w-11886[002]193385.057759:funcgraph_entry:|apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057765:funcgraph_entry:|__apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057768:funcgraph_entry:4.125us|pud_huge(); tokio-runtime-w-11886[002]193385.057778:funcgraph_entry:8.750us|pmd_huge(); tokio-runtime-w-11886[002]193385.057790:funcgraph_entry:4.625us|change_page_range(); tokio-runtime-w-11886[002]193385.057797:funcgraph_exit:+31.958us|} tokio-runtime-w-11886[002]193385.057803:funcgraph_exit:+44.375us|} tokio-runtime-w-11886[002]193385.057817:funcgraph_exit:+61.208us|} tokio-runtime-w-11886[002]193385.057820:funcgraph_exit:!319.292us|} tokio-runtime-w-11886[002]193385.057826:funcgraph_exit:!333.667us|} tokio-runtime-w-11886[002]193385.057840:funcgraph_entry:|set_memory_x(){ tokio-runtime-w-11886[002]193385.057847:funcgraph_entry:|change_memory_common(){ tokio-runtime-w-11886[002]193385.057855:funcgraph_entry:|find_vm_area(){ tokio-runtime-w-11886[002]193385.057858:funcgraph_entry:2.917us|find_vmap_area(); tokio-runtime-w-11886[002]193385.057870:funcgraph_exit:+14.375us|} tokio-runtime-w-11886[002]193385.057876:funcgraph_entry:|vm_unmap_aliases(){ tokio-runtime-w-11886[002]193385.057879:funcgraph_entry:|_vm_unmap_aliases.part.77(){ tokio-runtime-w-11886[002]193385.057882:funcgraph_entry:3.959us|mutex_lock(); tokio-runtime-w-11886[002]193385.057893:funcgraph_entry:3.000us|purge_fragmented_blocks_allcpus(); tokio-runtime-w-11886[002]193385.057900:funcgraph_entry:2.791us|__purge_vmap_area_lazy(); tokio-runtime-w-11886[002]193385.057907:funcgraph_entry:2.709us|mutex_unlock(); tokio-runtime-w-11886[002]193385.057913:funcgraph_exit:+33.708us|} tokio-runtime-w-11886[002]193385.057915:funcgraph_exit:+43.000us|} tokio-runtime-w-11886[002]193385.057922:funcgraph_entry:|__change_memory_common(){ tokio-runtime-w-11886[002]193385.057925:funcgraph_entry:|apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057930:funcgraph_entry:|__apply_to_page_range(){ tokio-runtime-w-11886[002]193385.057933:funcgraph_entry:4.292us|pud_huge(); tokio-runtime-w-11886[002]193385.057945:funcgraph_entry:8.750us|pmd_huge(); tokio-runtime-w-11886[002]193385.057956:funcgraph_entry:3.958us|change_page_range(); tokio-runtime-w-11886[002]193385.058037:funcgraph_exit:+32.083us|} tokio-runtime-w-11886[002]193385.058089:funcgraph_entry:7.667us|irq_enter_rcu(); tokio-runtime-w-11886[002]193385.058233:funcgraph_exit:!308.041us|} tokio-runtime-w-11886[002]193385.058239:funcgraph_exit:!316.709us|} tokio-runtime-w-11886[002]193385.058247:funcgraph_exit:!400.417us|} tokio-runtime-w-11886[002]193385.058255:funcgraph_exit:!415.000us|} tokio-runtime-w-11886[002]193385.058555:funcgraph_entry:8.250us|irq_enter_rcu(); tokio-runtime-w-11886[002]193385.058958:funcgraph_entry:|kallsyms_lookup_size_offset(){ tokio-runtime-w-11886[002]193385.058974:funcgraph_entry:+36.333us|get_symbol_pos(); tokio-runtime-w-11886[002]193385.059017:funcgraph_exit:+59.750us|} tokio-runtime-w-11886[002]193385.059043:funcgraph_entry:|kfree(){ tokio-runtime-w-11886[002]193385.059057:funcgraph_entry:3.000us|__kmem_cache_free(); tokio-runtime-w-11886[002]193385.059065:funcgraph_exit:+22.833us|} tokio-runtime-w-11886[002]193385.059073:funcgraph_exit:#2644.708us|} tokio-runtime-w-11886[002]193385.059079:funcgraph_exit:#2706.292us|} tokio-runtime-w-11886[002]193385.059095:funcgraph_entry:2.792us|mutex_unlock(); tokio-runtime-w-11886[002]193385.059101:funcgraph_exit:#2870.416us|}
這次程序已經通過了arch_prepare_bpf_trampoline、set_memory_ro和set_memory_x,我們看到的最后一個函數是kallsyms_lookup_size_offset。
正如我們在kernel/bpf/trampoline.c中的bpf_trampoline_update函數中所看到的,這里并沒有明確調用kallsyms_lookup_size_offset:
staticintbpf_trampoline_update(structbpf_trampoline*tr,boollock_direct_mutex) { //...OTHERCODE... #ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS again: if((tr->flags&BPF_TRAMP_F_SHARE_IPMODIFY)&& (tr->flags&BPF_TRAMP_F_CALL_ORIG)) tr->flags|=BPF_TRAMP_F_ORIG_STACK; #endif err=arch_prepare_bpf_trampoline(im,im->image,im->image+PAGE_SIZE, &tr->func.model,tr->flags,tlinks, tr->func.addr); if(err0) ??????goto?out; ???? ?????set_memory_ro((long)im->image,1); set_memory_x((long)im->image,1); WARN_ON(tr->cur_image&&tr->selector==0); WARN_ON(!tr->cur_image&&tr->selector); if(tr->cur_image) /*progsalreadyrunningatthisaddress*/ err=modify_fentry(tr,tr->cur_image->image,im->image,lock_direct_mutex); else /*firsttimeregistering*/ err=register_fentry(tr,im->image); #ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS if(err==-EAGAIN){ /*-EAGAINfrombpf_tramp_ftrace_ops_func.Now *BPF_TRAMP_F_SHARE_IPMODIFYisset,wecangeneratethe *trampolineagain,andretryregister. */ /*resetfops->funcandfops->trampolineforre-register*/ tr->fops->func=NULL; tr->fops->trampoline=0; /*resetim->imagememoryattrforarch_prepare_bpf_trampoline*/ set_memory_nx((long)im->image,1); set_memory_rw((long)im->image,1); gotoagain; } #endif if(err) gotoout; if(tr->cur_image) bpf_tramp_image_put(tr->cur_image); tr->cur_image=im; tr->selector++; out: /*Ifanyerrorhappens,restorepreviousflags*/ if(err) tr->flags=orig_flags; kfree(tlinks); returnerr; } ```shell >**注意:**`bpf_trampoline_update`的實現與之前的內核5.15稍有不同。 `kallsyms_lookup_size_offset`的調用被隱藏在另一個函數內部。我們在函數圖中看不到它,因為編譯器將其內聯了。 看起來`kallsyms_lookup_size_offset`是由`ftrace_location`調用的: ```c unsignedlongftrace_location(unsignedlongip) { structdyn_ftrace*rec; unsignedlongoffset; unsignedlongsize; rec=lookup_rec(ip,ip); if(!rec){ if(!kallsyms_lookup_size_offset(ip,&size,&offset)) gotoout; /*mapsym+0to__fentry__*/ if(!offset) rec=lookup_rec(ip,ip+size-1); } if(rec) returnrec->ip; out: return0; }
ftrace_location被register_fentry調用,而register_fentry在調用ftrace_location之后,在struct bpf_trampoline *tr的fops字段上包含了一次檢查。
/*firsttimeregistering*/ staticintregister_fentry(structbpf_trampoline*tr,void*new_addr) { void*ip=tr->func.addr; unsignedlongfaddr; intret; faddr=ftrace_location((unsignedlong)ip); if(faddr){ if(!tr->fops) return-ENOTSUPP; tr->func.ftrace_managed=true; } if(bpf_trampoline_module_get(tr)) return-ENOENT; if(tr->func.ftrace_managed){ ftrace_set_filter_ip(tr->fops,(unsignedlong)ip,0,1); ret=register_ftrace_direct_multi(tr->fops,(long)new_addr); }else{ ret=bpf_arch_text_poke(ip,BPF_MOD_CALL,NULL,new_addr); } if(ret) bpf_trampoline_module_put(tr); returnret; }
確實,如果tr->fops為false,該函數將返回錯誤-ENOTSUPP。
讓我們找出tr->fops是在哪里初始化的。
如果我們是正確的,那么創建trampoline的地方應該在bpf_trampoline_lookup函數內部。
staticstructbpf_trampoline*bpf_trampoline_lookup(u64key) { structbpf_trampoline*tr; structhlist_head*head; inti; mutex_lock(&trampoline_mutex); head=&trampoline_table[hash_64(key,TRAMPOLINE_HASH_BITS)]; hlist_for_each_entry(tr,head,hlist){ if(tr->key==key){ refcount_inc(&tr->refcnt); gotoout; } } tr=kzalloc(sizeof(*tr),GFP_KERNEL); if(!tr) gotoout; #ifdefCONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS tr->fops=kzalloc(sizeof(structftrace_ops),GFP_KERNEL); if(!tr->fops){ kfree(tr); tr=NULL; gotoout; } tr->fops->private=tr; tr->fops->ops_func=bpf_tramp_ftrace_ops_func; #endif tr->key=key; INIT_HLIST_NODE(&tr->hlist); hlist_add_head(&tr->hlist,head); refcount_set(&tr->refcnt,1); mutex_init(&tr->mutex); for(i=0;iprogs_hlist[i]); out: mutex_unlock(&trampoline_mutex); returntr; }
在分配之后,只有在出現CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS標志時,才會填充trampoline的fops字段。這個標志依賴于HAVE_CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS標志,而這個標志在aarch64上不存在。
結論
當前情況下,由于缺少_ftrace直接調用_功能,無法在aarch64上使用BPF LSM。幸運的是,當前的mainline分支已經合并了一個補丁[8],該補丁將在aarch64上啟用LSMs(以及其他功能)。
預計這些變化將會在下一個6.4版的Linux內核中發布。
審核編輯:湯梓紅
-
內核
+關注
關注
3文章
1366瀏覽量
40236 -
cpu
+關注
關注
68文章
10829瀏覽量
211193 -
Linux
+關注
關注
87文章
11232瀏覽量
208952 -
程序
+關注
關注
116文章
3778瀏覽量
80860
原文標題:探索aarch64架構上使用ftrace的BPF LSM
文章出處:【微信號:LinuxDev,微信公眾號:Linux閱碼場】歡迎添加關注!文章轉載請注明出處。
發布評論請先 登錄
相關推薦
評論