一、死機(jī)系統(tǒng)簡(jiǎn)圖
當(dāng)用戶對(duì)手機(jī)進(jìn)行操作時(shí), 對(duì)應(yīng)的數(shù)據(jù)流將是下面一個(gè)概括的流程圖.
HW 如傳感器, 觸摸屏(TP), 物理按鍵(KP)等感知到用戶操作后,觸發(fā)相關(guān)的中斷(ISR) 傳遞給Kernel, Kernel 相關(guān)的driver 對(duì)這些ISR 進(jìn)行處理后,轉(zhuǎn)化成標(biāo)準(zhǔn)的InputEvent.
User Space 的System Server 中的Input System 則持續(xù)監(jiān)聽(tīng)Kernel 傳遞上來(lái)的原始InputEvent, 對(duì)其進(jìn)行進(jìn)一步的處理后, 變成上層APP 可直接處理的Input Event, 如button 點(diǎn)擊, 長(zhǎng)按, 滑動(dòng)等等.
APP 對(duì)相關(guān)的事件進(jìn)行處理后,請(qǐng)求更新相關(guān)的邏輯界面,而這個(gè)則由System Server 中的WMS 等來(lái)負(fù)責(zé).
相關(guān)的邏輯界面更新后(Z-Window), 則會(huì)請(qǐng)求SurfaceFlinger 來(lái)產(chǎn)生FrameBuffer 數(shù)據(jù), SurfaceFlinger 則會(huì)利用GPU 等來(lái)計(jì)算生成.
Display System/Driver 則會(huì)將FrameBuffer 中的數(shù)據(jù)更新顯示出來(lái), 這樣用戶才能感知到他的操作行為.
二、死機(jī)的可能原因
原則上上面流程中,每一步出現(xiàn)問(wèn)題,都可能引發(fā)死機(jī)問(wèn)題. 大的方面講,可以分成 硬件HW 和 軟件SW 兩個(gè)層次, 硬件HW 不在我們的講訴之內(nèi).
1.軟件SW 上,死機(jī)的原因可以分成兩種:
邏輯行為異常
邏輯判斷錯(cuò)誤
邏輯設(shè)計(jì)錯(cuò)誤
邏輯卡頓(block)
死循環(huán) (Deadloop)
死鎖 (Deadlock)
2.從具體的原因上將,可以進(jìn)一步分成:
(1). Input Driver
無(wú)法接收HW 的中ISR,產(chǎn)生原始的InputEvent, 或者產(chǎn)生的InputEvent 異常.
(2). Input System
無(wú)法監(jiān)聽(tīng)Kernel 傳遞上來(lái)的原始InputEvent, 或者轉(zhuǎn)換與傳遞異常.
(3). System Logic
無(wú)法正常響應(yīng)Input System 傳遞過(guò)來(lái)的InputEvent, 或者響應(yīng)出錯(cuò).
(4). WMS/Surfaceflinger 行為異常
WMS/SF 無(wú)法正確的對(duì)Z-Window 進(jìn)行疊加轉(zhuǎn)換
(5). Display System3
無(wú)法更新Framebuffer 數(shù)據(jù),或者填充的數(shù)據(jù)錯(cuò)誤
(6). LCM Driver
無(wú)法將Framebuffer 數(shù)據(jù)顯示在LCM 上
(7). CPU Scheduler/File System/Memory Manager 等Kernel 支撐性模塊.
系統(tǒng)支撐性模塊無(wú)法正常, 導(dǎo)致整個(gè)流程都沒(méi)法正常進(jìn)行, 如CPU Scheduler 無(wú)法正常執(zhí)行,導(dǎo)致thread 都無(wú)法被調(diào)度, memory 耗盡導(dǎo)致系統(tǒng)無(wú)法快速申請(qǐng)到memory, file system 卡住導(dǎo)致無(wú)法進(jìn)行讀寫操作等.
3. 對(duì)應(yīng)硬件HW hang, 經(jīng)常見(jiàn)得的情況有:
Power 異常
Clock/26M/32K 失效
CPU Core 無(wú)法啟動(dòng), 執(zhí)行異常等.
Memory & Memory Controller.
Fail IC.
三、死機(jī)問(wèn)題需要分析哪些數(shù)據(jù)
1. 死機(jī)分析數(shù)據(jù)
俗話說(shuō), 巧婦難為無(wú)米之炊, 死機(jī)分析, 同樣需要獲取第一手的資料, 方可分析問(wèn)題. 那么哪些數(shù)據(jù)可以用來(lái)分析死機(jī)呢。
大概的講,可以分成空間數(shù)據(jù)和時(shí)間數(shù)據(jù)。空間數(shù)據(jù),即當(dāng)時(shí)現(xiàn)場(chǎng)環(huán)境,如有哪些process 在運(yùn)行,CPU 的執(zhí)行情況,memory 的利用情況,以及具體的process 的memory 數(shù)據(jù)等。 時(shí)間數(shù)據(jù),即行為上的連續(xù)數(shù)據(jù),比如某個(gè)Process 在一段時(shí)間內(nèi)執(zhí)行了哪些操作,某段時(shí)間內(nèi)CPU 利用率的變化等。通常時(shí)空都是交融的,對(duì)應(yīng)我們抓取資訊時(shí)往往也是。
2. 哪些數(shù)據(jù)可以用來(lái)分析.
Backtrace
Backtrace 又分成Java backtrace, Native Backtrace, Kernel Backtrace. 它是分析死機(jī)的非常重要的手段,我們可以快速的知道,對(duì)應(yīng)的process/thread 在當(dāng)時(shí)正在執(zhí)行哪些動(dòng)作,卡住哪里等。可以非常直觀的分析死機(jī)現(xiàn)場(chǎng)。
另外還有一類trace 為ftrace/systrace, 除非特別開(kāi)啟, 以及在特定的流程點(diǎn)上添加, 這類trace 往往不是很詳盡, 但也具有比較好的參考作用.
系統(tǒng)運(yùn)行環(huán)境
客觀的反應(yīng)系統(tǒng)的執(zhí)行環(huán)境,通常包括如CPU 利用率,Memory 使用情況, Storage 剩余情況等。這些資料也非常重要,比如可以快速的知道,當(dāng)時(shí)是否有Process 在瘋狂的執(zhí)行,當(dāng)時(shí)是不是處于嚴(yán)重的low memory 情況, Storage 是否有耗盡的情況發(fā)生等。
程序執(zhí)行環(huán)境
客觀的反應(yīng)當(dāng)時(shí)某個(gè)程序(Kernel 也可以看成一個(gè)程序)的執(zhí)行現(xiàn)場(chǎng), 此類資訊通常包括如process 的coredump, java heap prof, kernel 的memory dump 等。完整的執(zhí)行環(huán)境,我們可以快速的知道當(dāng)時(shí)具體的變量的值,寄存器值等,可以精細(xì)的分析問(wèn)題。
其他的一些資訊
這些資訊相對(duì)來(lái)說(shuō),比較零散了,如通常的LOG, 一些debug 命令的結(jié)果, JTAG & CVD 的數(shù)據(jù)等。
四 、Backtrace 分析
1. Java Backtrace
從Java Backtrace, 我們可以知道當(dāng)時(shí)Process 的虛擬機(jī)執(zhí)行狀態(tài). Java Backtrace 依靠SignalCatcher 來(lái)抓取.
Google default: SignalCatcher catchs SIGQUIT(3), and then print the java backtrace to /data/anr/trace.txt
MTK Enhance: SignalCatcher catchs SIGSTKFLT(16), and then print the java backtrace to /data/anr/mtktrace.txt( 僅僅 Android ICS 4.0 <-> Android M 6.0版本)
You can update system properties dalvik.vm.stack-trace-file to Change the address, default is /data/anr/traces.txt
1.1 抓取的方式
在ENG Build 中
adb remount adb shell chmod 0777 data/anr adb shell kill -3 pid adb pull /data/anr
在User Build 中
沒(méi)有root 權(quán)限的情況下,只能直接pull 已經(jīng)存在的backtrace.
adb pull /data/anr
你可以嘗試直接使用下面的腳本一次性抓取
adb remount adb shell chmod 0777 data/anr adb shell ps @echo off set processid= set /p processid=Please Input process id: @echo on adb shell kill -3 %processid% @echo off ping -n 8 127.0.0.1>nul @echo on adb pull data/anr/traces.txt trace-%processid%.txt pause
1.2 JavaBacktrace 解析
Android 比較新的版本的java backtrace, 除了直接的thread backtrace 之外, 同時(shí)也會(huì)把ART 的一些基本狀態(tài)也打印出來(lái), 比較方便觀察ART 的基本狀態(tài), 比如:
----- pid 1051 at 2018-09-29 23:23:52 ----- Cmd line: system_server Build fingerprint: 'XXXXX/A73/A73:8.1.0/O11019/1537977601:user/release-keys' ABI: 'arm64' Build type: optimized Zygote loaded classes=5413 post zygote classes=6049 Intern table: 61702 strong; 12632 weak JNI: CheckJNI is off; globals=10050 (plus 1015 weak) Libraries: /system/lib64/libandroid.so /system/lib64/libandroid_servers.so /system/lib64/libcompiler_rt.so /system/lib64/libdcfdecoderjni.so /system/lib64/libjavacrypto.so /system/lib64/libjnigraphics.so /system/lib64/libmedia_jni.so /system/lib64/libmediatek_exceptionlog.so /system/lib64/libperfframeinfo_jni.so /system/lib64/librutils.so /system/lib64/libsoundpool.so /system/lib64/libwebviewchromium_loader.so /system/lib64/libwifi-service.so /vendor/lib64/libnativecheck-jni.so libjavacore.so libopenjdk.so (17) Heap: 9% free, 80MB/89MB; 1893251 objects Dumping cumulative Gc timings Start Dumping histograms for 12601 iterations for concurrent copying ProcessMarkStack: Sum: 3760.617s 99% C.I. 35.990ms-1218.784ms Avg: 298.438ms Max: 14617.568ms ScanImmuneSpaces: Sum: 216.241s 99% C.I. 2.719ms-110.340ms Avg: 17.160ms Max: 1644.161ms FlipOtherThreads: Sum: 136.969s 99% C.I. 0.517ms-174.411ms Avg: 10.869ms Max: 5585.050ms VisitConcurrentRoots: Sum: 119.937s 99% C.I. 1.928ms-50.934ms Avg: 9.518ms Max: 1356.509ms ClearFromSpace: Sum: 66.775s 99% C.I. 0.168ms-21.473ms Avg: 5.299ms Max: 118.068ms SweepSystemWeaks: Sum: 58.215s 99% C.I. 0.196ms-19.387ms Avg: 4.619ms Max: 342.553ms GrayAllDirtyImmuneObjects: Sum: 57.570s 99% C.I. 0.133ms-123.041ms Avg: 4.568ms Max: 2010.547ms ForwardSoftReferences: Sum: 37.474s 99% C.I. 0.037ms-13.834ms Avg: 2.973ms Max: 118.213ms EnqueueFinalizerReferences: Sum: 30.905s 99% C.I. 0.094ms-22.239ms Avg: 2.452ms Max: 248.969ms MarkingPhase: Sum: 24.277s 99% C.I. 0.257ms-51.155ms Avg: 1.926ms Max: 3743.053ms VisitNonThreadRoots: Sum: 23.507s 99% C.I. 0.065ms-15.260ms Avg: 1.865ms Max: 190.694ms ProcessReferences: Sum: 17.221s 99% C.I. 9.019us-4689.544us Avg: 683.321us Max: 94215us EmptyRBMarkBitStack: Sum: 15.343s 99% C.I. 0.026ms-13.119ms Avg: 1.217ms Max: 145.672ms ThreadListFlip: Sum: 10.730s 99% C.I. 128.387us-25549.046us Avg: 851.524us Max: 1555533us InitializePhase: Sum: 7.689s 99% C.I. 125us-9536.627us Avg: 610.256us Max: 434902us FlipThreadRoots: Sum: 4.346s 99% C.I. 16.387us-13773.217us Avg: 344.930us Max: 241943us RecordFree: Sum: 4.106s 99% C.I. 95us-2474.875us Avg: 325.895us Max: 31656us SweepAllocSpace: Sum: 2.746s 99% C.I. 32.035us-6375.082us Avg: 217.946us Max: 397272us SweepLargeObjects: Sum: 2.611s 99% C.I. 10us-1976.352us Avg: 207.248us Max: 36205us ResumeOtherThreads: Sum: 2.453s 99% C.I. 8.115us-8079.599us Avg: 194.712us Max: 86018us ReclaimPhase: Sum: 2.086s 99% C.I. 9us-4266.285us Avg: 165.567us Max: 155093us ResumeRunnableThreads: Sum: 1.759s 99% C.I. 8.063us-2676.218us Avg: 139.607us Max: 87029us MarkStackAsLive: Sum: 782.752ms 99% C.I. 5us-799.362us Avg: 62.118us Max: 41297us MarkZygoteLargeObjects: Sum: 741.028ms 99% C.I. 12us-899.899us Avg: 58.807us Max: 24679us (Paused)GrayAllNewlyDirtyImmuneObjects: Sum: 651.153ms 99% C.I. 12us-373.754us Avg: 51.674us Max: 14328us ClearRegionSpaceCards: Sum: 412.842ms 99% C.I. 8us-199.633us Avg: 32.762us Max: 16297us (Paused)SetFromSpace: Sum: 357.553ms 99% C.I. 3us-286.340us Avg: 28.374us Max: 14500us SwapBitmaps: Sum: 151.902ms 99% C.I. 4us-99.729us Avg: 12.054us Max: 5447us Sweep: Sum: 110.635ms 99% C.I. 2us-49.924us Avg: 8.779us Max: 1087us (Paused)FlipCallback: Sum: 105.455ms 99% C.I. 2us-99.800us Avg: 8.368us Max: 8433us (Paused)ClearCards: Sum: 89.377ms 99% C.I. 1000ns-199016ns Avg: 295ns Max: 17545000ns UnBindBitmaps: Sum: 17.838ms 99% C.I. 0.250us-49.765us Avg: 1.415us Max: 1516us Done Dumping histograms concurrent copying paused: Sum: 10.999s 99% C.I. 320.432us-63766.026us Avg: 872.888us Max: 1214342us concurrent copying total time: 4607.007s mean time: 365.606ms concurrent copying freed: 4075114399 objects with total size 152GB concurrent copying throughput: 884547/s / 33MB/s Cumulative bytes moved 115136403208 Cumulative objects moved 2160415247 Total time spent in GC: 4607.007s Mean GC size throughput: 32MB/s Mean GC object throughput: 884504 objects/s Total number of allocations 4076810306 Total bytes allocated 145GB Total bytes freed 145GB Free memory 8MB Free memory until GC 8MB Free memory until OOME 431MB Total memory 89MB Max memory 512MB Zygote space size 652KB Total mutator paused time: 10.999s Total time waiting for GC to complete: 71.458s Total GC count: 12601 Total GC time: 4607.007s Total blocking GC count: 3305 Total blocking GC time: 1302.402s Histogram of GC count per 10000 ms: 0:11034,1:7128,2:1807,3:417,4:87,5:21,6:6,7:3,8:2,9:2,11:2,12:2,17:1 Histogram of blocking GC count per 10000 ms: 0:17706,1:2336,2:443,3:26,4:1 Registered native bytes allocated: 57354982 /system/framework/oat/arm64/com.android.location.provider.odex: speed /system/priv-app/FusedLocation/oat/arm64/FusedLocation.odex: speed /system/priv-app/Telecom/oat/arm64/Telecom.odex: speed /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/com.coloros.statistics.odex: quicken /system/priv-app/SettingsProvider/oat/arm64/SettingsProvider.odex: speed /system/framework/oat/arm64/services.odex: speed /system/framework/oat/arm64/ethernet-service.odex: speed /system/framework/oat/arm64/wifi-service.odex: speed /system/framework/oat/arm64/com.android.location.provider.odex: speed /system/framework/oat/arm64/mediatek-services.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /data/dalvik-cache/arm64/vendor@app@LPPeService@LPPeService.apk@classes.dex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /data/dalvik-cache/arm64/vendor@app@NlpService@NlpService.apk@classes.dex: quicken Running non JIT
對(duì)于死機(jī)情況來(lái)說(shuō), 我們比較關(guān)注java heap 的memory 使用與GC 情況:
Heap: 9% free, 80MB/89MB; 1893251 objects Free memory 8MB Free memory until GC 8MB Free memory until OOME 431MB Total memory 89MB Max memory 512MB
==》 可以簡(jiǎn)單知道是否有java 層的object leaks, 以及觸發(fā)GC 的情況
除去這些之后, 我們?cè)賮?lái)看java backtrace 的具體資訊.
下面是一小段system server 的java backtrace 的開(kāi)始
----- pid 682 at 2014-07-30 1853 ----- Cmd line: system_server JNI: CheckJNI is off; workarounds are off; pins=4; globals=1484 (plus 50 weak) DALVIK THREADS: (mutexes: tll=0 tsl=0 tscl=0 ghl=0) <== 只有老版本有. "main" prio=5 tid=1 NATIVE | group="main" sCount=1 dsCount=0 obj=0x4193fde0 self=0x418538f8 | sysTid=682 nice=-2 sched=0/0 cgrp=apps handle=1074835940 | state=S schedstat=( 47858718206 26265263191 44902 ) utm=4074 stm=711 core=0 at android.os.MessageQueue.nativePollOnce(Native Method) at android.os.MessageQueue.next(MessageQueue.java:138) at android.os.Looper.loop(Looper.java:150) at com.android.server.ServerThread.initAndLoop(SystemServer.java:1468) at com.android.server.SystemServer.main(SystemServer.java:1563) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:515) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:829) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:645) at dalvik.system.NativeStart.main(Native Method)
我們一行一行來(lái)解析.
0# 最開(kāi)始是 -----PID at Time 然后接著 Cmd line: process name
1# the backtrace header: dvm thread :“DALVIK THREADS:”
新版本中有包括 thread 量, 如: "DALVIK THREADS (214):" 可以觀察到是否有java thread leaks.
2# Global DVM mutex value: if 0 unlock, else lock
tll: threadListLock,
tsl: threadSuspendLock,
tscl: threadSuspendCountLock
ghl: gcHeapLock
(mutexes: tll=0 tsl=0 tscl=0 ghl=0)
3# thread name, java thread Priority, [daemon], DVM thread id, DVM thread status.
"main" -> main thread -> activity thread
prio -> java thread priority default is 5, (nice =0, linux thread priority 120), domain is [1,10]
DVM thread id, NOT linux thread id
DVM thread Status:
ZOMBIE, RUNNABLE, TIMED_WAIT, MONITOR, WAIT, INITALIZING,STARTING, NATIVE, VMWAIT, SUSPENDED,UNKNOWN
"main" prio=5 tid=1 NATIVE
4# DVM thread status
group: default is “main”
Compiler,JDWP,Signal Catcher,GC,FinalizerWatchdogDaemon,FinalizerDaemon,ReferenceQueueDaemon are system group
sCount: thread suspend count
dsCount: thread dbg suspend count
obj: thread obj address
Sef: thread point address
group="main" sCount=1 dsCount=0 obj=0x4193fde0 self=0x418538f8
5 Linux thread status
sysTId: linux thread tid
Nice: linux thread nice value
sched: cgroup policy/gourp id
cgrp: c group
handle: handle address
sysTid=682 nice=-2 sched=0/0 cgrp=apps handle=1074835940
6 CPU Sched stat
Schedstat (Run CPU Clock/ns, Wait CPU Clock/ns, Slice times)
utm: utime, user space time(jiffies)
stm: stime, kernel space time(jiffies)
Core now running in cpu.
state=S schedstat=( 47858718206 26265263191 44902 ) utm=4074 stm=711 core=0
五、常見(jiàn) Java backtrace 舉例
1.ActivityThread 正常狀態(tài)/ActivityThread Normal Case
Message Queue is empty, and thread wait for next message. "main" prio=5 tid=1 NATIVE | group="main" sCount=1 dsCount=0 obj=0x4193fde0 self=0x418538f8 | sysTid=11559 nice=0 sched=0/0 cgrp=apps/bg_non_interactive handle=1074835940 | state=S schedstat=( 2397315020 9177261498 7975 ) utm=100 stm=139 core=1 at android.os.MessageQueue.nativePollOnce(Native Method) at android.os.MessageQueue.next(MessageQueue.java:138) at android.os.Looper.loop(Looper.java:150) at android.app.ActivityThread.main(ActivityThread.java:5299) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:515) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:829) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:645) at dalvik.system.NativeStart.main(Native Method)
2.Java Backtrace Monitor case
Synchronized Lock: 等待同步鎖時(shí)的backtrace. "AnrMonitorThread" prio=5 tid=24 MONITOR | group="main" sCount=1 dsCount=0 obj=0x41fd80c8 self=0x551ac808 | sysTid=711 nice=0 sched=0/0 cgrp=apps handle=1356369328 | state=S schedstat=( 8265377638 4744771625 6892 ) utm=160 stm=666 core=0 at com.android.server.am.ANRManager$AnrDumpMgr.dumpAnrDebugInfoLocked(SourceFile:~832) - waiting to lock <0x42838968> (a com.android.server.am.ANRManager$AnrDumpRecord) held by tid=20 (ActivityManager) at com.android.server.am.ANRManager$AnrDumpMgr.dumpAnrDebugInfo(SourceFile:824) at com.android.server.am.ANRManager$AnrMonitorHandler.handleMessage(SourceFile:220) at android.os.Handler.dispatchMessage(Handler.java:110) at android.os.Looper.loop(Looper.java:193) at android.os.HandlerThread.run(HandlerThread.java:61)
3.執(zhí)行JNI code 未返回,狀態(tài)是native 的情況
"WifiP2pService" prio=5 tid=37 NATIVE | group="main" sCount=1 dsCount=0 obj=0x427a9910 self=0x55f088d8 | sysTid=734 nice=0 sched=0/0 cgrp=apps handle=1443230288 | state=S schedstat=( 91121772 135245305 170 ) utm=7 stm=2 core=1 #00 pc 00032700 /system/lib/libc.so (epoll_wait+12) #01 pc 000105e3 /system/lib/libutils.so (android::pollInner(int)+94) #02 pc 00010811 /system/lib/libutils.so (android::pollOnce(int, int*, int*, void**)+92) #03 pc 0006c96d /system/lib/libandroid_runtime.so (android::pollOnce(_JNIEnv*, int)+22) #04 pc 0001eacc /system/lib/libdvm.so (dvmPlatformInvoke+112) #05 pc 0004fed9 /system/lib/libdvm.so (dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*)+484) #06 pc 00027ea8 /system/lib/libdvm.so #07 pc 0002f4b0 /system/lib/libdvm.so (dvmMterpStd(Thread*)+76) #08 pc 0002c994 /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+188) #09 pc 000632a5 /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)+340) #10 pc 000632c9 /system/lib/libdvm.so (dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...)+20) #11 pc 00057961 /system/lib/libdvm.so #12 pc 0000dd40 /system/lib/libc.so (__thread_entry+72) at android.os.MessageQueue.nativePollOnce(Native Method) at android.os.MessageQueue.next(MessageQueue.java:138) at android.os.Looper.loop(Looper.java:150) at android.os.HandlerThread.run(HandlerThread.java:61)
4. 執(zhí)行object.wait 等待狀態(tài)
"AsyncTask #1" prio=5 tid=33 WAIT | group="main" sCount=1 dsCount=0 obj=0x427a8480 self=0x56036b40 | sysTid=733 nice=10 sched=0/0 cgrp=apps/bg_non_interactive handle=1443076000 | state=S schedstat=( 1941480839 10140523154 4229 ) utm=119 stm=75 core=0 at java.lang.Object.wait(Native Method) - waiting on <0x427a8618> (a java.lang.VMThread) held by tid=33 (AsyncTask #1) at java.lang.Thread.parkFor(Thread.java:1212) at sun.misc.Unsafe.park(Unsafe.java:325) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:157) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:410) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1035) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1097) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) at java.lang.Thread.run(Thread.java:848)
5. Suspend 狀態(tài),通常表明是抓取backtrace 時(shí),當(dāng)時(shí)還正在執(zhí)行java code, 被強(qiáng)制suspend 的情況
"FileObserver" prio=5 tid=23 SUSPENDED | group="main" sCount=1 dsCount=0 obj=0x41fd1dc8 self=0x551abda0 | sysTid=710 nice=0 sched=0/0 cgrp=apps handle=1427817920 | state=S schedstat=( 130152222 399783851 383 ) utm=9 stm=4 core=0 #00 pc 000329f8 /system/lib/libc.so (__futex_syscall3+8) #01 pc 000108cc /system/lib/libc.so (__pthread_cond_timedwait_relative+48) #02 pc 0001092c /system/lib/libc.so (__pthread_cond_timedwait+64) #03 pc 00055a93 /system/lib/libdvm.so #04 pc 0005614d /system/lib/libdvm.so (dvmChangeStatus(Thread*, ThreadStatus)+34) #05 pc 0004ae7f /system/lib/libdvm.so #06 pc 0004e353 /system/lib/libdvm.so #07 pc 000518d5 /system/lib/libandroid_runtime.so #08 pc 0008af9f /system/lib/libandroid_runtime.so #09 pc 0001eacc /system/lib/libdvm.so (dvmPlatformInvoke+112) #10 pc 0004fed9 /system/lib/libdvm.so (dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*)+484) #11 pc 00027ea8 /system/lib/libdvm.so #12 pc 0002f4b0 /system/lib/libdvm.so (dvmMterpStd(Thread*)+76) #13 pc 0002c994 /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+188) #14 pc 000632a5 /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)+340) #15 pc 000632c9 /system/lib/libdvm.so (dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...)+20) #16 pc 00057961 /system/lib/libdvm.so #17 pc 0000dd40 /system/lib/libc.so (__thread_entry+72) at android.os.FileObserver$ObserverThread.observe(Native Method) at android.os.FileObserver$ObserverThread.run(FileObserver.java:88)
六、Native Backtrace
1. Native Backtrace 抓取方式
1.利用debuggerd 抓取
MTK 已經(jīng)制作了一個(gè)利用debuggerd 抓取Native backtrace 的tool RTT(Runtime Trace), 對(duì)應(yīng)的執(zhí)行命令是:
rtt built timestamp (Apr 18 2014 1521) USAGE : rtt [-h] -f function -p pid [-t tid] -f funcion : current support functions: bt (Backtrace function) -p pid : pid to trace -t tid : tid to trace -n name : process name to trace -h : help menu
這個(gè)rtt 目前只有在eng/userdebug 中可以使用; 另外大家可以使用 google 默認(rèn)的debuggerd 來(lái)抓取:
adb shell debuggerd -b pid
注意的是這些都是要依賴debuggerd 的流程正常, 以及對(duì)應(yīng)抓取process 的debuggerd_signal_handler 行為正常, 否則抓取不到.
2. 添加代碼直接抓取
Google 默認(rèn)提供了CallStack API, 請(qǐng)參考
system/core/include/libutils/CallStack.h system/core/libutils/CallStack.cpp
可快速打印單個(gè)線程的backtrace.
2.解析Native Backtrace
你可以使用GDB, 或者addr2line 等 tool 來(lái)解析抓回的Native Backtrace, 從而知道當(dāng)時(shí)正在執(zhí)行的native 代碼.
如addr2line 執(zhí)行
arm-linux-androideabi-addr2line -f -C -e symbols address
七、Kernel Backtrace
1.Kernel Backtrace 抓取方式
1 運(yùn)行時(shí)抓取
AEE/RTT 工具
Proc System
cat proc/pid/task/tid/stack
Sysrq-trigger
adb shell cat proc/kmsg > kmsg.txt adb shell "echo 8 > proc/sys/kernel/printk“ //修改printk loglevel adb shell "echo t > /proc/sysrq-trigger“ //打印所有的backtrace adb shell "echo w > /proc/sysrq-trigger“//打印'-D' status 'D' 的 process
KDB
Long press volume UP and DOWN more then 10s btpDisplay stack for process bta [DRSTCZEUIMA] Display stack all processes btc Backtrace current process on each cpu btt Backtrace process given its struct task add
2.添加代碼直接抓取
#include當(dāng)前thread: dump_stack(); 其他thread: show_stack(task, NULL);
3. Process/Thread 狀態(tài)
"R (running)", /* 0/
"S (sleeping)", /1/
"D (disk sleep)", /2/
"T (stopped)", /4/
"t (tracing stop)", /8/
"Z (zombie)", /16/
"X (dead)", /32/
"x (dead)", /64/
"K (wakekill)", /128/
"W (waking)", /256 */
通常一般的Process 處于的狀態(tài)都是S (sleeping), 而如果一旦發(fā)現(xiàn)處于如D (disk sleep), T (stopped), Z (zombie) 等就要認(rèn)真審查.
八、幾種典型的異常情況
1. Deadlock
下面這個(gè)case 可以看到PowerManagerService, ActivityManager, WindowManager 相互之間發(fā)生deadlock.
"PowerManagerService" prio=5 tid=25 MONITOR | group="main" sCount=1 dsCount=0 obj=0x42bae270 self=0x6525d5c0 | sysTid=913 nice=0 sched=0/0 cgrp=apps handle=1696964440 | state=S schedstat=( 5088939411 10237027338 34016 ) utm=232 stm=276 core=2 at com.android.server.am.ActivityManagerService.bindService(ActivityManagerService.java:~14079) - waiting to lock <0x42aa0f78> (a com.android.server.am.ActivityManagerService) held by tid=16 (ActivityManager) at android.app.ContextImpl.bindServiceCommon(ContextImpl.java:1665) at android.app.ContextImpl.bindService(ContextImpl.java:1648) at com.android.server.power.PowerManagerService.bindSmartStandByService(PowerManagerService.java:4090) at com.android.server.power.PowerManagerService.handleSmartStandBySettingChangedLocked(PowerManagerService.java:4144) at com.android.server.power.PowerManagerService.access$5600(PowerManagerService.java:102) at com.android.server.power.PowerManagerService$SmartStandBySettingObserver.onChange(PowerManagerService.java:4132) at android.database.ContentObserver$NotificationRunnable.run(ContentObserver.java:181) at android.os.Handler.handleCallback(Handler.java:809) at android.os.Handler.dispatchMessage(Handler.java:102) at android.os.Looper.loop(Looper.java:139) at android.os.HandlerThread.run(HandlerThread.java:58) "ActivityManager" prio=5 tid=16 MONITOR | group="main" sCount=1 dsCount=0 obj=0x42aa0d08 self=0x649166b0 | sysTid=902 nice=-2 sched=0/0 cgrp=apps handle=1687251744 | state=S schedstat=( 39360881460 25703061063 69675 ) utm=1544 stm=2392 core=2 at com.android.server.wm.WindowManagerService.setAppVisibility(WindowManagerService.java:~4783) - waiting to lock <0x42d17590> (a java.util.HashMap) held by tid=12 (WindowManager) at com.android.server.am.ActivityStack.stopActivityLocked(ActivityStack.java:2432) at com.android.server.am.ActivityStackSupervisor.activityIdleInternalLocked(ActivityStackSupervisor.java:2103) at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.activityIdleInternal(ActivityStackSupervisor.java:2914) at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.handleMessage(ActivityStackSupervisor.java:2921) at android.os.Handler.dispatchMessage(Handler.java:110) at android.os.Looper.loop(Looper.java:147) at com.android.server.am.ActivityManagerService$AThread.run(ActivityManagerService.java:2112) "WindowManager" prio=5 tid=12 MONITOR | group="main" sCount=1 dsCount=0 obj=0x42a92550 self=0x6491dd48 | sysTid=898 nice=-4 sched=0/0 cgrp=apps handle=1687201104 | state=S schedstat=( 60734070955 41987172579 219755 ) utm=4659 stm=1414 core=1 at com.android.server.power.PowerManagerService.setScreenBrightnessOverrideFromWindowManagerInternal(PowerManagerService.java:~3207) - waiting to lock <0x42a95140> (a java.lang.Object) held by tid=25 (PowerManagerService) at com.android.server.power.PowerManagerService.setScreenBrightnessOverrideFromWindowManager(PowerManagerService.java:3196) at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLockedInner(WindowManagerService.java:9686) at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLockedLoop(WindowManagerService.java:8923) at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLocked(WindowManagerService.java:8879) at com.android.server.wm.WindowManagerService.access$500(WindowManagerService.java:170) at com.android.server.wm.WindowManagerService$H.handleMessage(WindowManagerService.java:7795) at android.os.Handler.dispatchMessage(Handler.java:110) at android.os.Looper.loop(Looper.java:147) at android.os.HandlerThread.run(HandlerThread.java:58)
2. 執(zhí)行JNI native code 后一去不復(fù)返
"main" prio=5 tid=1 NATIVE | group="main" sCount=1 dsCount=0 obj=0x41bb3d98 self=0x41ba2878 | sysTid=814 nice=-2 sched=0/0 cgrp=apps handle=1074389380 | state=D schedstat=( 22048890928 19526803112 32612 ) utm=1670 stm=534 core=0 (native backtrace unavailable) at android.hardware.SystemSensorManager$BaseEventQueue.nativeDisableSensor(Native Method) at android.hardware.SystemSensorManager$BaseEventQueue.disableSensor(SystemSensorManager.java:399) at android.hardware.SystemSensorManager$BaseEventQueue.removeAllSensors(SystemSensorManager.java:325) at android.hardware.SystemSensorManager.unregisterListenerImpl(SystemSensorManager.java:194) at android.hardware.SensorManager.unregisterListener(SensorManager.java:560) at com.android.internal.policy.impl.WindowOrientationListener.disable(WindowOrientationListener.java:139) at com.android.internal.policy.impl.PhoneWindowManager.updateOrientationListenerLp(PhoneWindowManager.java:774) at com.android.internal.policy.impl.PhoneWindowManager.screenTurnedOff(PhoneWindowManager.java:4897) at com.android.server.power.Notifier.sendGoToSleepBroadcast(Notifier.java:518) at com.android.server.power.Notifier.sendNextBroadcast(Notifier.java:434) at com.android.server.power.Notifier.access$500(Notifier.java:63) at com.android.server.power.Notifier$NotifierHandler.handleMessage(Notifier.java:584) at android.os.Handler.dispatchMessage(Handler.java:110) at android.os.Looper.loop(Looper.java:193) at com.android.server.ServerThread.initAndLoop(SystemServer.java:1436) at com.android.server.SystemServer.main(SystemServer.java:1531) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:515) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:824) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:640) at dalvik.system.NativeStart.main(Native Method)
===>
KERNEL SPACE BACKTRACE, sysTid=814
[] 0xffffffff from [ ] __schedule+0x3fc/0x950 [ ] __schedule+0xc/0x950 from [ ] schedule+0x40/0x80 [ ] schedule+0xc/0x80 from [ ] schedule_preempt_disabled+0x20/0x2c [ ] schedule_preempt_disabled+0xc/0x2c from [ ] mutex_lock_nested+0x1b8/0x560 [ ] mutex_lock_nested+0xc/0x560 from [ ] gsensor_operate+0x1bc/0x2c0 [ ] gsensor_operate+0xc/0x2c0 from [ ] hwmsen_enable+0xa8/0x30c [ ] hwmsen_enable+0xc/0x30c from [ ] hwmsen_unlocked_ioctl+0x2fc/0x528 [ ] hwmsen_unlocked_ioctl+0xc/0x528 from [ ] do_vfs_ioctl+0x94/0x5bc [ ] do_vfs_ioctl+0xc/0x5bc from [ ] sys_ioctl+0x7c/0x8c [ ] sys_ioctl+0xc/0x8c from [ ] ret_fast_syscall+0x0/0x40 [ ] from [ ] 編輯:黃飛
-
傳感器
+關(guān)注
關(guān)注
2548文章
50664瀏覽量
751945 -
cpu
+關(guān)注
關(guān)注
68文章
10824瀏覽量
211138 -
觸摸屏
+關(guān)注
關(guān)注
42文章
2290瀏覽量
115991 -
JAVA
+關(guān)注
關(guān)注
19文章
2957瀏覽量
104544 -
死機(jī)
+關(guān)注
關(guān)注
0文章
17瀏覽量
8592
原文標(biāo)題:八、幾種典型的異常情況
文章出處:【微信號(hào):哆啦安全,微信公眾號(hào):哆啦安全】歡迎添加關(guān)注!文章轉(zhuǎn)載請(qǐng)注明出處。
發(fā)布評(píng)論請(qǐng)先 登錄
相關(guān)推薦
評(píng)論