電子發燒友網>電子資料下載>電子資料>TinyML：使用ChatGPT和合成數據檢測嬰兒哭聲

TinyML：使用ChatGPT和合成數據檢測嬰兒哭聲

2388746 2023-07-13 | zip | 0.00 MB | 次下載 | 免費

資料介紹

描述

TinyML 是機器學習的一個領域，專注于將人工智能的力量帶給低功耗設備。該技術對于需要實時處理的應用程序特別有用。在機器學習領域，目前在定位和收集數據集方面存在挑戰。然而，使用合成數據可以以一種既具有成本效益又具有適應性的方式訓練 ML 模型，從而消除了對大量真實世界數據的需求。

在此項目中，我將向您展示如何通過使用Edge Impulse平臺訓練模型來創建嬰兒哭聲檢測系統，并將其部署到您的邊緣設備（例如Arduino Nicla Voice）。通過使用合成數據訓練機器學習模型，我們可以區分嬰兒哭聲的發生或背景噪音的存在。

這是即將發生的事情的先睹為快：

將數據集收集到Edge Impulse，以使用AduioLDM：Text-to-Audio和ChatGPT技術訓練模型。
使用Edge Impulse訓練模型。
導出模型供netron.app分析
將您的模型部署到Arduino Nicla Voice。
使用 Arduino IDE 進行實時數據評估和測試。

Baby Cry 系統部署管道

該圖包含部署機器學習模型以檢測兩種情況所涉及的幾個組件和步驟：嬰兒哭聲和背景噪音，使用 ChatGPT 生成文本提示。

以下是管道圖中組件及其交互的逐步分解：

ChatGPT ：ChatGPT 是管道的起點。它為兩種情況生成文本提示：嬰兒哭聲和背景噪音。
文本到音頻轉換：生成文本提示后，我們將它們發送到將文本轉換為音頻的模塊。該模塊創建與兩種情況的提示相對應的音頻文件。
模型訓練：生成的音頻文件上傳到Edge Impulse SaaS平臺。這是一個基于云的平臺，提供用于為微控制器等邊緣設備開發、訓練和部署機器學習模型的工具。
模型部署：訓練完成后，將機器學習模型部署到Arduino Nicla Voice開發板上。這些開發板專為構建可處理音頻和執行機器學習任務的智能語音設備而設計。
推論：部署后，機器學習模型可以處理來自麥克風的實時音頻輸入。該模型可以檢測輸入音頻是否代表嬰兒哭聲或背景噪音。

潛在地，機器學習模型的輸出可用于觸發動作，例如打開燈或向智能手機發送通知。

Arduino Nicla語音開發板概述

Arduino Nicla Voice是與Syntiant合作創建的開發板。通過使用 Syntiant 的超低功耗深度學習處理器，該板能夠在邊緣提供永遠在線的語音、手勢和動作識別。

1 / 2 ? Arduino Nicla 語音開發板

憑借其緊湊的尺寸，Nicla Voice 可以集成到可穿戴設備中，允許 AI 集成，同時需要最少的能量消耗。通過使用 Nicla Voice，您可以開發定制的語音識別模型并將它們與開發板一起使用，從而使 Nicla Voice 能夠通過分析您的聲音來識別特定的單詞或短語。

讓我們開始吧！

使用 ChatGPT 生成文本提示

使用ChatGPT生成不同的提示可以簡化為我的機器學習模型編寫提示的過程，該模型由兩類組成：嬰兒哭聲和背景噪音。通過使用ChatGPT生成不同的提示，我可以節省時間和精力，否則這些時間和精力將花費在集思廣益和編寫提示上。這種方法還可以產生范圍更廣的多樣化提示，從而可以提高機器學習模型的準確性和有效性。

這是使用 ChatGPT 生成的 Baby crying 場景的我的文本提示。

prompts = [
"Baby Crying",
"Baby crying in bedroom",
"Baby crying loudly",
"Infant crying",
"Newborn crying",
"Crying baby",
"Upset baby",
"Distressed baby",
"Fussy baby",
"Weeping infant",
"Sobbing baby",
"Whimpering baby",
"Wailing baby",
"Bawling baby",
"Crying newborn",
"Tearful baby",
"Bawling infant",
"Mourning baby",
"Bellowing baby",
"Screaming baby",
"Howling baby",
"Squalling baby",
"Yowling baby",
"Crying baby in nursery",
"Wailing infant in bedroom",
"Whimpering baby in crib",
"Sobbing baby in bassinet",
"Crying baby in the dark",
"Upset baby in bed",
"Distressed baby in room",
"Fussy baby in cradle",
"Weeping infant in playpen",
"Sobbing baby in the corner",
"Whimpering baby in the closet",
"Wailing baby in the crib",
"Bawling baby in the nursery",
"Crying newborn in the bedroom",
"Tearful baby in the playroom",
"Bawling infant in the den",
"Mourning baby in the living room",
"Bellowing baby in the kitchen",
"Screaming baby in the bathroom",
"Howling baby in the hallway",
"Squalling baby in the dining room",
"Yowling baby in the family room",
"Crying baby in the middle of the night",
"Wailing infant in the early morning",
"Whimpering baby during naptime",
"Sobbing baby during mealtime",
"Crying baby during bathtime",
"Upset baby during diaper change",
"Distressed baby during playtime",
"Fussy baby during bedtime",
"Weeping infant during storytime",
"Sobbing baby during teething",
"Whimpering baby during vaccination",
"Wailing baby during check-up",
"Bawling baby during colic",
"Crying newborn during feeding",
"Tearful baby during immunization",
"Bawling infant during growth spurt",
"Mourning baby during illness",
"Bellowing baby during teething",
"Screaming baby during reflux",
"Howling baby during ear infection",
"Squalling baby during constipation",
"Yowling baby during sleep regression",
"Crying baby during travel",
"Wailing infant during car ride",
"Whimpering baby during flight",
"Sobbing baby during road trip",
"Crying baby during vacation",
"Upset baby during change of environment",
"Distressed baby during new experiences",
"Fussy baby during unfamiliar situations",
"Weeping infant during loud noises",
"Sobbing baby during separation anxiety",
"Whimpering baby during stranger danger",
"Wailing baby during socialization",
"Bawling baby during weaning",
"Crying newborn during swaddling",
"Tearful baby during bath",
"Bawling infant during burping",
"Mourning baby during pacifier weaning",
"Bellowing baby during crawling",
"Screaming baby during walking",
]

此外，使用像 ChatGPT 這樣的語言模型可以幫助我提出我可能想不到的有創意和創新的提示。

這些是背景噪音提示。

prompts = [
"A hammer is hitting a wooden surface",
"A noise of nature",
"The sound of waves crashing on the shore",
"A thunderstorm in the distance",
"Traffic noise on a busy street",
"The hum of an air conditioning unit",
"Birds chirping in the morning",
"The sound of a train passing by",
"A group of people talking in a crowded room",
"The sound of raindrops hitting a tin roof",
"The buzz of a fluorescent light",
"The sound of footsteps on a wooden floor",
"The crackling of a campfire",
"The whirring of a ceiling fan",
"The sound of a basketball bouncing on concrete",
"A dog barking in the distance",
"The rustling of leaves in the wind",
"The buzzing of a bee or other insect",
"The sound of a church bell ringing",
"The roar of a waterfall",
"The tapping of a keyboard",
"The hiss of a steam engine",
"The clanging of pots and pans in a kitchen",
"The sound of a roaring fire in a fireplace",
"The hum of an electric generator",
"The sound of a lawnmower in the distance",
"The whistling of wind through a window crack",
"The clatter of dishes in a busy restaurant",
"The sound of a helicopter flying overhead",
"The tapping of rain on a metal roof",
"The gentle rustling of a book's pages turning",
"The creaking of a wooden chair",
"The sound of a pencil scratching on paper",
"The chirping of crickets at night",
"The crackling of a vinyl record playing",
"The hissing of an old radio",
"The sound of a pencil sharpener grinding",
"The gurgling of a coffee maker",
"The sound of a ticking clock",
"The roar of an airplane engine",
"The bubbling of a fish tank filter",
"The clanking of dishes being washed in a sink",
"The sound of a typewriter clacking",
"The roar of a lion in the wild",
"The whirring of a drone flying overhead",
"The beeping of a car horn in traffic",
"The sound of a door creaking open",
"The buzzing of a mosquito in the room",
"The sound of a blender mixing ingredients",
"The rumbling of a thunderstorm overhead",
"The tapping of a woodpecker on a tree trunk",
"The rustling of paper being shuffled",
"The sound of a busy office with people talking on the phone and typing on their keyboards",
"The sound of a construction site with heavy machinery and drilling",
"The sound of a dishwasher running in the kitchen",
"The chirping of birds in a forest",
"The sound of a police siren in the distance",
"The whistling of wind through tall grass",
"The sound of a cash register in a busy store",
"The buzzing of a fly or bee flying around",
"The sound of a bicycle bell ringing",
"The crackling of a fire in a fireplace"
]

這就是數據集生成的全部內容！

安裝 AudioLDM:Text-to-Audio 用于數據集生成

要從文本生成音頻文件，下一步涉及使用名為AudioLDM的文本到音頻生成工具，該工具由薩里大學和英國倫敦帝國理工學院的研究人員開發。該工具利用潛在擴散模型從文本生成高質量音頻。要使用 AudioLDM，您需要一臺配備強大 CPU 的獨立計算機。雖然建議使用專用 GPU，但這不是強制性的。要測試 AudioLDM 的功能，您可以通過Hugging Face在線試用。

我們將配置我們的 Python 環境。為了管理虛擬環境，我們將使用virtualenv ，它可以像下面這樣安裝：

sudo pip3 install virtualenv virtualenvwrapper

為了讓 virtualenv 工作，我們需要將以下行添加到~/.bashrc文件中：

nano ~/.bashrc

并添加以下行

# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh

要激活更改，必須執行以下命令：

source ~/.bashrc

現在我們可以使用 mkvirtualenv 命令創建一個虛擬環境。

mkvirtualenv audioldm -p python

使用 pip 安裝 PyTorch。

pip3 install torch==2.0.0

然后安裝audioldm包。

pip3 install audioldm

然后運行以下命令以使用文本提示生成音頻文件，該文件是使用 ChatGPT 生成的，可以在下面的 github 代碼部分中找到。

python3 generate.py

您應該得到以下輸出：

genereated: A hammer is hitting a wooden surface
genereated: A noise of nature
genereated: The sound of waves crashing on the shore
genereated: A thunderstorm in the distance
genereated: Traffic noise on a busy street
genereated: The hum of an air conditioning unit
genereated: Birds chirping in the morning
genereated: The sound of a train passing by

一旦收集到 wav 音頻樣本，就可以將它們輸入神經網絡以啟動自動檢測嬰兒是否在哭泣或是否存在背景噪音的訓練過程。

使用 Edge Impulse 平臺進行模型訓練

Edge Impulse 是一種基于 Web 的工具，可幫助我們快速輕松地創建可用于各種項目的 AI 模型。我們可以通過幾個簡單的步驟創建機器學習模型，用戶只需一個網絡瀏覽器就可以構建自定義圖像分類器。

轉到Arduino 云平臺，在登錄處輸入您的憑據（或創建一個帳戶），然后開始一個新項目。

下載Google Speech Commands Dataset以從中獲取“背景噪聲類”數據。可以按如下方式下載數據集。

wget http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz

從Google Speech Commands Dataset上傳合成 wav 音頻文件和“背景噪音類” 。就我而言，我上傳了大約 500 個 wav 文件。如果需要，您還可以通過標記文件并在數據采集中上傳并重新訓練模型來添加更多文件。

一旦你設置了所有的類并且對你的數據集感到滿意，就可以訓練模型了。在左側導航菜單中導航至 Create Impulse。

選擇Add a processing block并添加Audio (Syntiant) ，因為它非常適合基于 Syntiant NDP120 的開發板。它會嘗試將音頻轉換成某種基于時間和頻率特征的特征，這將有助于我們進行分類。然后選擇添加學習塊并添加具有兩個輸出類的分類。

然后導航到 Syntiant。在 Syntiant 下，我們將保留默認參數。單擊保存參數。

最后，單擊生成功能按鈕。您應該會得到如下所示的響應。

按“開始訓練”按鈕訓練模型。此過程可能需要大約 5-10 分鐘，具體取決于您的數據集大小。如果一切正常，您應該會在 Edge Impulse 中看到以下內容

我們得到了 90.7% 的驗證準確率。你不應該從你的訓練數據集中獲得 100% 的準確率，因為它可以被認為是過度擬合的模型。任何大于 70% 的值都是出色的模型性能。增加訓練時期的數量可能會增加這個準確度分數。

.tflite文件是我們的模型。最終的量化模型文件 (int8) 大小約為5KB ，準確??率接近 90%。

查看模型架構及其輸入和輸出格式和形狀總是很有趣。您可以使用像Netron這樣的程序來查看神經網絡。

單擊 serving_default_x:0：我們觀察到輸入的類型為 int8，大小為 [1, 1600]。現在讓我們看看輸出：我們有 2 個類，所以我們看到輸出形狀是 [1, 2]。量化過程會降低模型的性能，因為從 32 位浮點到 8 位整數表示意味著精度損失。

完成模型構建后，請轉到“部署”部分并將其部署到其中一個受支持的邊緣設備上。ML 模型部署是將經過訓練和測試的 ML 模型放入邊緣設備等生產環境中的過程，在這里它可以用于其預期目的。

轉到 Edge Impulse 的“部署”選項卡。單擊您的邊緣設備固件類型。在這里，它是 Arduino Nicla 語音。

您可能會看到以下日志消息：

Total Parameter Memory: 1.375 KB out of 640.0 KB on the NDP120_B0 device.                            | | Estimated Model Energy/Inference at 0.9V: 5.55404 (uJ)