緒論······································································· 1
第1章 信號處理··············································· 7
1.1 數字和模擬頻率··········································· 7
1.2 離散傅裏葉變換···········································8
1.2.1 實數DFT ······································ 9
1.2.2 復數DFT ···································· 10
1.2.3 負頻分量····································· 10
1.2.4 DFT變換性質···························· 10
1.3 FFT···························································· 11
1.3.1 FFT 結果舉例····························· 12
1.3.2 實信號FFT································· 13
1.3.3 短時傅裏葉變換························· 14
1.3.4 STFT語音窗函數選擇··············· 14
1.4 重疊相加法和重疊保留法·························· 16
1.4.1 OLA············································· 17
1.4.2 OLS ············································· 19
1.5 加權重疊相加法········································· 21
1.5.1 WOLA 計算過程························ 22
1.5.2 WOLA 窗函數選擇···················· 22
1.6 濾波器組···················································· 23
1.7 語音預加重····································· 27
1.8 高斯分布···················································· 27
1.8.1 單高斯分布································· 27
1.8.2 多維高斯分布····························· 29
1.9 HMM模型················································· 31
1.10 卡爾曼濾波·············································· 32
本章小結······························································ 33
參考文獻······························································ 33
第2章 發音機理和器件································ 34
2.1 語音的産生和接收········································· 34
2.1.1 語音産生機理····························· 34
2.1.2 發聲模型····································· 36
2.1.3 發音單位····································· 36
2.1.4 發音分類····································· 37
2.1.5 聲音接收····································· 37
2.1.6 聲音傳播····································· 38
2.2 揚聲器························································ 38
2.2.1 電學性能····································· 38
2.2.2 聲學性能····································· 39
2.2.3 底噪············································· 40
2.2.4 頻響特性····································· 41
2.2.5 THD+N POUT···························· 41
2.2.6 電壓(功率)和失真················· 42
2.3 麥剋風························································ 42
2.3.1 麥剋風性能指標························· 42
2.3.2 麥剋風的選擇····························· 43
2.4 結構設計····················································45
2.4.1 揚聲器相關音腔設計················· 45
2.4.2 麥剋風和揚聲器························· 45
2.5 音頻設備···················································· 46
2.5.1 聽音設備····································· 46
2.5.2 聲場錶現力································· 47
2.5.3 發聲設備····································· 48
2.5.4 消聲室測試································· 48
2.6 聲學測試···················································· 49
2.6.1 聲學音量····································· 50
2.6.2 失真度THD································ 50
2.6.3 頻響混疊····································· 51
2.6.4 麥剋風陣列一緻性····················· 53
2.6.5 AEC參考通路···························· 54
2.6.6 揚聲器鏡頻································· 56
2.6.7 揚聲器最大幅度下的THD········ 57
本章小結······························································ 58
參考文獻······························································ 58
第3章 語音端點檢測····································· 59
3.1 特徵選取···················································· 59
3.2 判決準則···················································· 61
3.2.1 門限············································· 61
3.2.2 統計模型法································· 61
3.2.3 機器學習法································· 62
3.3 VAD 實例·················································· 63
3.3.1 高斯分布····································· 63
3.3.2 算法流程····································· 63
3.3.3 計算流程····································· 68
3.4 語音/非語音幀的初始參數························· 75
3.4.1 模型參數計算····························· 75
3.4.2 高斯混閤模型····························· 76
3.4.3 EM算法······································ 76
本章小結······························································ 78
參考文獻······························································ 78
第4章 單通道降噪········································· 79
4.1 譜減法························································ 79
4.1.1 譜減法原理································· 79
4.1.2 譜減法實現································· 81
4.1.3 音樂噪聲控製····························· 83
4.1.4 濾波法········································· 83
4.2 維納濾波···················································· 84
4.3 子空間降噪················································ 86
4.4 WebRTC 單通道降噪實現······················· 87
4.4.1 算法原理····································· 87
4.4.2 算法初始化································· 88
4.4.3 信噪比計算:ComputeSnr ········ 90
4.4.4 語音噪聲概率計算····················· 91
4.4.5 特徵選取····································· 94
4.4.6 平坦度計算································· 96
4.4.7 噪聲估計更新函數:
UpdateNoiseEstimate················ 97
4.4.8 消除噪聲····································· 98
4.4.9 信號閤成····································· 99
4.4.10 仿真結果··································· 99
4.5 深度學習降噪········································· 101
本章小結···························································· 104
參考文獻···························································· 105
第5章 聲學迴聲消除·································· 106
5.1 迴聲消除原理·········································· 106
5.2 自適應濾波器·········································· 108
5.2.1 維納濾波器······························· 108
5.2.2 LMS算法································· 109
5.2.3 NLMS算法······························· 110
5.2.4 PBFDAF 算法··························· 111
5.3 WebRTC 迴聲消除算法························ 113
5.3.1 延遲估計··································· 113
5.3.2 自適應濾波······························· 114
5.3.3 非綫性處理(NLP)··············· 117
5.3.4 MATLAB代碼解讀················· 118
5.3.5 仿真實驗··································· 127
5.4 Speex 迴聲消除算法······························ 128
5.4.1 變步長計算······························· 129
5.4.2 雙綫性濾波器及預處理··········· 130
5.4.3 MATLAB代碼解讀················· 132
5.4.4 算法流程示意圖······················· 141
5.4.5 仿真實驗··································· 144
本章小結···························································· 146
參考文獻···························································· 146
第6章 聲源定位··········································· 147
6.1 GCC算法······················ 147
6.2 SRP-PHAT算法··································· 149
6.3 MUSIC算法············································ 150
6.4 TOPS 算法·············································· 152
6.5 FRIDA算法············································· 154
6.6 後處理抗噪·············································· 155
6.6.1 統計方法··································· 155
6.6.2 卡爾曼方法······························· 156
6.6.3 聲源定位建模··························· 158
6.6.4 粒子濾波法······························· 160
本章小結···························································· 160
參考文獻···························································· 161
第7章 波束形成技術··································· 162
7.1 麥剋風陣列·············································· 163
7.1.1 麥剋風數量和間距··················· 163
7.1.2 空域混疊··································· 165
7.1.3 波束形成指標··························· 165
7.1.4 噪聲場······································· 166
7.1.5 聲輻射······································· 167
7.2 常見波束形成方法··································· 168
7.2.1 延遲和波束形成方法··············· 168
7.2.2 濾波和波束形成方法··············· 169
7.2.3 恒定寬度波束形成方法··········· 169
7.2.4 超分辨波束形成方法··············· 170
7.2.5 廣義旁瓣相消波束形成方法··· 171
7.2.6 最小方差信號無畸變響應波束形成方法················· 172
7.3 WebRTC 波束形成實例························ 174
7.3.1 編譯測試文件··························· 174
7.3.2 測試文件處理流程··················· 175
7.3.3 測試命令··································· 176
7.3.4 算法的基本思想······················· 176
7.3.5 測試源碼··································· 178
7.3.6 算法處理流程··························· 181
7.3.7 權重計算函數··························· 185
7.3.8 權重相乘操作··························· 186
7.4 後置濾波(Post-filtering) ·················· 187
7.4.1 MMSE後置濾波······················ 189
7.4.2 Zelinski 後置濾波····················· 190
7.4.3 mccowan後置濾波·················· 191
7.4.4 STSA後置濾波························ 192
本章小結···························································· 193
參考文獻···························································· 194
第8章 盲源分離··········································· 196
8.1 基本概念及數學預備知識······················· 196
8.1.1 ICA基本概念··························· 196
8.1.2 梯度和最優化方法··················· 197
8.2 盲語音分離預處理——PCA··················· 199
8.3 頻域獨立成分分析法——FDICA··········· 200
8.3.1 頻域ICA··································· 200
8.3.2 去相關估計方法······················· 200
8.3.3 不確定性問題··························· 201
8.4 後置濾波處理··········································· 205
8.4.1 噪聲估計··································· 205
8.4.2 衰減因子計算··························· 206
8.5 GSC 與ICA聯閤估計···························· 209
8.5.1 峭度··········································· 209
8.5.2 經典GSC·································· 210
8.5.3 動態權重嚮量估計··················· 210
本章小結···························································· 212
參考文獻···························································· 213
第9章 音效處理··········································· 214
9.1 聲道的分類·············································· 214
9.1.1 單聲道······································· 214
9.1.2 雙聲道······································· 215
9.1.3 立體聲······································· 215
9.1.4 多聲道······································· 215
9.1.5 全景聲······································· 216
9.2 後端音效處理··········································· 217
本章小結···························································· 226
參考文獻···························································· 226
第10章 語音編/解碼··································· 227
10.1 LPC 編碼·············································· 230
10.2 SILK編/解碼········································· 231
10.2.1 編碼參數································· 232
10.2.2 編碼器····································· 234
10.2.3 解碼器····································· 239
10.3 opus 編/解碼概覽································· 239
10.3.1 opus 解碼································ 242
10.3.2 opus 編碼································ 243
10.3.3 opus 語音/音樂檢測·············· 244
10.4 語音質量評估········································ 247
10.4.1 主觀測試································· 248
10.4.2 客觀測試································· 248
10.4.3 無參考質量評估····················· 249
本章小結···························································· 249
參考文獻···························································· 249
第11章 語音網絡傳輸································ 251
11.1 擁塞控製················································ 252
11.1.1 GoogleCC擁塞控製··············· 255
11.1.2 基於PCC的擁塞控製··········· 260
11.1.3 基於BBR 的擁塞控製··········· 264
11.2 NetEQ ·················································· 266
11.2.1 NetEQ原理····························· 266
11.2.2 抖動和收包····························· 268
11.2.3 NetEQ代碼框架····················· 269
11.2.4 延遲計算································· 272
11.2.5 DSP 處理································ 274
11.2.6 變速不變調····························· 275
本章小結···························································· 277
參考文獻···························································· 277
第12章 語音喚醒········································ 278
12.1 語音喚醒技術簡介································· 278
12.2 特徵提取················································ 279
12.2.1 FBank ······································ 279
12.2.2 MFCC······································ 283
12.2.3 PCEN ······································ 284
12.3 模型結構················································ 284
12.3.1 DNN ········································ 284
12.3.2 CNN ········································ 286
12.3.3 CRNN······································ 287
12.3.4 DSCNN ··································· 288
12.3.5 子帶CNN ······························· 289
12.3.6 Attention·································· 290
12.4 計算加速················································ 292
12.4.1 硬件資源評估························· 292
12.4.2 加速方嚮································· 294
本章小結···························································· 299
參考文獻···························································· 299
第13章 語音識彆········································ 301
13.1 語音特徵提取········································ 303
13.1.1 MFCC特徵····························· 304
13.1.2 PLP 特徵································· 305
13.1.3 歸一化····································· 306
13.2 聲學模型················································ 306
13.2.1 高斯混閤模型························· 307
13.2.2 參數估計································· 307
13.2.3 隱馬爾科夫模型····················· 308
13.2.4 Baum-Welch法······················· 309
13.2.5 HMM識彆器·························· 309
13.3 語言模型················································ 310
13.3.1 N-gram語言模型··················· 311
13.3.2 加權有限狀態轉換機············· 312
13.4 YES 和NO識彆實例···························312
13.4.1 數據準備································· 312
13.4.2 數據預處理····························· 313
13.4.3 詞匯和發音詞典····················· 314
13.4.4 語言學模型····························· 315
13.4.5 特徵提取································· 319
13.4.6 聲學模型訓練························· 320
13.4.7 解碼和測試····························· 321
13.5 Kaldi 中文語音識彆······························321
13.5.1 數據集準備····························· 321
13.5.2 聲學模型訓練························· 322
13.5.3 安裝portaudio ························ 322
13.5.4 在綫識彆································· 323
13.6 DeepSpeech 語音識彆······················· 324
13.6.1 識彆建模································· 325
13.6.2 網絡組成································· 325
13.6.3 模型訓練和部署····················· 326
本章小結···························································· 330
參考文獻···························································· 330
附錄A 本書涉及的專業術語··························· 331
· · · · · · (
收起)