song (2022-10-31 12:02):
#paper Conditional Diffusion Probabilistic Model for Speech Enhancement, https://arxiv.org/abs/2202.05256# 一般的扩散模型在speech相关的task上表现并不优秀,原因是扩散模型假设所有的噪音是符合高斯分布的,而在speech任务中只有少量噪音的高斯噪音(白噪音)更多的是各种stationary和non-stationary noise。本文解决这一问题的方法是在reverse和diffuse过程中除了基于上一步的输出外,还基于一个带噪声语音,y,从每一步乘以一个高斯噪音变成乘以带噪声语音于当前步语音的差于高斯噪音的积。在这个过程中模型学到了带噪声语音(非高斯噪音)的特征。这个方法解决了非高斯分布数据使用扩散模型的问题。但语音增强问题有其特殊性,语音增强任务的数据集本身就带有干净语音和噪声语音,使这个任务较为适合这个方法,其他语音任务不一定会有干净语音作为输入。比如语音转换任务就没有大量目标语音作为干净语音输入,可以在此基础上再做研究
arXiv, 2022.
Conditional Diffusion Probabilistic Model for Speech Enhancement
翻译
Abstract:
Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes. More specifically, we propose a generalized formulation of the diffusion probabilistic model named conditional diffusion probabilistic model that, in its reverse process, can adapt to non-Gaussian real noises in the estimated speech signal. In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models, and investigate the generalization capability of our models to other datasets with noise characteristics unseen during training.
翻译
回到顶部