![]() In the proposed method, the magnitude spectra are smoothed by processing through a data-adaptive single-pole filter (DA-SPF) before computation of Mel frequency cepstral coefficients (MFCCs) to filter out the high-frequency components, which are mainly due to the pitch periodicity. ![]() This paper proposes a simple and effective data-adaptive smoothing approach to suppress the pitch and environment-induced mismatches in keyword spotting (KWS) systems. Results suggest that our proposed model better represents the signal over various domains and leads to better formant frequency tracking and estimation. An advantage of our model is that it is based on heatmaps that generate a probability distribution over formant predictions. Then, multiple decoders further process this representation, each responsible for predicting a different formant while considering the lower formant predictions. ![]() Our proposed model is composed of a shared encoder that gets as input a spectrogram and outputs a domain-invariant representation. The contribution of this paper is to propose a new network architecture that performs well on a variety of different speaker and speech domains. However, when presented with a speech from a different domain than that in which they have been trained on, these methods exhibit a decline in performance, limiting their usage as generic tools. Recent work has been shown that those frequencies can accurately be estimated using deep learning techniques. Formants are the spectral maxima that result from acoustic resonances of the human vocal tract, and their accurate estimation is among the most fundamental speech processing problems.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |