Bandwidth Extension Techniques of Bone Conducted Speech

Originating from bone and tissue conduction, the quality of BC speech must be enhanced by artificially extending its frequency bandwidth int the higher frequencies (2-4 kHz). There are many different techniques that could be used to extend the bandwidth of BC speech. A survey of these techniques is presented by Shin et al. (2012). In general there are three main approaches to the enhancement of BC speech: equalization, analysis and synthesis, and probabilistic. The main contributions for each of these approaches are discussed in the following three sections.

Speech Bandwidth Extension using Equalization Approach 

The equalization approach is a simple way of extending the bandwidth of BC speech. First presented by Shimamura and Tamiya (2005), this approach involved obtaining the long-term spectra of both the BC and Air Conduction (AC) speech and finding an equalization filter based on the ratio of the two long-term spectra. Next, the BC speech is filtered using the reconstruction filter and enhanced using a reinforced spectral subtraction technique (Shimamura and Tamiya, 2005; Ogata and Shimamura, 2001). Results of this technique showed an overall improvement of the BC speech, however, these results were not consistent and varied with each speaker as well as the filter length. Kondo et al. (2006) built up on this technique and enhanced it by using a speaker-dependent short-term Fast Fourier Transform (FFT) for equalization. This technique is speaker dependent and required extensive training and smoothing. Although the results of this work were an improvement on the work by Shimamura and Tamiya (2005), the enhanced BC speech low-energy speech regions and silent regions were overly emphasized which affected the perceived quality of the speech (Kondo et al., 2006). Finally, Shimamura et al. (2006) improved this approach by proposing a speaker dependent neural network based approach involving a normalized least-mean square adaptive filter. In summary, equalization approaches are simple, however, they are not robust to any leakage noise in the BC microphone, resulting from flanking pathways, and their speaker dependent nature is not practical.

Speech Bandwidth Extension using Analysis and Synthesis Approach 

In this approach an inverse speech transfer function between the AC and BC speech is obtained using either both the AC and BC speech (Yu et al., 2005; Tat Vu et al., 2008) or only the BC speech (Rahman and Shimamura, 2011) to reconstruct the BC speech. Yu et al. (2005) first introduced this approach by using the linear-predictive coding (LPC) coefficients to filter and extend the bandwidth of BC speech. However, LPC coefficients are susceptible to quantization noise and were thus replaced with line spectral frequencies (LSF) by Tat Vu et al. (2008). Thus far, the BWE techniques to improve the quality of BC speech all utilized an AC speech source as well. Rahman and Shimamura (2011) introduced a blind restoration technique that depended only on the BC speech. Nonetheless, speech distortion is introduced in this technique when the LPC filter is designed from mismatched LSF coefficients caused by BC channel noise or physiological noise such as teeth clack. In general, the analysis and synthesis approach is not very useful in practical application as it is not robust to BC channel noise or physiological noise (Shin et al., 2012).

Speech Bandwidth Extension using Probabilistic Approach

To address the issues caused by noise, probabilistic approaches were introduced. These approaches estimated the transfer function between the BC and AC speech by utilizing a maximum likelihood estimation (Liu et al., 2004). Liu et al. (2005) enhanced this technique by estimating the BC leakage noise, the background noise, the AC speech, the BC speech and any physiological noise as Gaussian distributions. The enhanced speech was a weighted sum of the AC speech and the noise reduced BC speech. This approach is advantageous because it does not require any pre-training and is speaker independent yet it still requires an AC microphone and does not use any speech model. A new probabilistic approach that utilized Gaussian Mixture Models (GMM) to model the speech was presented by Subramanya et al. (2008). Although this technique showed improvements from past techniques it requires access to an AC microphone, training from multiple speakers, and significantly large databases.

More recent techniques have been proposed with promising results. Huang et al. (2014) used function link artificial neural networks (FLANN) to denoise and extend the bandwidth of BC speech. However, it requires training the neural network with clean AC speech data. Li et al. (2014) proposed a technique that uses geometric harmonics along with a Laplacian pyramid to denoise and enhance the BC speech. This technique introduces distortion and is computationally complex, making it unsuitable when considering constraints of real-time processing on an embedded hardware with limited resources.

Current forms of BWE techniques for BC speech either require large amount of training, require the use of an AC microphone, are speaker dependent or are computationally exhaustive. A simple speaker independent technique that requires no AC microphone nor training would be practical and applicable in a real life setting.

Vocal Effort

Talkers adjust their speech level in the presence of noise (Lane and Tranel, 1971), with varying talker-to-listener distance (Fux et al., 2011), and to express emotion (Schröder, 2001). This work focuses on changes in vocal effort as a function of noise and talker-to-listener distance. These changes in vocal effort are governed by talkers’ perception of their own voice (Tufts and Frank, 2003).

(a) Direct air conduction: sound travels from the talker’s mouth to the ear through propagation in the open air.
(b) Bone conduction: sound transmitted through bone and tissue conduction inside the skull. Direct stimulation of the cochlea can occur through vibrations of the skull vibrating the cochlear fluid or indirect stimulation can occur through excitation of the air entrapped in the ear canal vibrating the eardrum resulting in a direct stimulation the cochlea.
(c) Indirect air conduction: sound travels from the talker’s mouth then reflects off of surfaces around the talker traveling back to the talker’s ear .

This feedback mechanism is referred to as the audio-phonation loop (Garnier et al., 2010).

Table des matières

INTRODUCTION
0.1 Context
0.2 Problem Statement
0.3 Background
0.4 Objectives
0.5 Structure
CHAPTER 1 LITERATURE REVIEW
1.1 Hearing Protection Devices
1.2 Communication in Noise: existing tools and techniques
1.3 Bandwidth Extension Techniques of Bone Conducted Speech
1.3.1 Speech Bandwidth Extension using Equalization Approach
1.3.2 Speech Bandwidth Extension using Analysis and Synthesis
Approach
1.3.3 Speech Bandwidth Extension using Probabilistic Approach
1.4 Vocal Effort
1.4.1 Open Ear
1.4.2 Occluded Ear
1.4.3 Vocal Effort in Noise
1.4.4 In Noise, Open Ears
1.4.5 In Noise, Occluded Ears
1.4.6 Vocal Effort With Varying Distance
CHAPTER 2 IMPROVING THE QUALITY OF IN-EAR MICROPHONE
SPEECH VIA ADAPTIVE FILTERING AND ARTIFICIAL
BANDWIDTH EXTENSION
2.1 Introduction
2.2 Methods and Materials
2.2.1 Speech Corpus
2.2.2 Predicted Quality
2.2.3 IEM Noise Reduction
2.2.4 IEM Bandwidth Extension
2.2.5 Performance Evaluation
2.3 Results
2.3.1 Pre-Enhancement Objective Quality Assessment
2.3.2 IEM Speech Enhancement
2.3.3 Performance Evaluation
2.4 Discussion
2.5 Conclusions
CHAPTER 3 VARIATIONS IN VOICE LEVEL AND FUNDAMENTAL
FREQUENCY WITH CHANGING BACKGROUND NOISE
LEVEL AND TALKER-TO-LISTENER DISTANCE WHILE
WEARING HEARING PROTECTORS: A PILOT STUDY
3.1 Introduction
3.2 Method
3.2.1 Apparatus
3.2.2 Participants
3.2.3 Task
3.2.4 Conditions
3.2.5 Procedure
3.2.5.1 Measurement of individual earplug transfer function
3.2.5.2 Assessment of well-fitted earplug
3.2.5.3 Adjustment of the background noise level
3.2.5.4 Analysis
3.3 Results
3.4 Discussion
3.5 Conclusions
CHAPTER 4 MODELING SPEECH LEVEL AS A FUNCTION OF BACKGROUND
NOISE LEVEL AND TALKER-TO-LISTENER DISTANCE
FOR TALKERS WEARING HEARING PROTECTION DEVICES
4.1 Introduction
4.2 Methods and Materials
4.2.1 Experimental Setup
4.2.2 Model Fitting
4.3 Results
4.4 Discussions
4.5 Conclusions
CHAPTER 5 CONCLUSION

Cours gratuitTélécharger le document complet

 

Télécharger aussi :

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *