International Journal of Electrical and Computer Engineering (IJECE) Vol. No. August 2017, pp. ISSN: 2088-8708. DOI: 10. 11591/ijece. Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing Amart Sulong1. Teddy Surya Gunawan2. Othman O. Khalifa3. Mira Kartiwi4. Hassan Dao5 1,2,3 Department of Electrical and Computer Engineering. International Islamic University Malaysia. Malaysia Department of Information Systems. International Islamic University Malaysia. Malaysia Institute of Information Technology. University Kuala Lumpur. Malaysia Article Info ABSTRACT Article history: The speech enhancement algorithms are utilized to overcome multiple limitation factors in recent applications such as mobile phone and communication channel. The challenges focus on corrupted speech solution between noise reduction and signal distortion. We used a modified Wiener filter and compressive sensing (CS) to investigate and evaluate the improvement of speech quality. This new method adapted noise estimation and Wiener filter gain function in which to increase weight amplitude spectrum and improve mitigation of interested signals. The CS is then applied using the gradient projection for sparse reconstruction (GPSR) technique as a study system to empirically investigate the interactive effects of the corrupted noise and obtain better perceptual improvement aspects to listener fatigue with noiseless reduction conditions. The proposed algorithm shows an enhancement in testing performance evaluation of objective assessment tests outperform compared to other conventional algorithms at various noise type conditions of 0, 5, 10, 15 dB SNRs. Therefore, the proposed algorithm significantly achieved the speech quality improvement and efficiently obtained higher performance resulting in better noise reduction compare to other conventional algorithms. Received Jan 4, 2017 Revised May 31, 2017 Accepted Jun 14, 2017 Keyword: Compressive sensing PESQ PESQ improvement SNR Speech enhancement Wiener filter Copyright A 2017 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Teddy Surya Gunawan. Department of Electrical and Computer Engineering. International Islamic University Malaysia. Jalan Gombak, 53100 Kuala Lumpur. Malaysia. Email: tsgunawan@iium. INTRODUCTION In the modern era, the advancement in technology has brought about great benefits to human beings and their daily lives. Today innovation technologies, signal processing is one of the most powerful sources for modern designed engineering that capable of realizing various applications in their real implementations from theoretical aspect point of view to counterpart in different application areas. There are always a tradeoff between noise reduction and signal distortion. Most of research found that more noise reduction is always accompanied by more signal distortion . , . The main challenge of the speech enhancement process is to design effective algorithms to suppress the noise without introducing any possibility of perceptual distortion into the speech signal . , . Research and investigations on speech enhancement problem have been growing at a rapid rate that cover a broad spectrum of constrains, application, and issues. The challenging work for enhancing noisy speech is on single microphone and the speech problem that was degraded by the noise and remains widely open for investigation . , . Such problem is well known as single-channel speech enhancement and considered as the most difficult task . , . This is because of fact that the noise Journal homepage: http://iaesjournal. com/online/index. php/IJECE A ISSN: 2088-8708 and speech are perceived as within the same channel by assuming no access to reference noise where the improvement of the speech signal-to-noise ratio (SNR) is target of most techniques. Most of the speech enhancement techniques have concentrated principally on statistically uncorrelated and independent additive noise . , . However, the design of effective algorithms that can combat additive noise while producing high quality and improved speech signal is limited. Thus, the studies of additive noise in various types of applications and their related behavior are crucial endeavors. Most of the literatures focus on the difference of the noise sources in terms of temporal and spectral characteristics, and the range of the noise levels that may be encountered in real life . Many existing researches on speech enhancement have based relatively on samples of speech quality measurements which has made it impossible to carry out satisfactory studies. This aspect of study may suggest a better understanding of the related characteristics with a great number of the noisy speech date available for the speech at various dB SNR environments . Concerns have been expressed about speech enhancement approaches. However, there has been a few researches so far that seek possible solution to the speech enhancement based on compressive sensing (CS) technique. Consequently, the question remain whether it can achieve suitable high improvement in both its performance and quality. Thus, it may be useful to investigate and analyze this new approach of data acquisition which is known as compressive sensing (CS) technique . Its theory assert that one can recover certain signals from far fewer samples or measurements than conventional method that is based on the wellknown Shannon/Nyquist sampling theorem . , . In turn, new type of sampling theory can predict from the sparce signals and be constructed from what previously believed to be incomplete information . This method also provides efficient algorithm which can be used for perfect recovery of the sparse signal . Majority of researches in the CS techniques have been introduced in image processing to provide compressed version of the original image with noiseless distortion . , . This technique relies mainly on empirical observation that many signals can be well-approximated by sparse expression in terms of suitable basis . LITERATURE SURVEY OF SPEECH ENHANCEMENT Many literatures have been report . , . , . , . and mentioned a widely used single channel speech enhancement based on the short-time spectral magnitude (STSM). In real processing speech enhancement techniques, the algorithm employed a simple principle in which the spectrum of the clean speech estimation signal can be obtained by subtracting a noise estimation spectrum from the noisy speech spectrum conditions. In general, speech enhancement . , . was contaminated and degraded with additive noise. It is typically attacked by the background noise of uncorrelated speech. This signal was known as noisy speech and its spectrualrum can be expressed as follow. A s. A d . and Y (A, k ) A S (A, k ) A D(A, k ) . where y . , s. , and d . are noisy speech, clean speech, and additive noise respectively with n sample number of the discrete time signal. It is often computed on a frame-by-frame basis. The noisy speech is then calculated in the discrete time domain of the short-time Fourier transform (STFT) in which it is generally non-stationary in nature. Its noisy speech spectrum Y (A, k ) , clean speech spectrum S (A, k ) , and noise spectrum D(A, k ) are calculated depend on A and k ( A and k are denoted as frequency response and the frame number respectivel. For simplicity, the k term throughout the assumption of a frame segment are Hence the noisy speech power spectrum can be expressed as follows Y (A ) A S (A ) A D(A ) . The enhanced speech estimation in short-time magnitude SI (A ) can be obtain by subtracting a noise estimate during speech pause, which formulated as follow: EE Y (A ) A DI (A ) SI (A ) A E EE0 if Y (A ) A DI (A ) IJECE Vol. No. August 2017 : 1941 Ae 1951 IJECE ISSN: 2088-8708 The noise estimation spectrum DI (A ) is calculated from the averaging frames of the recent speech pauses: M A1 DI (A ) A YI (A ) M j A0 SP where M is the number of speech pauses in consecutive frames. Equation . is not taken into action when the background noise is stationary and coverage to optimal estimate of noise power spectrum. In addition. Equation . can also be consider as filter when its product of the noisy speech spectrum is represented as E E I (A ) EE E Y (A ) 2 A AN(A ) Y (A ) and AN(A ) A Emax 0, 1 A SI (A ) A E1 A 2 E Y (A ) EE Y (A ) E E E where AN(A ) is the gain function of spectral subtraction and also known as filter. This gain function AN(A ) is defined as the magnitude response of 0 C AN(A ) C 1 , therefore it is zero phase filter as shown AN(A ) in Equation . To synthesise results, the enhanced speech signal needs reconstruction. This phase is done by using the noisy phase as the clean speech estimation signal, due to insensitivity of the human auditory system . , . Subsequently, the enhanced speech in a frame is estimated and the clean speech estimation is then synthesis as sI. A IFFT SI (A ) e jA (Y (A )) . It synthesis will recovere speech estimation waveform by inverse A A Fast Fourier transforming IFFT (C) using an overlap and add method. Moreover, the subtractive-type algorithms can also be estimated using filter approach dependent on the noisy speechAos characteristics and on the noise estimation spectrum that can be expressed as SI (A ) A AN(A ) Y (A ) . This gain function AN(A ) combine the noise reduction of the proposed method . In extensive studied . reported that the gain improvement relatively used the parameters . A . A , and A respectivel. The following gain function is as follow: A1 A 2 EE1 A A E DI (A ) E E E E EE Y (A ) EE EE AN(A ) A E EE E DI (A ) E A 1 E E AE e EE Y (A ) EE EE E DI (A ) E , if E EE Y (A ) EE A AA The gain function from Equation . is the designed parameter to deal with the tradeoff in noise reduction, residual noise and speech distortion signal. These variation parameter can be described as the free parameter and can be described as follows: . Over-subtraction factor A (A A . : to avoid the attenuation of the spectrum more than necessary which leads to the reduced residual noise peaks even though the distortion to the speech signal increased . educing auditable distortio. Spectral flooring A . C A AA . : to reduce the background noise whereas the background noise is added but only remaining the minimum value of the background noise to be taken. Exponent A 1 and A 2 : to determine the sharpness of the transition from changing the gain function, by assigning AN(A ) A 1 . odified spectral componen. The modification of the exponent A 1 and A 2 parameters of the algorithm and its results are described as follow: in case of magnitude subtraction ( A 1 A 1 and A 2 A 1 ), in case of power spectral subtraction ( A 1 A 2 and A 2 A 0. 5 ) , and in case of Wiener filter ( A 1 A 2 and A 2 A 1 ) respectively. In . , it is mentioned the advantages of the spectral subtraction algorithms as follow. simple and only requiring noise estimation spectrum, and . variation of subtraction parameters with highly flexibility. Normally, it employs voice activity detection (VAD) in the form of statistical information of silence region. VAD performance degraded significantly at low signal to noise ration (SNR). However, difficulty emerged when background noise is nonstationary. Their shortcoming perceptually contains the remnant of unnatural noticeable to spectral artifacts known as musical noise in random frequencies. It correctly depends on precise Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing (Amart Sulon. A ISSN: 2088-8708 of noise estimation which is limited by the performance of speech/pause detectors. The algorithmAos improvement of using spectral over-subtraction is to minimize the inevitable noise and distortion . , . Beside that, the algorithm assigned A0 C A A AA 1 and AA C 1A in which to control the amount of power spectrum in noise subtraction from the power spectrum of the noisy speech in each speech frame . Its spectral floor parameters are used to prevent the cause of spectrum floor from going to below the preset minimum level rather that setting to zero. This algorithm depends on a posteriori segmental SNR and over subtraction factor can be calculate from Equation . I (A ) Y (A ) A A D SI (A ) A E EA DI (A ) Y (A ) I (A ) E A min A A 0 E E SNRmax E A A A . A A A 0 A ( SNR)E where A min A 1 . A max A 5 . SNRmin A A5 . SNRmax A 20 dB . A 0 , (A 0 C . A at 0 dB SNR . In this technique will uniform the noise effects spectrum to the speech and predict the subtraction factors that was subtracted noisy by over-estimate of noise factor spectrum. Speech distortion and remnant musical noise is balanced using the various combinations of over-subtraction factor A and spectral floor parameter A . This parameter is to avoid the trade of the amount of remnant noise and the level of perceived musical noise. parameter value A is large, it produces auditable noise due to a very little amount of remnant musical noise. If parameter value A is very small then the remnant noise greatly reduced but speech signal is quit annoying by the musical noise. Thus, the suitable design of its parameter value A is set following the Equation . and parameter A is set to 0. As such, the algorithm can reduce the level of perceived remnant musical noise while the remaining of the background noise is presented and distorted the enhanced speech signal. Many type of research also reported using other domain, e. signal subspace approach . , . It differs from the spectral subtraction by decomposing the noisy speech with Kahunen-Loeve-Transform (KLT) into subspace that occupied primarily by the clean speech vector space signal and noise vector space signal. This method used KLT instead of FFT which is proposed by spectral subtraction. It is then estimated the signal the signal of interest and noise subspace from a subspace of the noisy Euclidean space . In . mentioned that there are several different types of the spectral subtraction algorithms family. Accordingly, this spectral subtraction type estimates the speech by subtracting noise estimation from the noise speech or by multiplying the noise spectrum with gain functions, and then combine it with the phase of noisy speech. Some of its examples, in briefly, are spectral over-subtraction, spectral subtraction based on perceptual properties, iterative spectral subtraction, multi-band spectral subtraction. Wiener filtering. Therefore, spectral subtraction types essentially were based on intuitive and heuristically based principles. In Wiener filter type algorithms, the general idea is to minimize the mean square error criterion and to achieve the optimal filter as mention in . , . The typical formula of the Wiener filter with noncausal Wiener filter for which the frequency response . , . and its formular can be expressed as follow ANA. Wiener (A ) A E E S AA A E E E S AA A E A E E DAA A E SI AA A Y AA A Ps (A ) EE and ANWiener (A ) A EE E Ps (A ) A APd (A ) E where E[C] is assigned as signal estimator and parameter A and A is assigned to some constant. These constant referred as parametric Wiener filters in which to obtain their characteristic for speech In Equation . assign the A and A are equal to one. Thus, the enhanced speech estimation depends largely on the gain parameterAos improvement. The enhanced speech estimation and its gain function is shown in Equation . This gain function is largely depend on the power spectrum density of the noise at a certain frequency that attenuates each frequency component. In . , . reviewed the statistical model based algorithm. Its method is justified by the statistics of speech and noise that are not available and there is no knowledge of the best distortion measure in the perception sense by modification of using Hidden Markov Model (HMM) based enhancement . In general, this method adapted a composite source model by choosing a finite set of statistically independent Gaussian subsources model. This finite set is consider as switch that controlled by a Markov chain. The HMM-based IJECE Vol. No. August 2017 : 1941 Ae 1951 IJECE ISSN: 2088-8708 enhancement systems allow a separation between speech and noise beside that it introduced of a prior information about speech and a modeling of noise lead to an improvement over classical methods, especially at low SNRs and for speech corrupted by nonstationary noise. The limitation os the HMM-based system require a training phase to obtain the speech and noise models. It relatively increase the computational The evaluation of this stage followed clean speech estimation using Maximum A Posteriori estimation (MAP) . , . Minimum Mean-Square Error (MMSE) estimation . t also known as Epharaim and MallahAos estimato. This method . focused on producing colorless residual noise by introducing the gain functionAos estimator as a function of a posteriori SNR and a priori SNR. SI (A ) A ANWiener (A ) Y (A ) E E EE S AA A EE ANWiener (A ) A E E E E S AA A 2 E AAE E DAA A 2 E E EE EE EE EE Later, . proposed the modification of a priori signal to noise estimation that leads to the best subjective results and achieved the trade-off between noise reductions with low computational load for real time operations. Moreover, . adapted with a non-causal estimator for a priori SNR and a corresponding non-causal to enhance speech signal. This estimator technique produced a higher improvement in segmental SNR, lower log-spectral distortion, and better perceptual evaluation of speech quality assessment tests (PESQ scores based on ITU-T P. 862 standard . Besides that, other speech enhancement techniques . , . also In . mentioned the modification of boosting techniques and its adaptation to temporal masking threshold of the human auditory system. This masking threshold depends on human auditory system that typically using in speech and audio coding to lower the bitrate requirement. The gain function was depended on the global forward masking threshold and forward masking threshold in each subband . It acted as the filter operation that expressed in time domain in order to evaluate the noise effects to the speech signal in each subband. PROPOSED SPEECH ENHANCEMENT ALGORITHM In this section. Figure 1 shows in the block diagram of the proposed algorithm. This speech enhancement algorithm is designed based on Wiener filter and compressive sensing (CS). Noisy Spectrum and Update of Noise Estimate As shown in Figure 1, the speech signal has been contaminated by noise and it is well-known as noisy speech. With this method, the noisy speech is separate into a frame of 20 milliseconds in which each frame is corresponded to 160 sample per frame by using the sampling rate of 8 kHz. Let noisy speech y. as the input signal in term of time domain that consist of the clean speech s. and additive noise d . of independent source respectively. The equations are restated and simplified in order to make understandable. From Equation . , noise estimate . with the hypothesis formula can be expressed in Equation . The noise estimation will calculate based on frame-by-frame noise estimation of Equation . The hypothesis of Equation . is update the noise estimate A D AA A . The rang A d A0 C A d C 1A was assigned for smoothing factor. H 0 (A ) and H1 (A ) denoted the speech absent and the speech present hypothesis respectively. Hence, the noise estimate A A D(A ) obtained from Equation . where p' (A , k ) AE P H AA , k A | Y AA , k A denoted as the speech presence probability of the noise variance that corrupted in high nonstationary noise environments. A D2 AA A A EAu D(A ) Ay and H 0' (A ) : AI D2 (A A . A A d AI D2 (A ) A A1 A A d AY AA A H1' (A ) : AI D2 (A A . A AI D2 (A ) D(A ) A A s (A) D(A A . A . A A s (A)) Y (A) and A s (A ) AE A d A . A A d ) p' (A ) . SNR Estimator and Wiener Filter The SNR estimator is represented by observing local a posteriori SNR and a priori SNR in Equation . This estimator was adapted by using . in order to produce colorless residual Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing (Amart Sulon. A ISSN: 2088-8708 noise and to improve the gain function of the Wiener filter. The SInA1 (A ) is the previous frame estimation speech, where SNRpost A 1 is interpreted as instantaneous SNR ( SNRinst ) while A A 0. 98 and P( . A y if y C 0 and P( . A 0 otherwise. This Wiener technique in Equation . was modified based on . to obtain the high amplitude spectrum weight estimate when applying Equation . to non-linear optimal gain function of Equation . and produced the enhanced speech signal. This modified technique will reduce the mismatch weight of the interested signal. Then, the inverse FFT transformed is synthesis. It also derived under assumption that of a key parameter in the reduction of the noise and improving the speech distortion where the technique given a decision-directed method as low computational load for real time operation. SNR post A Y (A ) D(A ) Au SNR A E S (A ) 2 and D(A ) A Ay A A SI (A) A A1 A A AAP(SNR A . A n A1 D(A ) yI AnA A IFFT ANWienerAA A C Y AA A e jA A X AA AA and ANWiener (A ) A SNR prio 1 A SNR prio Noisy Speech Noisy Speech Noisy Spectrum Estimator Noise Estimate Average Spectral SNR SNR Estimator Estimator Wiener Filter Compressive Sensing (CS) Modification Enhanced Speech PESQ Score PESQ Measure Clean Speech Figure 1. The proposed algorithm based on Wiener filter and compressive sensing technique Compressive Sensing Modification The compressive sensing (CS) technique is also modified. This novel CS approach is fundamentally different from the well-known Shannon sampling theorem . This technique used sampling theory that of selecting the interested signal and recover with almost exact signal reconstruction from noiseless observations . , . The major advantage of the CS is the recovery predictions of the signals from incomplete measurements . that was applied in various applications. Moreover, the CS technique relies on the key efficiency of the empirical observation with well sparse approximations in suitable basis by IJECE Vol. No. August 2017 : 1941 Ae 1951 IJECE ISSN: 2088-8708 only a small amount of nonzero coefficients . , . The CS method used gradient projection for sparse reconstruction (GPSR) to experimentally investigate the interactive effects of the corrupted noise and obtain better improvement to the listener with noiseless reduction . This method applied based on the weight adaptation ( yI . A Ax A w ) of inverse fast Fourier transform in Equation . to achieve high quality noise reduction and enhance speech signal sI. A Ax where the nature of a matric is defined by measurement matrix A Ea R mCn . The estimated coefficient x Ea R n , and model mismatch w Ea R m is under assumption that m A n . To recover the ill-posed condition of signal with sufficient sparse x of unconstrained problem used the GPSR . technique, where the spurious components w Ea R m are reduced noiseless distortions. This technique can be expressed as in Equation . y A Ax 2 A A x 1 Let the sample y is input weigh signal correlation to predetermined the element of weigh adaptation yI . The determination to exact solution of the sparse recovery y is utilized to regulate the recovery of the estimated coefficient in the predicted signal xC of x and achieve the improvement of speech quality with noise reduction. This CS modification technique relies on the key efficiency of the empirical observation with well sparse approximation in suitable basis by only small amount of nonzero coefficients . , . EXPERIMENTAL RESULTS AND DISCUSSIONS PESQ objective assessment test and its percentage improvement in A was investigate in which to evaluate the enhancement of the speech signal and then compare with the clean speech signal that of a particular assessment signal . , . , . The PESQ score has almost correlated with subjective assessment test of a 93. 5% correlation while other objective test such as Itakura-saito distortion algorithm. Articulation index, segment SNR, and SNR have correlation assessment test of 59%, 67%, 77%, and 24% respectively . In . also introduced the new speech quality assessment test in term of percentage PESQ improvement A . This percentage improvement can be expressed as shown in Equation . PESQproc A PESQref PESQref C 100% Equation . mentioned on PESQproc and PESQref , it denoted the objective PESQ assessment score of the enhanced speech compared with the clean speech signal while in PESQref refers to PESQ score of testing noisy speech performance quality compared with the clean speech respectively. The four different real artificial added form the noisy speech corpus (NOIZEUS) Ie standard 1996 . , . These noisy data set used the American English language, where the speech originally sampled at 25 kHz and down-sampled to 8 kHz. The traditional algorithms include Spsub . Ssrdc . Pklt . WnrWt . Mmask . , and mmse . The PESQ assessment test was used to evaluate the main analysis and its significant diferent between the proposed SpEnCS and the other algorithms at various noise type SNRs. Figure 2 clearly indicated the improvement of the proposed algorithm in the waveforms and spectrogram results when compare with traditional algorithms, noisy speech. In Figure 3, the PESQ score in the proposed SpEnCS algorithm outperforms the speech quality compared to overall score with other algorithms of all noise types, i. 0, 5, 10, 15 dB SNR. Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing (Amart Sulon. A ISSN: 2088-8708 . The speech waveform of clean speech . The speech waveform of clean speech . The speech waveform of the proposed SpEnCS . The speech waveform of Pklt algorithm . The speech waveform of mmse algorithm . The spectrogram of clean speech . The spectrogram of noisy speech . The spectrogram of the proposed SpEnCS . The spectrogram of Pklt algorithm . The spectrogram of mmse algoithm Figure 2. The comparison of speech waveform . and its spectrogram . of the proposed SpEnCS algorithmthat of airport noise atAusp12. wavAy at 0 dB SNR . The PESQ assessment score of Airport Noise . The PESQ assessment score of Babble Noise . The PESQ assessment score of Car Noise . The PESQ assessment score of Exhibition Noise Figure 3. Comparison of PESQ assessment test of the proposed SpEnCS algorithm with other conventional algorithms at 0 dB, 5 dB, 10 dB and 15 dB respectively IJECE Vol. No. August 2017 : 1941 Ae 1951 IJECE ISSN: 2088-8708 Table 1. The PESQ improvement in percentage (%) of the proposed SpEnCS compares with other algorithms Table 1 shown that the worst case appear with 0dB at all type of noise conditions. Most of PESQ percentage improvement results of the traditional algorithms were below 10% and its improvement remain It was only at mmse algorithm produced comparable results with the proposed SpEnCS The overall average of the improvement in the proposed SpEnCS is around 20% to all noisy assessment tests but other algorithm produced less than the proposed algorithms. CONCLUSIONS AND FUTURE WORKS A new speech enhancement approach by using Wiener filter and compressive sensing was proposed for enhancing speech degraded by additive noise. The noise estimation is adapted in which to track noise update estimation continuously. The proposed approach is based on the Wiener filter and compressive The Wiener filter is modified for reducing colorless residual noise before Wiener filter is calculated. Wiener filter is then produced the optimal gain with increasing amplitude spectrum weight estimate and reducing mismatch signal estimate. The compressive sensing later is modified to predict the interested signals from incomplete measurements . and recover with almost signal reconstruction from noiseless Our investigation and evaluation of the proposed algorithms outperforms the other conventional algorithms at various noise types. ACKNOWLEDGEMENTS This research has been supported by International Islamic University Malaysia Research Grant. RIGS16-336-0500. REFERENCES