Eva Cheng is the Deputy Director of Women in Engineering and Information Technology, and Senior Lecturer in the School of Electrical and Data Engineering. Previous to joining UTS in September 2017, she was a Lecturer in the School of Engineering at RMIT University.
Eva actively collaborates on social justice and community engagement in STEM diversity and humanitarian engineering, including working with Tech Girls are Superheroes (which aims to engage 10,000 girls in STEM entrepreneurship by 2020), Girl Geek Academy, Laika Academy and Engineers Without Borders Australia. She is also engaged in cross-cultural and inclusive teaching and research, and developing global mobility experiential learning programs for students.
Her research is in multimedia signal processing, including speech and audio signal processing and computer vision. Collaborative research projects have included working with musicians, visual artists, architects, biologists and non-profit organisations. She is a member of the UTS Perceptual Imaging Lab (πLab) and Centre for Audio, Acoustics and Vibration.
- Member, Institute of Electrical and Electronics Engineers (IEEE)
- Member, Audio Engineering Society (AES)
- Member, Acoustical Society of America (ASA)
- Member, Engineers Without Borders Australia
Can supervise: YES
- Speech and audio signal processing
- Multimedia signal processing
- Quality of multimedia experience
- Engineering and STEM education
- Speech and audio signal processing
- Multimedia signal processing
- Humanitarian engineering
Zhao, S, Cheng, E, Qiu, X, Burnett, I & Chia-Chun Liu, J 2018, 'Spatial decorrelation of wind noise with porous microphone windscreens.', Journal of the Acoustical Society of America, vol. 143, no. 1, pp. 330-330.View/Download from: UTS OPUS or Publisher's site
This paper explores the wind noise reduction mechanism of porous microphone windscreens by investigating the spatial correlation of wind noise. First, the spatial structure of the wind noise signal is studied by simulating the magnitude squared coherence of the pressure measured with two microphones at various separation distances, and it is found that the coherence of the two signals decreases with the separation distance and the wind noise is spatially correlated only within a certain distance less than the turbulence wavelength. Then, the wind noise reduction of the porous microphone windscreen is investigated, and the porous windscreen is found to be the most effective in attenuating wind noise in a certain frequency range, where the windscreen diameter is approximately 2 to 4 times the turbulence wavelengths (2 < D0/ξ < 4), regardless of the wind speed and windscreen diameter. The spatial coherence between the wind noise outside and inside a porous microphone windscreen is compared with that without the windscreen, and the coherence is found to decrease significantly when the windscreen diameter is approximately 2 to 4 times the turbulence wavelengths, corresponding to the most effective wind noise reduction frequency range of the windscreen. Experimental results with a fan are presented to support the simulations. It is concluded that the wind noise reduction mechanism of porous microphone windscreens is related to the spatial decorrelation effect on the wind noise signals provided by the porous material and structure.
Zhao, S, Dabin, M, Cheng, E, Qiu, X, Burnett, I & Liu, JC-C 2018, 'Mitigating wind noise with a spherical microphone array.', The Journal of the Acoustical Society of America, vol. 144, no. 6, pp. 3211-3211.View/Download from: UTS OPUS or Publisher's site
This paper utilizes a rigid spherical microphone array to reduce wind noise. In the experiments conducted, a loudspeaker is used to reproduce the desired sound signal and an axial fan is employed to generate wind noise in an anechoic chamber. The sound signal and wind noise are measured separately with the spherical microphone array and analyzed in the spherical harmonic domain. The wind noise is found to be irregularly distributed in the spherical harmonic domain, distinct from the sound signal which is concentrated in the first few spherical harmonic modes. This difference is utilized to reduce wind noise without degrading the desired sound pressure level (SPL) by use of a low pass filter method in the spherical harmonic domain. Experimental results with both single-tonal and multi-tonal sound signals demonstrate that the proposed method can reduce wind noise by more than 10 dB in the frequency range below 500 Hz. The SPL of the desired sound signal can be extracted from wind noise with an error within 1.0 dB, even when the sound level is 8 dB lower than wind noise.
Hieu, MB, Lech, M, Cheng, E, Neville, K & Burnett, IS 2017, 'Object Recognition Using Deep Convolutional Features Transformed by a Recursive Network Structure', IEEE ACCESS, vol. 4, pp. 10059-10066.View/Download from: UTS OPUS or Publisher's site
Deep neural networks (DNNs) trained on large data sets have been shown to be able to capture
high-quality features describing image data. Numerous studies have proposed various ways to transfer DNN
structures trained on large data sets to perform classification tasks represented by relatively small data sets.
Due to the limitations of these proposals, it is not well known how to effectively adapt the pre-trained
model into the new task. Typically, the transfer process uses a combination of fine-tuning and training of
adaptation layers; however, both tasks are susceptible to problems with data shortage and high computational
complexity. This paper proposes an improvement to the well-known AlexNet feature extraction technique.
The proposed approach applies a recursive neural network structure on features extracted by a deep
convolutional neural network pre-trained on a large data set. Object recognition experiments conducted on
the Washington RGBD image data set have shown that the proposed method has the advantages of structural
simplicity combined with the ability to provide higher recognition accuracy at a low computational cost
compared with other relevant methods. The new approach requires no training at the feature extraction phase,
and can be performed very efficiently as the output features are compact and highly discriminative, and can
be used with a simple classifier in object recognition settings
Wang, X, Cheng, E, Burnett, IS, Huang, Y & Wlodkowic, D 2017, 'Automatic multiple zebrafish larvae tracking in unconstrained microscopic video conditions.', Scientific Reports, vol. 7, no. 1, pp. 1-8.View/Download from: UTS OPUS or Publisher's site
The accurate tracking of zebrafish larvae movement is fundamental to research in many biomedical, pharmaceutical, and behavioral science applications. However, the locomotive characteristics of zebrafish larvae are significantly different from adult zebrafish, where existing adult zebrafish tracking systems cannot reliably track zebrafish larvae. Further, the far smaller size differentiation between larvae and the container render the detection of water impurities inevitable, which further affects the tracking of zebrafish larvae or require very strict video imaging conditions that typically result in unreliable tracking results for realistic experimental conditions. This paper investigates the adaptation of advanced computer vision segmentation techniques and multiple object tracking algorithms to develop an accurate, efficient and reliable multiple zebrafish larvae tracking system. The proposed system has been tested on a set of single and multiple adult and larvae zebrafish videos in a wide variety of (complex) video conditions, including shadowing, labels, water bubbles and background artifacts. Compared with existing state-of-the-art and commercial multiple organism tracking systems, the proposed system improves the tracking accuracy by up to 31.57% in unconstrained video imaging conditions. To facilitate the evaluation on zebrafish segmentation and tracking research, a dataset with annotated ground truth is also presented. The software is also publicly accessible.
Zhao, S, Cheng, E, Qiu, X, Burnett, I & Liu, JC-C 2017, 'Wind noise spectra in small Reynolds number turbulent flows.', Journal of the Acoustical Society of America, vol. 142, no. 5, pp. 3227-3227.View/Download from: UTS OPUS or Publisher's site
Wind noise spectra caused by wind from fans in indoor environments have been found to be different from those measured in outdoor atmospheric conditions. Although many models have been developed to predict outdoor wind noise spectra under the assumption of large Reynolds number [Zhao, Cheng, Qiu, Burnett, and Liu (2016). J. Acoust. Soc. Am. 140, 4178-4182, and the references therein], they cannot be applied directly to the indoor situations because the Reynolds number of wind from fans in indoor environments is usually much smaller than that experienced in atmospheric turbulence. This paper proposes a pressure structure function model that combines the energy-containing and dissipation ranges so that the pressure spectrum for small Reynolds number turbulent flows can be calculated. The proposed pressure structure function model is validated with the experimental results in the literature, and then the obtained pressure spectrum is verified with the numerical simulation and experiment results. It is demonstrated that the pressure spectrum obtained from the proposed pressure structure function model can be utilized to estimate wind noise spectra caused by turbulent flows with small Reynolds numbers.
Zhao, S, Dabin, M, Cheng, E, Qiu, X, Burnett, I & Liu, JC-C 2017, 'On the wind noise reduction mechanism of porous microphone windscreens.', Journal of the Acoustical Society of America, vol. 142, no. 4, pp. 2454-2454.View/Download from: UTS OPUS or Publisher's site
This paper investigates the wind noise reduction mechanism of porous microphone windscreens. The pressure fluctuations inside the porous windscreens with various viscous and inertial coefficients are studied with numerical simulations. The viscous and inertial coefficients represent the viscous forces resulting from the fluid-solid interaction along the surface of the pores and the inertial forces imposed on the fluid flow by the solid structure of the porous medium, respectively. Simulation results indicate that the wind noise reduction first increases and then decreases with both viscous and inertial coefficients after reaching a maximum. Experimental results conducted on five porous microphone windscreens with porosity from 20 to 60 pores per inch (PPI) show that the 40 PPI windscreen has the highest wind noise reduction performance, and this supports the simulation results. The existence of the optimal values for the viscous and inertial coefficients is explained qualitatively and it is shown that the design of the porous microphone windscreens should take into account both the turbulence suppression inside and the wake generation behind the windscreen to achieve optimal performance.
Zhao, S, Cheng, E, Qiu, X, Burnett, I & Liu, JC-C 2016, 'Pressure spectra in turbulent flows in the inertial and the dissipation ranges', JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, vol. 140, no. 6, pp. 4178-4182.View/Download from: UTS OPUS or Publisher's site
Zhao, S, Qiu, X, Cheng, E, Burnett, I, Williams, N, Burry, J & Burry, M 2015, 'Sound quality inside small meeting rooms with different room shape and fine structures', APPLIED ACOUSTICS, vol. 93, pp. 65-74.View/Download from: Publisher's site
Noise cancellation: disrupting audio perception is an interactive sound and visual art installation that explores the creation of new technology - open-air active signal cancellation - and how it can be incorporated into interactive art installations. As an ongoing collaborative project between artist and engineer, noise cancellation engages signal processing research issues in a creative application space. This paper describes the collaborative installation work in progress and discusses why this active signal cancellation technology is important in relation to changing modes of listening and hearing, altering spatial perception and encouraging audiences to fully interact with art installations within art galleries (and related spaces). Copyright © 2013 Inderscience Enterprises Ltd.
Bui, HM, Lech, M, Cheng, E, Neville, K, Wilkinson, R & Burnett, IS 2018, 'Randomized dimensionality reduction of deep network features for image object recognition', Proceedings - 2018 2nd International Conference on Recent Advances in Signal Processing, Telecommunications and Computing, SIGTELCOM 2018, International Conference on Recent Advances in Signal Processing, Telecommunications & Computing, Ho Chi Minh City, Vietnam, pp. 136-141.View/Download from: Publisher's site
© 2018 IEEE. This study investigates data dimensionality reduction for image object recognition. The dimensionality reduction was applied to features extracted from an existing pre-trained Deep Neural Network (DNN) structure, the AlexNet. An analysis of the neurons in different layers of the AlexNet revealed an incremental increase in the pair-wise orthogonality between weight vectors of neurons in each layer, towards higher-level layers. This observation motivated the current study to evaluate the possibility of performing randomized dimensionality reduction by mimicking the observed orthogonality property of the high-level layers on activations of low-level layers of the AlexNet. Image object classification experiments have shown that the proposed random orthogonal projection method performed well in multiple tests, consistently outperforming the well-known statistics-based sparse random projection. Apart from being data independent, the proposed approach achieved performances comparable with the state-of-the-art techniques, but with lower computational requirements.
Wang, X, Cheng, E, Burnett, IS, Wilkinson, R & Lech, M 2018, 'Automatic tracking of multiple zebrafish larvae with resilience against segmentation errors', 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), International Symposium on Biomedical Imaging, IEEE, Washington, DC, USA, pp. 1157-1160.View/Download from: Publisher's site
© 2018 IEEE. The accurate tracking of zebrafish larvae movement is essential to many biomedical and neural science applications. This paper develops an accurate and reliable multiple zebrafish larvae tracking system resilient to detection and segmentation errors due to object misdetection and occlusion. The proposed system can therefore be applied to microscopic videos in unconstrained, realistic imaging conditions. Evaluated on a set of single and multiple adult and larvae zebrafish videos, a wide variety of (complex) video conditions were tested, including shadowing, labels, water bubbles and background artefacts. The proposed system obtains decreased overall MOTP error of up to 44.49 pixels compared to the commercial LoliTrack system, and increased MOTA accuracy by 31.57% compared with the state-of-the-art idTracker approach. The results offer an additional advantage of improved position detection, increased accuracy and unique identification compared to current techniques.
Chiem, QT, Wilkinson, RH, Lech, M & Cheng, E 2017, 'Investigating Keypoint Repeatability for 3D Correspondence Estimation in Cluttered Scenes', Proceedings of DICTA 2017 - 2017 International Conference on Digital Image Computing: Techniques and Applications, International Conference on Digital Image Computing: Techniques and Applications, IEEE, Sydney, NSW, Australia, pp. 1-7.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. In 3D object recognition, local feature-based recognition is known to be robust against occlusion and clutter. Local feature estimation requires feature correspondences, including feature extraction and matching. Feature extraction is normally a two-stage process that estimates keypoints and keypoint descriptors, and existing studies show repeatability to be a good indicator of keypoint feature detector robustness. However, the impact of keypoint repeatability on feature correspondence estimation and overall feature matching accuracy has not yet been studied. In this paper, local features are extracted at both regular and repeatable 3D keypoints using leading keypoint detectors combined with the SHOT descriptor to estimate a set of correspondences. When using a keypoint detector of high repeatability, experimental results show improved feature matching accuracy and reduced computational requirements for the feature description and matching, and overall correspondence estimation process.
Vu, H, Cheng, E, Wilkinson, R & Lech, M 2017, 'On the use of convolutional neural networks for graphical model-based human pose estimation', Proceedings - 2017 International Conference on Recent Advances in Signal Processing, Telecommunications and Computing, SigTelCom 2016, International Conference on Recent Advances in Signal Processing, Telecommunications & Computing, IEEE, Da Nang, Vietnam, pp. 88-93.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. The recent application of Convolutional Neural Networks (CNNs) to Human Pose Estimation (HPE) from static images have improved estimation accuracy compared to traditional HPE approaches. In particular, a recent novel HPE approach combines a traditional graphical model with CNNs to result in state-of-the-art HPE accuracy, improving the estimation accuracy compared to using either approach alone. However, the accuracy of the CNN used in the hybrid model has not yet been explored, and this paper evaluates the use of CNNs in the hybrid model through investigating different network configurations and fine-tuning the network using pre-trained weights obtained from a large labeled dataset. The proposed CNN configurations not only improve the accuracy of the existing network by up to 2% but also uses fewer parameters, resulting in a higher HPE accuracy and simpler network structure.
Vu, HT, Wilkinson, RH, Lech, M & Cheng, E 2017, 'A Comparison between Anatomy-Based and Data-Driven Tree Models for Human Pose Estimation', Proceedings of DICTA 2017 - 2017 International Conference on Digital Image Computing: Techniques and Applications, International Conference on Digital Image Computing: Techniques and Applications, IEEE, Sydney, NSW, Australia, pp. 1-7.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Tree structures are commonly used to model relationships between body parts for articulated Human Pose Estimation (HPE). Tree structures can be used to model relationships among feature maps of joints in a structured learning framework using Convolutional Neural Networks (CNNs). This paper proposes new data-driven tree models for HPE. The data-driven tree structures were obtained using the Chow-Liu Recursive Grouping (CLRG) algorithm, representing the joint distribution of human body joints and tested using the Leeds Sports Pose (LSP) dataset. The paper analyzes the effect of the variation of the number of nodes on the accuracy of the HPE. Experimental results showed that the data-driven tree model obtained 1% higher HPE accuracy compared to the traditional anatomy-based model. A further improvement of 0.5% was obtained by optimizing the number of nodes in the traditional anatomy-based model.
Wang, X, Cheng, E & Burnett, IS 2017, 'Improved cell segmentation with adaptive bi-Gaussian mixture models for image contrast enhancement pre-processing', 2017 IEEE Life Sciences Conference, LSC 2017, IEEE Life Sciences Conference, IEEE, Sydney, NSW, Australia, pp. 87-90.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. The accurate detection and segmentation of cells from time-lapse microscopic video sequences provides a critical foundation for understanding dynamic cell behaviours and cell characteristics when using automatic cell tracking systems. However, general object segmentation methods in computer vision are susceptible to errors due to the severe microscopic imaging conditions in time-lapse cell videos. To address the low image intensity contrast typical in cell images, this paper investigates the use of an adaptive, shifted bi-Gaussian mixture model to enhance the contrast prior to cell segmentation. Rather than using a model with fixed parameters across an entire video sequence as in existing approaches, this paper proposes the adaptive derivation of the mixture model parameters to match the intensity histogram for each video frame to adaptively address changes in the video background. Experimental results across a cell database show improved segmentation accuracy compared with existing image contrast enhancement methods. The pre-processed cell image exhibits greater differentiation between the cell foreground and background, whilst also maintaining the original intensity histogram features.
Wang, X, Cheng, E, Burnett, IS, Huang, Y & Wlodkowic, D 2017, 'Crowdsourced generation of annotated video datasets: A zebrafish larvae dataset for video segmentation and tracking evaluation', 2017 IEEE Life Sciences Conference, LSC 2017, IEEE Life Sciences Conference, Sydney, NSW, Australia, pp. 274-277.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Video segmentation research has emerged over the last decade for biomedical image and video processing, especially in biological organism tracking. However, due to the difficulties in generating the video segmentation ground truth, the general lack of segmentation datasets with annotated ground-truth severely limits the evaluation of segmentation algorithms. This paper proposes an efficient and scalable crowdsourced approach to generate video segmentation ground-Truth to facilitate database generation for general biological organism segmentation and tracking algorithm evaluation. To illustrate the proposed approach, an annotated zebrafish larvae video segmentation dataset has been generated and made freely available online. To enable the evaluation of algorithms against a ground-Truth, a set of segmentation evaluation metrics are also presented. The segmentation performance of five leading segmentation algorithms is then evaluated by the metrics on the generated zebrafish video segmentation dataset.
Zhao, S, Cheng, E, Qiu, X, Burnett, I & Liu, CC 2017, 'Simulations on the wind noise reduction by spherical shell windscreens', INTER-NOISE 2017 - 46th International Congress and Exposition on Noise Control Engineering: Taming Noise and Moving Quiet.View/Download from: UTS OPUS
© 2017 Institute of Noise Control Engineering. All rights reserved. Various windscreens are widely used in outdoor acoustic measurements to reduce the effect of the wind induced pressure fluctuations at the microphones and to improve the measurement accuracy. However, the physical mechanism of the wind noise reduction by windscreens remains unclear yet. In this paper, the wind noise reduction performance of spherical shell windscreens is investigatedwith numerical simulations based on the turbulent modeling in porous media. The effects of both the diameter and the thickness of the spherical shell windscreen on the wind noise reduction performance are investigatedfor both uniform and turbulent incoming flows. It is found that the wind noise at the microphone can be reduced by spherical shell windscreens only when the thickness of the spherical shell windscreen is smaller than 0.1 cmfor uniformincoming flow, and the wind noise reduction increases with the diameter but approaches a constant when the windscreen is largerthan 20 cm.Forturbulentincoming flow, the wind noise reduction performance of the spherical shell windscreen increases with the diameter and reaches the best when the shell thickness is around 1.0 cm.
Zhao, S, Cheng, E, Qiu, X, Lacey, J & Maisch, S 2017, 'A method of configuring fixed coefficient active noise controllers for traffic noise reduction', INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Hong Kong, China.View/Download from: UTS OPUS
© 2017 Institute of Noise Control Engineering. All Rights Reserved. In practical applications of Active Noise Control (ANC) systems on traffic noise reduction, the noise sources to be controlled are usually far away from the system and are continuously moving, hence there are no fixed noise sources for configuring the controller. This paper proposes a method to configure fixed coefficient ANC systems for the scenario of traffic noise reduction. In the tuning process, a pseudo noise source is proposed to be placed near a single channel ANC system to adjust the controller. After the optimal coefficients of the controller are obtained for this situation, the coefficients are fixed and the ANC system is utilized to cancel the actual noise source in the far-field. Simulation results showed that when the noise source is a point source located very far away from the ANC system, moving the pseudo noise source farther away from the single-channel ANC system can effectively increase the noise reduction. However, if the noise source is closer to the single-channel ANC system than the pseudo noise source, the performance deteriorates quickly. When the primary noise originates from a line array of incoherent point sources far from the ANC system, moving the pseudo noise source farther away from the system can effectively increase the noise reduction; however, the performance of the single channel ANC system decreases with frequency and deteriorates when there are many noise sources present simultaneously as in the traffic noise scenario. Experiments were conducted in a laboratory environment for one noise source and three noise sources, and the results are consistent with the simulations.
Arndt, S, Brunnstrdnr, K, Cheng, E, Engelke, U, Moller, S & Antons, JN 2016, 'Review on using physiology in quality of experience', Human Vision and Electronic Imaging 2016, HVEI 2016, pp. 197-205.View/Download from: Publisher's site
In the area of Quality of Experience (QoE), one challenge is to design test methodologies in order to evaluate the perceived quality of multimedia content delivered through technical sys-tems. Traditionally, this evaluation is done using subjective opinion tests. However, sometimes it is difficult for observers to communicate the experienced quality through the given scale. Fur-thermore. those tests do not give insights into how the user is reacting on an internal physiological level. To overcome these issues, one approach is to use physiological measures, in order to derive a direct non-verbal response of the recipient. In this paper, we review studies that have been performed in the domain of QoE using physiological measures and we look into current activities in standardization bodies. We present challenges this research faces. and give an overview on what researchers should be aware of when they want to start working in this research area.
Bui, HM, Lech, M, Cheng, E, Neville, K & Burnett, IS 2016, 'Using Grayscale Images for Object Recognition with Convolutional-Recursive Neural Network', 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE), International Conference on Communications and Electronics (HUT-ICCE), IEEE, Vietnam, pp. 321-325.View/Download from: UTS OPUS or Publisher's site
Rajapaksha, T, Qiu, X, Cheng, E & Burnett, I 2016, 'Geometrical room geometry estimation from room impulse responses', ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Shanghai, China, pp. 331-335.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.Room geometry estimation from corresponding Room Impulse Responses (RIRs) has attracted much attention in recent years, and a key challenge is to find the first order image source locations from the RIRs under different environments. Unlike the existing approaches which require a priori knowledge of the room or require some ideal conditions, this paper proposes an intuitive geometrical method based on the acoustical image source model. The proposed approach does not need any a priori knowledge of the room, only the RIRs from one arbitrary source location to five arbitrary receiving locations. The first order image sources of the walls in a room are identified first, then the room geometry is estimated based on the wall locations using a geometrical approach. Simulations with 2D and 3D convex polyhedral rooms demonstrate the feasibility and the precision of the proposed approach is discussed.
Zhao, S, Cheng, E, Qiu, X, Alambeigi, P, Burry, J & Burry, M 2016, 'A preliminary investigation on the sound field properties in the Sagrada Familia Basilica', 2nd Australasian Acoustical Societies Conference, ACOUSTICS 2016, Conference of the Australian Acoustical Society, AAS, Brisbane, Australia, pp. 797-806.View/Download from: UTS OPUS
This paper reports on a preliminary investigation of the sound field properties inside a large Roman Catholic Church in Barcelona, the Sagrada Familia Basilica, which is a world heritage site although its construction has not been completed. The impulse responses were measured at five sound source positions combined with 14 measurement locations inside the Sagrada Familia Basilica, and the Impulse response to Noise Ratio (INR) were examined to examine the reliability of the measured impulse responses. The room acoustic parameters were calculated and the following five sound field properties in the Sagrada Familia Basilica were analysed: reverberation, spaciousness, loudness, warmth and clarity. The reverberation time (T20) and the Early Decay Time (EDT) were compared with the existing optimal values for small volume churches whereas the middle frequency strength of sound (Gmid), the low frequency strength of sound (G125), the clarity (C80) and the binaural quality index (1 - IACCE) were compared with the optimal values for concert halls. The understanding of the sound field properties in churches, especially in a church of such a large volume as the Sagrada Familia Basilica, is still an open topic and much further research is necessary for a more thorough understanding.
Zhao, S, Cheng, E, Qiu, X, Burnett, I & Liu, JCC 2016, 'Estimation of the frequency boundaries of the inertial range for wind noise spectra in anechoic wind tunnels', Proceedings - 2nd Australasian Acoustical Societies Conference, ACOUSTICS 2016, Conference of the Australian Acoustical Society, AAS, Brisbane, Australia, pp. 1187-1196.View/Download from: UTS OPUS
Wind noise generated by the intrinsic turbulence in the flow can affect the outdoor noise measurements. Various attempts have been made to investigate the wind noise generation mechanism. Wind noise spectra in anechoic wind tunnels can be divided into three frequency regions. In the low frequency region known as the energy-containing range, the wind noise spectrum does not change significantly with frequency. In contrast, in the middle frequency region (or inertial range) the decay rate of the wind noise spectrum curve follows the 7/3 power law, but in the high frequency region (or dissipation range) the decay rate of the wind noise spectrum curve is faster than the -7/3 power law. The boundaries of the -7/3 power law frequency range depend on the Reynolds number; however, no exact value is known according to current literature. This paper proposes a method for predicting the boundary values based on the energy cascade theory. Large eddy simulations of free jet were performed to validate the proposed method and the results were found to be in reasonable agreement with existing experiment measurements obtained in an anechoic wind tunnel. Additional simulations were also conducted with different inflow entrance sizes to further verify the predictions from the proposed method.
Choy, S-M, Chiu, K-H, Cheng, E & Burnett, I 2015, '3D Fatigue from Stereoscopic 3D Video Displays: Comparing Objective and Subjective Tests using Electroencephalography', Proceedings of TENCON 2015 - 2015 IEEE Region 10 Conference, IEEE Tencon (IEEE Region 10 Conference), IEEE, Macao, pp. 1-4.View/Download from: Publisher's site
The use of stereoscopic display has increased in recent times, with a growing range of applications using 3D videos for visual entertainment, data visualization, and medical applications. However, stereoscopic 3D video can lead to adverse reactions amongst some viewers, including visual fatigue, headache and nausea; such reactions can further lead to Visually Induced Motion Sickness (VIMS). Whilst motion sickness symptoms can occur from other types of visual displays, this paper investigates the rapid adjustment triggered by human pupils as a potential cause of 3D fatigue due to VIMS from stereoscopic 3D displays. Using Electroencephalogram (EEG) biosignals and eye blink tools to measure the 3D fatigue, a series of objective and subjective experiments were conducted to investigate the effect of stereoscopic 3D across a series of video sequences.
Hamilton, M, Salim, F, Cheng, E & Choy, SL 2015, 'Transafe: A crowdsourced mobile platform for crime and safety perception management', International Symposium on Technology and Society, Proceedings.View/Download from: Publisher's site
© 2011 IEEE. This paper describes a proposed mobile platform, Transafe, that captures and analyses public perceptions of safety to deliver 'crowdsourced' collective intelligence about places in the City of Melbourne, Australia, and their affective states at various times of the day. Public perceptions of crime on public transport in Melbourne are often mismatched with actual crime statistics and such perceptions thus can act as social barriers to visitors and locals traversing within and through the city. Using interactive mobile applications and social media, the visualization of this crowdsourced safety perception information will increase the commuter's awareness of various situations in the City of Melbourne. In addition, through social behavioral analysis and ethnographic research, the collective public intelligence will also help inform the stakeholders of the city for future policy-making and policing strategies for safety perception management. At the centre of the proposed platform is the design and development of a mobile phone application that can contribute to people feeling safer by supporting users to report crimes and misdemeanors that they witness, and provide information about transportation and emergency services around where the users are located. The proposed application can also act as a crime deterrent with one feature that enables user tracking by up to three nominated friends if the user opts to activate tracking when feeling unsafe while roaming the city.
Qiu, X, Cheng, E, Burnett, I, Williams, N, Burry, J & Burry, M 2015, 'Preliminary study on the speech privacy performance of the Fabpod', Acoustics 2015 Hunter Valley, Conference of the Australian Acoustical Society, Australian Acoustical Society, Sydney, Australia.
This paper reports the preliminary measurement results for characterising the speech privacy performance of an open ceiling meeting room called Fabpod in RMIT University, where the Speech Privacy Class standardized in the ASTM E2638 was adopted in the measurements to rate the speech privacy performance. The background sound pressure level inside and outside the Fabpod and the sound pressure level differences at different locations inside and outside the Fabpod with different sound source locations were measured in one third octave bands from 50 Hz to 10000 Hz. Based on the measurement results, the Speech Privacy Class of the Fabpod was calculated. The conclusion is that the Fabpod cannot meet the normal speech privacy criteria and the meeting inside the Fabpod can easily be overheard outside. Speech privacy is affected by many factors including the speech attenuation from the sound source to the receiver and the level of the background noise. The speech attenuation from the sound source to the receiver depends on the height of the wall or barrier, the sound absorption coefficient of the ceiling and the distance between the sound source and receiver. To achieve acceptable speech privacy for the Fabpod, all design parameters have to be tuned to near optimum values. The measures that can be used to increase the speech privacy of the Fabpod are discussed.
Sharma, S, Cheng, E & Burnett, IS 2015, 'A Simple Objective Method for Automatic Error Detection in Stereoscopic 3D Video', Proceedings for the Big Data Visual Analytics (BDVA), 2015, Big Data Visual Analytics, IEEE, Hobart, TAS, pp. 119-121.View/Download from: Publisher's site
With the increased popularity of 3D videos online and through consumer and cinema media, there exist few techniques for the automatic detection of stereoscopic error in 3D videos. Further, techniques based on disparity estimation are imprecise and computationally complex. This paper proposes a simple objective method to detect common errors inherent to stereoscopic 3D content due to discrepant objects between the left and the right view of the image pairs, stereoscopic window violation and undesirably high binocular disparity that causes viewing discomfort. The technique proposed in this paper identifies stereoscopic errors by computing only the edge disparity, which is computationally less expensive and uses simplified methods that may be optimised for real-time computation. Evaluations of the proposed technique are conducted on a series of stereoscopic 3D videos containing common errors, where regions that contain a range of different errors are successfully and clearly identified.
Wang, X, Cheng, E & Burnett, IS 2015, 'Improved (STEM) cell segmentation with histogram matching image contrast enhancement', 2015 IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP 2015 - Proceedings, IEEE China Summit and International Conference on Signal and Information Processing, IEEE, Chengdu, China, pp. 816-820.View/Download from: Publisher's site
© 2015 IEEE. The tracking of moving biological cells in time-lapse video sequences is fundamental to further understanding biological processes. Automatic cell tracking techniques require accurate cell image segmentation; however, current segmentation techniques are susceptible to errors due to non-ideal but realistic cell image conditions, including low contrast typical of cell microscopic images. This paper proposes a novel image pre-processing technique to enhance the low grayscale image contrast for improved cell image segmentation accuracy. A shifted bi-Gaussian model is matched to the original cell image intensity histogram for greater differentiation between the cell foreground and image background, whilst maintaining the original intensity histogram shape. Experiments conducted on a stem cell time-lapse image database show up to 33% improved segmentation accuracy, in some frames (partially or completely) detecting cells that manual ground-Truth and/or existing segmentation approaches fail to identify.
Wu, L, Qiu, X, Burnett, IS, Cheng, E & Guo, Y 2014, 'A decoupled hybrid structure for active noise control with uncorrelatednarrowband disturbances', INTERNOISE 2014 - 43rd International Congress on Noise Control Engineering: Improving the World Through Noise Control, INTERNOISE 2014 - 43rd International Congress on Noise Control Engineering: Improving the World Through Noise Control.View/Download from: UTS OPUS
In real active noise control (ANC)applications,the following situations frequently occur, one isthat disturbances only present at the error sensor and havelowcorrelation with reference signal, the other is thatthere is no enough space or ideal position for locating the reference sensor to satisfy causality condition. Thusthe residual noise after feedforward control can be seen as uncorrelated narrowband disturbancesin these situationsand ahybrid adaptive feedforward and feedback structure is often utilized to cope with this problem.Many efforts have been paid to improve the performance of the hybrid ANC system, nevertheless, few interests are concerned about the combination method between the feedforward and feedback structure. After investigating the conventional combination method of hybrid feedforward and feedback system, this paper introduces analternate combination method for hybrid ANC systemwhich featuresthat itavoidsthe coupling between the feedforward and feedback structures and both structures are concatenated to attenuate the ambient noise. Simulations are carried out to validatethe effectiveness of the introduced methodfor ANCwith uncorrelated narrowband disturbances.
Ling, L, Cheng, E & Burnett, IS 2013, 'An Iterated Extended Kalman Filter for 3D mapping via Kinect camera', ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Vancouver, BC, Canada, pp. 1773-1777.View/Download from: UTS OPUS or Publisher's site
This paper proposes the use of the Iterated Extended Kalman Filter (IEKF) in a real-time 3D mapping framework applied to Microsoft Kinect RGB-D data. Standard EKF techniques typically used for 3D mapping are susceptible to errors introduced during the state prediction linearization and measurement prediction. When models are highly nonlinear due to measurement errors e.g., outliers, occlusions and feature initialization errors, the errors propagate and directly result in divergence and estimation inconsistencies. To prevent linearized error propagation, this paper proposes repetitive linearization of the nonlinear measurement model to provide a running estimate of camera motion. The effects of iterated-EKF are experimentally simulated with synthetic map and landmark data on a range and bearing camera model. It was shown that the IEKF measurement update outperforms the EKF update when the state causes nonlinearities in the measurement function. In the real indoor environment 3D mapping experiment, more robust convergence behavior for the IEKF was demonstrated, whilst the EKF updates failed to converge. © 2013 IEEE.
Cheng, E, Burton, P, Burton, J, Joseski, A & Burnett, I 2012, 'RMIT3DV: Pre-announcement of a creative commons uncompressed HD 3D video database', 2012 4th International Workshop on Quality of Multimedia Experience, QoMEX 2012, International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, Yarra Valley, VIC, Australia, pp. 212-217.View/Download from: UTS OPUS or Publisher's site
There has been much recent interest, both from industry and research communities, in 3D video technologies and processing techniques. However, with the standardisation of 3D video coding well underway and researchers studying 3D multimedia delivery and users' quality of multimedia experience in 3D video environments, there exist few publicly available databases of 3D video content. Further, there are even fewer sources of uncompressed 3D video content for flexible use in a number of research studies and applications. This paper thus presents a preliminary version of RMIT3DV: an uncompressed HD 3D video database currently composed of 31 video sequences that encompass a range of environments, lighting conditions, textures, motion, etc. The database was natively filmed on a professional HD 3D camera, and this paper describes the 3D film production workflow in addition to the database distribution and potential future applications of the content. The database is freely available online via the creative commons license, and researchers are encouraged to contribute 3D content to grow the resource for the (HD) 3D video research community. © 2012 IEEE.
Davis, S, Cheng, E, Ritz, C & Burnett, I 2012, 'Ensuring Quality of Experience for markerless image recognition applied to print media content', 2012 4th International Workshop on Quality of Multimedia Experience, QoMEX 2012, International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, Yarra Valley, VIC, Australia, pp. 158-163.View/Download from: UTS OPUS or Publisher's site
This paper investigates how minimal user interaction paradigms and markerless image recognition technologies can be applied to matching print media content to online digital proofs. By linking print material to online content, users can enhance their experience with traditional forms of print media with updated online content, videos, interactive online features etc. The proposed approach is based on extracting features from images/text from mobile device camera images to form fingerprints that are used to find matching images/text within a limited test set. An important criterion for these applications is to ensure that the user Quality of Experience (QoE), particularly in terms of matching accuracy and time, is robust to a variety of conditions typically encountered in practical scenarios. In this paper, the performance of a number of computer vision techniques that extract the image features and form the fingerprints are analysed and compared. Both computer simulation tests and mobile device experiments in realistic user conditions are conducted to study the effectiveness of the techniques when considering scale, rotation, blur and lighting variations typically encountered by a user. © 2012 IEEE.
Ling, L, Burnett, IS & Cheng, E 2012, 'A dense 3D reconstruction approach from uncalibrated video sequences', Proceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2012, IEEE International Conference on Multimedia and Expo, IEEE, Melbourne, VIC, Australia, pp. 587-592.View/Download from: UTS OPUS or Publisher's site
Current approaches for 3D reconstruction from feature points of images are classed as sparse and dense techniques. However, the sparse approaches are insufficient for surface reconstruction since only sparsely distributed feature points are presented. Further, existing dense reconstruction approaches require pre-calibrated camera orientation, which limits the applicability and flexibility. This paper proposes a one-stop 3D reconstruction solution that reconstructs a highly dense surface from an uncalibrated video sequence, the camera orientations and surface reconstruction are simultaneously computed from new dense point features using an approach motivated by Structure from Motion (SfM) techniques. Further, this paper presents a flexible automatic method with the simple interface of 'videos to 3D model'. These improvements are essential to practical applications in 3D modeling and visualization. The reliability of the proposed algorithm has been tested on various data sets and the accuracy and performance are compared with both sparse and dense reconstruction benchmark algorithms. © 2012 IEEE.
Rainer, B, Waltl, M, Cheng, E, Shujau, M, Timmerer, C, Davis, S, Burnett, I, Ritz, C & Hellwagner, H 2012, 'Investigating the impact of sensory effects on the Quality of Experience and emotional response in web videos', 2012 4th International Workshop on Quality of Multimedia Experience, QoMEX 2012, International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, Yarra Valley, VIC, Australia, pp. 278-283.View/Download from: UTS OPUS or Publisher's site
Multimedia is ubiquitously available online with large amounts of video increasingly consumed through Web sites such as YouTube or Google Video. However, online multimedia typically limits users to visual/auditory stimulus, with onscreen visual media accompanied by audio. The recent introduction of MPEG-V proposed multi-sensory user experiences in multimedia environments, such as enriching video content with so-called sensory effects like wind, vibration, light, etc. In MPEG-V, these sensory effects are represented as Sensory Effect Metadata (SEM), which is additionally associated to the multimedia content. This paper presents three user studies that utilize the sensory effects framework of MPEG-V, investigating the emotional response of users and enhancement of Quality of Experience (QoE) of Web video sequences from a range of genres with and without sensory effects. In particular, the user studies were conducted in Austria and Australia to investigate whether geography and cultural differences affect users' elicited emotional responses and QoE. © 2012 IEEE.
Cheng, E & Burnett, IS 2011, 'On the effect of AMR and AMR-WB GSM compression on overlapped speech for forensic analysis', 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Proceedings, IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Prague, Czech Republic, pp. 1872-1875.View/Download from: UTS OPUS or Publisher's site
The recent ubiquity of mobile telephony has posed the challenge of forensic speech analysis on compressed speech content. Whilst existing research studies have investigated the effect of mobile speech compression on speaker and speech parameters, this paper addresses the effect of speech compression on parameters when an interfering background speaker is present in clean and noisy conditions. Preliminary evaluations presented in this paper study the effect of the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) speech coders on the Linear Prediction (LP) speech spectrum, Line Spectral Frequencies (LSFs), and Mel Frequency Cepstral Coefficients (MFCCs). Results indicate that due caution should be employed for the forensic analysis of mobile telephony speech: speech coder parameters are significantly degraded when an interfering speaker or noise is present, compared to parameters obtained from the main speaker alone. Moreover, at high SNR the speech parameters exhibit values that gradually transition from those ideally and independently obtained from the main speaker to those of the background speaker as the amplitude of the background interfering speaker increases. © 2011 IEEE.
Cheng, E, Davis, S, Burnett, I & Ritz, C 2011, 'An ambient multimedia user experience feedback framework based on user tagging and EEG biosignals', Proceedings of the 4th International Workshop on Semantic Ambient Media Experience, SAME 2011, in Conjunction with the 5th International Convergence on Communities and Technologies, pp. 61-66.
Multimedia is increasingly accessed online and within social networks; however, users are typically limited to visual/auditory stimulus through media presented onscreen with accompanying audio over speakers. Whilst recent research studying additional ambient sensory multimedia effects recorded numerical scores of perceptual quality, the users' time-varying emotional response to the ambient sensory feedback is not considered. This paper thus introduces a framework to evaluate user ambient quality of multimedia experience and discover users' time-varying emotional responses through explicit user tagging and implicit EEG biosignal analysis. In the proposed framework, users interact with the media via discrete tagging activities whilst their EEG biosignal emotional feedback is continuously monitored in-between user tagging events with emotional states correlated with media content and tags. Copyright © (2011) by International Ambient Media Association (iAMEA).
Davis, S, Cheng, E, Burnett, I & Ritz, C 2011, 'Multimedia user feedback based on augmenting user tags with EEG emotional states', 2011 3rd International Workshop on Quality of Multimedia Experience, QoMEX 2011, International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, Mechelen, Belgium, pp. 143-148.View/Download from: UTS OPUS or Publisher's site
Efficient content-based access to large multimedia collections requires annotations that are human-meaningful, and user tagging of media is one means to obtain such semantic metadata. Tags can also act as user feedback essential for quality of multimedia experience assessment; however, tags can lack user context and become ambiguous between different users. Further, user tagging is a deliberate and discrete event where a user's response to the media can significantly vary in-between tagging events. This paper extends upon the authors' social multimedia adaptation framework to explore the use of EEG biosignals obtained from consumer EEG headsets to form context around explicit tagging activities and as user emotional feedback in-between user tagging events. Preliminary user studies investigating grouped participant responses indicate the most indicative emotional states to be short-term excitement, engagement and frustration in addition to gyroscope information. © 2011 IEEE.
Ling, L, Burnett, IS & Cheng, E 2011, 'A flexible markerless registration method for video augmented reality', MMSP 2011 - IEEE International Workshop on Multimedia Signal Processing, IEEE International Workshop on Multimedia Signal Processing, IEEE, Hangzhou, China.View/Download from: UTS OPUS or Publisher's site
This paper proposes a flexible, markerless registration method that addresses the problem of realistic virtual object placement at any position in a video sequence. The registration consists of two steps: four points are specified by the user to build the world coordinate system, where the virtual object is rendered. A self-calibration camera tracking algorithm is then proposed to recover the camera viewpoint frame-by-frame, such that the virtual object can be dynamically and correctly rendered according to camera movement. The proposed registration method needs no reference fiducials, knowledge of camera parameters or the user environment, where the virtual object can be placed in any environment even without any distinct features. Experimental evaluations demonstrate low errors for several camera motion rotations around the X and Y axes for the self-calibration algorithm. Finally, virtual object rendering applications in different user environments are evaluated. © 2011 IEEE.
Ling, L, Cheng, E & Burnett, IS 2011, 'Eight solutions of the essential matrix for continuous camera motion tracking in video augmented reality', Proceedings - 2011 IEEE International Conference on Multimedia and Expo (ICME), IEEE International Conference on Multimedia and Expo, IEEE, LaSalle, Ramon Llull University Barcelona, Spain.View/Download from: UTS OPUS or Publisher's site
This paper considers a self-calibration approach to the estimation of motion parameters for an unknown camera used for video-based augmented reality. Whilst existing systems derive four SVD solutions of the essential matrix, which encodes the epipolar geometry between two camera views, this paper presents eight possible solutions derived from mathematical computation and geometrical analysis. The eight solutions not only reflect the position and orientation of the camera in static displacement but also the dynamic, relative orientation between the camera and an object in continuous motion. This paper details a novel algorithm that introduces three geometric constraints to determine the rotation and translation matrix from the eight possible essential matrix solutions. An OpenGL camera motion simulator is used to demonstrate and evaluate the reliability of the proposed algorithms; this directly visualizes the abstract computer vision parameters into real 3D. © 2011 IEEE.
Cheng, E, Davis, S, Burnett, I & Ritz, C 2010, 'The Role of Experts in Social Media - Are the Tertiary Educated Engaged?', PROCEEDINGS OF THE 2010 IEEE INTERNATIONAL SYMPOSIUM ON TECHNOLOGY AND SOCIETY: SOCIAL IMPLICATIONS OF EMERGING TECHNOLOGIES, International Symposium on Technology-and-Society - Social Implications of Emerging Technologies, IEEE, Univ Wollongong, Wollongong, AUSTRALIA, pp. 205-212.
Davis, SJ, Cheng, EC, Burnett, IS & Ritz, CH 2010, 'Multimedia adaptation based on semantics from social network users interacting with media', 2010 2nd International Workshop on Quality of Multimedia Experience, QoMEX 2010 - Proceedings, pp. 170-175.View/Download from: Publisher's site
A key goal of adaptive multimedia delivery is to provide users with content that maximizes their quality of experience. To achieve this goal, adaptive multimedia systems require descriptions of the content and user preference information, moving beyond traditional criteria such as quality of service requirements or perceptual quality based on traditional metrics. Media is increasingly consumed within online social networks and multimedia sharing websites can also add a wealth of metadata. In this paper, mechanisms for gathering semantics that relate to user preferences when interacting with media content in social networks are proposed. Subjective results indicate the proposed mechanisms can successfully provide information about user and social group media preferences that can be used for adapting multimedia for improved user quality of experience. ©2010 IEEE.
Schiemer, G, Deleflie, E & Cheng, E 2010, 'Pocket Gamelan: Realizations of a Microtonal Composition on a Linux Phone Using Open Source Music Synthesis Software', CULTURAL COMPUTING, 2nd IFIP TC 14 Entertainment Computing Symposium, ECS 2010, SPRINGER-VERLAG BERLIN, Australian Comp Soc, Brisbane, AUSTRALIA, pp. 101-+.
Simpson, C-A & Cheng, E 2010, 'NOISE CANCELLATION: DISRUPTING AUDIO PERCEPTION', 2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), International Conference on Multimedia and Expo, IEEE, Singapore, SINGAPORE, pp. 1612-1617.View/Download from: Publisher's site
Smith, D, Cheng, E & Burnett, IS 2010, 'Musical onset detection using MPEG-7 audio descriptors', 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society, pp. 4036-4042.
An onset detection system that exploits MPEG-7 audio descriptors is proposed in this paper, with investigations into the feasibility of MPEG-7 based onset detection performed across a diverse database of music. Detection functions were developed from both individual MPEG-7 descriptors and combinations of descriptors (joint detection functions). The results indicated that individual descriptors could achieve respectable detection performance (maximum F-measure of 0.753) with basic waveform features. Average detection performance could be improved by up to 11.2%, however, when joint detection functions were comprised of diverse combinations of MPEG-7 descriptors. This may be attributed to the increased capability of detection functions, composed of different spectral and temporal features, in capturing the variation in onset characteristics from different musical styles. It is thus concluded that the proposed onset detection system could be plausibly integrated into an existing MPEG-7 audio analysis system with minimal computational overhead.
Cheng, E, Burnett, IS & Ritz, C 2009, 'The effect of microphone directivity patterns on spatial cues for reverberant multichannel meeting speech analysis', European Signal Processing Conference, pp. 2181-2185.
Multiparty meetings common to many business environments often have participants who are generally stationary. Hence, active speakers can be disambiguated by location, and meeting analysis research groups have proposed the use of speaker location information (spatial cues) for meeting segmentation and higher level analysis. As the cues are estimated from multi-microphone recordings, this paper studies the effect of varying microphone directivity patterns on the spatial cue accuracy and reliability. Results from theoretical simulations and recordings from a real reverberant environment suggest that different spatial cues (based on inter-microphone signal time delays or amplitude level differences) optimally respond to different microphone directivity patterns, where time delay accuracy was found to be independent of the relative microphone configuration. © EURASIP, 2009.
Cheng, E, Burnett, IS & Ritz, C 2007, 'Time Delay Estimation of Reverberant Meeting Speech: On the Use of Multichannel Linear Prediction', SITIS 2007: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGIES & INTERNET BASED SYSTEMS, IEEE International Conference on Signal Image Technology and Internet Based Systems, IEEE COMPUTER SOC, Shanghai, PEOPLES R CHINA, pp. 531-537.
Cheng, E, Burnett, IS & Ritz, CH 2008, 'Multivariate autoregressive modelling of multichannel reverberant speech', Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008, pp. 945-949.View/Download from: Publisher's site
Recent research in speech localization and dereverberation introduced processing of the multichannel linear prediction (LP) residual of speech recorded with multiple microphones. This paper investigates the novel use of intra- and inter-channel speech prediction by proposing the use of a multichannel LP model derived from multivariate autoregression (MVAR), where current LP approaches are based on univariate autoregression (AR). Experiments were conducted on simulated anechoic and reverberant synthetic speech vowels and real speech sentences; results show that, especially at low reverberation times, the MVAR model exhibits greater prediction gains from the residual signal, compared to residuals obtained from univariate AR models for individually or jointly modelled speech channels. In addition, the MVAR model more accurately models the speech signal when compared to univariate LP of a similar prediction order and when a smaller number of microphones are deployed. © 2008 IEEE.
Cheng, E, Cheng, B, Ritz, C & Burnett, IS 2008, 'Spatialized Teleconferencing: Recording and 'Squeezed' Rendering of Multiple Distributed Sites', ATNAC: 2008 AUSTRALASIAN TELECOMMUNICATION NETWOKS AND APPLICATIONS CONFERENCE, Australasian Networks and Applications Conference 2008, IEEE, Adelaide, AUSTRALIA, pp. 411-+.View/Download from: Publisher's site
Cheng, E, Burnett, I & Ritz, C 2007, 'Using spatial audio cues from speech excitation for meeting speech segmentation', International Conference on Signal Processing Proceedings, ICSP.View/Download from: Publisher's site
Multiparty meetings generally involve stationary participants. Participant location information can thus be used to segment the recorded meeting speech into each speaker's 'turn' for meeting 'browsing'. To represent speaker location information from speech, previous research showed that the most reliable time delay estimates are extracted from the Hubert envelope of the Linear Prediction residual signal. The authors' past work has proposed the use of spatial audio cues to represent speaker location information. This paper proposes extracting spatial audio cues from the Hubert envelope of the speech residual for indicating changing speaker location for meeting speech segmentation. Experiments conducted on recordings of a real acoustic environment show that spatial cues from the Hilbert envelope are more consistent across frequency subbands and can clearly distinguish between spatially distributed speakers, compared to spatial cues estimated from the recorded speech or residual signal. © 2006 IEEE.
As multiparty meetings involve participants that are generally stationary when actively speaking, participant location information can be used to segment the recorded meeting audio into speaker 'turns.' In this paper, speaker location information derived from 'spatial cues' generated by spatial audio coding techniques is investigated. The validity of using spatial cues for meeting audio segmentation is explored through investigating multiple microphone meeting audio recording techniques and different spatial audio coders. Experimental results show that the statistical relationship between speaker location and interchannel level and phase-based spatial cues strongly depends on the microphone pattern. Results also indicate that interchannel correlation-based spatial cues represent location information that is ambiguous for meeting audio segmentation.
Cheng, E, Burnett, I & Ritz, C 2006, 'Varying microphone patterns for meeting speech segmentation using spatial audio cues', ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2006, PROCEEDINGS, 7th Pacific Rim Conference on Multimedia, SPRINGER-VERLAG BERLIN, Zhejiang Univ, Hangzhou, PEOPLES R CHINA, pp. 221-+.
Cheng, E, Davis, S, Burnett, I & Lukasiak, J 2006, 'Efficient delivery of hierarchically structured meeting audio metadata with a bi-directional XML protocol', 2006 International Conference on Computing and Informatics, ICOCI '06.View/Download from: Publisher's site
This paper explores user-centered metadata delivery through the example of hierarchically organized meeting audio metadata. Audio annotations that describe meeting scenarios can vary from low-level signal-based descriptors to high-level semantics. Users of meeting metadata also have widely varying requirements and hence want metadata at varying levels and detail. Thus, for efficient metadata access, it is vital to provide customization or choice of the metadata to be delivered using e.g. regions of interest and annotation detail specification. As well as proposing a user-centered metadata organization strategy, this paper introduces the use of a bi-directional XML protocol for metadata delivery. The combination provides advantages in terms of bandwidth efficiency when an example meeting metadata browser application is examined with practical user interfaces. ©2006 IEEE.
Cheng, E, Lukasiak, J, Burnett, IS & Stirling, D 2005, 'Using spatial cues for meeting speech segmentation', 2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, IEEE International Conference on Multimedia and Expo (ICME), IEEE, Toronto, CANADA, pp. 350-353.View/Download from: Publisher's site
Lukasiak, J, McElroy, C & Cheng, E 2005, 'Compression transparent low-level description of audio signals', 2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, IEEE International Conference on Multimedia and Expo (ICME), IEEE, Amsterdam, NETHERLANDS, pp. 422-425.View/Download from: Publisher's site