UTS site search

Dr Richard Xu

Biography

Dr Richard Yi Da Xu is a Senior Lecturer and Deputy Head of School @ School of Computing and Communications.

He has been research active in computer vision, image processing and pattern recognition since 2002. You may find a list of video demos of his previous work at:

http://www-staff.it.uts.edu.au/~ydxu/research.htm

Recently, he focuses has been in the underlying machine learning algorithms, particularly in the filed of  Non-parametric Bayes, and its monte-carlo inference method. He has written a numerous tutorial papers in this area for PhD student training. You may find them at:

http://www-staff.it.uts.edu.au/~ydxu/statistics.htm

Dr Richard Xu is constantly seeking high quality PhD/Master research students whom has a passion in mathematical/statistical research. 

He also offers UTS approved consultancy for industries interested in statistical modeling of Big-Data.

Image of Richard Xu
Deputy Head of School, School of Computing and Communications
Associate Member, Advanced Analytics Institute
Core Member, Centre for Innovation in IT Services Applications
PhD
 
Phone
+61 2 9514 4587
Room
CB11.08.113

Research Interests

He has been research active in computer vision, image processing and pattern recognition since 2002. You may find a list of video demos of his previous work at:

http://www-staff.it.uts.edu.au/~ydxu/research.htm

Recently, he focuses has been in the underlying machine learning algorithms, particularly in the filed of  Non-parametric Bayes, and its monte-carlo inference method. He has written a numerous tutorial papers in this area for PhD student training. You may find them at:

http://www-staff.it.uts.edu.au/~ydxu/statistics.htm

Can supervise: Yes

  • Network Security
  • Operating Systems in Network Security
  • Image Processing and Pattern Recognition

Conference Papers

Bargi, A., Xu, Y. & Piccardi, M. 2012, 'An online HDP-HMM for joint action segmentation and classification in motion capture data', 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Providence RI, USA, June 2012 in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), ed Ava B, IEEE Computer Society, Providence RI, USA, pp. 1-7.
View/Download from: Publisher's site
Since its inception, action recognition research has mainly focused on recognizing actions from closed, predefined sets of classes. Conversely, the problem of recognizing actions from open, possibly incremental sets of classes is still largely unexplored. In this paper, we propose a novel online method based on the ++sticky+ hierarchical Dirichlet process and the hidden Markov model [11, 5]. This approach, labelled as the online HDP-HMM, provides joint segmentation and classification of actions while a) processing the data in an online, recursive manner, b) discovering new classes as they occur, and c) adjusting its parameters over the streaming data. In a set of experiments, we have applied the online HDP-HMM to recognize actions from motion capture data from the TUM kitchen dataset, a challenging dataset of manipulation actions in a kitchen [12]. The results show significant accuracy in action classification, time segmentation and determination of the number of action classes
Zare Borzeshi, E., Piccardi, M. & Xu, Y. 2011, 'A Discriminative Prototype Selection Approach for Graph Embedding in Human Action Recognition', IEEE International Conference on Computer Vision Workshops, Barcelona Spain, November 2011 in 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshop), ed M Piccardo, IEEE Computer Society, Barcelona Spain, pp. 1295-1301.
View/Download from: OPUS
This paper proposes a novel graph-based method for representing a human's shape during the performance of an action. Despite their strong representational power, graphs are computationally cumbersome for pattern analysis. One way of circumventing this problem is that of transforming the graphs into a vector space by means of graph embedding. Such an embedding can be conveniently obtained by way of a set of 'prototype' graphs and a dissimilarity measure: yet, the critical step in this approach is the selection of a suitable set of prototypes which can capture both the salient structure within each action class as well as the intra-class variation. This paper proposes a new discriminative approach for the selection of prototypes which maximizes a function of the inter- and intra-class distances. Experiments on an action recognition dataset reported in the paper show that such a discriminative approach outperforms well-established prototype selection methods such as center, border and random prototype selection.
Concha, O.P., Xu, Y., Piccardi, M. & Moghaddam, Z. 2011, 'HMM-MIO: An Enhanced Hidden Markov Model for Action Recognition', IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, Colorado Spring, CO, June 2011 in 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, ed M Piccardi, IEEE Computer Society, Piscataway, USA, pp. 62-69.
View/Download from: OPUS | Publisher's site
Generative models can be flexibly employed in a variety of tasks such as classification, detection and segmen- tation thanks to their explicit modelling of likelihood functions. However, likelihood functions are hard to model accurately in many real cases. In this paper, we present an enhanced hidden Markov model capable of dealing with the noisy, high-dimensional and sparse measurements typical of action feature sets. The modified model, named hid- den Markov model with multiple, independent observations (HMM-MIO), joins: a) robustness to observation outliers, b) dimensionality reduction, and c) processing of sparse observations. In the paper, a set of experimental results over the Weizmann and KTH datasets shows that this model can be tuned to achieve classification accuracy comparable to that of discriminative classifiers. While discriminative ap- proaches remain the natural choice for classification tasks, our results prove that likelihoods, too, can be modelled to a high level of accuracy. In the near future, we plan extension of HMM-MIO along the lines of infinite Markov models and its integration into a switching model for continuous human action recognition.
Zare Borzeshi, E., Xu, Y. & Piccardi, M. 2011, 'Automatic Human Action Recognition in Video by Graph Embedding', Image Analysis and Processing ++ ICIAP 2011, Ravenna, Italy, September 2011 in Lecture Notes in Computer Science.Image Analysis and Processing - ICIAP 2011.16th International Conference Part II, ed Giuseppe Maino, Gian Luca Foresti, Springer-Verlag, Springer Heidelberg Dordrecht London NewYork, pp. 19-28.
View/Download from: OPUS
The problem of human action recognition has received increasing attention in recent years for its importance in many applications. Yet, the main limitation of current approaches is that they do not capture well the spatial relationships in the subject performing the action. This paper presents an initial study which uses graphs to represent the actor++s shape and graph embedding to then convert the graph into a suitable feature vector. In this way, we can benefit from the wide range of statistical classifiers while retaining the strong representational power of graphs. The paper shows that, although the proposed method does not yet achieve accuracy comparable to that of the best existing approaches, the embedded graphs are capable of describing the deformable human shape and its evolution along the time. This confirms the interesting rationale of the approach and its potential for future performance.
Concha, O.P., Xu, Y. & Piccardi, M. 2010, 'Robust Dimensionality Reduction for Human Action Recognition', Digital Image Computing: Techniques and Applications, Sydney, Australia, December 2010 in Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), ed Jian Zhang, Chunhua Shen, Glenn Geers, Qiang Wu, IEEE Computer Society, Sydney, Australia, pp. 349-356.
View/Download from: OPUS | Publisher's site
Human action recognition can be approached by combining an action-discriminative feature set with a classifier. However, the dimensionality of typical feature sets joint with that of the time dimension often leads to a curse-of-dimensionality situation. Moreover, the measurement of the feature set is subject to sometime severe errors. This paper presents an approach to human action recognition based on robust dimensionality reduction. The observation probabilities of hidden Markov models (HMM) are modelled by mixtures of probabilistic principal components analyzers and mixtures of t-distribution sub-spaces, and compared with conventional Gaussian mixture models. Experimental results on two datasets show that dimensionality reduction helps improve the classification accuracy and that the heavier-tailed t-distribution can help reduce the impact of outliers generated by segmentation errors.
Concha, O.P., Xu, Y. & Piccardi, M. 2010, 'Compressive Sensing of Time Series for Human Action Recognition', Digital Image Computing: Techniques and Applications, Sydney, Australia, December 2010 in Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), ed Jian Zhang, Chunhua Shen, Glenn Geers, Qiang Wu, IEEE Computer Society, Sydney, Australia, pp. 454-461.
View/Download from: OPUS | Publisher's site
Compressive Sensing (CS) is an emerging signal processing technique where a sparse signal is reconstructed from a small set of random projections. In the recent literature, CS techniques have demonstrated promising results for signal compression and reconstruction [9, 8, 1]. However, their potential as dimensionality reduction techniques for time series has not been significantly explored to date. To this aim, this work investigates the suitability of compressive-sensed time series in an application of human action recognition. In the paper, results from several experiments are presented: (1) in a first set of experiments, the time series are transformed into the CS domain and fed into a hidden Markov model (HMM) for action recognition; (2) in a second set of experiments, the time series are explicitly reconstructed after CS compression and then used for recognition; (3) in the third set of experiments, the time series are compressed by a hybrid CS-Haar basis prior to input into HMM; (4) in the fourth set, the time series are reconstructed from the hybrid CS-Haar basis and used for recognition. We further compare these approaches with alternative techniques such as sub-sampling and filtering. Results from our experiments show unequivocally that the application of CS does not degrade the recognition accuracy; rather, it often increases it. This proves that CS can provide a desirable form of dimensionality reduction in pattern recognition over time series.
Xu, Y. & Jin, J. 2005, 'Latency insensitive task scheduling for real-time video processing and streaming', Belgium, September 2005 in Advanced Concepts For Intelligent Vision Systems, Proceedings, ed NA, Springer-Verlag Berlin, Berlin, Germany, pp. 387-394.
View/Download from: OPUS | Publisher's site
In recent times, computer vision and pattern recognition (CVPR) technologies made automatic feature extraction, events detection possible in real-time, on-the-fly video processing and streaming systems. However, these multiple and computational expensive
Allen, J.K., Xu, Y. & Jin, J. 2005, 'Mean shift object tracking for a SIMD computer', International Conference on Information Technology and Applications, Sydney, Australia, July 2005 in Proceedings of Third International Conference On Information Technology And Applications, Vol 1, ed He, X; Hintz, T; Piccardi, M; Wu, Q; Huang, M; Tien, D, IEEE, Los Alamitos, USA, pp. 692-697.
View/Download from: OPUS | Publisher's site
We use SIMD instructions to implement a popular video object-tracking algorithm in an attempt to achieve the best possible performance on the available hardware. We start with an implementation of the well-known mean shift algorithm with adaptive scale a
Xu, Y., Jin, J. & Allen, J.K. 2005, 'Stream-based interactive video language authoring using correlated audiovisual watermarking', International Conference on Information Technology and Applications, Sydney, Australia, July 2005 in Proceedings of Third International Conference On Information Technology And Applications, Vol 2, ed He, X; Hintz, T; Piccardi, M; Wu, Q; Huang, M; Tien, D, IEEE, Piscataway, USA, pp. 377-380.
View/Download from: OPUS | Publisher's site
We propose a novel framework that employs correlated digital video and audio watermarking where the watermarking sequence contains video interaction information and media features as a basis towards constructing a secure, self-contained, format independe
Xu, Y. & Jin, J. 2005, 'Scheduling latency insensitive computer vision tasks', IEEE International Symposium on Parallel and Distributed Processing with Applications, Nanjing, China, November 2005 in Parallel and Distributed Processing and Applications - Third International Symposium, ISPA 2005 - Lecture Notes In Computer Science, ed Pan, Y; Chen, D; Guo, M; Cao, JN; Dongarra, J, Springer, Berlin, Germany, pp. 1089-1100.
View/Download from: OPUS | Publisher's site
In recent times, there are increasing numbers of computer vision and pattern recognition (CVPR) technologies being applied to real time video processing using single processor PCs. However, these multiple computational expensive tasks are generating bott
Xu, Y., Allen, J.K. & Jin, J. 2004, 'Robust Mean-Shift Tracking with Extended Fast Colour Thresholding', International Symposium on Intelligent Multimedia, Video and Speech Processing, Hong Kong, China, October 2004 in Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, ed Lam,K; Yan, H., IEEE, Hong Kong, China, pp. 542-545.
View/Download from: OPUS | Publisher's site
We propose a novel adaptive approach for object tracking using fast colour thresholding and region merging. It proves to be an effective measure against large variations between consecutive frames during a mean-shift process. The approach retains mean-shift's property of efficiency and improves mean-shift's drawback of robustness. It can track non-rigid objects with significant occlusion

Journal Articles

Zare Borzeshi, E., Concha, O.P., Xu, Y. & Piccardi, M. 2013, 'Joint Action Segmentation and Classification by an Extended Hidden Markov Model', IEEE Signal Processing Letters, vol. 20, no. 12, pp. 1207-1210.
View/Download from: OPUS | Publisher's site
Hidden Markov models (HMMs) provide joint segmentation and classification of sequential data by efficient inference algorithms and have therefore been employed in fields as diverse as speech recognition, document processing, and genomics. However, conven
Xu, Y. & Kemp, M. 2010, 'Fitting Multiple Connected Ellipses To An Image Silhouette Hierarchically', IEEE Transactions On Image Processing, vol. 19, no. 7, pp. 1673-1682.
View/Download from: OPUS | Publisher's site
In this paper, we seek to fit a model, specified in terms of connected ellipses, to an image silhouette. Some algorithms that have attempted this problem are sensitive to initial guesses and also may converge to a wrong solution when they attempt to mini
Xu, Y. & Kemp, M. 2010, 'An Iterative Approach for Fitting Multiple Connected Ellipse Structure to Silhouette', Pattern Recognition Letters, vol. 31, no. 13, pp. 1860-1867.
View/Download from: OPUS | Publisher's site
In many image processing applications, the structures conveyed in the image contour can often be described by a set of connected ellipses. Previous fitting methods to align the connected ellipse structure with a contour, in general, lack a continuous solution space. In addition, the solution obtain often satisfies only a partial number of ellipses, leaving others with poor fits. In this paper, we address these two problems by presenting an iterative framework for fitting a 2D silhouettte contour to a pre-specified connected ellipses structure with a very coarse initial guess. Under the proposed framework, we first improve the initial guess by modelling the silhouette region as set of disconnected ellipses using mixture of Gaussian densities or the heuristics approaches. Then, an iterative method is applied in a similar fashion to the Iterative Closest Point (ICP) (Alshawa, 2007; Li and Griffiths, 2000; Besl and McKay, 1992) algorithm. Each iteration contains two parts: first part is to assighn all the contour points to the individual unconnected ellipses, which we refer to as the segmentation step and the second part is the non-linear least square approach that minimizes both the sum of the square distance between the countour points and ellipse's edge as well as minimizing the ellipse's vertex pair(s) distances, which we refer to as the minimization step. We illustrate the effectiveness of our menthods through experimental result on several images as well as applying the algorithm to a mini database of human upper-body images.
Xu, Y. & Jin, J.S. 2007, 'Camera Control and Multimedia Interaction using Individual Object Recognition', Journal of Multimedia, vol. 2, no. 3, pp. 77-85.
View/Download from: OPUS
Currently, most of the automated, computervision assisted camera control policies are based on human events, such as the speaker gesture and position changes. In addition to these events, in this paper, we introduce a set of natural camera control and multimedia synchronization schemes based on the individual object interaction. We describe in detail, how our unique method, in which the head-pose estimation are used to compute the region of interest (ROI) for recognizing the hand-held object. We explain, from our results, how our approach has achieved robustness, efficiency and unambiguous object interaction during real-time video shooting.
Xu, Y., Jin, J. & Allen, J.K. 2005, 'IVDA: Intelligent Real-time Video Detection Agent for Virtual Classroom Presentation', Advanced Technology for Learning, vol. 2, no. 2, pp. 77-86.
View/Download from: OPUS
Audiovisual streaming has been extensively used in synchronous virtual classroom applications. Until recently, content-based processing has rarely been used in real-time streaming. We present, in this paper, an intelligent system that uses state-of-the-art video processing and computer vision technologies that can automatically respond to various video events de?ned by a set of preprogrammed rules. This intelligent system performs object acquisition, automatic video editing, and student multimedia presentation synchronization that can leverage both the capabilities and e?ciencies in multimedia streaming for a real-time synchronous virtual classroom. We present detailed discussions of the four major advantages of the system, namely, inexpensive hardware, automation, environment adaptabilities as well as natural teaching ?ow. We describe the system in detail, illustrating the main cutting-edge video-processing algorithms being incorporated as well as our own research findings in an effort to enhance performance over the existing algorithms used in virtual classrooms. We also show the implementation of the current prototype system as well as explore its potential in future e-learning applications.