UTS site search

Dr Richard Xu


Dr Richard Yi Da Xu is an academic working at Global Big-Data Technologies Centre and School of Computing and Communications.

He has been a FIRST LINE researcher in machine learning, data analytics and computer vision since 2002; He sincerely believes that no matter the stage of one's academic career, it is important to keep on conduct first line research, at least some portion of it, i.e., he believes research knowledge is far more indicative of one's ability than his/her publication list.

For this reason, not only Dr Xu writes his own research code, he also publishes a series of Statistics, Probability and Machine Learning course for PhD students around the world. It's constantly being updated. He enjoys sharing his knowledge with other researchers and industry practitioners.

The notes (in English) are found at:


The video links (in Mandarin currently, with an English version on its way):


If you are a keen research student whom truly enjoy mathematics and programming, please consider to join Dr Xu's wonderful team!
Image of Richard Xu
Deputy Head of School, School of Computing and Communications
Core Member, INEXT - Innovation in IT Services and Applications
Core Member, Global Big Data Technologies
Associate Member, AAI - Advanced Analytics Institute ship
+61 2 9514 4587

Research Interests

Machine Learning, Data Analytics and Computer Vision
Can supervise: Yes


Li, M., Xu, Y.D. & He, X.J. 2015, 'Face hallucination based on Nonparametric Bayesian learning', Proceedings of IEEE International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Quebec City, Canada, pp. 986-990.
View/Download from: UTS OPUS or Publisher's site
In this paper, we propose a novel example-based face hallucination method through nonparametric Bayesian learning based on the assumption that human faces have similar local pixel structure. We cluster the low resolution (LR) face image patches by nonparametric method distance dependent Chinese Restaurant process (ddCRP) and calculate the centres of the clusters (i.e., subspaces). Then, we learn the mapping coefficients from the LR patches to high resolution (HR) patches in each subspace. Finally, the HR patches of input low resolution face image can be efficiently generated by a simple linear regression. The spatial distance constraint is employed to aid the learning of subspace centers so that every subspace will better reflect the detailed information of image patches. Experimental results show our method is efficient and promising for face hallucination.
Bargi, A., Xu, R.Y.D. & Piccardi, M. 2014, 'An Infinite Adaptive Online Learning Model for Segmentation and Classification of Streaming Data', 22nd International Conference on Pattern Recognition (ICPR), 2014, IEEE Computer Society, Stockholm, Sweden.
View/Download from: UTS OPUS or Publisher's site
In recent years, the desire and need to understand streaming data has been increasing. Along with the constant flow of data, it is critical to classify and segment the observations on-the-fly without being limited to a rigid number of classes. In other words, the system needs to be adaptive to the streaming data and capable of updating its parameters to comply with natural changes. This interesting problem, however, is poorly addressed in the literature, as many of the common studies focus on offline classification over a pre-defined class set. In this paper, we propose a novel adaptive online system based on Markov switching models with hierarchical Dirichlet process priors. This infinite adaptive online approach is capable of segmenting and classifying the streaming data over infinite classes, while meeting the memory and delay constraints of streaming contexts. The model is further enhanced by a 'predictive batching' mechanism, that is able to divide the flowing data into batches of variable size, imitating the ground-truth segments. Experiments on two video datasets show significant performance of the proposed approach in frame-level accuracy, segmentation recall and precision, while determining the accurate number of classes in acceptable computational time.
Bargi, A., Xu, R.Y.D., Ghahramani, Z. & Piccardi, M. 2014, 'A Non-parametric Conditional Factor Regression Model for Multi-Dimensional Input and Response', Seventeenth International Conference on Artificial Intelligence and Statistics, 2014, JMLR, Reykjavik, Iceland, pp. 77-85.
View/Download from: UTS OPUS
Bargi, A., Xu, R. & Piccardi, M. 2012, 'An online HDP-HMM for joint action segmentation and classification in motion capture data', 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Computer Society, Providence RI, USA, pp. 1-7.
View/Download from: UTS OPUS or Publisher's site
Since its inception, action recognition research has mainly focused on recognizing actions from closed, predefined sets of classes. Conversely, the problem of recognizing actions from open, possibly incremental sets of classes is still largely unexplored. In this paper, we propose a novel online method based on the âstickyâ hierarchical Dirichlet process and the hidden Markov model [11, 5]. This approach, labelled as the online HDP-HMM, provides joint segmentation and classification of actions while a) processing the data in an online, recursive manner, b) discovering new classes as they occur, and c) adjusting its parameters over the streaming data. In a set of experiments, we have applied the online HDP-HMM to recognize actions from motion capture data from the TUM kitchen dataset, a challenging dataset of manipulation actions in a kitchen [12]. The results show significant accuracy in action classification, time segmentation and determination of the number of action classes
Zare Borzeshi, E., Piccardi, M. & Xu, R. 2011, 'A Discriminative Prototype Selection Approach for Graph Embedding in Human Action Recognition', 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshop), IEEE International Conference on Computer Vision Workshops, IEEE Computer Society, Barcelona Spain, pp. 1295-1301.
View/Download from: UTS OPUS
This paper proposes a novel graph-based method for representing a human's shape during the performance of an action. Despite their strong representational power, graphs are computationally cumbersome for pattern analysis. One way of circumventing this problem is that of transforming the graphs into a vector space by means of graph embedding. Such an embedding can be conveniently obtained by way of a set of 'prototype' graphs and a dissimilarity measure: yet, the critical step in this approach is the selection of a suitable set of prototypes which can capture both the salient structure within each action class as well as the intra-class variation. This paper proposes a new discriminative approach for the selection of prototypes which maximizes a function of the inter- and intra-class distances. Experiments on an action recognition dataset reported in the paper show that such a discriminative approach outperforms well-established prototype selection methods such as center, border and random prototype selection.
Concha, O.P., Xu, R., Piccardi, M. & Moghaddam, Z. 2011, 'HMM-MIO: An Enhanced Hidden Markov Model for Action Recognition', 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, IEEE Computer Society, Colorado Spring, CO, pp. 62-69.
View/Download from: UTS OPUS or Publisher's site
Generative models can be flexibly employed in a variety of tasks such as classification, detection and segmen- tation thanks to their explicit modelling of likelihood functions. However, likelihood functions are hard to model accurately in many real cases. In this paper, we present an enhanced hidden Markov model capable of dealing with the noisy, high-dimensional and sparse measurements typical of action feature sets. The modified model, named hid- den Markov model with multiple, independent observations (HMM-MIO), joins: a) robustness to observation outliers, b) dimensionality reduction, and c) processing of sparse observations. In the paper, a set of experimental results over the Weizmann and KTH datasets shows that this model can be tuned to achieve classification accuracy comparable to that of discriminative classifiers. While discriminative ap- proaches remain the natural choice for classification tasks, our results prove that likelihoods, too, can be modelled to a high level of accuracy. In the near future, we plan extension of HMM-MIO along the lines of infinite Markov models and its integration into a switching model for continuous human action recognition.
Zare Borzeshi, E., Xu, R. & Piccardi, M. 2011, 'Automatic Human Action Recognition in Video by Graph Embedding', Lecture Notes in Computer Science.Image Analysis and Processing - ICIAP 2011.16th International Conference Part II, Image Analysis and Processing â ICIAP 2011, Springer-Verlag, Ravenna, Italy, pp. 19-28.
View/Download from: UTS OPUS
The problem of human action recognition has received increasing attention in recent years for its importance in many applications. Yet, the main limitation of current approaches is that they do not capture well the spatial relationships in the subject performing the action. This paper presents an initial study which uses graphs to represent the actorâs shape and graph embedding to then convert the graph into a suitable feature vector. In this way, we can benefit from the wide range of statistical classifiers while retaining the strong representational power of graphs. The paper shows that, although the proposed method does not yet achieve accuracy comparable to that of the best existing approaches, the embedded graphs are capable of describing the deformable human shape and its evolution along the time. This confirms the interesting rationale of the approach and its potential for future performance.
Concha, O.P., Xu, R. & Piccardi, M. 2010, 'Robust Dimensionality Reduction for Human Action Recognition', Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), Digital Image Computing: Techniques and Applications, IEEE Computer Society, Sydney, Australia, pp. 349-356.
View/Download from: UTS OPUS or Publisher's site
Human action recognition can be approached by combining an action-discriminative feature set with a classifier. However, the dimensionality of typical feature sets joint with that of the time dimension often leads to a curse-of-dimensionality situation. Moreover, the measurement of the feature set is subject to sometime severe errors. This paper presents an approach to human action recognition based on robust dimensionality reduction. The observation probabilities of hidden Markov models (HMM) are modelled by mixtures of probabilistic principal components analyzers and mixtures of t-distribution sub-spaces, and compared with conventional Gaussian mixture models. Experimental results on two datasets show that dimensionality reduction helps improve the classification accuracy and that the heavier-tailed t-distribution can help reduce the impact of outliers generated by segmentation errors.
Concha, O.P., Xu, R. & Piccardi, M. 2010, 'Compressive Sensing of Time Series for Human Action Recognition', Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), Digital Image Computing: Techniques and Applications, IEEE Computer Society, Sydney, Australia, pp. 454-461.
View/Download from: UTS OPUS or Publisher's site
Compressive Sensing (CS) is an emerging signal processing technique where a sparse signal is reconstructed from a small set of random projections. In the recent literature, CS techniques have demonstrated promising results for signal compression and reconstruction [9, 8, 1]. However, their potential as dimensionality reduction techniques for time series has not been significantly explored to date. To this aim, this work investigates the suitability of compressive-sensed time series in an application of human action recognition. In the paper, results from several experiments are presented: (1) in a first set of experiments, the time series are transformed into the CS domain and fed into a hidden Markov model (HMM) for action recognition; (2) in a second set of experiments, the time series are explicitly reconstructed after CS compression and then used for recognition; (3) in the third set of experiments, the time series are compressed by a hybrid CS-Haar basis prior to input into HMM; (4) in the fourth set, the time series are reconstructed from the hybrid CS-Haar basis and used for recognition. We further compare these approaches with alternative techniques such as sub-sampling and filtering. Results from our experiments show unequivocally that the application of CS does not degrade the recognition accuracy; rather, it often increases it. This proves that CS can provide a desirable form of dimensionality reduction in pattern recognition over time series.
Benter, A., Xu, R., Moore, W., Antolovich, M. & Gao, J. 2009, 'Fragment size detection within homogeneous material using ground penetrating radar', 2009 International Radar Conference "Surveillance for a Safer World", RADAR 2009.
Ground Penetrating Radar (GPR) offers the ability to observe the internal structure of a pile of rocks. Large fragments within the pile may not be visible on the surface. Determining these large fragment sizes before collection can improve mine productivity. This research has examined the potential to identify objects where the background media and the object exhibit the same dielectric properties. Preliminary results are presented which show identification is possible using standard GPR equipment.
Xu, R.Y.D. & Kemp, M. 2009, 'Multiple curvature based approach to human upper body parts detection with connected ellipse model fine-tuning', Proceedings - International Conference on Image Processing, ICIP, pp. 2577-2580.
View/Download from: Publisher's site
In this paper, we discuss an effective method for detecting human upper body parts from a 2D image silhouette using curvature analysis and ellipse fitting. First we smooth the silhouette so that we can determine just the global features: the head, hands and armpits. Next we reduce the smoothing to detect the local features of the neck and elbows. We model the human upper body by multiple connected ellipses. Thus we segment the body by the extracted features. Ellipses are fitted to each segment. Lastly, we apply a non-linear least square method to minimize the differences between the connected ellipse model and the edge of the silhouette. ©2009 IEEE.
Xu, R.Y.D., Brown, J.M., Traish, J.M. & Dezwa, D. 2008, 'A computer vision based camera pedestal's vertical motion control', Proceedings - International Conference on Pattern Recognition.
Traditional camera pedestals are manually operated. Our long term goal is to construct a fully autonomous pedestal system which can respond to changes in a scene and mimicking the human camera operator. In this paper, we discuss our experiments to control the vertical motion of a pedestal by leveling its position with a human head or a tracked hand-held object. We describe a set of computer vision methods used in these experiments, including the head position tracking using Gaussian Mixture Model (GMM) of the foreground blob and hand-held object tracking using Continuously Adaptive Mean shift (CAM-shift) with motion initialization. We also discuss the application of Kalman Filter and showing its effect in the reduction of the number of jittering pedestal motions. © 2008 IEEE.
Xu, R.Y.D. 2008, 'A computer vision based whiteboard capture system', 2008 IEEE Workshop on Applications of Computer Vision, WACV.
View/Download from: Publisher's site
Conventional whiteboard video capture using a static camera usually results in a poor quality. In this paper, we present an autonomous whiteboard scan and capture prototype system, which consist a pair of static and Pan-Tilt-Zoom (PTZ) cameras. The PTZ camera is used to scan the newly-updated whiteboard regions without interrupting the instructor. We will illustrate several computer vision techniques used in our system: Firstly, we present our unique camera calibration method using rough hand-drawn gridlines. Secondly, we present the image processing methods used to determine where the newly updated whiteboard region to be scanned is. Our method also accounts for the whiteboard region occlusion from the instructor.
Gao, J. & Xu, R.Y. 2007, 'Mixture of the robust L1 distributions and its applications', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 26-35.
Recently a robust probabilistic L1-PCA model was introduced in [1] by replacing the conventional Gaussian noise model with the Laplacian L1 model. Due to the heavy tail characteristics of the L1 distribution, the proposed model is more robust against data outliers. In this paper, we generalized the L1-PCA into a mixture of L1-distributions so that the model can be used for possible multiclustering data. For the model learning we use the property that the L1 density can be expanded as a superposition of infinite number of Gaussian densities to include a tractable Bayesian learning and inference based on the variational EM-type algorithm. © Springer-Verlag Berlin Heidelberg 2007.
Xu, R.Y.D. & Jin, J.S. 2006, 'Individual object interaction for camera control and multimedia synchronization', ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. V481-V484.
In recent times, most of the computer-vision assisted automatic camera control policies are based on human events, such as speaker position changes. In addition to these events, in this paper, we introduce a set of natural camera control and multimedia synchronization schemes based on individual object interaction. We present our methods in detail, including head-pose calculation and laser pointer guidance, which are used to estimate the region of interest (ROI) for both hand-held and object-at-distance. We explain, from our results, of how these set of approaches have achieved robustness, efficiency and unambiguous object interaction during real-time video shooting. © 2006 IEEE.

Journal articles

Peng, F., Lu, J., Wang, Y., Xu, R.Y.D., Ma, C. & Yang, J. 2016, 'N-dimensional Markov random field prior for cold-start recommendation', Neurocomputing.
View/Download from: Publisher's site
© 2016 Elsevier B.V. A recommender system is a commonly used technique to improve user experience in e-commerce applications. One of the popular recommender methods is Matrix Factorization (MF) that learns the latent profile of both users and items. However, if the historical ratings are not available, the latent profile will draw from a zero-mean Gaussian prior, resulting in uninformative recommendations. To deal with this issue, we propose using an n-dimensional Markov random field as the prior of matrix factorization (called mrf-MF). In the Markov random field, the attribute (such as age, occupation of users and genre, release year of items) is considered as the site and the latent profile, the random variable. Through the prior, new users or items will be recommended according to its neighbors. The proposed model is suitable for three types of cold-start recommendation: (1) recommend new items to existing users; (2) recommend new users for existing items; (3) recommend new items to new users. The proposed model is assessed on two movie datasets, Movielens 100K and Movielens 1M. Experimental results show that it can effectively solve each of the three cold-start problems and outperforms several matrix factorization based methods.
Kemp, M. & Xu, R.Y.D. 2015, 'Geometrically-constrained balloon fitting for multiple connected ellipses', Pattern Recognition, vol. 48, no. 7, pp. 2198-2208.
View/Download from: Publisher's site
Copyright © 2015 Published by Elsevier Ltd. All rights reserved. This paper presents a framework to fit data to a model consisting of multiple connected ellipses. For each iteration of the fitting algorithm, the representation of the multiple ellipses is mapped to a Gaussian mixture model (GMM) and the connections are mapped to geometric constraints for the GMM. The fitting is a modified constrained expectation maximisation (EM) method on the GMM (maximising with respect to the ellipse parameters rather than Gaussian parameters). A key modification is that the precision of the chosen GMM is increased at each iteration. This is similar to slowly inflating a bunch of connected balloons and so this is called balloon fitting. Extensions of the framework to other constraints and possible pre-processing are also discussed. The superiority of balloon fitting is demonstrated through experiments on several silhouettes with noisy edges which compare other existing methods with balloon fitting and some of the extensions. Crown
Qiao, M., Bian, W., Xu, R.Y.D. & Tao, D. 2015, 'Diversified Hidden Markov Models for Sequential Labeling', IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 11, pp. 2947-2960.
View/Download from: Publisher's site
© 2015 IEEE. Labeling of sequential data is a prevalent meta-problem for a wide range of real world applications. While the first-order Hidden Markov Models (HMM) provides a fundamental approach for unsupervised sequential labeling, the basic model does not show satisfying performance when it is directly applied to real world problems, such as part-of-speech tagging (PoS tagging) and optical character recognition (OCR). Aiming at improving performance, important extensions of HMM have been proposed in the literatures. One of the common key features in these extensions is the incorporation of proper prior information. In this paper, we propose a new extension of HMM, termed diversified Hidden Markov Models (dHMM), which utilizes a diversity-encouraging prior over the state-transition probabilities and thus facilitates more dynamic sequential labellings. Specifically, the diversity is modeled by a continuous determinantal point process prior, which we apply to both unsupervised and supervised scenarios. Learning and inference algorithms for dHMM are derived. Empirical evaluations on benchmark datasets for unsupervised PoS tagging and supervised OCR confirmed the effectiveness of dHMM, with competitive performance to the state-of-the-art.
Fan, X., Cao, L. & Xu, R.Y.D. 2014, 'Dynamic Infinite Mixed-Membership Stochastic Blockmodel', IEEE Transactions on Neural Networks and Learning Systems.
View/Download from: Publisher's site
Directional and pairwise measurements are often used to model interactions in a social network setting. The mixed-membership stochastic blockmodel (MMSB) was a seminal work in this area, and its ability has been extended. However, models such as MMSB face particular challenges in modeling dynamic networks, for example, with the unknown number of communities. Accordingly, this paper proposes a dynamic infinite mixed-membership stochastic blockmodel, a generalized framework that extends the existing work to potentially infinite communities inside a network in dynamic settings (i.e., networks are observed over time). Additional model parameters are introduced to reflect the degree of persistence among one's memberships at consecutive time stamps. Under this framework, two specific models, namely mixture time variant and mixture time invariant models, are proposed to depict two different time correlation structures. Two effective posterior sampling strategies and their results are presented, respectively, using synthetic and real-world data.
Zare Borzeshi, E., Concha, O.P., Xu, R. & Piccardi, M. 2013, 'Joint Action Segmentation and Classification by an Extended Hidden Markov Model', IEEE Signal Processing Letters, vol. 20, no. 12, pp. 1207-1210.
View/Download from: UTS OPUS or Publisher's site
Hidden Markov models (HMMs) provide joint segmentation and classification of sequential data by efficient inference algorithms and have therefore been employed in fields as diverse as speech recognition, document processing, and genomics. However, conven
Xu, R. & Kemp, M. 2010, 'Fitting Multiple Connected Ellipses To An Image Silhouette Hierarchically', IEEE Transactions On Image Processing, vol. 19, no. 7, pp. 1673-1682.
View/Download from: UTS OPUS or Publisher's site
In this paper, we seek to fit a model, specified in terms of connected ellipses, to an image silhouette. Some algorithms that have attempted this problem are sensitive to initial guesses and also may converge to a wrong solution when they attempt to mini
Xu, R. & Kemp, M. 2010, 'An Iterative Approach for Fitting Multiple Connected Ellipse Structure to Silhouette', Pattern Recognition Letters, vol. 31, no. 13, pp. 1860-1867.
View/Download from: UTS OPUS or Publisher's site
In many image processing applications, the structures conveyed in the image contour can often be described by a set of connected ellipses. Previous fitting methods to align the connected ellipse structure with a contour, in general, lack a continuous solution space. In addition, the solution obtain often satisfies only a partial number of ellipses, leaving others with poor fits. In this paper, we address these two problems by presenting an iterative framework for fitting a 2D silhouettte contour to a pre-specified connected ellipses structure with a very coarse initial guess. Under the proposed framework, we first improve the initial guess by modelling the silhouette region as set of disconnected ellipses using mixture of Gaussian densities or the heuristics approaches. Then, an iterative method is applied in a similar fashion to the Iterative Closest Point (ICP) (Alshawa, 2007; Li and Griffiths, 2000; Besl and McKay, 1992) algorithm. Each iteration contains two parts: first part is to assighn all the contour points to the individual unconnected ellipses, which we refer to as the segmentation step and the second part is the non-linear least square approach that minimizes both the sum of the square distance between the countour points and ellipse's edge as well as minimizing the ellipse's vertex pair(s) distances, which we refer to as the minimization step. We illustrate the effectiveness of our menthods through experimental result on several images as well as applying the algorithm to a mini database of human upper-body images.