UTS site search

Dr Richard Xu

Biography

Dr Richard Yi Da Xu is a Senior Lecturer and Deputy Head of School @ School of Computing and Communications.

He has been research active in computer vision, image processing and pattern recognition since 2002. You may find a list of video demos of his previous work at:

http://www-staff.it.uts.edu.au/~ydxu/research.htm

Recently, he focuses has been in the underlying machine learning algorithms, particularly in the filed of  Non-parametric Bayes, and its monte-carlo inference method. He has written a numerous tutorial papers in this area for PhD student training. You may find them at:

http://www-staff.it.uts.edu.au/~ydxu/statistics.htm

Dr Richard Xu is constantly seeking high quality PhD/Master research students whom has a passion in mathematical/statistical research. 

He also offers UTS approved consultancy for industries interested in statistical modeling of Big-Data.

Image of Richard Xu
Deputy Head of School, School of Computing and Communications
Associate Member, Advanced Analytics Institute
Core Member, Centre for Innovation in IT Services Applications
Core Member, Global Big Data Technologies Centre
PhD
 
Phone
+61 2 9514 4587

Research Interests

He has been research active in computer vision, image processing and pattern recognition since 2002. You may find a list of video demos of his previous work at:

http://www-staff.it.uts.edu.au/~ydxu/research.htm

Recently, he focuses has been in the underlying machine learning algorithms, particularly in the filed of  Non-parametric Bayes, and its monte-carlo inference method. He has written a numerous tutorial papers in this area for PhD student training. You may find them at:

http://www-staff.it.uts.edu.au/~ydxu/statistics.htm

Can supervise: Yes

  • Network Security
  • Operating Systems in Network Security
  • Image Processing and Pattern Recognition

Conferences

Zhang, F., Li, J., Li, F., Xu, M., Xu, R. & He, X. 2015, 'Community detection based on links and node features in social networks', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 418-429.
View/Download from: UTS OPUS
© Springer International Publishing Switzerland 2015. Community detection is a significant but challenging task in the field of social network analysis. Many effective methods have been proposed to solve this problem. However, most of them are mainly based on the topological structure or node attributes. In this paper, based on SPAEM [1], we propose a joint probabilistic model to detect community which combines node attributes and topological structure. In our model, we create a novel feature-based weighted network, within which each edge weight is represented by the node feature similarity between two nodes at the end of the edge. Then we fuse the original network and the created network with a parameter and employ expectation-maximization algorithm (EM) to identify a community. Experiments on a diverse set of data, collected from Facebook and Twitter, demonstrate that our algorithm has achieved promising results compared with other algorithms.
Bargi, A., Da Xu, R.Y. & Piccardi, M. 2014, 'An infinite adaptive online learning model for segmentation and classification of streaming data', Proceedings - International Conference on Pattern Recognition, pp. 3440-3445.
View/Download from: UTS OPUS or Publisher's site
© 2014 IEEE. In recent years, the desire and need to understand streaming data has been increasing. Along with the constant flow of data, it is critical to classify and segment the observations on-the-fly without being limited to a rigid number of classes. In other words, the system needs to be adaptive to the streaming data and capable of updating its parameters to comply with natural changes. This interesting problem, however, is poorly addressed in the literature, as many of the common studies focus on offline classification over a pre-defined class set. In this paper, we propose a novel adaptive online system based on Markov switching models with hierarchical Dirichlet process priors. This infinite adaptive online approach is capable of segmenting and classifying the streaming data over infinite classes, while meeting the memory and delay constraints of streaming contexts. The model is further enhanced by a 'predictive batching' mechanism, that is able to divide the flowing data into batches of variable size, imitating the ground-truth segments. Experiments on two video datasets show significant performance of the proposed approach in frame-level accuracy, segmentation recall and precision, while determining the accurate number of classes in acceptable computational time.
Bargi, A., Xu, R.Y.D., Ghahramani, Z. & Piccardi, M. 2014, 'A Non-parametric Conditional Factor Regression Model for Multi-Dimensional Input and Response', Seventeenth International Conference on Artificial Intelligence and Statistics, 2014, JMLR, Reykjavik, Iceland, pp. 77-85.
View/Download from: UTS OPUS
Bargi, A., Da Xu, R.Y. & Piccardi, M. 2012, 'An online HDP-HMM for joint action segmentation and classification in motion capture data', IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1-7.
View/Download from: UTS OPUS or Publisher's site
Since its inception, action recognition research has mainly focused on recognizing actions from closed, predefined sets of classes. Conversely, the problem of recognizing actions from open, possibly incremental sets of classes is still largely unexplored. In this paper, we propose a novel online method based on the "sticky" hierarchical Dirichlet process and the hidden Markov model [11, 5]. This approach, labelled as the online HDP-HMM, provides joint segmentation and classification of actions while a) processing the data in an online, recursive manner, b) discovering new classes as they occur, and c) adjusting its parameters over the streaming data. In a set of experiments, we have applied the online HDP-HMM to recognize actions from motion capture data from the TUM kitchen dataset, a challenging dataset of manipulation actions in a kitchen [12]. The results show significant accuracy in action classification, time segmentation and determination of the number of action classes. © 2012 IEEE.
Zare Borzeshi, E., Piccardi, M. & Xu, R. 2011, 'A Discriminative Prototype Selection Approach for Graph Embedding in Human Action Recognition', 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshop), IEEE International Conference on Computer Vision Workshops, IEEE Computer Society, Barcelona Spain, pp. 1295-1301.
View/Download from: UTS OPUS
This paper proposes a novel graph-based method for representing a human's shape during the performance of an action. Despite their strong representational power, graphs are computationally cumbersome for pattern analysis. One way of circumventing this problem is that of transforming the graphs into a vector space by means of graph embedding. Such an embedding can be conveniently obtained by way of a set of 'prototype'? graphs and a dissimilarity measure: yet, the critical step in this approach is the selection of a suitable set of prototypes which can capture both the salient structure within each action class as well as the intra-class variation. This paper proposes a new discriminative approach for the selection of prototypes which maximizes a function of the inter- and intra-class distances. Experiments on an action recognition dataset reported in the paper show that such a discriminative approach outperforms well-established prototype selection methods such as center, border and random prototype selection.
Concha, O.P., Da Xu, R.Y., Moghaddam, Z. & Piccardi, M. 2011, 'HMM-MIO: An enhanced hidden Markov model for action recognition', IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
View/Download from: UTS OPUS or Publisher's site
Generative models can be flexibly employed in a variety of tasks such as classification, detection and segmentation thanks to their explicit modelling of likelihood functions. However, likelihood functions are hard to model accurately in many real cases. In this paper, we present an enhanced hidden Markov model capable of dealing with the noisy, high-dimensional and sparse measurements typical of action feature sets. The modified model, named hidden Markov model with multiple, independent observations (HMM-MIO), joins: a) robustness to observation outliers, b) dimensionality reduction, and c) processing of sparse observations. In the paper, a set of experimental results over the Weizmann and KTH datasets shows that this model can be tuned to achieve classification accuracy comparable to that of discriminative classifiers. While discriminative approaches remain the natural choice for classification tasks, our results prove that likelihoods, too, can be modelled to a high level of accuracy. In the near future, we plan extension of HMM-MIO along the lines of infinite Markov models and its integration into a switching model for continuous human action recognition. © 2011 IEEE.
Borzeshi, E.Z., Xu, R. & Piccardi, M. 2011, 'Automatic human action recognition in videos by graph embedding', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 19-28.
View/Download from: UTS OPUS or Publisher's site
The problem of human action recognition has received increasing attention in recent years for its importance in many applications. Yet, the main limitation of current approaches is that they do not capture well the spatial relationships in the subject performing the action. This paper presents an initial study which uses graphs to represent the actor's shape and graph embedding to then convert the graph into a suitable feature vector. In this way, we can benefit from the wide range of statistical classifiers while retaining the strong representational power of graphs. The paper shows that, although the proposed method does not yet achieve accuracy comparable to that of the best existing approaches, the embedded graphs are capable of describing the deformable human shape and its evolution along the time. This confirms the interesting rationale of the approach and its potential for future performance. © 2011 Springer-Verlag.
Concha, O.P., Da Xu, R.Y. & Piccardi, M. 2010, 'Robust dimensionality reduction for human action recognition', Proceedings - 2010 Digital Image Computing: Techniques and Applications, DICTA 2010, pp. 349-356.
View/Download from: UTS OPUS or Publisher's site
Human action recognition can be approached by combining an action-discriminative feature set with a classifier. However, the dimensionality of typical feature sets joint with that of the time dimension often leads to a curse-of-dimensionality situation. Moreover, the measurement of the feature set is subject to sometime severe errors. This paper presents an approach to human action recognition based on robust dimensionality reduction. The observation probabilities of hidden Markov models (HMM) are modelled by mixtures of probabilistic principal components analyzers and mixtures of t-distribution sub-spaces, and compared with conventional Gaussian mixture models. Experimental results on two datasets show that dimensionality reduction helps improve the classification accuracy and that the heavier-tailed t-distribution can help reduce the impact of outliers generated by segmentation errors. © 2010 Crown Copyright.
Concha, O.P., Da Xu, R.Y. & Piccardi, M. 2010, 'Compressive Sensing of time series for human action recognition', Proceedings - 2010 Digital Image Computing: Techniques and Applications, DICTA 2010, pp. 454-461.
View/Download from: UTS OPUS or Publisher's site
Compressive Sensing (CS) is an emerging signal processing technique where a sparse signal is reconstructed from a small set of random projections. In the recent literature, CS techniques have demonstrated promising results for signal compression and reconstruction [9, 8, 1]. However, their potential as dimensionality reduction techniques for time series has not been significantly explored to date. To this aim, this work investigates the suitability of compressive-sensed time series in an application of human action recognition. In the paper, results from several experiments are presented: (1) in a first set of experiments, the time series are transformed into the CS domain and fed into a hidden Markov model (HMM) for action recognition; (2) in a second set of experiments, the time series are explicitly reconstructed after CS compression and then used for recognition; (3) in the third set of experiments, the time series are compressed by a hybrid CS-Haar basis prior to input into HMM; (4) in the fourth set, the time series are reconstructed from the hybrid CS-Haar basis and used for recognition. We further compare these approaches with alternative techniques such as sub-sampling and filtering. Results from our experiments show unequivocally that the application of CS does not degrade the recognition accuracy; rather, it often increases it. This proves that CS can provide a desirable form of dimensionality reduction in pattern recognition over time series. © 2010 Crown Copyright.
Allen, J., Xu, R. & Jin, J. 2005, 'Mean shift object tracking for a SIMD computer', Proceedings of Third International Conference On Information Technology And Applications, Vol 1, International Conference on Information Technology and Applications, IEEE, Sydney, Australia, pp. 692-697.
View/Download from: UTS OPUS or Publisher's site
We use SIMD instructions to implement a popular video object-tracking algorithm in an attempt to achieve the best possible performance on the available hardware. We start with an implementation of the well-known mean shift algorithm with adaptive scale a
Xu, R. & Jin, J. 2005, 'Latency insensitive task scheduling for real-time video processing and streaming', Advanced Concepts For Intelligent Vision Systems, Proceedings, Springer-Verlag Berlin, Belgium, pp. 387-394.
View/Download from: UTS OPUS or Publisher's site
In recent times, computer vision and pattern recognition (CVPR) technologies made automatic feature extraction, events detection possible in real-time, on-the-fly video processing and streaming systems. However, these multiple and computational expensive
Xu, R., Jin, J. & Allen, J. 2005, 'Stream-based interactive video language authoring using correlated audiovisual watermarking', Proceedings of Third International Conference On Information Technology And Applications, Vol 2, International Conference on Information Technology and Applications, IEEE, Sydney, Australia, pp. 377-380.
View/Download from: UTS OPUS or Publisher's site
We propose a novel framework that employs correlated digital video and audio watermarking where the watermarking sequence contains video interaction information and media features as a basis towards constructing a secure, self-contained, format independe
Xu, R. & Jin, J. 2005, 'Scheduling latency insensitive computer vision tasks', Parallel and Distributed Processing and Applications - Third International Symposium, ISPA 2005 - Lecture Notes In Computer Science, IEEE International Symposium on Parallel and Distributed Processing with Applications, Springer, Nanjing, China, pp. 1089-1100.
View/Download from: UTS OPUS or Publisher's site
In recent times, there are increasing numbers of computer vision and pattern recognition (CVPR) technologies being applied to real time video processing using single processor PCs. However, these multiple computational expensive tasks are generating bott
Xu, R., Allen, J. & Jin, J. 2004, 'Robust Mean-Shift Tracking with Extended Fast Colour Thresholding', Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, International Symposium on Intelligent Multimedia, Video and Speech Processing, IEEE, Hong Kong, China, pp. 542-545.
View/Download from: UTS OPUS or Publisher's site
We propose a novel adaptive approach for object tracking using fast colour thresholding and region merging. It proves to be an effective measure against large variations between consecutive frames during a mean-shift process. The approach retains mean-shift's property of efficiency and improves mean-shift's drawback of robustness. It can track non-rigid objects with significant occlusion

Journal articles

Fan, X., Cao, L. & Da Xu, R.Y. 2015, 'Dynamic Infinite Mixed-Membership Stochastic Blockmodel', IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 9, pp. 2072-2085.
View/Download from: Publisher's site
Zare Borzeshi, E., Concha, O.P., Xu, R. & Piccardi, M. 2013, 'Joint Action Segmentation and Classification by an Extended Hidden Markov Model', IEEE Signal Processing Letters, vol. 20, no. 12, pp. 1207-1210.
View/Download from: UTS OPUS or Publisher's site
Hidden Markov models (HMMs) provide joint segmentation and classification of sequential data by efficient inference algorithms and have therefore been employed in fields as diverse as speech recognition, document processing, and genomics. However, conven
Da Xu, R.Y. & Kemp, M. 2010, 'Fitting Multiple Connected Ellipses to an Image Silhouette Hierarchically', IEEE Transactions on Image Processing, vol. 19, no. 7, pp. 1673-1682.
View/Download from: UTS OPUS or Publisher's site
Xu, R.Y.D. & Kemp, M. 2010, 'An iterative approach for fitting multiple connected ellipse structure to silhouette', Pattern Recognition Letters, vol. 31, no. 13, pp. 1860-1867.
View/Download from: UTS OPUS or Publisher's site
Xu, R. & Jin, J.S. 2007, 'Camera Control and Multimedia Interaction using Individual Object Recognition', Journal of Multimedia, vol. 2, no. 3, pp. 77-85.
View/Download from: UTS OPUS
Currently, most of the automated, computervision assisted camera control policies are based on human events, such as the speaker gesture and position changes. In addition to these events, in this paper, we introduce a set of natural camera control and multimedia synchronization schemes based on the individual object interaction. We describe in detail, how our unique method, in which the head-pose estimation are used to compute the region of interest (ROI) for recognizing the hand-held object. We explain, from our results, how our approach has achieved robustness, efficiency and unambiguous object interaction during real-time video shooting.
Xu, R., Jin, J. & Allen, J. 2005, 'IVDA: Intelligent Real-time Video Detection Agent for Virtual Classroom Presentation', Advanced Technology for Learning, vol. 2, no. 2, pp. 77-86.
View/Download from: UTS OPUS
Audiovisual streaming has been extensively used in synchronous virtual classroom applications. Until recently, content-based processing has rarely been used in real-time streaming. We present, in this paper, an intelligent system that uses state-of-the-art video processing and computer vision technologies that can automatically respond to various video events de?ned by a set of preprogrammed rules. This intelligent system performs object acquisition, automatic video editing, and student multimedia presentation synchronization that can leverage both the capabilities and e?ciencies in multimedia streaming for a real-time synchronous virtual classroom. We present detailed discussions of the four major advantages of the system, namely, inexpensive hardware, automation, environment adaptabilities as well as natural teaching ?ow. We describe the system in detail, illustrating the main cutting-edge video-processing algorithms being incorporated as well as our own research findings in an effort to enhance performance over the existing algorithms used in virtual classrooms. We also show the implementation of the current prototype system as well as explore its potential in future e-learning applications.