UTS site search

Associate Professor Richard Xu


As the Director of Industry Analytics and Visualisation, and the creator of at Faculty of Engineering & IT's DataLounge initiative, I am constantly navigating between the demands of developing world-renowned machine learning research and finding practical machine learning applications in real industry scenarios

This sees me managing a team of 25 academics, Postdoc, PhD students and engineers; applying their minds and talents to an array of industry projects for government and businesses

Aside from my duties as Industry director, I also publish a series of Statistics, Probability and Machine Learning (including Deep Learning) courses for PhD students and ML practitioners around the world
I recorded the world's most popular Mandarin-speaking machine learning research MOOCS in Youku and Youtube
I published many scholarly papers, some of which were co-authored with world's top ten most influential machine learning researchers


I am the co-founder of Deep Learning Sydney meetup, which has over 1600 members of mostly industry data scientists
Image of Richard Xu
Associate Professor, School of Computing and Communications
Core Member, INEXT - Innovation in IT Services and Applications
Core Member, Global Big Data Technologies
Associate Member, AAI - Advanced Analytics Institute
+61 2 9514 4587

Research Interests

Machine Learning, Data Analytics and Computer Vision
Can supervise: Yes


Qiao, M., Bian, W., Xu, R.Y.D. & Tao, D. 2016, 'Diversified hidden Markov models for sequential labeling', 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, pp. 1512-1513.
View/Download from: Publisher's site
© 2016 IEEE.Labeling of sequential data is a prevalent metaproblem in a wide range of real world applications. A first-order hidden Markov model (HMM) provides a fundamental approach for sequential labeling. However, it does not show satisfactory performance for real world problems, such as optical character recognition (OCR). Aiming at addressing this problem, important extensions of HMM have been proposed in literature. One of the common key features in these extensions is the incorporation of proper prior information. In this paper, we propose a new extension of HMM, termed diversified hidden Markov models (dHMM), with incorporating a diversity-encouraging prior. The prior is added over the state-transition probabilities and thus facilitates more dynamic sequential labelling. Specifically, the diversity is modeled with a continuous determinantal point process. An EM framework for parameter learning and MAP inference is derived, and empirical evaluation on OCR dataset verifies its effectiveness.
Li, Q., Bian, W., Xu, Y., You, J. & Tao, D. 2016, 'Random Mixed Field Model for Mixed-Attribute Data Restoration', Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI, Phoenix, Arizona, USA, pp. 1244-1250.
Noisy and incomplete data restoration is a critical preprocessing step in developing effective learning algorithms, which targets to reduce the effect of noise and missing values in data. By utilizing attribute correlations and/or instance similarities, various techniques have been developed for data denoising and imputation tasks. However, current existing data restoration methods are either specifically designed for a particular task, or incapable of dealing with mixed-attribute data. In this paper, we develop a new probabilistic model to provide a general and principled method for restoring mixed-attribute data. The main contributions of this study are twofold: a) a unified generative model, utilizing a generic random mixed field (RMF) prior, is designed to exploit mixed-attribute correlations; and b) a structured mean-field variational approach is proposed to solve the challenging inference problem of simultaneous denoising and imputation. We evaluate our method by classification experiments on both synthetic data and real benchmark datasets. Experiments demonstrate, our approach can effectively improve the classification accuracy of noisy and incomplete data by comparing with other data restoration methods.
Peng, F., Lu, X., Lu, J., Xu, R.Y.D., Luo, C., Ma, C. & Yang, J. 2016, 'Metricrec: Metric learning for cold-start recommendations', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 445-458.
View/Download from: Publisher's site
© Springer International Publishing AG 2016.Making recommendations for new users is a challenging task of cold-start recommendations due to the absence of historical ratings. When the attributes of users are available, such as age, occupation and gender, then new users' preference can be inferred. Inspired by the user based collaborative filtering in warm-start scenario, we propose using the similarity on attributes to conduct recommendations for new users. Two basic similarity metrics, cosine and Jaccard, are evaluated for cold-start. We also propose a novel recommendation model, MetricRec, that learns an interest-derived metric such that the users with similar interests are close to each other in the attribute space. As the MetricRec's feasible area is conic, we propose an efficient Interior-point Stochastic Gradient Descent (ISGD) method to optimize it. During the optimizing process, the metric is always guaranteed in the feasible area. Owing to the stochastic strategy, ISGD possesses scalability. Finally, the proposed models are assessed on two movie datasets, Movielens-100K and Movielens-1M. Experimental results demonstrate that MetricRec can effectively learn the interest-derived metric that is superior to cosine and Jaccard, and solve the cold-start problem effectively.
Fan, X., Xu, R.Y.D. & Cao, L. 2016, 'Copula mixed-membership stochastic block model', IJCAI International Joint Conference on Artificial Intelligence, pp. 1462-1468.
The Mixed-Membership Stochastic Blockmodels (MMSB) is a popular framework for modelling social relationships by fully exploiting each individual node's participation (or membership) in a social network. Despite its powerful representations, MMSB assumes that the membership indicators of each pair of nodes (i.e., people) are distributed independently. However, such an assumption often does not hold in real-life social networks, in which certain known groups of people may correlate with each other in terms of factors such as their membership categories. To expand MMSB's ability to model such dependent relationships, a new framework - a Copula Mixed-Membership Stochastic Blockmodel - is introduced in this paper for modeling intra-group correlations, namely an individual Copula function jointly models the membership pairs of those nodes within the group of interest. This framework enables various Copula functions to be used on demand, while maintaining the membership indicator's marginal distribution needed for modelling membership indicators with other nodes outside of the group of interest. Sampling algorithms for both the finite and infinite number of groups are also detailed. Our experimental results show its superior performance in capturing group interactions when compared with the baseline models on both synthetic and real world datasets.
Li, M., Xu, Y.D. & He, X.J. 2015, 'Face hallucination based on Nonparametric Bayesian learning', Proceedings of IEEE International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Quebec City, Canada, pp. 986-990.
View/Download from: UTS OPUS or Publisher's site
In this paper, we propose a novel example-based face hallucination method through nonparametric Bayesian learning based on the assumption that human faces have similar local pixel structure. We cluster the low resolution (LR) face image patches by nonparametric method distance dependent Chinese Restaurant process (ddCRP) and calculate the centres of the clusters (i.e., subspaces). Then, we learn the mapping coefficients from the LR patches to high resolution (HR) patches in each subspace. Finally, the HR patches of input low resolution face image can be efficiently generated by a simple linear regression. The spatial distance constraint is employed to aid the learning of subspace centers so that every subspace will better reflect the detailed information of image patches. Experimental results show our method is efficient and promising for face hallucination.
Xuan, J., Lu, J., Zhang, G., Xu, R.Y.D. & Luo, X. 2015, 'Infinite author topic model based on mixed gamma-negative binomial process', Proceedings - IEEE International Conference on Data Mining, ICDM, 2015 IEEE International Conference on Data Mining, IEEE, Atlantic City, USA, pp. 489-498.
View/Download from: UTS OPUS or Publisher's site
Incorporating the side information of text corpus, i.e., authors, time stamps, and emotional tags, into the traditionaltext mining models has gained significant interests in the area of information retrieval, statistical natural language processing, andmachine learning. One branch of these works is the so-called Author Topic Model (ATM), which incorporates the authors'sinterests as side information into the classical topic model. However, the existing ATM needs to predefine the number of topics, which is difficult and inappropriate in many real-world settings. In this paper, we propose an Infinite Author Topic (IAT) modelto resolve this issue. Instead of assigning a discrete probability on fixed number of topics, we use a stochastic process to determinethe number of topics from the data itself. To be specific, we extend a gamma-negative binomial process to three levels in orderto capture the author-document-keyword hierarchical structure. Furthermore, each document is assigned a mixed gamma processthat accounts for the multi-author's contribution towards this document. An efficient Gibbs sampling inference algorithm witheach conditional distribution being closed-form is developed for the IAT model. Experiments on several real-world datasets showthe capabilities of our IAT model to learn the hidden topics, authors' interests on these topics and the number of topicssimultaneously.
Bargi, A., Xu, R.Y.D. & Piccardi, M. 2014, 'An Infinite Adaptive Online Learning Model for Segmentation and Classification of Streaming Data', 22nd International Conference on Pattern Recognition (ICPR), 2014, IEEE Computer Society, Stockholm, Sweden.
View/Download from: UTS OPUS or Publisher's site
In recent years, the desire and need to understand streaming data has been increasing. Along with the constant flow of data, it is critical to classify and segment the observations on-the-fly without being limited to a rigid number of classes. In other words, the system needs to be adaptive to the streaming data and capable of updating its parameters to comply with natural changes. This interesting problem, however, is poorly addressed in the literature, as many of the common studies focus on offline classification over a pre-defined class set. In this paper, we propose a novel adaptive online system based on Markov switching models with hierarchical Dirichlet process priors. This infinite adaptive online approach is capable of segmenting and classifying the streaming data over infinite classes, while meeting the memory and delay constraints of streaming contexts. The model is further enhanced by a 'predictive batching' mechanism, that is able to divide the flowing data into batches of variable size, imitating the ground-truth segments. Experiments on two video datasets show significant performance of the proposed approach in frame-level accuracy, segmentation recall and precision, while determining the accurate number of classes in acceptable computational time.
Bargi, A., Xu, R.Y.D., Ghahramani, Z. & Piccardi, M. 2014, 'A Non-parametric Conditional Factor Regression Model for Multi-Dimensional Input and Response', Seventeenth International Conference on Artificial Intelligence and Statistics, 2014, JMLR, Reykjavik, Iceland, pp. 77-85.
View/Download from: UTS OPUS
Bargi, A., Xu, R. & Piccardi, M. 2012, 'An online HDP-HMM for joint action segmentation and classification in motion capture data', 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Computer Society, Providence RI, USA, pp. 1-7.
View/Download from: UTS OPUS or Publisher's site
Since its inception, action recognition research has mainly focused on recognizing actions from closed, predefined sets of classes. Conversely, the problem of recognizing actions from open, possibly incremental sets of classes is still largely unexplored. In this paper, we propose a novel online method based on the âstickyâ hierarchical Dirichlet process and the hidden Markov model [11, 5]. This approach, labelled as the online HDP-HMM, provides joint segmentation and classification of actions while a) processing the data in an online, recursive manner, b) discovering new classes as they occur, and c) adjusting its parameters over the streaming data. In a set of experiments, we have applied the online HDP-HMM to recognize actions from motion capture data from the TUM kitchen dataset, a challenging dataset of manipulation actions in a kitchen [12]. The results show significant accuracy in action classification, time segmentation and determination of the number of action classes
Zare Borzeshi, E., Piccardi, M. & Xu, R. 2011, 'A Discriminative Prototype Selection Approach for Graph Embedding in Human Action Recognition', 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshop), IEEE International Conference on Computer Vision Workshops, IEEE Computer Society, Barcelona Spain, pp. 1295-1301.
View/Download from: UTS OPUS
This paper proposes a novel graph-based method for representing a human's shape during the performance of an action. Despite their strong representational power, graphs are computationally cumbersome for pattern analysis. One way of circumventing this problem is that of transforming the graphs into a vector space by means of graph embedding. Such an embedding can be conveniently obtained by way of a set of 'prototype' graphs and a dissimilarity measure: yet, the critical step in this approach is the selection of a suitable set of prototypes which can capture both the salient structure within each action class as well as the intra-class variation. This paper proposes a new discriminative approach for the selection of prototypes which maximizes a function of the inter- and intra-class distances. Experiments on an action recognition dataset reported in the paper show that such a discriminative approach outperforms well-established prototype selection methods such as center, border and random prototype selection.
Concha, O.P., Xu, R., Piccardi, M. & Moghaddam, Z. 2011, 'HMM-MIO: An Enhanced Hidden Markov Model for Action Recognition', 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, IEEE Computer Society, Colorado Spring, CO, pp. 62-69.
View/Download from: UTS OPUS or Publisher's site
Generative models can be flexibly employed in a variety of tasks such as classification, detection and segmen- tation thanks to their explicit modelling of likelihood functions. However, likelihood functions are hard to model accurately in many real cases. In this paper, we present an enhanced hidden Markov model capable of dealing with the noisy, high-dimensional and sparse measurements typical of action feature sets. The modified model, named hid- den Markov model with multiple, independent observations (HMM-MIO), joins: a) robustness to observation outliers, b) dimensionality reduction, and c) processing of sparse observations. In the paper, a set of experimental results over the Weizmann and KTH datasets shows that this model can be tuned to achieve classification accuracy comparable to that of discriminative classifiers. While discriminative ap- proaches remain the natural choice for classification tasks, our results prove that likelihoods, too, can be modelled to a high level of accuracy. In the near future, we plan extension of HMM-MIO along the lines of infinite Markov models and its integration into a switching model for continuous human action recognition.
Zare Borzeshi, E., Xu, R. & Piccardi, M. 2011, 'Automatic Human Action Recognition in Video by Graph Embedding', Lecture Notes in Computer Science.Image Analysis and Processing - ICIAP 2011.16th International Conference Part II, Image Analysis and Processing â ICIAP 2011, Springer-Verlag, Ravenna, Italy, pp. 19-28.
View/Download from: UTS OPUS
The problem of human action recognition has received increasing attention in recent years for its importance in many applications. Yet, the main limitation of current approaches is that they do not capture well the spatial relationships in the subject performing the action. This paper presents an initial study which uses graphs to represent the actorâs shape and graph embedding to then convert the graph into a suitable feature vector. In this way, we can benefit from the wide range of statistical classifiers while retaining the strong representational power of graphs. The paper shows that, although the proposed method does not yet achieve accuracy comparable to that of the best existing approaches, the embedded graphs are capable of describing the deformable human shape and its evolution along the time. This confirms the interesting rationale of the approach and its potential for future performance.
Concha, O.P., Xu, R. & Piccardi, M. 2010, 'Robust Dimensionality Reduction for Human Action Recognition', Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), Digital Image Computing: Techniques and Applications, IEEE Computer Society, Sydney, Australia, pp. 349-356.
View/Download from: UTS OPUS or Publisher's site
Human action recognition can be approached by combining an action-discriminative feature set with a classifier. However, the dimensionality of typical feature sets joint with that of the time dimension often leads to a curse-of-dimensionality situation. Moreover, the measurement of the feature set is subject to sometime severe errors. This paper presents an approach to human action recognition based on robust dimensionality reduction. The observation probabilities of hidden Markov models (HMM) are modelled by mixtures of probabilistic principal components analyzers and mixtures of t-distribution sub-spaces, and compared with conventional Gaussian mixture models. Experimental results on two datasets show that dimensionality reduction helps improve the classification accuracy and that the heavier-tailed t-distribution can help reduce the impact of outliers generated by segmentation errors.
Concha, O.P., Xu, R. & Piccardi, M. 2010, 'Compressive Sensing of Time Series for Human Action Recognition', Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), Digital Image Computing: Techniques and Applications, IEEE Computer Society, Sydney, Australia, pp. 454-461.
View/Download from: UTS OPUS or Publisher's site
Compressive Sensing (CS) is an emerging signal processing technique where a sparse signal is reconstructed from a small set of random projections. In the recent literature, CS techniques have demonstrated promising results for signal compression and reconstruction [9, 8, 1]. However, their potential as dimensionality reduction techniques for time series has not been significantly explored to date. To this aim, this work investigates the suitability of compressive-sensed time series in an application of human action recognition. In the paper, results from several experiments are presented: (1) in a first set of experiments, the time series are transformed into the CS domain and fed into a hidden Markov model (HMM) for action recognition; (2) in a second set of experiments, the time series are explicitly reconstructed after CS compression and then used for recognition; (3) in the third set of experiments, the time series are compressed by a hybrid CS-Haar basis prior to input into HMM; (4) in the fourth set, the time series are reconstructed from the hybrid CS-Haar basis and used for recognition. We further compare these approaches with alternative techniques such as sub-sampling and filtering. Results from our experiments show unequivocally that the application of CS does not degrade the recognition accuracy; rather, it often increases it. This proves that CS can provide a desirable form of dimensionality reduction in pattern recognition over time series.
Benter, A., Xu, R., Moore, W., Antolovich, M. & Gao, J. 2009, 'Fragment size detection within homogeneous material using ground penetrating radar', 2009 International Radar Conference "Surveillance for a Safer World", RADAR 2009.
Ground Penetrating Radar (GPR) offers the ability to observe the internal structure of a pile of rocks. Large fragments within the pile may not be visible on the surface. Determining these large fragment sizes before collection can improve mine productivity. This research has examined the potential to identify objects where the background media and the object exhibit the same dielectric properties. Preliminary results are presented which show identification is possible using standard GPR equipment.
Xu, R.Y.D. & Kemp, M. 2009, 'Multiple curvature based approach to human upper body parts detection with connected ellipse model fine-tuning', Proceedings - International Conference on Image Processing, ICIP, pp. 2577-2580.
View/Download from: Publisher's site
In this paper, we discuss an effective method for detecting human upper body parts from a 2D image silhouette using curvature analysis and ellipse fitting. First we smooth the silhouette so that we can determine just the global features: the head, hands and armpits. Next we reduce the smoothing to detect the local features of the neck and elbows. We model the human upper body by multiple connected ellipses. Thus we segment the body by the extracted features. Ellipses are fitted to each segment. Lastly, we apply a non-linear least square method to minimize the differences between the connected ellipse model and the edge of the silhouette. ©2009 IEEE.
Xu, R.Y.D., Brown, J.M., Traish, J.M. & Dezwa, D. 2008, 'A computer vision based camera pedestal's vertical motion control', Proceedings - International Conference on Pattern Recognition.
Traditional camera pedestals are manually operated. Our long term goal is to construct a fully autonomous pedestal system which can respond to changes in a scene and mimicking the human camera operator. In this paper, we discuss our experiments to control the vertical motion of a pedestal by leveling its position with a human head or a tracked hand-held object. We describe a set of computer vision methods used in these experiments, including the head position tracking using Gaussian Mixture Model (GMM) of the foreground blob and hand-held object tracking using Continuously Adaptive Mean shift (CAM-shift) with motion initialization. We also discuss the application of Kalman Filter and showing its effect in the reduction of the number of jittering pedestal motions. © 2008 IEEE.
Xu, R.Y.D. 2008, 'A computer vision based whiteboard capture system', 2008 IEEE Workshop on Applications of Computer Vision, WACV.
View/Download from: Publisher's site
Conventional whiteboard video capture using a static camera usually results in a poor quality. In this paper, we present an autonomous whiteboard scan and capture prototype system, which consist a pair of static and Pan-Tilt-Zoom (PTZ) cameras. The PTZ camera is used to scan the newly-updated whiteboard regions without interrupting the instructor. We will illustrate several computer vision techniques used in our system: Firstly, we present our unique camera calibration method using rough hand-drawn gridlines. Secondly, we present the image processing methods used to determine where the newly updated whiteboard region to be scanned is. Our method also accounts for the whiteboard region occlusion from the instructor.
Gao, J. & Xu, R.Y. 2007, 'Mixture of the robust L1 distributions and its applications', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 26-35.
Recently a robust probabilistic L1-PCA model was introduced in [1] by replacing the conventional Gaussian noise model with the Laplacian L1 model. Due to the heavy tail characteristics of the L1 distribution, the proposed model is more robust against data outliers. In this paper, we generalized the L1-PCA into a mixture of L1-distributions so that the model can be used for possible multiclustering data. For the model learning we use the property that the L1 density can be expanded as a superposition of infinite number of Gaussian densities to include a tractable Bayesian learning and inference based on the variational EM-type algorithm. © Springer-Verlag Berlin Heidelberg 2007.
Xu, R.Y.D. & Jin, J.S. 2006, 'Individual object interaction for camera control and multimedia synchronization', ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
In recent times, most of the computer-vision assisted automatic camera control policies are based on human events, such as speaker position changes. In addition to these events, in this paper, we introduce a set of natural camera control and multimedia synchronization schemes based on individual object interaction. We present our methods in detail, including head-pose calculation and laser pointer guidance, which are used to estimate the region of interest (ROI) for both hand-held and object-at-distance. We explain, from our results, of how these set of approaches have achieved robustness, efficiency and unambiguous object interaction during real-time video shooting. © 2006 IEEE.

Journal articles

Fan, X., Xu, R.Y.D., Cao, L. & Song, Y. 2017, 'Learning Nonparametric Relational Models by Conjugately Incorporating Node Information in a Network', IEEE Transactions on Cybernetics, vol. 47, no. 3, pp. 589-599.
View/Download from: Publisher's site
Relational model learning is useful for numerous practical applications. Many algorithms have been proposed in recent years to tackle this important yet challenging problem. Existing algorithms utilize only binary directional link data to recover hidden network structures. However, there exists far richer and more meaningful information in other parts of a network which one can (and should) exploit. The attributes associated with each node, for instance, contain crucial information to help practitioners understand the underlying relationships in a network. For this reason, in this paper, we propose two models and their solutions, namely the node-information involved mixed-membership model and the node-information involved latent-feature model, in an effort to systematically incorporate additional node information. To effectively achieve this aim, node information is used to generate individual sticks of a stick-breaking process. In this way, not only can we avoid the need to prespecify the number of communities beforehand, the algorithm also encourages that nodes exhibiting similar information have a higher chance of assigning the same community membership. Substantial efforts have been made toward achieving the appropriateness and efficiency of these models, including the use of conjugate priors. We evaluate our framework and its inference algorithms using real-world data sets, which show the generality and effectiveness of our models in capturing implicit network structures.
Feng, X.I.A.N.G., Wan, W., Richard Yi Da Xu, Chen, H., Li, P. & Sánchez, J.A. 2017, 'A perceptual quality metric for 3D triangle meshes based on spatial pooling', Frontiers of Computer Science.
View/Download from: UTS OPUS
Lu, J., Xuan, J., Zhang, G., Xu, Y.D. & Luo, X. 2017, 'Bayesian Nonparametric Relational Topic Model through Dependent Gamma Processes', IEEE Transactions on Knowledge and Data Engineering, pp. 1-14.
View/Download from: UTS OPUS or Publisher's site
Peng, F., Lu, J., Wang, Y., Xu, R.Y.D., Ma, C. & Yang, J. 2016, 'N-dimensional Markov random field prior for cold-start recommendation', Neurocomputing, vol. 191, pp. 187-199.
View/Download from: Publisher's site
© 2016 Elsevier B.V. A recommender system is a commonly used technique to improve user experience in e-commerce applications. One of the popular recommender methods is Matrix Factorization (MF) that learns the latent profile of both users and items. However, if the historical ratings are not available, the latent profile will draw from a zero-mean Gaussian prior, resulting in uninformative recommendations. To deal with this issue, we propose using an n-dimensional Markov random field as the prior of matrix factorization (called mrf-MF). In the Markov random field, the attribute (such as age, occupation of users and genre, release year of items) is considered as the site and the latent profile, the random variable. Through the prior, new users or items will be recommended according to its neighbors. The proposed model is suitable for three types of cold-start recommendation: (1) recommend new items to existing users; (2) recommend new users for existing items; (3) recommend new items to new users. The proposed model is assessed on two movie datasets, Movielens 100K and Movielens 1M. Experimental results show that it can effectively solve each of the three cold-start problems and outperforms several matrix factorization based methods.
Li, J., Zhao, B., Deng, C. & Xu, R.Y.D. 2016, 'Time Varying Metric Learning for visual tracking', Pattern Recognition Letters, vol. 80, pp. 157-164.
View/Download from: Publisher's site
© 2016 Elsevier B.V.Traditional tracking-by-detection based methods treat the target and the background as a binary classification problem. This two class classification method suffers from two main drawbacks. Firstly, the learning result may be unreliable when the number of training samples is not large enough. Secondly, the binary classifier tends to drift because of the complex background tracking conditions. In this paper, we propose a new model called Time Varying Metric Learning (TVML) for visual tracking. We adopt the Wishart Process to model the time varying metrics for target features, and apply the Recursive Bayesian Estimation (RBE) framework to learn the metric from the data with 'side information contraint. Metric learning with side information is able to omit the clustering of negative samples, which is more preferable in complex background tracking scenarios. The recursive Bayesian model ensures the learned metric is accurate with limited training samples. The experimental results demonstrate the comparable performance of the TVML tracker compared to state-of-the-art methods, especially when there are background clutters.
Qiao, M., Xu, R.Y.D., Bian, W. & Tao, D. 2016, 'Fast sampling for time-varying Determinantal Point Processes', ACM Transactions on Knowledge Discovery from Data, vol. 11, no. 1.
View/Download from: Publisher's site
© 2016 ACM.Determinantal Point Processes (DPPs) are stochastic models which assign each subset of a base dataset with a probability proportional to the subset's degree of diversity. It has been shown that DPPs are particularly appropriate in data subset selection and summarization (e.g., news display, video summarizations). DPPs prefer diverse subsets while other conventional models cannot offer. However, DPPs inference algorithms have a polynomial time complexity which makes it difficult to handle large and time-varying datasets, especially when real-time processing is required. To address this limitation, we developed a fast sampling algorithm for DPPs which takes advantage of the nature of some time-varying data (e.g., news corpora updating, communication network evolving), where the data changes between time stamps are relatively small. The proposed algorithm is built upon the simplification of marginal density functions over successive time stamps and the sequential Monte Carlo (SMC) sampling technique. Evaluations on both a real-world news dataset and the Enron Corpus confirm the efficiency of the proposed algorithm.
Kemp, M. & Xu, R.Y.D. 2015, 'Geometrically-constrained balloon fitting for multiple connected ellipses', Pattern Recognition, vol. 48, no. 7, pp. 2198-2208.
View/Download from: Publisher's site
Copyright © 2015 Published by Elsevier Ltd. All rights reserved.This paper presents a framework to fit data to a model consisting of multiple connected ellipses. For each iteration of the fitting algorithm, the representation of the multiple ellipses is mapped to a Gaussian mixture model (GMM) and the connections are mapped to geometric constraints for the GMM. The fitting is a modified constrained expectation maximisation (EM) method on the GMM (maximising with respect to the ellipse parameters rather than Gaussian parameters). A key modification is that the precision of the chosen GMM is increased at each iteration. This is similar to slowly inflating a bunch of connected balloons and so this is called balloon fitting. Extensions of the framework to other constraints and possible pre-processing are also discussed. The superiority of balloon fitting is demonstrated through experiments on several silhouettes with noisy edges which compare other existing methods with balloon fitting and some of the extensions. Crown
Qiao, M., Bian, W., Xu, R.Y.D. & Tao, D. 2015, 'Diversified Hidden Markov Models for Sequential Labeling', IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 11, pp. 2947-2960.
View/Download from: Publisher's site
© 2015 IEEE. Labeling of sequential data is a prevalent meta-problem for a wide range of real world applications. While the first-order Hidden Markov Models (HMM) provides a fundamental approach for unsupervised sequential labeling, the basic model does not show satisfying performance when it is directly applied to real world problems, such as part-of-speech tagging (PoS tagging) and optical character recognition (OCR). Aiming at improving performance, important extensions of HMM have been proposed in the literatures. One of the common key features in these extensions is the incorporation of proper prior information. In this paper, we propose a new extension of HMM, termed diversified Hidden Markov Models (dHMM), which utilizes a diversity-encouraging prior over the state-transition probabilities and thus facilitates more dynamic sequential labellings. Specifically, the diversity is modeled by a continuous determinantal point process prior, which we apply to both unsupervised and supervised scenarios. Learning and inference algorithms for dHMM are derived. Empirical evaluations on benchmark datasets for unsupervised PoS tagging and supervised OCR confirmed the effectiveness of dHMM, with competitive performance to the state-of-the-art.
Fan, X., Cao, L. & Xu, R.Y.D. 2014, 'Dynamic Infinite Mixed-Membership Stochastic Blockmodel', IEEE Transactions on Neural Networks and Learning Systems.
View/Download from: Publisher's site
Directional and pairwise measurements are often used to model interactions in a social network setting. The mixed-membership stochastic blockmodel (MMSB) was a seminal work in this area, and its ability has been extended. However, models such as MMSB face particular challenges in modeling dynamic networks, for example, with the unknown number of communities. Accordingly, this paper proposes a dynamic infinite mixed-membership stochastic blockmodel, a generalized framework that extends the existing work to potentially infinite communities inside a network in dynamic settings (i.e., networks are observed over time). Additional model parameters are introduced to reflect the degree of persistence among one's memberships at consecutive time stamps. Under this framework, two specific models, namely mixture time variant and mixture time invariant models, are proposed to depict two different time correlation structures. Two effective posterior sampling strategies and their results are presented, respectively, using synthetic and real-world data.
Zare Borzeshi, E., Concha, O.P., Xu, R. & Piccardi, M. 2013, 'Joint Action Segmentation and Classification by an Extended Hidden Markov Model', IEEE Signal Processing Letters, vol. 20, no. 12, pp. 1207-1210.
View/Download from: UTS OPUS or Publisher's site
Hidden Markov models (HMMs) provide joint segmentation and classification of sequential data by efficient inference algorithms and have therefore been employed in fields as diverse as speech recognition, document processing, and genomics. However, conven
Xu, R. & Kemp, M. 2010, 'Fitting Multiple Connected Ellipses To An Image Silhouette Hierarchically', IEEE Transactions On Image Processing, vol. 19, no. 7, pp. 1673-1682.
View/Download from: UTS OPUS or Publisher's site
In this paper, we seek to fit a model, specified in terms of connected ellipses, to an image silhouette. Some algorithms that have attempted this problem are sensitive to initial guesses and also may converge to a wrong solution when they attempt to mini
Xu, R. & Kemp, M. 2010, 'An Iterative Approach for Fitting Multiple Connected Ellipse Structure to Silhouette', Pattern Recognition Letters, vol. 31, no. 13, pp. 1860-1867.
View/Download from: UTS OPUS or Publisher's site
In many image processing applications, the structures conveyed in the image contour can often be described by a set of connected ellipses. Previous fitting methods to align the connected ellipse structure with a contour, in general, lack a continuous solution space. In addition, the solution obtain often satisfies only a partial number of ellipses, leaving others with poor fits. In this paper, we address these two problems by presenting an iterative framework for fitting a 2D silhouettte contour to a pre-specified connected ellipses structure with a very coarse initial guess. Under the proposed framework, we first improve the initial guess by modelling the silhouette region as set of disconnected ellipses using mixture of Gaussian densities or the heuristics approaches. Then, an iterative method is applied in a similar fashion to the Iterative Closest Point (ICP) (Alshawa, 2007; Li and Griffiths, 2000; Besl and McKay, 1992) algorithm. Each iteration contains two parts: first part is to assighn all the contour points to the individual unconnected ellipses, which we refer to as the segmentation step and the second part is the non-linear least square approach that minimizes both the sum of the square distance between the countour points and ellipse's edge as well as minimizing the ellipse's vertex pair(s) distances, which we refer to as the minimization step. We illustrate the effectiveness of our menthods through experimental result on several images as well as applying the algorithm to a mini database of human upper-body images.