As an Associate Professor and Director of Industry Analytics and Visualisation, and the creator of at Faculty of Engineering & IT's DataLounge initiative, I am constantly navigating between the demands of developing world-renowned machine learning research and finding practical applications in real industry scenarios.
This sees me managing a team of 25 academics, Postdoc, PhD students and engineers; applying their minds and talents to an array of industry projects for government and businesses.
Aside from my duties as Industry director, I also publish a series of Statistics, Probability and Machine Learning (including Deep Learning) courses for PhD students and ML practitioners around the world.
I published many scholarly papers, some of which were co-authored with world's top ten most influential machine learning researchers.
I am the co-founder of Deep Learning Sydney meetup, which has over 2500+ members of mostly industry data scientists.
Can supervise: YES
Machine Learning, Deep Learning, Data Analytics and Computer Vision
- Deep Learning
- Machine Learning
Feng, XIANG, Wan, W, Richard Yi Da Xu, Chen, H, Li, P & Sánchez, JA 2018, 'A perceptual quality metric for 3D triangle meshes based on spatial pooling', Frontiers of Computer Science.View/Download from: UTS OPUS
Bargi, A, Xu, YD & Piccardi, M 2018, 'AdOn HDP-HMM: An Adaptive Online Model for Segmentation and Classification of Sequential Data', IEEE Transactions on Neural Networks and Learning Systems, pp. 3953-3968.View/Download from: UTS OPUS or Publisher's site
Yang, W, Li, J, Zheng, H & Xu, RYD 2018, 'A Nuclear Norm Based Matrix Regression Based Projections Method for Feature Extraction', IEEE Access, vol. 6, pp. 7445-7451.View/Download from: UTS OPUS or Publisher's site
© 2013 IEEE. In the traditional graph embedding framework, the graph is usually built by k-NN or r-ball. Since it is difficult to manually set the parameters k and r in the high-dimensional space, sparse representation-based methods are usually introduced to automatically build the graphs. In recent years, nuclear norm-based matrix regression (NMR) has been proposed for face recognition using the low rank structural information (i.e., the image matrix-based error model). Inspired by NMR, we give a NMR-based projections (NMRP) method for feature extraction and recognition. The experiments on FERET and extended Yale B face databases show that NMR can be used to build the graph while NMRP is an effective feature extraction method.
Feng, X, Wan, W, Xu, RYD, Perry, S, Zhu, S & Liu, Z 2018, 'A new mesh visual quality metric using saliency weighting-based pooling strategy', Graphical Models, vol. 99, pp. 1-12.View/Download from: UTS OPUS or Publisher's site
© 2018 Elsevier Inc. Several metrics have been proposed to assess the visual quality of 3D triangular meshes during the last decade. In this paper, we propose a mesh visual quality metric by integrating mesh saliency into mesh visual quality assessment. We use the Tensor-based Perceptual Distance Measure metric to estimate the local distortions for the mesh, and pool local distortions into a quality score using a saliency weighting-based pooling strategy. Three well-known mesh saliency detection methods are used to demonstrate the superiority and effectiveness of our metric. Experimental results show that our metric with any of three saliency maps performs better than state-of-the-art metrics on the LIRIS/EPFL general-purpose database. We generate a synthetic saliency map by assembling salient regions from individual saliency maps. Experimental results reveal that the synthetic saliency map achieves better performance than individual saliency maps, and the performance gain is closely correlated with the similarity between the individual saliency maps.
Lu, J, Xuan, J, Zhang, G, Xu, YD & Luo, X 2017, 'Bayesian Nonparametric Relational Topic Model through Dependent Gamma Processes', IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 7, pp. 1357-1369.View/Download from: UTS OPUS or Publisher's site
Traditional relational topic models provide a successful way to discover the hidden topics from a document network. Many
theoretical and practical tasks, such as dimensional reduction, document clustering, and link prediction, could benefit from this revealed
knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known a priori,
which is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational
topic model using stochastic processes instead of fixed-dimensional probability distributions in this paper. Specifically, each document
is assigned a Gamma process, which represents the topic interest of this document. Although this method provides an elegant solution,
it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two
spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In
order to resolve these challenges, we use a subsampling strategy to assign each document a different Gamma process from the global
Gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the
document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number
simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the
hidden topics and, more importantly, the number of topics.
© 2016Existing Active Contour methods suffer from the deficiencies of initialization sensitivity, slow convergence, and being insufficient in the presence of image noise and inhomogeneity. To address these problems, this paper proposes a region scalable active contour model with global constraint (RSGC). The energy function is formulated by incorporating local and global constraints. The local constraint is a region scalable fitting term that draws upon local region information under controllable scales. The global constraint is constructed through estimating the global intensity distribution of image content. Specifically, the global intensity distribution is approximated with a Gaussian Mixture Model (GMM) and estimated by Expectation Maximization (EM) algorithm as a prior. The segmentation process is implemented through optimizing the improved energy function. Comparing with two other representative models, i.e. region-scalable fitting model (RSF) and active contour model without edges (CV), the proposed RSGC model achieves more efficient, stable and precise results on most testing images under the joint actions of local and global constraints.
Li, J, Deng, C, Xu, RYD, Tao, D & Zhao, B 2017, 'Robust Object Tracking with Discrete Graph-Based Multiple Experts', IEEE Transactions on Image Processing, vol. 26, no. 6, pp. 2736-2750.View/Download from: UTS OPUS or Publisher's site
© 1992-2012 IEEE. Variations of target appearances due to illumination changes, heavy occlusions, and target deformations are the major factors for tracking drift. In this paper, we show that the tracking drift can be effectively corrected by exploiting the relationship between the current tracker and its historical tracker snapshots. Here, a multi-expert framework is established by the current tracker and its historical trained tracker snapshots. The proposed scheme is formulated into a unified discrete graph optimization framework, whose nodes are modeled by the hypotheses of the multiple experts. Furthermore, an exact solution of the discrete graph exists giving the object state estimation at each time step. With the unary and binary compatibility graph scores defined properly, the proposed framework corrects the tracker drift via selecting the best expert hypothesis, which implicitly analyzes the recent performance of the multi-expert by only evaluating graph scores at the current frame. Three base trackers are integrated into the proposed framework to validate its effectiveness. We first integrate the online SVM on a budget algorithm into the framework with significant improvement. Then, the regression correlation filters with hand-crafted features and deep convolutional neural network features are introduced, respectively, to further boost the tracking performance. The proposed three trackers are extensively evaluated on three data sets: TB-50, TB-100, and VOT2015. The experimental results demonstrate the excellent performance of the proposed approaches against the state-of-the-art methods.
Liu, W, Luo, X, Zhang, J, Xue, R & Xu, RYD 2017, 'Semantic summary automatic generation in news event', Concurrency and Computation: Practice and Experience, vol. 29, no. 24, pp. 1-5.View/Download from: UTS OPUS or Publisher's site
Copyright © 2017 John Wiley & Sons, Ltd. How to generate summary with more novel and rich semantics is a challenging issue in the area of multi-document automatic summary. In this paper, a core semantics extraction model (CSEM) is proposed to improve the novel and rich semantics of multi-document summary. Firstly, for improving the rich semantics, semantic units, which are a group of association relations of keywords, are used to express texts' semantics. Secondly, for improving the novel semantics, an attenuation function is introduced to adjust the importance of semantic units according to the appearing times that sem antic units in the candidate of summary sentences. Thirdly, in order to maximize the novel and rich semantics of summary, the generating process of summary is converted into the optimization process on how to find a set of sentences with a higher importance. Finally, CSEM extracts the least number of sentences to cover the most core semantics in corpus as summary. Experimental results on the benchmark DUC 2004 show that our model outperforms the state-of-art approaches (eg, OCCAMS_V, JS-Gen-2) under official metric. Especially, the recall of our model in ROUGE-1 is 40.684%, which is better than other approaches (eg, OCCAMS_V 38.497% and JS-Gen-2 36.739%).
Xuan, J, Lu, J, Zhang, G, Xu, RYD & Luo, X 2017, 'A Bayesian nonparametric model for multi-label learning', Machine Learning, vol. 106, no. 11, pp. 1787-1815.View/Download from: UTS OPUS or Publisher's site
© 2017, The Author(s). Multi-label learning has become a significant learning paradigm in the past few years due to its broad application scenarios and the ever-increasing number of techniques developed by researchers in this area. Among existing state-of-the-art works, generative statistical models are characterized by their good generalization ability and robustness on large number of labels through learning a low-dimensional label embedding. However, one issue of this branch of models is that the number of dimensions needs to be fixed in advance, which is difficult and inappropriate in many real-world settings. In this paper, we propose a Bayesian nonparametric model to resolve this issue. More specifically, we extend a Gamma-negative binomial process to three levels in order to capture the label-instance-feature structure. Furthermore, a mixing strategy for Gamma processes is designed to account for the multiple labels of an instance. The mixed process also leads to a difficulty in model inference, so an efficient Gibbs sampling inference algorithm is then developed to resolve this difficulty. Experiments on several real-world datasets show the performance of the proposed model on multi-label learning tasks, comparing with three state-of-the-art models from the literature.
Fan, X, Xu, RYD, Cao, L & Song, Y 2017, 'Learning Nonparametric Relational Models by Conjugately Incorporating Node Information in a Network', IEEE Transactions on Cybernetics, vol. 47, no. 3, pp. 589-599.View/Download from: UTS OPUS or Publisher's site
Relational model learning is useful for numerous practical applications. Many algorithms have been proposed in recent years to tackle this important yet challenging problem. Existing algorithms utilize only binary directional link data to recover hidden network structures. However, there exists far richer and more meaningful information in other parts of a network which one can (and should) exploit. The attributes associated with each node, for instance, contain crucial information to help practitioners understand the underlying relationships in a network. For this reason, in this paper, we propose two models and their solutions, namely the node-information involved mixed-membership model and the node-information involved latent-feature model, in an effort to systematically incorporate additional node information. To effectively achieve this aim, node information is used to generate individual sticks of a stick-breaking process. In this way, not only can we avoid the need to prespecify the number of communities beforehand, the algorithm also encourages that nodes exhibiting similar information have a higher chance of assigning the same community membership. Substantial efforts have been made toward achieving the appropriateness and efficiency of these models, including the use of conjugate priors. We evaluate our framework and its inference algorithms using real-world data sets, which show the generality and effectiveness of our models in capturing implicit network structures.
© 2016 Elsevier B.V. Traditional tracking-by-detection based methods treat the target and the background as a binary classification problem. This two class classification method suffers from two main drawbacks. Firstly, the learning result may be unreliable when the number of training samples is not large enough. Secondly, the binary classifier tends to drift because of the complex background tracking conditions. In this paper, we propose a new model called Time Varying Metric Learning (TVML) for visual tracking. We adopt the Wishart Process to model the time varying metrics for target features, and apply the Recursive Bayesian Estimation (RBE) framework to learn the metric from the data with 'side information contraint'. Metric learning with side information is able to omit the clustering of negative samples, which is more preferable in complex background tracking scenarios. The recursive Bayesian model ensures the learned metric is accurate with limited training samples. The experimental results demonstrate the comparable performance of the TVML tracker compared to state-of-the-art methods, especially when there are background clutters.
Qiao, M, Xu, RYD, Bian, W & Tao, D 2016, 'Fast sampling for time-varying Determinantal Point Processes', ACM Transactions on Knowledge Discovery from Data, vol. 11, no. 1.View/Download from: Publisher's site
© 2016 ACM. Determinantal Point Processes (DPPs) are stochastic models which assign each subset of a base dataset with a probability proportional to the subset's degree of diversity. It has been shown that DPPs are particularly appropriate in data subset selection and summarization (e.g., news display, video summarizations). DPPs prefer diverse subsets while other conventional models cannot offer. However, DPPs inference algorithms have a polynomial time complexity which makes it difficult to handle large and time-varying datasets, especially when real-time processing is required. To address this limitation, we developed a fast sampling algorithm for DPPs which takes advantage of the nature of some time-varying data (e.g., news corpora updating, communication network evolving), where the data changes between time stamps are relatively small. The proposed algorithm is built upon the simplification of marginal density functions over successive time stamps and the sequential Monte Carlo (SMC) sampling technique. Evaluations on both a real-world news dataset and the Enron Corpus confirm the efficiency of the proposed algorithm.
Peng, F, Lu, J, Wang, Y, Xu, RYD, Ma, C & Yang, J 2016, 'N-dimensional Markov random field prior for cold-start recommendation', Neurocomputing, vol. 191, pp. 187-199.View/Download from: UTS OPUS or Publisher's site
© 2016 Elsevier B.V. A recommender system is a commonly used technique to improve user experience in e-commerce applications. One of the popular recommender methods is Matrix Factorization (MF) that learns the latent profile of both users and items. However, if the historical ratings are not available, the latent profile will draw from a zero-mean Gaussian prior, resulting in uninformative recommendations. To deal with this issue, we propose using an n-dimensional Markov random field as the prior of matrix factorization (called mrf-MF). In the Markov random field, the attribute (such as age, occupation of users and genre, release year of items) is considered as the site and the latent profile, the random variable. Through the prior, new users or items will be recommended according to its neighbors. The proposed model is suitable for three types of cold-start recommendation: (1) recommend new items to existing users; (2) recommend new users for existing items; (3) recommend new items to new users. The proposed model is assessed on two movie datasets, Movielens 100K and Movielens 1M. Experimental results show that it can effectively solve each of the three cold-start problems and outperforms several matrix factorization based methods.
Fan, X, Cao, L & Xu, RYD 2015, 'Dynamic Infinite Mixed-Membership Stochastic Blockmodel.', IEEE transactions on neural networks and learning systems, vol. 26, no. 9, pp. 2072-2085.View/Download from: UTS OPUS or Publisher's site
Directional and pairwise measurements are often used to model interactions in a social network setting. The mixed-membership stochastic blockmodel (MMSB) was a seminal work in this area, and its ability has been extended. However, models such as MMSB face particular challenges in modeling dynamic networks, for example, with the unknown number of communities. Accordingly, this paper proposes a dynamic infinite mixed-membership stochastic blockmodel, a generalized framework that extends the existing work to potentially infinite communities inside a network in dynamic settings (i.e., networks are observed over time). Additional model parameters are introduced to reflect the degree of persistence among one's memberships at consecutive time stamps. Under this framework, two specific models, namely mixture time variant and mixture time invariant models, are proposed to depict two different time correlation structures. Two effective posterior sampling strategies and their results are presented, respectively, using synthetic and real-world data.
Kemp, M & Xu, RYD 2015, 'Geometrically-constrained balloon fitting for multiple connected ellipses', Pattern Recognition, vol. 48, no. 7, pp. 2198-2208.View/Download from: UTS OPUS or Publisher's site
Copyright © 2015 Published by Elsevier Ltd. All rights reserved. This paper presents a framework to fit data to a model consisting of multiple connected ellipses. For each iteration of the fitting algorithm, the representation of the multiple ellipses is mapped to a Gaussian mixture model (GMM) and the connections are mapped to geometric constraints for the GMM. The fitting is a modified constrained expectation maximisation (EM) method on the GMM (maximising with respect to the ellipse parameters rather than Gaussian parameters). A key modification is that the precision of the chosen GMM is increased at each iteration. This is similar to slowly inflating a bunch of connected balloons and so this is called balloon fitting. Extensions of the framework to other constraints and possible pre-processing are also discussed. The superiority of balloon fitting is demonstrated through experiments on several silhouettes with noisy edges which compare other existing methods with balloon fitting and some of the extensions. Crown
Qiao, M, Bian, W, Xu, RYD & Tao, D 2015, 'Diversified Hidden Markov Models for Sequential Labeling', IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 11, pp. 2947-2960.View/Download from: Publisher's site
© 2015 IEEE. Labeling of sequential data is a prevalent meta-problem for a wide range of real world applications. While the first-order Hidden Markov Models (HMM) provides a fundamental approach for unsupervised sequential labeling, the basic model does not show satisfying performance when it is directly applied to real world problems, such as part-of-speech tagging (PoS tagging) and optical character recognition (OCR). Aiming at improving performance, important extensions of HMM have been proposed in the literatures. One of the common key features in these extensions is the incorporation of proper prior information. In this paper, we propose a new extension of HMM, termed diversified Hidden Markov Models (dHMM), which utilizes a diversity-encouraging prior over the state-transition probabilities and thus facilitates more dynamic sequential labellings. Specifically, the diversity is modeled by a continuous determinantal point process prior, which we apply to both unsupervised and supervised scenarios. Learning and inference algorithms for dHMM are derived. Empirical evaluations on benchmark datasets for unsupervised PoS tagging and supervised OCR confirmed the effectiveness of dHMM, with competitive performance to the state-of-the-art.
Zare Borzeshi, E, Concha, OP, Xu, R & Piccardi, M 2013, 'Joint Action Segmentation and Classification by an Extended Hidden Markov Model', IEEE Signal Processing Letters, vol. 20, no. 12, pp. 1207-1210.View/Download from: UTS OPUS or Publisher's site
Hidden Markov models (HMMs) provide joint segmentation and classification of sequential data by efficient inference algorithms and have therefore been employed in fields as diverse as speech recognition, document processing, and genomics. However, conven
Xu, R & Kemp, M 2010, 'Fitting Multiple Connected Ellipses To An Image Silhouette Hierarchically', IEEE Transactions On Image Processing, vol. 19, no. 7, pp. 1673-1682.View/Download from: UTS OPUS or Publisher's site
In this paper, we seek to fit a model, specified in terms of connected ellipses, to an image silhouette. Some algorithms that have attempted this problem are sensitive to initial guesses and also may converge to a wrong solution when they attempt to mini
Xu, R & Kemp, M 2010, 'An Iterative Approach for Fitting Multiple Connected Ellipse Structure to Silhouette', Pattern Recognition Letters, vol. 31, no. 13, pp. 1860-1867.View/Download from: UTS OPUS or Publisher's site
In many image processing applications, the structures conveyed in the image contour can often be described by a set of connected ellipses. Previous fitting methods to align the connected ellipse structure with a contour, in general, lack a continuous solution space. In addition, the solution obtain often satisfies only a partial number of ellipses, leaving others with poor fits. In this paper, we address these two problems by presenting an iterative framework for fitting a 2D silhouettte contour to a pre-specified connected ellipses structure with a very coarse initial guess. Under the proposed framework, we first improve the initial guess by modelling the silhouette region as set of disconnected ellipses using mixture of Gaussian densities or the heuristics approaches. Then, an iterative method is applied in a similar fashion to the Iterative Closest Point (ICP) (Alshawa, 2007; Li and Griffiths, 2000; Besl and McKay, 1992) algorithm. Each iteration contains two parts: first part is to assighn all the contour points to the individual unconnected ellipses, which we refer to as the segmentation step and the second part is the non-linear least square approach that minimizes both the sum of the square distance between the countour points and ellipse's edge as well as minimizing the ellipse's vertex pair(s) distances, which we refer to as the minimization step. We illustrate the effectiveness of our menthods through experimental result on several images as well as applying the algorithm to a mini database of human upper-body images.
Qiao, M, Bian, W, Xu, RYD & Tao, D 2016, 'Diversified hidden Markov models for sequential labeling', 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, pp. 1512-1513.View/Download from: Publisher's site
© 2016 IEEE. Labeling of sequential data is a prevalent metaproblem in a wide range of real world applications. A first-order hidden Markov model (HMM) provides a fundamental approach for sequential labeling. However, it does not show satisfactory performance for real world problems, such as optical character recognition (OCR). Aiming at addressing this problem, important extensions of HMM have been proposed in literature. One of the common key features in these extensions is the incorporation of proper prior information. In this paper, we propose a new extension of HMM, termed diversified hidden Markov models (dHMM), with incorporating a diversity-encouraging prior. The prior is added over the state-transition probabilities and thus facilitates more dynamic sequential labelling. Specifically, the diversity is modeled with a continuous determinantal point process. An EM framework for parameter learning and MAP inference is derived, and empirical evaluation on OCR dataset verifies its effectiveness.
Li, Q, Bian, W, Xu, Y, You, J & Tao, D 2016, 'Random Mixed Field Model for Mixed-Attribute Data Restoration', Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI, Phoenix, Arizona, USA, pp. 1244-1250.View/Download from: UTS OPUS
Noisy and incomplete data restoration is a critical preprocessing step in developing effective learning algorithms, which targets to reduce the effect of noise and missing values in data. By utilizing attribute correlations and/or instance similarities, various techniques have been developed for data denoising and imputation tasks. However, current existing data restoration methods are either specifically designed for a particular task, or incapable of dealing with mixed-attribute data. In this paper, we develop a new probabilistic model to provide a general and principled method for restoring mixed-attribute data. The main contributions of this study are twofold: a) a unified generative model, utilizing a generic random mixed field (RMF) prior, is designed to exploit mixed-attribute correlations; and b) a structured mean-field variational approach is proposed to solve the challenging inference problem of simultaneous denoising and imputation. We evaluate our method by classification experiments on both synthetic data and real benchmark datasets. Experiments demonstrate, our approach can effectively improve the classification accuracy of noisy and incomplete data by comparing with other data restoration methods.
Peng, F, Lu, X, Lu, J, Xu, RYD, Luo, C, Ma, C & Yang, J 2016, 'Metricrec: Metric learning for cold-start recommendations', Advanced Data Mining and Applications (LNAI), International Conference on Advanced Data Mining and Applications, Springer, Gold Coast, QLD, Australia, pp. 445-458.View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing AG 2016.Making recommendations for new users is a challenging task of cold-start recommendations due to the absence of historical ratings. When the attributes of users are available, such as age, occupation and gender, then new users' preference can be inferred. Inspired by the user based collaborative filtering in warm-start scenario, we propose using the similarity on attributes to conduct recommendations for new users. Two basic similarity metrics, cosine and Jaccard, are evaluated for cold-start. We also propose a novel recommendation model, MetricRec, that learns an interest-derived metric such that the users with similar interests are close to each other in the attribute space. As the MetricRec's feasible area is conic, we propose an efficient Interior-point Stochastic Gradient Descent (ISGD) method to optimize it. During the optimizing process, the metric is always guaranteed in the feasible area. Owing to the stochastic strategy, ISGD possesses scalability. Finally, the proposed models are assessed on two movie datasets, Movielens-100K and Movielens-1M. Experimental results demonstrate that MetricRec can effectively learn the interest-derived metric that is superior to cosine and Jaccard, and solve the cold-start problem effectively.
Fan, X, Xu, RYD & Cao, L 2016, 'Copula mixed-membership stochastic block model', IJCAI International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, AAAI Press / International Joint Conferences on Artificial Intelligence, New York City, New York, United States, pp. 1462-1468.View/Download from: UTS OPUS
The Mixed-Membership Stochastic Blockmodels (MMSB) is a popular framework for modelling social relationships by fully exploiting each individual node's participation (or membership) in a social network. Despite its powerful representations, MMSB assumes that the membership indicators of each pair of nodes (i.e., people) are distributed independently. However, such an assumption often does not hold in real-life social networks, in which certain known groups of people may correlate with each other in terms of factors such as their membership categories. To expand MMSB's ability to model such dependent relationships, a new framework - a Copula Mixed-Membership Stochastic Blockmodel - is introduced in this paper for modeling intra-group correlations, namely an individual Copula function jointly models the membership pairs of those nodes within the group of interest. This framework enables various Copula functions to be used on demand, while maintaining the membership indicator's marginal distribution needed for modelling membership indicators with other nodes outside of the group of interest. Sampling algorithms for both the finite and infinite number of groups are also detailed. Our experimental results show its superior performance in capturing group interactions when compared with the baseline models on both synthetic and real world datasets.
Li, M, Xu, YD & He, XJ 2015, 'Face hallucination based on Nonparametric Bayesian learning', Proceedings of IEEE International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Quebec City, Canada, pp. 986-990.View/Download from: UTS OPUS or Publisher's site
In this paper, we propose a novel example-based face hallucination method through nonparametric Bayesian learning based on the assumption that human faces have similar local pixel structure. We cluster the low resolution (LR) face image patches by nonparametric method distance dependent Chinese Restaurant process (ddCRP) and calculate the centres of the clusters (i.e., subspaces). Then, we learn the mapping coefficients from the LR patches to high resolution (HR) patches in each subspace. Finally, the HR patches of input low resolution face image can be efficiently generated by a simple linear regression. The spatial distance constraint is employed to aid the learning of subspace centers so that every subspace will better reflect the detailed information of image patches. Experimental results show our method is efficient and promising for face hallucination.
Xuan, J, Lu, J, Zhang, G, Xu, RYD & Luo, X 2015, 'Infinite author topic model based on mixed gamma-negative binomial process', Proceedings - IEEE International Conference on Data Mining, ICDM, IEEE International Conference on Data Mining, IEEE, Atlantic City, USA, pp. 489-498.View/Download from: UTS OPUS or Publisher's site
Incorporating the side information of text corpus, i.e., authors, time stamps, and emotional tags, into the traditionaltext mining models has gained significant interests in the area of information retrieval, statistical natural language processing, andmachine learning. One branch of these works is the so-called Author Topic Model (ATM), which incorporates the authors'sinterests as side information into the classical topic model. However, the existing ATM needs to predefine the number of topics, which is difficult and inappropriate in many real-world settings. In this paper, we propose an Infinite Author Topic (IAT) modelto resolve this issue. Instead of assigning a discrete probability on fixed number of topics, we use a stochastic process to determinethe number of topics from the data itself. To be specific, we extend a gamma-negative binomial process to three levels in orderto capture the author-document-keyword hierarchical structure. Furthermore, each document is assigned a mixed gamma processthat accounts for the multi-author's contribution towards this document. An efficient Gibbs sampling inference algorithm witheach conditional distribution being closed-form is developed for the IAT model. Experiments on several real-world datasets showthe capabilities of our IAT model to learn the hidden topics, authors' interests on these topics and the number of topicssimultaneously.
Bargi, A, Xu, RYD & Piccardi, M 2014, 'An Infinite Adaptive Online Learning Model for Segmentation and Classification of Streaming Data', Proceedings - International Conference on Pattern Recognition, International Conference on Pattern Recognition, IEEE Computer Society, Stockholm, Sweden.View/Download from: UTS OPUS or Publisher's site
In recent years, the desire and need to understand streaming data has been increasing. Along with the constant flow of data, it is critical to classify and segment the observations on-the-fly without being limited to a rigid number of classes. In other words, the system needs to be adaptive to the streaming data and capable of updating its parameters to comply with natural changes. This interesting problem, however, is poorly addressed in the literature, as many of the common studies focus on offline classification over a pre-defined class set. In this paper, we propose a novel adaptive online system based on Markov switching models with hierarchical Dirichlet process priors. This infinite adaptive online approach is capable of segmenting and classifying the streaming data over infinite classes, while meeting the memory and delay constraints of streaming contexts. The model is further enhanced by a 'predictive batching' mechanism, that is able to divide the flowing data into batches of variable size, imitating the ground-truth segments. Experiments on two video datasets show significant performance of the proposed approach in frame-level accuracy, segmentation recall and precision, while determining the accurate number of classes in acceptable computational time.
Bargi, A, Xu, RYD, Ghahramani, Z & Piccardi, M 2014, 'A Non-parametric Conditional Factor Regression Model for Multi-Dimensional Input and Response', Journal of Machine Learning Research, International Conference on Artificial Intelligence and Statistics, JMLR, Reykjavik, Iceland, pp. 77-85.View/Download from: UTS OPUS
Bargi, A, Xu, R & Piccardi, M 2012, 'An online HDP-HMM for joint action segmentation and classification in motion capture data', 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Providence RI, USA, pp. 1-7.View/Download from: UTS OPUS or Publisher's site
Since its inception, action recognition research has mainly focused on recognizing actions from closed, predefined sets of classes. Conversely, the problem of recognizing actions from open, possibly incremental sets of classes is still largely unexplored. In this paper, we propose a novel online method based on the âstickyâ hierarchical Dirichlet process and the hidden Markov model [11, 5]. This approach, labelled as the online HDP-HMM, provides joint segmentation and classification of actions while a) processing the data in an online, recursive manner, b) discovering new classes as they occur, and c) adjusting its parameters over the streaming data. In a set of experiments, we have applied the online HDP-HMM to recognize actions from motion capture data from the TUM kitchen dataset, a challenging dataset of manipulation actions in a kitchen . The results show significant accuracy in action classification, time segmentation and determination of the number of action classes
Zare Borzeshi, E, Piccardi, M & Xu, R 2011, 'A Discriminative Prototype Selection Approach for Graph Embedding in Human Action Recognition', Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshop), IEEE International Conference on Computer Vision, IEEE Computer Society, Barcelona Spain, pp. 1295-1301.View/Download from: UTS OPUS or Publisher's site
This paper proposes a novel graph-based method for representing a human's shape during the performance of an action. Despite their strong representational power, graphs are computationally cumbersome for pattern analysis. One way of circumventing this problem is that of transforming the graphs into a vector space by means of graph embedding. Such an embedding can be conveniently obtained by way of a set of 'prototype' graphs and a dissimilarity measure: yet, the critical step in this approach is the selection of a suitable set of prototypes which can capture both the salient structure within each action class as well as the intra-class variation. This paper proposes a new discriminative approach for the selection of prototypes which maximizes a function of the inter- and intra-class distances. Experiments on an action recognition dataset reported in the paper show that such a discriminative approach outperforms well-established prototype selection methods such as center, border and random prototype selection.
Concha, OP, Xu, R, Piccardi, M & Moghaddam, Z 2011, 'HMM-MIO: An Enhanced Hidden Markov Model for Action Recognition', 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, IEEE Computer Society, Colorado Spring, CO, pp. 62-69.View/Download from: UTS OPUS or Publisher's site
Generative models can be flexibly employed in a variety of tasks such as classification, detection and segmen- tation thanks to their explicit modelling of likelihood functions. However, likelihood functions are hard to model accurately in many real cases. In this paper, we present an enhanced hidden Markov model capable of dealing with the noisy, high-dimensional and sparse measurements typical of action feature sets. The modified model, named hid- den Markov model with multiple, independent observations (HMM-MIO), joins: a) robustness to observation outliers, b) dimensionality reduction, and c) processing of sparse observations. In the paper, a set of experimental results over the Weizmann and KTH datasets shows that this model can be tuned to achieve classification accuracy comparable to that of discriminative classifiers. While discriminative ap- proaches remain the natural choice for classification tasks, our results prove that likelihoods, too, can be modelled to a high level of accuracy. In the near future, we plan extension of HMM-MIO along the lines of infinite Markov models and its integration into a switching model for continuous human action recognition.
Zare Borzeshi, E, Xu, R & Piccardi, M 2011, 'Automatic Human Action Recognition in Video by Graph Embedding', Lecture Notes in Computer Science.Image Analysis and Processing - ICIAP 2011.16th International Conference Part II, International Conference on Image Analysis and Processing, Springer-Verlag, Ravenna, Italy, pp. 19-28.View/Download from: UTS OPUS or Publisher's site
The problem of human action recognition has received increasing attention in recent years for its importance in many applications. Yet, the main limitation of current approaches is that they do not capture well the spatial relationships in the subject performing the action. This paper presents an initial study which uses graphs to represent the actorâs shape and graph embedding to then convert the graph into a suitable feature vector. In this way, we can benefit from the wide range of statistical classifiers while retaining the strong representational power of graphs. The paper shows that, although the proposed method does not yet achieve accuracy comparable to that of the best existing approaches, the embedded graphs are capable of describing the deformable human shape and its evolution along the time. This confirms the interesting rationale of the approach and its potential for future performance.
Concha, OP, Xu, R & Piccardi, M 2010, 'Robust Dimensionality Reduction for Human Action Recognition', Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), Digital Image Computing: Techniques and Applications, IEEE Computer Society, Sydney, Australia, pp. 349-356.View/Download from: UTS OPUS or Publisher's site
Human action recognition can be approached by combining an action-discriminative feature set with a classifier. However, the dimensionality of typical feature sets joint with that of the time dimension often leads to a curse-of-dimensionality situation. Moreover, the measurement of the feature set is subject to sometime severe errors. This paper presents an approach to human action recognition based on robust dimensionality reduction. The observation probabilities of hidden Markov models (HMM) are modelled by mixtures of probabilistic principal components analyzers and mixtures of t-distribution sub-spaces, and compared with conventional Gaussian mixture models. Experimental results on two datasets show that dimensionality reduction helps improve the classification accuracy and that the heavier-tailed t-distribution can help reduce the impact of outliers generated by segmentation errors.
Concha, OP, Xu, R & Piccardi, M 2010, 'Compressive Sensing of Time Series for Human Action Recognition', Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), Digital Image Computing: Techniques and Applications, IEEE Computer Society, Sydney, Australia, pp. 454-461.View/Download from: UTS OPUS or Publisher's site
Compressive Sensing (CS) is an emerging signal processing technique where a sparse signal is reconstructed from a small set of random projections. In the recent literature, CS techniques have demonstrated promising results for signal compression and reconstruction [9, 8, 1]. However, their potential as dimensionality reduction techniques for time series has not been significantly explored to date. To this aim, this work investigates the suitability of compressive-sensed time series in an application of human action recognition. In the paper, results from several experiments are presented: (1) in a first set of experiments, the time series are transformed into the CS domain and fed into a hidden Markov model (HMM) for action recognition; (2) in a second set of experiments, the time series are explicitly reconstructed after CS compression and then used for recognition; (3) in the third set of experiments, the time series are compressed by a hybrid CS-Haar basis prior to input into HMM; (4) in the fourth set, the time series are reconstructed from the hybrid CS-Haar basis and used for recognition. We further compare these approaches with alternative techniques such as sub-sampling and filtering. Results from our experiments show unequivocally that the application of CS does not degrade the recognition accuracy; rather, it often increases it. This proves that CS can provide a desirable form of dimensionality reduction in pattern recognition over time series.
Benter, A, Xu, R, Moore, W, Antolovich, M & Gao, J 2009, 'Fragment size detection within homogeneous material using ground penetrating radar', 2009 International Radar Conference "Surveillance for a Safer World", RADAR 2009.
Ground Penetrating Radar (GPR) offers the ability to observe the internal structure of a pile of rocks. Large fragments within the pile may not be visible on the surface. Determining these large fragment sizes before collection can improve mine productivity. This research has examined the potential to identify objects where the background media and the object exhibit the same dielectric properties. Preliminary results are presented which show identification is possible using standard GPR equipment.
Xu, RYD & Kemp, M 2009, 'Multiple curvature based approach to human upper body parts detection with connected ellipse model fine-tuning', Proceedings - International Conference on Image Processing, ICIP, pp. 2577-2580.View/Download from: Publisher's site
In this paper, we discuss an effective method for detecting human upper body parts from a 2D image silhouette using curvature analysis and ellipse fitting. First we smooth the silhouette so that we can determine just the global features: the head, hands and armpits. Next we reduce the smoothing to detect the local features of the neck and elbows. We model the human upper body by multiple connected ellipses. Thus we segment the body by the extracted features. Ellipses are fitted to each segment. Lastly, we apply a non-linear least square method to minimize the differences between the connected ellipse model and the edge of the silhouette. ©2009 IEEE.
Da Xu, RY, Gao, J & Antolovich, M 2008, 'Novel methods for high-resolution facial image capture using calibrated PTZ and static cameras', 2008 IEEE International Conference on Multimedia and Expo, ICME 2008 - Proceedings, pp. 45-48.View/Download from: Publisher's site
In many machine vision applications, a set of static and Pan-Tilt-Zoom (PTZ) cameras are used to capture a sequence of high-resolution facial images of a moving person. In this paper, we present our implementation of such a system. We emphasis two novelties in our work; the first one is our efficient PTZ camera calibration technique using handdrawn gridlines. The second one is our head position estimation technique using the Gaussian Mixture Model (GMM) and variance analysis of the foreground blob regions. © 2008 IEEE.
Xu, RYD, Brown, JM, Traish, JM & Dezwa, D 2008, 'A computer vision based camera pedestal's vertical motion control', Proceedings - International Conference on Pattern Recognition.
Traditional camera pedestals are manually operated. Our long term goal is to construct a fully autonomous pedestal system which can respond to changes in a scene and mimicking the human camera operator. In this paper, we discuss our experiments to control the vertical motion of a pedestal by leveling its position with a human head or a tracked hand-held object. We describe a set of computer vision methods used in these experiments, including the head position tracking using Gaussian Mixture Model (GMM) of the foreground blob and hand-held object tracking using Continuously Adaptive Mean shift (CAM-shift) with motion initialization. We also discuss the application of Kalman Filter and showing its effect in the reduction of the number of jittering pedestal motions. © 2008 IEEE.
Conventional whiteboard video capture using a static camera usually results in a poor quality. In this paper, we present an autonomous whiteboard scan and capture prototype system, which consist a pair of static and Pan-Tilt-Zoom (PTZ) cameras. The PTZ camera is used to scan the newly-updated whiteboard regions without interrupting the instructor. We will illustrate several computer vision techniques used in our system: Firstly, we present our unique camera calibration method using rough hand-drawn gridlines. Secondly, we present the image processing methods used to determine where the newly updated whiteboard region to be scanned is. Our method also accounts for the whiteboard region occlusion from the instructor.
Gao, J & Xu, RY 2007, 'Mixture of the robust L1 distributions and its applications', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 26-35.
Recently a robust probabilistic L1-PCA model was introduced in  by replacing the conventional Gaussian noise model with the Laplacian L1 model. Due to the heavy tail characteristics of the L1 distribution, the proposed model is more robust against data outliers. In this paper, we generalized the L1-PCA into a mixture of L1-distributions so that the model can be used for possible multiclustering data. For the model learning we use the property that the L1 density can be expanded as a superposition of infinite number of Gaussian densities to include a tractable Bayesian learning and inference based on the variational EM-type algorithm. © Springer-Verlag Berlin Heidelberg 2007.
Xu, RYD & Jin, JS 2006, 'Individual object interaction for camera control and multimedia synchronization', ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
In recent times, most of the computer-vision assisted automatic camera control policies are based on human events, such as speaker position changes. In addition to these events, in this paper, we introduce a set of natural camera control and multimedia synchronization schemes based on individual object interaction. We present our methods in detail, including head-pose calculation and laser pointer guidance, which are used to estimate the region of interest (ROI) for both hand-held and object-at-distance. We explain, from our results, of how these set of approaches have achieved robustness, efficiency and unambiguous object interaction during real-time video shooting. © 2006 IEEE.
Lim, CC, Da Xu, RY, Yu, H & Jin, JJ 2005, 'Streaming web-lecturing and synchronized web browsing', Proceedings of the IASTED International Conference on Web-Based Education, WBE 2005, pp. 18-21.
Developments of e-learning technologies are generating great impact in the field of education services to overcome geographical displace and improve the collaborative group work environment. Previous researches about e-learning system have introduced many concepts to generate fundamental groundwork such as tools for managing the contents of course materials, students' records and assessments information. However, the great potential of e-learning technologies according to web lecturing and collaborative group work can still be explored for maximizing the interactions among users. This paper proposed a web-based multimedia system for e-learning management application with an advanced streaming web lecturing module and an additional of a communication tools developed according to the concept of Computer Supported Cooperative Work (CSCW). With the ability of tracking multiple objects simultaneously, the proposed streaming web lecturing can satisfy and improve the requirements of a virtual classroom. The efficiency of collaboration and consultation is increased by providing an online chat session and simultaneous group web browsing environment that user could aware the presence of other participants.