Visual tracking is a critical task in many computer vision applications such as surveillance and robotics. However, although the robustness to local corruptions has been improved, prevailing trackers are still sensitive to large scale corruptions, such as occlusions and illumination variations. In this paper, we propose a novel robust object tracking technique depends on subspace learning-based appearance model. Our contributions are twofold. First, mask templates produced by frame difference are introduced into our template dictionary. Since the mask templates contain abundant structure information of corruptions, the model could encode information about the corruptions on the object more efficiently. Meanwhile, the robustness of the tracker is further enhanced by adopting system dynamic, which considers the moving tendency of the object. Second, we provide the theoretic guarantee that by adapting the modulated template dictionary system, our new sparse model can be solved by the accelerated proximal gradient algorithm as efficient as in traditional sparse tracking methods. Extensive experimental evaluations demonstrate that our method significantly outperforms 21 other cutting-edge algorithms in both speed and tracking accuracy, especially when there are challenges such as pose variation, occlusion, and illumination changes.
Li, J., Mei, X., Prokhorov, D. & Tao, D. 2017, 'Deep Neural Network for Structural Prediction and Lane Detection in Traffic Scene', IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 690-703.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE. Hierarchical neural networks have been shown to be effective in learning representative image features and recognizing object classes. However, most existing networks combine the low/middle level cues for classification without accounting for any spatial structures. For applications such as understanding a scene, how the visual cues are spatially distributed in an image becomes essential for successful analysis. This paper extends the framework of deep neural networks by accounting for the structural cues in the visual signals. In particular, two kinds of neural networks have been proposed. First, we develop a multitask deep convolutional network, which simultaneously detects the presence of the target and the geometric attributes (location and orientation) of the target with respect to the region of interest. Second, a recurrent neuron layer is adopted for structured visual detection. The recurrent neurons can deal with the spatial distribution of visible cues belonging to an object whose shape or structure is difficult to explicitly define. Both the networks are demonstrated by the practical task of detecting lane boundaries in traffic scenes. The multitask convolutional neural network provides auxiliary geometric information to help the subsequent modeling of the given lane structures. The recurrent neural network automatically detects lane boundaries, including those areas containing no marks, without any explicit prior knowledge or secondary modeling.
Li, J., Lin, X., Rui, X., Rui, Y. & Tao, D. 2015, 'A Distributed Approach Toward Discriminative Distance Metric Learning', IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 9, pp. 2111-2122.View/Download from: Publisher's site
Distance metric learning (DML) is successful in
discovering intrinsic relations in data. However, most algorithms
are computationally demanding when the problem size becomes
large. In this paper, we propose a discriminative metric learning
algorithm, develop a distributed scheme learning metrics on
moderate-sized subsets of data, and aggregate the results into
a global solution. The technique leverages the power of parallel
computation. The algorithm of the aggregated DML (ADML)
scales well with the data size and can be controlled by the
partition. We theoretically analyze and provide bounds for the
error induced by the distributed treatment. We have conducted
experimental evaluation of the ADML, both on specially designed
tests and on practical image annotation tasks. Those tests have
shown that the ADML achieves the state-of-the-art performance
at only a fraction of the cost incurred by most existing methods.
Li, J. & Tao, D. 2013, 'A Bayesian Hierarchical Factorization Model for Vector Fields', IEEE Transactions On Image Processing, vol. 22, no. 11, pp. 4510-4521.View/Download from: UTS OPUS or Publisher's site
Factorization-based techniques explain arrays of observations using a relatively small number of factors and provide an essential arsenal for multi-dimensional data analysis. Most factorization models are, however, developed on general arrays of scalar v
Li, J. & Tao, D. 2013, 'Exponential Family Factors For Bayesian Factor Analysis', IEEE Transactions On Neural Networks And Learning Systems, vol. 24, no. 6, pp. 964-976.View/Download from: UTS OPUS or Publisher's site
Expressing data as linear functions of a small number of unknown variables is a useful approach employed by several classical data analysis methods, e.g., factor analysis, principal component analysis, or latent semantic indexing. These models represent the data using the product of two factors. In practice, one important concern is how to link the learned factors to relevant quantities in the context of the application. To this end, various specialized forms of the factors have been proposed to improve interpretability. Toward developing a unified view and clarifying the statistical significance of the specialized factors, we propose a Bayesian model family. We employ exponential family distributions to specify various types of factors, which provide a unified probabilistic formulation. A Gibbs sampling procedure is constructed as a general computation routine. We verify the model by experiments, in which the proposed model is shown to be effective in both emulating existing models and motivating new model designs for particular problem settings.
Principal component analysis (PCA) is a widely used model for dimensionality reduction. In this paper, we address the problem of determining the intrinsic dimensionality of a general type data population by selecting the number of principal components for a generalized PCA model. In particular, we propose a generalized Bayesian PCA model, which deals with general type data by employing exponential family distributions. Model selection is realized by empirical Bayesian inference of the model. We name the model as simple exponential family PCA (SePCA), since it embraces both the principal of using a simple model for data representation and the practice of using a simplified computational procedure for the inference. Our analysis shows that the empirical Bayesian inference in SePCA formally realizes an intuitive criterion for PCA model selection - a preserved principal component must sufficiently correlate to data variance that is uncorrelated to the other principal components. Experiments on synthetic and real data sets demonstrate effectiveness of SePCA and exemplify its characteristics for model selection.
Li, J., Bian, W., Tao, D. & Zhang, C. 2013, 'Learning Colours From Textures By Sparse Manifold Embedding', Signal Processing, vol. 93, no. 6, pp. 1485-1495.View/Download from: UTS OPUS or Publisher's site
The capability of inferring colours from the texture (grayscale contents) of an image is useful in many application areas, when the imaging device/environment is limited. Traditional manual or limited automatic colour assignment involves intensive human
Li, J. & Tao, D. 2012, 'On Preserving Original Variables in Bayesian PCA with Application to Image Analysis', IEEE Transactions On Image Processing, vol. 21, no. 12, pp. 4830-4843.View/Download from: UTS OPUS or Publisher's site
Principal component analysis (PCA) computes a succinct data representation by converting the data to a few new variables while retaining maximum variation. However, the new variables are dif?cult to interpret, because each one is combined with all of the original input variables and has obscure semantics. Under the umbrella of Bayesian data analysis, this paper presents a new prior to explicitly regularize combinations of input variables. In particular, the prior penalizes pair-wise products of the coef?cients of PCA and encourages a sparse model. Compared to the commonly used 1 -regularizer, the proposed prior encourages the sparsity pattern in the resultant coef?cients to be consistent with the intrinsic groups in the original input variables. Moreover, the proposed prior can be explained as recovering a robust estimation of the covariance matrix for PCA. The proposed model is suited for analyzing visual data, where it encourages the output variables to correspond to meaningful parts in the data. We demonstrate the characteristics and effectiveness of the proposed technique through experiments on both synthetic and real data.
Li, J., Tao, D. & Li, X. 2012, 'A probabilistic model for image representation via multiple patterns', Pattern Recognition, vol. 45, no. 11, pp. 4044-4053.View/Download from: UTS OPUS or Publisher's site
For image analysis, an important extension to principal component analysis (PCA) is to treat an image as multiple samples, which helps alleviate the small sample size problem. Various schemes of transforming an image to multiple samples have been proposed. Although having been shown effective in practice, the schemes are mainly based on heuristics and experience. In this paper, we propose a probabilistic PCA model, in which we explicitly represent the transformation scheme and incorporate the scheme as a stochastic component of the model. Therefore fitting the model automatically learns the transformation. Moreover, the learned model allows us to distinguish regions that can be well described by the PCA model from those that need further treatment. Experiments on synthetic images and face data sets demonstrate the properties and utility of the proposed model
Zhang, Z., Cheng, J., Li, J., Bian, W. & Tao, D. 2012, 'Segment-Based Features for Time Series Classification', Computer Journal, vol. 55, no. 9, pp. 1088-1102.View/Download from: UTS OPUS or Publisher's site
In this paper, we propose an approach termed segment-based features (SBFs) to classify time series. The approach is inspired by the success of the component- or part-based methods of object recognition in computer vision, in which a visual object is described as a number of characteristic parts and the relations among the parts. Utilizing this idea in the problem of time series classification, a time series is represented as a set of segments and the corresponding temporal relations. First, a number of interest segments are extracted by interest point detection with automatic scale selection. Then, a number of feature prototypes are collected by random sampling from the segment set, where each feature prototype may include single segment or multiple ordered segments. Subsequently, each time series is transformed to a standard feature vector, i.e. SBF, where each entry in the SBF is calculated as the maximum response (maximum similarity) of the corresponding feature prototype to the segment set of the time series.
Kang, G., Li, J. & Tao, D. 2016, 'Shakeout: A New Regularized Deep Neural Network Training Scheme', Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), AAAI Conference on Artificial Intelligence, AAAI, Phoenix, USA, pp. 1751-1757.
Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. The invention of effective training techniques largely contributes to this success. The so-called "Dropout" training scheme is one of the most powerful tool to reduce over-fitting. From the statistic point of view, Dropout works by implicitly imposing an L2 regularizer on the weights. In this paper, we present a new training scheme: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, our method randomly chooses to enhance or inverse the contributions of each unit to the next layer. We show that our scheme leads to a combination of L1 regularization and L2 regularization imposed on the weights, which has been proved effective by the Elastic Net models in practice.We have empirically evaluated the Shakeout scheme and demonstrated that sparse network weights are obtained via Shakeout training. Our classification experiments on real-life image datasets MNIST and CIFAR-10 show that Shakeout deals with over-fitting effectively.
Zhang, Z., Huang, K., Tan, T., Yang, P. & Li, J. 2016, 'ReD-SFA: Relation Discovery Based Slow Feature Analysis for Trajectory Clustering', Proceedings for the Conference on Computer Vision and Pattern Recognition, IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Piscataway, USA, pp. 752-760.View/Download from: UTS OPUS or Publisher's site
For spectral embedding/clustering, it is still an open
problem on how to construct an relation graph to reflect the
intrinsic structures in data. In this paper, we proposed an
approach, named Relation Discovery based Slow Feature
Analysis (ReD-SFA), for feature learning and graph construction
simultaneously. Given an initial graph with only
a few nearest but most reliable pairwise relations, new reliable
relations are discovered by an assumption of reliability
preservation, i.e., the reliable relations will preserve their
reliabilities in the learnt projection subspace. We formulate
the idea as a cross entropy (CE) minimization problem to
reduce the discrepancy between two Bernoulli distributions
parameterized by the updated distances and the existing
relation graph respectively. Furthermore, to overcome the
imbalanced distribution of samples, a Boosting-like strategy
is proposed to balance the discovered relations over all
clusters. To evaluate the proposed method, extensive experiments
are performed with various trajectory clustering
tasks, including motion segmentation, time series clustering
and crowd detection. The results demonstrate that ReDSFA
can discover reliable intra-cluster relations with high
precision, and competitive clustering performance can be
achieved in comparison with state-of-the-art
Li, J. & Tao, D. 2013, 'A Bayesian factorised covariance model for image analysis', International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, AAAI, Beijing, China, pp. 1465-1471.View/Download from: UTS OPUS
Li, J. & Tao, D. 2012, 'Sampling Normal Distribution Restricted on Multiple Regions', International Conference on Neural Information Processing, International Conference on Neural Information Processing, Springer-Verlag, Doha, Qatar, pp. 492-500.View/Download from: UTS OPUS
We develop an accept-reject sampler for probability densities that have the similar form of a normal density function, but supported on restricted regions. Compared to existing techniques, the proposed method deals with multiple disjoint regions, truncated on one or both sides. For the original problem of sampling from one region, the efficiency is enhanced as well. We verify the desirable attributes of the proposed algorithm by both theoretical analysis and simulation studies.
Li, J. & Tao, D. 2011, 'A Probabilistic Model for Discovering High Level Brain Activities from fMRI', Lecture Notes in Computer Science, International Conference on Neural Information Processing, Springer-Verlag, Shanghai, China, pp. 329-336.View/Download from: UTS OPUS
Functional magnetic resonance imaging (fMRI) has provided an invaluable method of investing real time neuron activities. Statistical tools have been developed to recognise the mental state from a batch of fMRI observations over a period. However, an interesting question is whether it is possible to estimate the real time mental states at each moment during the fMRI observation. In this paper, we address this problem by building a probabilistic model of the brain activity. We model the tempo-spatial relations among the hidden high-level mental states and observable low-level neuron activities. We verify our model by experiments on practical fMRI data. The model also implies interesting clues on the task-responsible regions in the brain.
Li, J. & Tao, D. 2011, 'Wisdom of Crowds: Single Image Super-resolution from the Web', ofWorkshop on Large Scale Visual Analytics with the IEEE International Conference on Data Mining, IEEE International Conference on Data Mining, IEEE- Computer Society, Vancouver, Canada, pp. 812-816.View/Download from: UTS OPUS
This paper addresses the problem of learning based single image super-resolution. Previous research on this problem employs human user to provide a set of images that are similar to the target image as a reference. Then the superresolution algorithm can learn from the provided reference images to predict the high resolution details for the target image. We propose a fully automatic scheme, which leverages the knowledge of the entire visual world and to query relevant references from the Internet. The proposed scheme is free of human supervision, and the performance compromise is small. We conduct experiments to show the effectiveness of the method.
Li, J., Bian, W., Tao, D. & Zhang, C. 2011, 'Learning Colours from Textures by Sparse Manifold Embedding', Lecture Notes in Artificial Intelligence.AI 2011: Advances in Artificial Intelligence.24th Australasian Joint Conference, Australasian Joint Conference on Artificial Intelligence, Springer-Verlag Berlin / Heidelberg, Perth, Australia, pp. 600-608.View/Download from: UTS OPUS or Publisher's site
The capability of inferring colours from the texture (grayscale contents) of an image is useful in many application areas, when the imaging device/environment is limited. Traditional colour assignment involves intensive human effort. Automatic methods have been proposed to establish relations between image textures and the corresponding colours. Existing research mainly focuses on linear relations. In this paper, we employ sparse constraints in the model of texture-colour relationship. The technique is developed on a locally linear model, which assumes manifold assumption of the distribution of the image data. Given the texture of an image patch, learning the model transfers colours to the texture patch by combining known colours of similar texture patches. The sparse constraint checks the contributing factors in the model and helps improve the stability of the colour transfer. Experiments show that our method gives superior results to those of the previous work.
Li, J. & Tao, D. 2010, 'An Exponential Family Extension to Principal Component Analysis', International Conference on Neural Information Processing 2011, International Conference on Neural Information Processing, Springer, Sydney, Australia, pp. 1-9.View/Download from: UTS OPUS
In this paper, we present a unified probabilistic model for constrained factorisation models, which employs exponential family distributions to represent the constrained factors. Our main objective is to provide a versatile framework, on which prototype models with various constraints can be implemented effortlessly. For learning the proposed stochastic model, Gibbs sampling is employed for model inference. We also demonstrate the utility and versatility of the model by experiments.
Li, J. & Tao, D. 2010, 'Boosted Dynamic Cognitive Activity Recognition from Brain Images', Proceedings - The 9th International Conference on Machine Learning and Applications, ICMLA 2010, International Conference on Machine Learning and Applications, IEEE, Washington, D.C., USA, pp. 361-366.View/Download from: UTS OPUS or Publisher's site
Functional Magnetic Resonance Imaging (fMRI) has become an important diagnostic tool for measuring brain haemodynamics. Previous research on analysing fMRI data mainly focuses on detecting low-level neuron activation from the ensued haemodynamic activities. An important recent advance is to show that the high-level cognitive status is recognisable from a period of fMRI records. Nevertheless, it would also be helpful to reveal dynamics of cognitive activities during the period. In this paper, we tackle the problem of discovering the dynamic cognitive activities by proposing an algorithm of boosted structure learning. We employ statistic model of random fields to represent the dynamics of the brain. To exploit the rich fMRI observations with reasonable model complexity, we build multiple models, where one model links the cognitive activities to only a fraction of the fMRI observations. We combine the simple models by using an altered AdaBoost scheme for multi-class structure learning and show theoretical justification of the proposed scheme. Empirical test shows the method effectively links the physiological and the psychological activities of the brain.
Li, J. & Tao, D. 2010, 'Simple exponential family PCA', Journal of Machine Learning Research, pp. 453-460.
Bayesian principal component analysis (BPCA), a probabilistic reformulation of PCA with Bayesian model selection, is a systematic approach to determining the number of essential principal components (PCs) for data representation. However, it assumes that data are Gaussian distributed and thus it cannot handle all types of practical observations, e.g. integers and binary values. In this paper, we propose simple exponential family PCA (SePCA), a generalised family of probabilistic principal component analysers. SePCA employs exponential family distributions to handle general types of observations. By using Bayesian inference, SePCA also automatically discovers the number of essential PCs. We discuss techniques for fitting the model, develop the corresponding mixture model, and show the effectiveness of the model based on experiments.