UTS site search

Professor Dacheng Tao

Biography

Dacheng Tao is Professor of Computer Science with the Centre for Quantum Computation and Intelligent Systems (QCIS) and the Faculty of Engineering and Information Technology (FEIT) in the University of Technology Sydney (UTS). He takes/took a visiting professorship at many top universities and research institutes, e.g. Birkbeck - University of London, Shanghai Jiaotong University, Huazhong University of Science & Technology, Wuhan University, Northwestern Polytechnic University, Chinese Academy of Sciences, and Xidian University. Previously, he worked as a Nanyang Assistant Professor in the Nanyang Technological University and an Assistant Professor in the Hong Kong Polytechnic University. He received his BEng degree from the University of Science and Technology of China (USTC), his MPhil degree from the Chinese University of Hong Kong (CUHK), and his PhD from the University of London (London).

He mainly applies statistics and mathematics to data analytics problems and his research interests spread across computer vision, computational neuroscience, data science, geoinformatics, image processing, machine learning, medical informatics, multimedia, neural networks and video surveillance. His research results have expounded in one monograph and 400+ publications at prestigious journals and prominent conferences, such as IEEE T-PAMI, T-NNLS, T-IP, T-SP, T-MI, T-KDE, T-CYB, JMLR, IJCV, NIPS, ICML, CVPR, ICCV, ECCV, AISTATS, ICDM, SDM; ACM SIGKDD and Multimedia, with several best paper awards, such as the best theory/algorithm paper runner up award in IEEE ICDM’07, the best student paper award in IEEE ICDM’13, and the 2014 ICDM 10 Year Highest-Impact Paper Award.

He has made notable contributions to universities by providing excellent research student supervision. His PhD students (including co-supervised PhD students) won Chancellor’s Award for the most outstanding PhD thesis across the university in 2012 and 2015, respectively, UTS Chancellor Postdoctoral Fellowship in 2012, the Extraordinary Potential Prize of 2011 Chinese Government Award for Outstanding Self-Financed Students Abroad, Microsoft Fellowship Award, Baidu Fellowship, Beihang “Zhuoyue” Program, the PLA Best PhD Dissertation Award, the Chinese Computer Federation (CCF) Outstanding Dissertation Award, the Award for the Excellent Doctoral Dissertation of Shanghai, the Award for the Excellent Doctoral Dissertation of Beijing, and Excellent PhD Dissertation Award from the National University of Defense Technology.

He is/was a guest editor of 10+ special issues, an editor of 10+ journals, including IEEE Trans. on Big Data (T-BD), IEEE Trans. on Neural Networks and Learning Systems (T-NNLS), IEEE Trans. on Image Processing (T-IP), IEEE Trans. on Cybernetics (T-CYB), IEEE Trans. on Systems, Man and Cybernetics: Part B (T-SMCB), IEEE Trans. on Circuits and Systems for Video Technology (T-CSVT), IEEE Trans. on Knowledge and Data Engineering (T-KDE), Pattern Recognition (Elsevier), Information Sciences (Elsevier), Signal Processing (Elsevier), and Computational Statistics & Data Analysis (Elsevier). He has edited five books on several topics of optical pattern recognition and its applications. He has chaired for conferences, special sessions, invited sessions, workshops, and panels for 60+ times. He has served for nearly 200 major conferences including CVPR, ICCV, ECCV, AAAI, IJCAI, NIPS, ICDM, AISTATS, ACM SIGKDD and Multimedia, and nearly 100 prestigious international journals including T-PAMI, IJCV, JMLR, AIJ, and MLJ.

He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), a Fellow of the Optical Society of America (OSA), a Fellow of the International Association of Pattern Recognition (IAPR), a Fellow of the International Society for Optical Engineering (SPIE), an Elected Member of the International Statistical Institute (ISI), a Fellow of the British Computer Society (BCS), and a Fellow of the Institution of Engineering and Technology (IET/IEE). He is an elected member of the Global Young Academy (GYA). He chairs the IEEE SMC Technical Committee on Cognitive Computing and the IEEE SMC New South Wales Section.

Professional

Fellow, Institute of Electrical and Electronics Engineers (FIEEE)

Fellow, Optical Society of America (FOSA)

Fellow, International Association of Pattern Recognition (FIAPR)

Fellow, International Society for Optical Engineering (FSPIE)

Fellow, Institution of Engineering and Technology (FIET)

Fellow, British Computer Society (FBCS)

Elected Member, International Statistical Institute (ISI)

Elected Member of the Global Young Academy (GYA)

Future Fellow and Professor, A/DRsch Ctr Quantum Computat'n & Intelligent Systs
Core Member, Joint Research Centre in Intelligent Systems Membership
Core Member, QCIS - Quantum Computation and Intelligent Systems
Core Member, AAI - Advanced Analytics Institute
BEng (USTC), MPhil (CUHK), PhD (London)

Research Interests

statistics and mathematics for data analysis problems in machine learning, data mining & engineering, computer vision, image processing, multimedia, video surveillance and neuroscience

Can supervise: Yes

Image and Video Analysis; Computer Vision; Pattern Recognition; Machine Learning; and Discrete Mathematics

Books

Yu, J. & Tao, D. 2013, Modern machine learning techniques and their applications in cartoon animation research, First edition, Wiley-IEEE Press, Hoboken, New Jersey.
The integration of machine learning techniques and cartoon animation research is fast becoming a hot topic. This book helps readers learn the latest machine learning techniques, including patch alignment framework; spectral clustering, graph cuts, and convex relaxation; ensemble manifold learning; multiple kernel learning; multiview subspace learning; and multiview distance metric learning. It then presents the applications of these modern machine learning techniques in cartoon animation research. With these techniques, users can efficiently utilize the cartoon materials to generate animations in areas such as virtual reality, video games, animation films, and sport simulations
Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E. & Wang, H. 2011, Studies in Computational Intelligence: Preface.
Tao, D., Xu, D. & Li, X. 2009, Semantic mining technologies for multimedia databases.
Multimedia searching and management have become popular due to demanding applications and competition among companies. Despite the increase in interest, there is no existing book covering basic knowledge on state-of-the-art techniques within the field. Semantic Mining Technologies for Multimedia Databases provides an introduction to the most recent techniques in multimedia semantic mining necessary to researchers new to the field. This book serves as an important reference in multimedia for academicians, multimedia technologists and researchers, and academic libraries. &copy; 2009 by IGI Global. All rights reserved.

Chapters

He, X., Luo, S., Tao, D., Xu, C., Yang, J. & Abul Hasan, M. 2015, 'Preface' in MultiMedia Modeling, Springer, Germany, pp. V-VI.
He, X., Xu, C., Tao, D., Luo, S., Yang, J. & Hasan, M.A. 2015, 'Preface' in MultiMedia Modeling (LNCS), Springer, Germany, pp. V-VI.
Luo, Y., Tao, D. & Xu, C. 2013, 'Patch Alignment for Graph Embedding' in Fu, Y. & Ma, Y. (eds), Graph Embedding for Pattern Analysis, Springer New York, New York, NY, USA, pp. 73-118.
Dozens of manifold learning-based dimensionality reduction algorithms have been proposed in the literature. The most representative ones are locally linear embedding (LLE) [65], ISOMAP [76], Laplacian eigenmaps (LE) [4], Hessian eigenmaps (HLLE) [20], and local tangent space alignment (LTSA) [102]. LLE uses linear coefficients, which reconstruct a given example by its neighbors, to represent the local geometry, and then seeks a low-dimensional embedding, in which these coefficients are still suitable for reconstruction. ISOMAP preserves global geodesic distances of all the pairs of examples.
Gao, X., Wang, B., Tao, D. & Li, X. 2011, 'A Unified Tensor Level Set Method for Image Segmentation. Multimedia Analysis, Processing and Communications' in Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E. & Wang, H. (eds), Studies in Computational Intelligence vol 346, Springer-Verlag Berlin, Berlin, pp. 217-238.
This paper presents a new unified level set model for multiple regional image segmentation. This model builds a unified tensor representation for comprehensively depicting each pixel in the image to be segmented, by which the image aligns itself with a t
Xiao, B., Gao, X., Tao, D. & Li, X. 2011, 'Recognition of Sketches in Photos' in Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E. & Wang, H. (eds), Studies in Computational Intelligence vol 346. Multimedia Analysis, Processing and Communications, Springer-Verlag Berlin / Heidelberg, Berlin/Heidelberg, pp. 239-262.
Summary. Face recognition by sketches in photos makes an important complement to face photo recognition. It is challenging because sketches and photos have geometrical deformations and texture difference. Aiming to achieve better performance in mixture pattern recognition, we reduce difference between sketches and photos by synthesizing sketches from photos, and vice versa, and then transform the sketch-photo recognition to photo-photo/sketch-sketch recognition. Pseudo-sketch/pseudo-photo patches are synthesized with embedded hiddenMarkovmodel and integrated to derive pseudo-sketch/pseudo-photo. Experiments are carried out to demonstrate that the proposed methods are effective to produce pseudo-sketch/pseudophoto with high quality and achieve promising recognition results.
Deng, C., Gao, X., Li, X. & Tao, D. 2011, 'Robust Image Watermarking Based on Feature Regions' in Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E. & Wang, H. (eds), Studies in Computational Intelligence vol 346. Multimedia Analysis, Processing and Communications, Springer-Verlag Berlin / Heidelberg, Berlin/Heidelberg, pp. 111-137.
Abstract. In image watermarking, binding the watermark synchronization with the local features has been widely used to provide robustness against geometric distortions as well as common image processing operations. However, in the existing schemes, the problems with random bending attack, nonisotropic scaling, general affine transformation, and combined attacks still remain difficult. In this chapter, we present and discuss the framework of the extraction and selection of the scale-space feature points.We then propose two robust image watermarking algorithms through synchronizing watermarking with the invariant local feature regions centered at feature points. The first algorithm conducts watermark embedding and detection in the affine covariant regions (ACRs). The second algorithm is combining the local circular regions (LCRs) with Tchebichef moments, and local Tchebichef moments (LTMs) are used to embed and detect watermark. These proposed algorithms are evaluated theoretically and experimentally, and are compared with two representative schemes. Experiments are carried out on a set of standard test images, and the preliminary results demonstrate that the developed algorithms improve the performance over these two representative image watermarking schemes in terms of robustness. Towards the overall robustness against geometric distortions and common image processing operations, the LTMs-based method has an advantage over the ACRs-based method.
Bian, W. & Tao, D. 2011, 'Face Subspace Learning' in Li, S.Z. & Jain, A.K. (eds), Handbook of Face Recognition, Springer-Verlag London Limited, London UK, pp. 51-77.
NA
Gao, X., Xiao, B., Tao, D. & Li, X. 2009, 'A Comparative Study of Three Graph Edit Distance Algorithms' in Abraham, A., Hassanien, A.E. & Snasel, V. (eds), Studies in Computational Intelligence Vol 205: Foundations of Computational Intelligence vol 5, Springer, Berlin, pp. 223-242.
Graph edit distance (GED) is widely applied to similarity measurement of graphs in inexact graph matching. Due to the difficulty of defining cost functions reasonably, we do research oil two GED algorithms without cost function definition: the first is c

Conferences

Yu, D. & Tao, D.C. 2015, 'Frontier of Business Management Challenge in the Dynamics of Big Data', 5th Organizations, Artifacts and Practices (OAP) Workshop, Sydeny.
Song, D., Meyer, D.A. & Tao, D. 2016, 'Top-k link recommendation in social networks', Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 389-398.
&copy; 2015 IEEE. Inferring potential links is a fundamental problem in social networks. In the link recommendation problem, the aim is to suggest a list of potential people to each user, ordered by the preferences of the user. Although various approaches have been developed to solve this problem, the difficulty of producing a ranking list with high precision at the top - the most important consideration for real world applications - remains largely an open problem. In this work, we propose two top-k link recommendation algorithms which focus on optimizing the top ranked links. For this purpose, we define a cost-sensitive ranking loss which penalizes the mistakes at the top of a ranked list more than the mistakes at the bottom. In particular, we propose a log loss, derive its surrogate, and formulate a top-k link recommendation model by optimizing this surrogate loss function based upon latent features. Moreover, we extend this top-k link recommendation model by incorporating both the latent features and explicit features of the network. Finally, an efficient learning scheme to learn the model parameters is provided. We conduct empirical studies based upon four real world datasets, i.e., Wikipedia, CondMat, Epinions, and MovieLens 1M, of which the largest network contains more than 70 thousand nodes and over one million links. Our experiments demonstrate that the proposed algorithms outperform several state-of-the-art methods.
Zhang, Q., Zhang, L., Du, B., Zheng, W., Bian, W. & Tao, D. 2016, 'MMFE: Multitask multiview feature embedding', Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 1105-1110.
&copy; 2015 IEEE. In data mining and pattern recognition area, the learned objects are often represented by the multiple features from various of views. How to learn an efficient and effective feature embedding for the subsequent learning tasks? In this paper, we address this issue by providing a novel multi-task multiview feature embedding (MMFE) framework. The MMFE algorithm is based on the idea of low-rank approximation, which suggests that the observed multiview feature matrix is approximately represented by the low-dimensional feature embedding multiplied by a projection matrix. In order to fully consider the particular role of each view to the multiview feature embedding, we simultaneously suggest the multitask learning scheme and ensemble manifold regularization into the MMFE algorithm to seek the optimal projection. Since the objection function of MMFE is multi-variable and non-convex, we further provide an iterative optimization procedure to find the available solution. Two real world experiments show that the proposed method outperforms single-task-based as well as state-of-the-art multiview feature embedding methods for the classification problem.
Xiong, W., Du, B., Zhang, L., Hu, R., Bian, W., Shen, J. & Tao, D. 2016, 'R2FP: Rich and robust feature pooling for mining visual data', Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 469-478.
&copy; 2015 IEEE. The human visual system proves smart in extracting both global and local features. Can we design a similar way for unsupervised feature learning? In this paper, we propose anovel pooling method within an unsupervised feature learningframework, named Rich and Robust Feature Pooling (R2FP), to better explore rich and robust representation from sparsefeature maps of the input data. Both local and global poolingstrategies are further considered to instantiate such a methodand intensively studied. The former selects the most conductivefeatures in the sub-region and summarizes the joint distributionof the selected features, while the latter is utilized to extractmultiple resolutions of features and fuse the features witha feature balancing kernel for rich representation. Extensiveexperiments on several image recognition tasks demonstratethe superiority of the proposed techniques.
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Čehovin, L., Fernández, G., Vojír, T., Häger, G., Nebehay, G., Pflugfelder, R., Gupta, A., Bibi, A., Lukežič, A., Garcia-Martin, A., Saffari, A., Petrosino, A., Montero, A.S., Varfolomieiev, A., Baskurt, A., Zhao, B., Ghanem, B., Martinez, B., Lee, B., Han, B., Wang, C., Garcia, C., Zhang, C., Schmid, C., Tao, D., Kim, D., Huang, D., Prokhorov, D., Du, D., Yeung, D.Y., Ribeiro, E., Khan, F.S., Porikli, F., Bunyak, F., Zhu, G., Seetharaman, G., Kieritz, H., Yau, H.T., Li, H., Qi, H., Bischof, H., Possegger, H., Lee, H., Nam, H., Bogun, I., Jeong, J.C., Cho, J.I., Lee, J.Y., Zhu, J., Shi, J., Li, J., Jia, J., Feng, J., Gao, J., Choi, J.Y., Kim, J.W., Lang, J., Martinez, J.M., Choi, J., Xing, J., Xue, K., Palaniappan, K., Lebeda, K., Alahari, K., Gao, K., Yun, K., Wong, K.H., Luo, L., Ma, L., Ke, L., Wen, L., Bertinetto, L., Pootschi, M., Maresca, M., Danelljan, M., Wen, M., Zhang, M., Arens, M., Valstar, M., Tang, M., Chang, M.C., Khan, M.H., Fan, N., Wang, N., Miksik, O. & Torr, P.H.S. 2016, 'The Visual Object Tracking VOT2015 Challenge Results', Proceedings of the IEEE International Conference on Computer Vision, pp. 564-586.
&copy; 2015 IEEE. The Visual Object Tracking challenge 2015, VOT2015, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 62 trackers are presented. The number of tested trackers makes VOT 2015 the largest benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2015 challenge that go beyond its VOT2014 predecessor are: (i) a new VOT2015 dataset twice as large as in VOT2014 with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2014 evaluation methodology by introduction of a new performance measure. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.
Zhang, B., Wang, Z., Tao, D., Hua, X.S. & Feng, D.D. 2016, 'Automatic Preview Frame Selection for Online Videos', 2015 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2015.
&copy; 2015 IEEE. The preview frame of an online video plays a critical role for a user to quickly decide whether to watch the video. However, the preview frames of most online videos such as those shared on social media platforms are either selected heuristically (e.g., the first or middle frame of a video) or manually by users or experienced editors. In this paper, we investigate the challenging automatic preview frame selection task and formulae it as a classification problem. To our best knowledge, this is the one of the first attempts on this topic, since most existing key frame selection methods do not explicitly aim for selecting the best representative one only. Considering that a preview frame for an entire video should be informative in the context of the video story, attention catching, and of high visual quality, we propose three types of features to characterize each video frame: informativeness, attention, and aesthetics. Due to the imbalanced nature of training data (i.e., one preview frame only vs thousands of non-preview frames in a video), we utilize random forests to learn the features of preview frames and to classify each frame into preview frame or non-preview frame. In addition, we also increase the number of positive training samples by identifying frames which are visually similar to the preview frame. We evaluated our proposed method both quantitatively and qualitatively with a set of 180 news videos manually collected from the BBC news website. Experimental results indicate that our method is promising. We also investigated the contribution of each visual feature to guide future studies.
Xu, Z., Huang, S., Zhang, Y. & Tao, D. 2016, 'Augmenting strong supervision using web data for fine-grained categorization', Proceedings of the IEEE International Conference on Computer Vision, pp. 2524-2532.
&copy; 2015 IEEE.We propose a new method for fine-grained object recognition that employs part-level annotations and deep convolutional neural networks (CNNs) in a unified framework. Although both schemes have been widely used to boost recognition performance, due to the difficulty in acquiring detailed part annotations, strongly supervised fine-grained datasets are usually too small to keep pace with the rapid evolution of CNN architectures. In this paper, we solve this problem by exploiting inexhaustible web data. The proposed method improves classification accuracy in two ways: more discriminative CNN feature representations are generated using a training set augmented by collecting a large number of part patches from weakly supervised web images, and more robust object classifiers are learned using a multi-instance learning algorithm jointly on the strong and weak datasets. Despite its simplicity, the proposed method delivers a remarkable performance improvement on the CUB200-2011 dataset compared to baseline part-based R-CNN methods, and achieves the highest accuracy on this dataset even in the absence of test image annotations.
Qiao, M., Bian, W., Xu, R.Y.D. & Tao, D. 2016, 'Diversified hidden Markov models for sequential labeling', 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, pp. 1512-1513.
&copy; 2016 IEEE.Labeling of sequential data is a prevalent metaproblem in a wide range of real world applications. A first-order hidden Markov model (HMM) provides a fundamental approach for sequential labeling. However, it does not show satisfactory performance for real world problems, such as optical character recognition (OCR). Aiming at addressing this problem, important extensions of HMM have been proposed in literature. One of the common key features in these extensions is the incorporation of proper prior information. In this paper, we propose a new extension of HMM, termed diversified hidden Markov models (dHMM), with incorporating a diversity-encouraging prior. The prior is added over the state-transition probabilities and thus facilitates more dynamic sequential labelling. Specifically, the diversity is modeled with a continuous determinantal point process. An EM framework for parameter learning and MAP inference is derived, and empirical evaluation on OCR dataset verifies its effectiveness.
Luo, Y., Tao, D., Ramamohanarao, K., Xu, C. & Wen, Y. 2016, 'Tensor canonical correlation analysis for multi-view dimension reduction', 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, pp. 1460-1461.
&copy; 2016 IEEE.Canonical correlation analysis (CCA) has proven an effective tool for two-view dimension reduction due to its profound theoretical foundation and success in practical applications. In respect of multi-view learning, however, it is limited by its capability of only handling data represented by two-view features, while in many real-world applications, the number of views is frequently many more. Although the ad hoc way of simultaneously exploring all possible pairs of features can numerically deal with multi-view data, it ignores the high order statistics (correlation information) which can only be discovered by simultaneously exploring all features. Therefore, in this work, we develop tensor CCA (TCCA) which straightforwardly yet naturally generalizes CCA to handle the data of an arbitrary number of views by analyzing the covariance tensor of the different views. TCCA aims to directly maximize the canonical correlation of multiple (more than two) views. Crucially, we prove that the main problem of multiview canonical correlation maximization is equivalent to finding the best rank-1 approximation of the data covariance tensor, which can be solved efficiently using the well-known alternating least squares (ALS) algorithm. As a consequence, the high order correlation information contained in the different views is explored and thus a more reliable common subspace shared by all features can be obtained.
Zhang, F., Li, J., Li, F., Xu, M., Xu, Y. & He, X. 2015, 'Community detection based on links and node features in social networks', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 21st International Conference on Multimedia Modelling, MMM 2015, Springer, Sydney, Australia, pp. 418-429.
&copy; Springer International Publishing Switzerland 2015. Community detection is a significant but challenging task in the field of social network analysis. Many effective methods have been proposed to solve this problem. However, most of them are mainly based on the topological structure or node attributes. In this paper, based on SPAEM [1], we propose a joint probabilistic model to detect community which combines node attributes and topological structure. In our model, we create a novel feature-based weighted network, within which each edge weight is represented by the node feature similarity between two nodes at the end of the edge. Then we fuse the original network and the created network with a parameter and employ expectation-maximization algorithm (EM) to identify a community. Experiments on a diverse set of data, collected from Facebook and Twitter, demonstrate that our algorithm has achieved promising results compared with other algorithms.
Al-Dmour, H., Ali, N. & Al-Ani, A. 2015, 'An Efficient Hybrid Steganography Method Based on Edge Adaptive and Tree Based Parity Check', MultiMedia Modeling (LNCS), 21st International Conference on MultiMedia Modelling, MMM 2015, Springer, Sydney, Australia, pp. 1-12.
A major requirement for any steganography method is to minimize the changes that are introduced to the cover image by the data embedding process. Since the Human Visual System (HVS) is less sensitive to changes in sharp regions compared to smooth regions, edge adaptive has been proposed to discover edge regions and enhance the quality of the stego image as well as improve the embedding capacity. However, edge adaptive does not apply any coding scheme, and hence it embedding efficiency may not be optimal. In this paper, we propose a method that enhances edge adaptive by incorporating the Tree-Based Parity Check (TBPC) algorithm, which is a well-established coding-based steganography method. This combination enables not only the identification of potential pixels for embedding, but it also enhances the embedding efficiency through an efficient coding mechanism. More specifically, the method identifies the embedding locations according to the difference value between every two adjacent pixels, that form a block, in the cover image, and the number of embedding bits in each block is determined based on the difference between its two pixels. The incorporation of TBPC minimizes the modifications of the cover image, as it changes no more than two bits out of seven pixel bits when embedding four secret bits. Experimental results show that the proposed scheme can achieve both large embedding payload and high embedding efficiency.
Li, Y., Tian, X., Liu, T. & Tao, D. 2015, 'Multi-Task Model and Feature Joint Learning', http://ijcai.org/papers15/contents.php, International Joint Conference on Artificial Intelligence, AAAI Press / International Joint Conferences on Artificial Intelligence, Buenos Aires, Argentia, pp. 3643-3649.
Liu, H., Liu, T., Tao, D., Wu, J. & Yun, F. 2015, 'Spectral Ensemble Clustering', Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Hilton, Sydney, pp. 715-724.
Ensemble clustering, also known as consensus clustering, is emerging as a promising solution for multi-source and/or heterogeneous data clustering. The co-association matrix based method, which redefines the ensemble clustering problem as a classical graph partition problem, is a landmark method in this area. Nevertheless, the relatively high time and space complexity preclude it from real-life large-scale data clustering. We therefore propose SEC, an efficient Spectral Ensemble Clustering method based on co-association matrix. We show that SEC has theoretical equivalence to weighted K-means clustering and results in vastly reduced algorithmic complexity. We then derive the latent consensus function of SEC, which to our best knowledge is among the first to bridge co-association matrix based method to the methods with explicit object functions. The robustness and generalizability of SEC are then investigated to prove the superiority of SEC in theory. We finally extend SEC to meet the challenge rising from incomplete basic partitions, based on which a scheme for big data clustering can be formed. Experimental results on various real-world data sets demonstrate that SEC is an effective and efficient competitor to some state-of-the-art ensemble clustering methods and is also suitable for big data clustering.
Shen, J., Mei, T., Tao, D., Li, X. & Rui, Y. 2015, 'EMIF: Towards a Scalable and Effective Indexing Framework for Large Scale Music Retrieval', Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ACM on International Conference on Multimedia Retrieval, ACM, Shanghai, China, pp. 543-546.
In this article, we present a novel indexing technique called EMIF (Effective Music Indexing Framework) to facilitate scalable and accurate content based music retrieval. It is designed based on a "classification-and-indexing" principle and consists of two main functionality layers: 1) a novel semantic-sensitive classification to identify input music's category and 2) multiple indexing structures - one local indexing structure corresponds to one semantic category. EMIF's layered architecture not only enables superior search accuracy but also reduces query response time significantly. To evaluate the system, a set of comprehensive experimental studies have been carried out using large test collection and EMIF demonstrates promising performance over state-of-the-art approaches.
Beveridge, J.R., Zhang, H., Draper, B.A., Flynn, P.J., Feng, Z., Huber, P., Kittler, J., Huang, Z., Li, S., Li, Y., Kan, M., Wang, R., Shan, S., Chen, X., Li, H., Hua, G., Struc, V., Krizaj, J., Ding, C., Tao, D. & Phillips, P.J. 2015, 'Report on the FG 2015 Video Person Recognition Evaluation', Proceedings of the 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2015, IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), IEEE, Ljubljana, Slovenia, pp. 1-8.
&copy; 2015 IEEE. This report presents results from the Video Person Recognition Evaluation held in conjunction with the 11th IEEE International Conference on Automatic Face and Gesture Recognition. Two experiments required algorithms to recognize people in videos from the Point-and-Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. Five groups from around the world participated in the evaluation. The video handheld experiment was included in the International Joint Conference on Biometrics (IJCB) 2014 Handheld Video Face and Person Recognition Competition. The top verification rate from this evaluation is double that of the top performer in the IJCB competition. Analysis shows that the factor most effecting algorithm performance is the combination of location and action: where the video was acquired and what the person was doing.
Wang, Z., Du, B., Zhang, L., Hu, W., Tao, D. & Zhang, L. 2015, 'Batch mode active learning for geographical image classification', Web Technologies and Applications (LNCS), 17th Asia-Pacific Web Conference, Springer, Guangzhou, China, pp. 744-755.
&copy; Springer International Publishing Switzerland 2015. In this paper, an innovative batch mode active learning by combining discriminative and representative information for hyperspectral image classification with support vector machine is proposed. In the past years, the batch mode active learning mainly exploits different query functions, which are based on two criteria: uncertainty criterion and diversity criterion. Generally, the uncertainty criterion and diversity criterion are independent of each other, and they also could not make sure the queried samples identical and independent distribution. In the proposed method, the diversity criterion is focused. In the innovative diversity criterion, firstly, we derive a novel form of upper bound for true risk in the active learning setting, by minimizing this upper bound to measure the discriminative information, which is connected with the uncertainty. Secondly, for the representative information, the maximum mean discrepancy(MMD) which captures the representative information of the data structure is adopt to match the distribution of the labeled samples and query samples, to make sure the queried samples have a similar distribution to the labeled samples and guarantee the queried samples are diversified. Meanwhile, the number of new queried samples is adaptive, which depends on the distribution of the labeled samples. In the experiment, we employ two benchmark remote sensing images, Indian Pines and Washington DC. The experimental results demonstrate the effective of our proposed method compared with the state-of-the-art AL methods.
Song, D., Meyer, D.A. & Tao, D. 2015, 'Efficient latent link recommendation in signed networks', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1105-1114.
He, X., Luo, S., Tao, D., Xu, C., Yang, J. & Abul Hasan, M. 2015, 'MultiMedia Modeling: 21st International Conference, MMM 2015 Sydney, NSW, Australia, January 5-7, 2015 Proceedings, Part II', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).
Xu, C., Tao, D. & Xu, C. 2015, 'Multi-view self-paced learning for clustering', Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI International Joint Conference on Artificial Intelligence, ACM, Buenos Aires, Argentina, pp. 3974-3980.
Exploiting the information from multiple views can improve clustering accuracy. However, most existing multi-view clustering algorithms are non-convex and are thus prone to becoming stuck into bad local minima, especially when there are outliers and missing data. To overcome this problem, we present a new multi-view self-paced learning (MSPL) algorithm for clustering, that learns the multi-view model by not only progressing from 'easy' to 'complex' examples, but also from 'easy' to 'complex' views. Instead of binarily separating the examples or views into 'easy' and 'complex', we design a novel probabilistic smoothed weighting scheme. Employing multiple views for clustering and defining complexity across both examples and views are shown theoretically to be beneficial to optimal clustering. Experimental results on toy and real-world data demonstrate the efficacy of the proposed algorithm.
Deng, C., Lv, Z., Liu, W., Huang, J., Tao, D. & Gao, X. 2015, 'Multi-view matrix decomposition: A new scheme for exploring discriminative information', Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI International Joint Conference on Artificial Intelligence, AAAI Press, Buenos Aires, Argentina, pp. 3438-3444.
Recent studies have demonstrated the advantages of fusing information from multiple views for various machine learning applications. However, most existing approaches assumed the shared component common to all views and ignored the private components of individual views, which thereby restricts the learning performance. In this paper, we propose a new multi-view, low-rank, and sparse matrix decomposition scheme to seamlessly integrate diverse yet complementary information stemming from multiple views. Unlike previous approaches, our approach decomposes an input data matrix concatenated from multiple views as the sum of low-rank, sparse, and noisy parts. Then a unified optimization framework is established, where the low-rankness and group-structured sparsity constraints are imposed to simultaneously capture the shared and private components in both instance and view levels. A proven optimization algorithm is developed to solve the optimization, yielding the learned augmented representation which is used as features for classification tasks. Extensive experiments conducted on six benchmark image datasets show that our approach enjoys superior performance over the state-of-the-art approaches.
Song, D., Liu, W., Meyer, D.A., Tao, D. & Ji, R. 2015, 'Rank Preserving Hashing for Rapid Image Search', Data Compression Conference Proceedings, 2015 Data Compression Conference (DCC), IEEE, Snowbird, USA, pp. 353-362.
&copy; 2015 IEEE. In recent years, hashing techniques are becoming overwhelmingly popular for their high efficiency in handling large-scale computer vision applications. It has been shown that hashing techniques which leverage supervised information can significantly enhance performance, and thus greatly benefit visual search tasks. Typically, a modern hashing method uses a set of hash functions to compress data samples into compact binary codes. However, few methods have developed hash functions to optimize the precision at the top of a ranking list based upon Hamming distances. In this paper, we propose a novel supervised hashing approach, namely Rank Preserving Hashing (RPH), to explicitly optimize the precision of Hamming distance ranking towards preserving the supervised rank information. The core idea is to train disciplined hash functions in which the mistakes at the top of a Hamming-distance ranking list are penalized more than those at the bottom. To find such hash functions, we relax the original discrete optimization objective to a continuous surrogate, and then design an online learning algorithm to efficiently optimize the surrogate objective. Empirical studies based upon two benchmark image datasets demonstrate that the proposed hashing approach achieves superior image search accuracy over the state-of-the-art approaches.
Tan, H., Zhang, X., Guan, N., Tao, D., Huang, X. & Luo, Z. 2015, 'Two-dimensional euler PCA for face recognition', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Conference on MultiMedia Modeling, Springer, Sydney, Australia, pp. 548-559.
&copy; Springer International Publishing Switzerland 2015. Principal component analysis (PCA) projects data on the directions with maximal variances. Since PCA is quite effective in dimension reduction, it has been widely used in computer vision. However, conventional PCA suffers from following deficiencies: 1) it spends much computational costs to handle high-dimensional data, and 2) it cannot reveal the nonlinear relationship among different features of data. To overcome these deficiencies, this paper proposes an efficient two-dimensional Euler PCA (2D-ePCA) algorithm. Particularly, 2D-ePCA learns projection matrix on the 2D pixel matrix of each image without reshaping it into 1D long vector, and uncovers nonlinear relationships among features by mapping data onto complex representation. Since such 2D complex representation induces much smaller kernel matrix and principal subspaces, 2D-ePCA costs much less computational overheads than Euler PCA on large-scale dataset. Experimental results on popular face datasets show that 2D-ePCA outperforms the representative algorithms in terms of accuracy, computational overhead, and robustness.
Wu, S., Zhang, X., Guan, N., Tao, D., Huang, X. & Luo, Z. 2015, 'Non-negative low-rank and group-sparse matrix factorization', MultiMedia Modeling (LNCS), 21st International Conference on Multimedia Modelling, Springer, Sydney, Australia, pp. 536-547.
&copy; Springer International Publishing Switzerland 2015. Non-negative matrix factorization (NMF) has been a popular data analysis tool and has been widely applied in computer vision. However, conventional NMF methods cannot adaptively learn grouping structure froma dataset.This paper proposes a non-negative low-rank and group-sparse matrix factorization (NLRGS) method to overcome this deficiency. Particularly, NLRGS captures the relationships among examples by constraining rank of the coefficients meanwhile identifies the grouping structure via group sparsity regularization. By both constraints, NLRGS boosts NMF in both classification and clustering. However, NLRGS is difficult to be optimized because it needs to deal with the low-rank constraint. To relax such hard constraint, we approximate the low-rank constraint with the nuclear norm and then develop an optimization algorithm for NLRGS in the frame of augmented Lagrangian method(ALM). Experimental results of both face recognition and clustering on four popular face datasets demonstrate the effectiveness of NLRGS in quantities.
Guan, N., Tao, D., Lan, L., Luo, Z. & Yang, X. 2015, 'Activity recognition in still images with transductive non-negative matrix factorization', Computer Vision - ECCV 2014 Workshops (LNCS), 13th European Conference on Computer Vision Workshops, Springer, Zurich, Switzerland, pp. 802-817.
&copy; Springer International Publishing Switzerland 2015. Still image based activity recognition is a challenging problem due to changes in appearance of persons, articulation in poses, cluttered backgrounds, and absence of temporal features. In this paper, we proposed a novel method to recognize activities from still images based on transductive non-negative matrix factorization (TNMF). TNMF clusters the visual descriptors of each human action in the training images into fixed number of groups meanwhile learns to represent the visual descriptor of test image on the concatenated bases. Since TNMF learns these bases on both training images and test image simultaneously, it learns a more discriminative representation than standard NMF based methods. We developed a multiplicative update rule to solve TNMF and proved its convergence. Experimental results on both laboratory and real-world datasets demonstrate that TNMF consistently outperforms NMF.
Xu, C., Tao, D. & Xu, C. 2015, 'Large-margin multi-label causal feature learning', Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI, Austin, USA, pp. 1924-1930.
&copy; Copyright 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. In multi-label learning, an example is represented by a de-scriptive feature associated with several labels. Simply con-sidering labels as independent or correlated is crude; it would be beneficial to define and exploit the causality between multiple labels. For example, an image label 'lake' implies the label 'water', but not vice versa. Since the original features are a disorderly mixture of the properties originating from different labels, it is intuitive to factorize these raw features to clearly represent each individual label and its causality relationship. Following the large-margin principle, we propose an effective approach to discover the causal features of multiple labels, thus revealing the causality between labels from the perspective of feature. We show theoretically that the proposed approach is a tight approximation of the empirical multi-label classification error, and the causality revealed strengthens the consistency of the algorithm. Extensive experimentations using synthetic and real-world data demonstrate that the proposed algorithm effectively discovers label causality, generates causal features, and improves multi-label learning.
Yin, G., An, L., Gao, X. & Tao, D. 2015, 'Feature regions based on graph optimization for robust reversible watermarking', Proceedings - International Conference on Image Processing, ICIP, 2015 IEEE International Conference on Image Processing (ICIP), IEEE, Piscataway, USA, pp. 4758-4762.
&copy; 2015 IEEE. Recently, robust reversible watermarking (RRW) has gained increasing interests and researchers are seeking more stable image features to design watermark embedding and extraction models for local image regions protection. Previous studies show that it is promising to construct local feature regions (FRs) for RRW to handle this problem. However, selecting non-overlapping FRs and evaluating FRs stability for RRW are still challenging. To target this issue, we first construct an undirected weighted graph based on the FRs distribution pattern and then formulate the non-overlapping FRs selection as a weighted maximal clique problem and develop a maximum gain-cost ratio algorithm for its approximately optimal solution. Furthermore, we design reasonable metrics to evaluate the FRs stability in terms of FRs locations and local image content. Extensive experiments demonstrate the effectiveness and efficiency of our work.
Zhang, W., Guan, N., Tao, D., Mao, B., Huang, X. & Luo, Z. 2015, 'Correntropy supervised non-negative matrix factorization', Proceedings of the International Joint Conference on Neural Networks, 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, Piscataway, USA, pp. 1-8.
&copy; 2015 IEEE. Non-negative matrix factorization (NMF) is a powerful dimension reduction method and has been widely used in many pattern recognition and computer vision problems. However, conventional NMF methods are neither robust enough as their loss functions are sensitive to outliers, nor discriminative because they completely ignore labels in a dataset. In this paper, we proposed a correntropy supervised NMF (CSNMF) to simultaneously overcome aforementioned deficiencies. In particular, CSNMF maximizes the correntropy between the data matrix and its reconstruction in low-dimensional space to inhibit outliers during learning the subspace, and narrows the minimizes the distances between coefficients of any two samples with the same class labels to enhance the subsequent classification performance. To solve CSNMF, we developed a multiplicative update rules and theoretically proved its convergence. Experimental results on popular face image datasets verify the effectiveness of CSNMF comparing with NMF, its supervised variants, and its robustified variants.
Liu, M., Luo, Y., Tao, D., Xu, C. & Wen, Y. 2015, 'Low-rank multi-view learning in matrix completion for multi-label image classification', Proceedings of the National Conference on Artificial Intelligence, Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI, Austin, USA, pp. 2778-2784.
Copyright &copy; 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Multi-label image classification is of significant interest due to its major role in real-world web image analysis applications such as large-scale image retrieval and browsing. Recently, matrix completion (MC) has been developed to deal with multi-label classification tasks. MC has distinct advantages, such as robustness to missing entries in the feature and label spaces and a natural ability to handle multi-label problems. However, current MC-based multi-label image classification methods only consider data represented by a singleview feature, therefore, do not precisely characterize images that contain several semantic concepts. An intuitive way to utilize multiple features taken from different views is to concatenate the different features into a long vector; however, this concatenation is prone to over-fitting and leads to high time complexity in MC-based image classification. Therefore, we present a novel multi-view learning model for MC-based image classification, called low-rank multi-view matrix completion (IrMMC), which first seeks a low-dimensional common representation of all views by utilizing the proposed low-rank multi-view learning (IrMVL) algorithm. In IrMVL, the common subspace is constrained to be low rank so that it is suitable for MC. In addition, combination weights are learned to explore complementarity between different views. An efficient solver based on fixed-point continuation (FPC) is developed for optimization, and the learned low-rank representation is then incorporated into MC-based image classification. Extensive experimentation on the challenging PAS-CAL VOC'07 dataset demonstrates the superiority of Ir-MMC compared to other multi-label image classification approaches.
Fang, M. & Tao, D. 2015, 'Active multi-task learning via bandits', SIAM International Conference on Data Mining 2015, SDM 2015, 2015 SIAM International Conference on Data Mining, SIAM, Vancouver, Canada, pp. 505-513.
Liu, H., Wu, J., Tao, D., Zhang, Y. & Fu, Y. 2015, 'DIAS: A disassemble-assemble framework for highly sparse text clustering', Proceedings of the 2015 SIAM International Conference on Data Mining, 2015 SIAM International Conference on Data Mining, SIAM, Vancouver, Canada, pp. 766-774.
Copyright &copy; SIAM. Upon extensive studies, text clustering remains a critical challenge in data mining community. Even by various techniques proposed to overcome some of these challenges, there still exist problems when dealing with weakly related or even noisy features. In response to this, we propose a Dlssemble-ASsemble (DIAS) framework for text clustering. DIAS employs simple random feature sampling to disassemble high-dimensional text data and gains diverse structural knowledge. This also does good to avoiding the bulk of noisy features. Then the multi-view knowledge is assembled by weighted Information-theoretic Consensus Clustering (IC-C) in order to gain a high-quality consensus partitioning. Extensive experiments on eight real-world text data sets demonstrate the advantages of DIAS over other widely used methods. In particular, DIAS shows strengths in learning from very weak basic partitionings. In addition, it is the natural suitability to distributed computing that makes DIAS become a promising candidate for big text clustering.
Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D. & Tao, D. 2015, 'MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, Massachusetts, USA, pp. 749-758.
&copy; 2015 IEEE. Variations in the appearance of a tracked object, such as changes in geometry/photometry, camera viewpoint, illumination, or partial occlusion, pose a major challenge to object tracking. Here, we adopt cognitive psychology principles to design a flexible representation that can adapt to changes in object appearance during tracking. Inspired by the well-known Atkinson-Shiffrin Memory Model, we propose MUlti-Store Tracker (MUSTer), a dual-component approach consisting of short- and long-term memory stores to process target appearance memories. A powerful and efficient Integrated Correlation Filter (ICF) is employed in the short-term store for short-term tracking. The integrated long-term component, which is based on keypoint matching-tracking and RANSAC estimation, can interact with the long-term memory and provide additional information for output control. MUSTer was extensively evaluated on the CVPR2013 Online Object Tracking Benchmark (OOTB) and ALOV++ datasets. The experimental results demonstrated the superior performance of MUSTer in comparison with other state-of-art trackers.
Gong, D., Li, Z., Tao, D., Liu, J. & Li, X. 2015, 'A maximum entropy feature descriptor for age invariant face recognition', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 5289-5297.
&copy; 2015 IEEE. In this paper, we propose a new approach to overcome the representation and matching problems in age invariant face recognition. First, a new maximum entropy feature descriptor (MEFD) is developed that encodes the microstructure of facial images into a set of discrete codes in terms of maximum entropy. By densely sampling the encoded face image, sufficient discriminatory and expressive information can be extracted for further analysis. A new matching method is also developed, called identity factor analysis (IFA), to estimate the probability that two faces have the same underlying identity. The effectiveness of the framework is confirmed by extensive experimentation on two face aging datasets, MORPH (the largest public-domain face aging dataset) and FGNET. We also conduct experiments on the famous LFW dataset to demonstrate the excellent generalizability of our new approach.
Gong, C., Tao, D., Liu, W., Maybank, S.J., Fang, M., Fu, K. & Yang, J. 2015, 'Saliency propagation from simple to difficult', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2531-2539.
&copy; 2015 IEEE. Saliency propagation has been widely adopted for identifying the most attractive object in an image. The propagation sequence generated by existing saliency detection methods is governed by the spatial relationships of image regions, i.e., the saliency value is transmitted between two adjacent regions. However, for the inhomogeneous difficult adjacent regions, such a sequence may incur wrong propagations. In this paper, we attempt to manipulate the propagation sequence for optimizing the propagation quality. Intuitively, we postpone the propagations to difficult regions and meanwhile advance the propagations to less ambiguous simple regions. Inspired by the theoretical results in educational psychology, a novel propagation algorithm employing the teaching-to-learn and learning-to-teach strategies is proposed to explicitly improve the propagation quality. In the teaching-to-learn step, a teacher is designed to arrange the regions from simple to difficult and then assign the simplest regions to the learner. In the learning-to-teach step, the learner delivers its learning confidence to the teacher to assist the teacher to choose the subsequent simple regions. Due to the interactions between the teacher and learner, the uncertainty of original difficult regions is gradually reduced, yielding manifest salient objects with optimized background suppression. Extensive experimental results on benchmark saliency datasets demonstrate the superiority of the proposed algorithm over twelve representative saliency detectors.
Xiao, S., Li, W., Xu, D. & Tao, D. 2015, 'FaLRR: A fast low rank representation solver', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4612-4620.
&copy; 2015 IEEE. Low rank representation (LRR) has shown promising performance for various computer vision applications such as face clustering. Existing algorithms for solving LRR usually depend on its two-variable formulation which contains the original data matrix. In this paper, we develop a fast LRR solver called FaLRR, by reformulating LRR as a new optimization problem with regard to factorized data (which is obtained by skinny SVD of the original data matrix). The new formulation benefits the corresponding optimization and theoretical analysis. Specifically, to solve the resultant optimization problem, we propose a new algorithm which is not only efficient but also theoretically guaranteed to obtain a globally optimal solution. Regarding the theoretical analysis, the new formulation is helpful for deriving some interesting properties of LRR. Last but not least, the proposed algorithm can be readily incorporated into an existing distributed framework of LRR for further acceleration. Extensive experiments on synthetic and real-world datasets demonstrate that our FaLRR achieves order-of-magnitude speedup over existing LRR solvers, and the efficiency can be further improved by incorporating our algorithm into the distributed framework of LRR.
Gong, M., Zhang, K., Schölkopf, B., Tao, D. & Geiger, P. 2015, 'Discovering temporal causal relations from subsampled data', Proceedings of The 32nd International Conference on Machine Learning, International Conference on Machine Learning, IMLS, Lille Grand Palais, pp. 1898-1906.
&copy; Copyright 2015 by International Machine Learning Society (IMLS). All rights reserved.Granger causal analysis has been an important tool for causal analysis for time series in various fields, including neuroscience and economics, and recently it has been extended to include instantaneous effects between the time series to explain the contemporaneous dependence in the residuals. In this paper, we assume that the time series at the true causal frequency follow the vector autoregressive model. We show that when the data resolution becomes lower due to subsam-pling, neither the original Granger causal analysis nor the extended one is able to discover the underlying causal relations. We then aim to answer the following question: can we estimate the temporal causal relations at the right causal frequency from the subsampled data? Traditionally this suffers from the identifiability problems: under the Gaussianity assumption of the data, the solutions are generally not unique. We prove that, however, if the noise terms are non-Gaussian, the underlying model for the high-frequency data is identifiable from subsampled data under mild conditions. We then propose an Expectation-Maximization (EM) approach and a variational inference approach to recover temporal causal relations from such subsampled data. Experimental results on both simulated and real data are reported to illustrate the performance of the proposed approaches.
Tan, H., Zhang, X., Guan, N., Tao, D., Huang, X. & Luo, Z. 2015, 'Two-dimensional euler PCA for face recognition', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 548-559.
&copy; Springer International Publishing Switzerland 2015.Principal component analysis (PCA) projects data on the directions with maximal variances. Since PCA is quite effective in dimension reduction, it has been widely used in computer vision. However, conventional PCA suffers from following deficiencies: 1) it spends much computational costs to handle high-dimensional data, and 2) it cannot reveal the nonlinear relationship among different features of data. To overcome these deficiencies, this paper proposes an efficient two-dimensional Euler PCA (2D-ePCA) algorithm. Particularly, 2D-ePCA learns projection matrix on the 2D pixel matrix of each image without reshaping it into 1D long vector, and uncovers nonlinear relationships among features by mapping data onto complex representation. Since such 2D complex representation induces much smaller kernel matrix and principal subspaces, 2D-ePCA costs much less computational overheads than Euler PCA on large-scale dataset. Experimental results on popular face datasets show that 2D-ePCA outperforms the representative algorithms in terms of accuracy, computational overhead, and robustness.
Wu, S., Zhang, X., Guan, N., Tao, D., Huang, X. & Luo, Z. 2015, 'Non-negative low-rank and group-sparse matrix factorization', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 536-547.
&copy; Springer International Publishing Switzerland 2015.Non-negative matrix factorization (NMF) has been a popular data analysis tool and has been widely applied in computer vision. However, conventional NMF methods cannot adaptively learn grouping structure froma dataset.This paper proposes a non-negative low-rank and group-sparse matrix factorization (NLRGS) method to overcome this deficiency. Particularly, NLRGS captures the relationships among examples by constraining rank of the coefficients meanwhile identifies the grouping structure via group sparsity regularization. By both constraints, NLRGS boosts NMF in both classification and clustering. However, NLRGS is difficult to be optimized because it needs to deal with the low-rank constraint. To relax such hard constraint, we approximate the low-rank constraint with the nuclear norm and then develop an optimization algorithm for NLRGS in the frame of augmented Lagrangian method(ALM). Experimental results of both face recognition and clustering on four popular face datasets demonstrate the effectiveness of NLRGS in quantities.
Liu, T. & Tao, D. 2014, 'On the Robustness and Generalization of Cauchy Regression', 2014 4th IEEE International Conference on Information Science and Technology (ICIST), IEEE, Shenzhen, China, pp. 101-106.
It was recently highlighted in a special issue of Nature [1] that the value of big data has yet to be effectively exploited for innovation, competition and productivity. To realize the full potential of big data, big learning algorithms need to be developed to keep pace with the continuous creation, storage and sharing of data. Least squares (LS) and least absolute deviation (LAD) have been successful regression tools used in business, government and society over the past few decades. However, these existing technologies are severely limited by noisy data because their breakdown points are both zero, i.e., they do not tolerate outliers. By appropriately setting the turning constant of Cauchy regression (CR), the maximum possible value (50%) of the breakdown point can be attained. CR therefore has the capability to learn a robust model from noisy big data. Although the theoretical analysis of the breakdown point for CR has been comprehensively investigated, we propose a new approach by interpreting the optimization of an objective function as a sample-weighted procedure. We therefore clearly show the differences of the robustness between LS, LAD and CR. We also study the statistical performance of CR. This study derives the generalization error bounds for CR by analyzing the covering number and Rademacher complexity of the hypothesis class, as well as showing how the scale parameter affects its performance.
Hong, Z., Wang, C., Mei, X., Prokhorov, D. & Tao, D. 2014, 'Tracking Using Multilevel Quantizations', Computer Vision – ECCV 2014, European Conference on Computer Vision, Springer, Switzerland, pp. 155-171.
Most object tracking methods only exploit a single quantization of an image space: pixels, superpixels, or bounding boxes, each of which has advantages and disadvantages. It is highly unlikely that a common optimal quantization level, suitable for tracking all objects in all environments, exists. We therefore propose a hierarchical appearance representation model for tracking, based on a graphical model that exploits shared information across multiple quantization levels. The tracker aims to find the most possible position of the target by jointly classifying the pixels and superpixels and obtaining the best configuration across all levels. The motion of the bounding box is taken into consideration, while Online Random Forests are used to provide pixel- and superpixel-level quantizations and progressively updated on-the-fly. By appropriately considering the multilevel quantizations, our tracker exhibits not only excellent performance in non-rigid object deformation handling, but also its robustness to occlusions. A quantitative evaluation is conducted on two benchmark datasets: a non-rigid object tracking dataset (11 sequences) and the CVPR2013 tracking benchmark (50 sequences). Experimental results show that our tracker overcomes various tracking challenges and is superior to a number of other popular tracking methods.
Fang, M. & Tao, D. 2014, 'Networked bandits with disjoint linear payoffs', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1106-1115.
In this paper, we study 'networked bandits', a new bandit problem where a set of interrelated arms varies over time and, given the contextual information that selects one arm, invokes other correlated arms. This problem remains under-investigated, in spite of its applicability to many practical problems. For instance, in social networks, an arm can obtain payoffs from both the selected user and its relations since they often share the content through the network. We examine whether it is possible to obtain multiple payoffs from several correlated arms based on the relationships. In particular, we formalize the networked bandit problem and propose an algorithm that considers not only the selected arm, but also the relationships between arms. Our algorithm is 'optimism in face of uncertainty' style, in that it decides an arm depending on integrated confidence sets constructed from historical data. We analyze the performance in simulation experiments and on two real-world offline datasets. The experimental results demonstrate our algorithm's effectiveness in the networked bandit setting. &copy; 2014 ACM.
Guan, N., Lan, L., Tao, D., Luo, Z. & Yang, X. 2014, 'Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation', ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 2534-2538.
Regarding the non-negativity property of the magnitude spectrogram of speech signals, nonnegative matrix factorization (NMF) has obtained promising performance for speech separation by independently learning a dictionary on the speech signals of each known speaker. However, traditional NM-F fails to represent the mixture signals accurately because the dictionaries for speakers are learned in the absence of mixture signals. In this paper, we propose a new transductive NMF algorithm (TNMF) to jointly learn a dictionary on both speech signals of each speaker and the mixture signals to be separated. Since TNMF learns a more descriptive dictionary by encoding the mixture signals than that learned by NMF, it significantly boosts the separation performance. Experiments results on a popular TIMIT dataset show that the proposed TNMF-based methods outperform traditional NMF-based methods for separating the monophonic mixtures of speech signals of known speakers. &copy; 2014 IEEE.
Zhou, T. & Tao, D. 2014, 'Multi-task copula by sparse graph regression', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 771-780.
This paper proposes multi-task copula (MTC) that can handle a much wider class of tasks than mean regression with Gaussian noise in most former multi-task learning (MTL). While former MTL emphasizes shared structure among models, MTC aims at joint prediction to exploit inter-output correlation. Given input, the outputs of MTC are allowed to follow arbitrary joint continuous distribution. MTC captures the joint likelihood of multi-output by learning the marginal of each output firstly and then a sparse and smooth output dependency graph function. While the former can be achieved by classical MTL, learning graphs dynamically varying with input is quite a challenge. We address this issue by developing sparse graph regression (SpaGraphR), a non-parametric estimator incorporating kernel smoothing, maximum likelihood, and sparse graph structure to gain fast learning algorithm. It starts from a few seed graphs on a few input points, and then updates the graphs on other input points by a fast operator via coarse-to-fine propagation. Due to the power of copula in modeling semi-parametric distributions, SpaGraphR can model a rich class of dynamic non-Gaussian correlations. We show that MTC can address more flexible and difficult tasks that do not fit the assumptions of former MTL nicely, and can fully exploit their relatedness. Experiments on robotic control and stock price prediction justify its appealing performance in challenging MTL problems. &copy; 2014 ACM.
Zhang, X., Guan, N., Lan, L., Tao, D. & Luo, Z. 2014, 'Box-constrained projective nonnegative matrix factorization via augmented Lagrangian method', Proceedings of the International Joint Conference on Neural Networks, pp. 1900-1906.
&copy; 2014 IEEE. Projective non-negative matrix factorization (P-NMF) projects a set of examples onto a subspace spanned by a non-negative basis whose transpose is regarded as the projection matrix. Since PNMF learns a natural parts-based representation, it has been successfully used in text mining and pattern recognition. However, it is non-trivial to analyze the convergence of the optimization algorithms for PNMF because its objective function is non-convex. In this paper, we propose a Box-constrained PNMF (BPNMF) method to overcome this deficiency of PNMF. In particular, BPNMF introduces an auxiliary variable, i.e., the coefficients of examples, and incorporates the following two types of constraints: 1) each entry of the basis is non-negative and upper-bounded, i.e., box-constrained, and 2) the coefficients equal to the projected points of the examples. The first box constraint makes the basis to be bound and the second equality constraint keeps its equivalence to PNMF. Similar to PNMF, BPNMF is difficult because the objective function is non-convex. To solve BPNMF, we developed an efficient algorithm in the frame of augmented Lagrangian multiplier (ALM) method and proved that the ALM-based algorithm converges to local minima. Experimental results on two face image datasets demonstrate the effectiveness of BPNMF compared with the representative methods.
Lan, L., Guan, N., Zhang, X., Tao, D. & Luo, Z. 2014, 'Soft-constrained nonnegative matrix factorization via normalization', Proceedings of the International Joint Conference on Neural Networks, pp. 3025-3030.
&copy; 2014 IEEE. Semi-supervised clustering aims at boosting the clustering performance on unlabeled samples by using labels from a few labeled samples. Constrained NMF (CNMF) is one of the most significant semi-supervised clustering methods, and it factorizes the whole dataset by NMF and constrains those labeled samples from the same class to have identical encodings. In this paper, we propose a novel soft-constrained NMF (SCNMF) method by softening the hard constraint in CNMF. Particularly, SCNMF factorizes the whole dataset into two lower-dimensional factor matrices by using multiplicative update rule (MUR). To utilize the labels of labeled samples, SCNMF iteratively normalizes both factor matrices after updating them with MURs to make encodings of labeled samples close to their label vectors. It is therefore reasonable to believe that encodings of unlabeled samples are also close to their corresponding label vectors. Such strategy significantly boosts the clustering performance even when the labeled samples are rather limited, e.g., each class owns only a single labeled sample. Since the normalization procedure never increases the computational complexity of MUR, SCNMF is quite efficient and effective in practices. Experimental results on face image datasets illustrate both efficiency and effectiveness of SCNMF compared with both NMF and CNMF.
Mu, Y., Ding, W., Lo, H.Z. & Tao, D. 2014, 'Face recognition from multiple images per subject', MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, pp. 889-892.
For face recognition, we show that knowing that each subject corresponds to multiple face images can improve classification performance. For domains such as video surveillance, it is easy to deduce which group of images belong to the same subject; in domains such as family album identification, we lose group membership information but there is still a group of images for each subject. We define these two types of problems as multiple faces per subject. In this paper, we propose a Bipart framework to take advantage of this group information in the testing set as well as in the training set. From these two sources of information, two models are learned independently and combined to form a unified discriminative distance space. Furthermore, this framework is generalized to allow both subspace learning and distance metric learning methods to take advantage of this group information. Bipart is evaluated on the multiple faces per subject problem using several benchmark datasets, including video and static image data, subjects of various ages, various lighting conditions, and many facial expressions. Comparisons against state-of-the-art distance and subspace learning methods demonstrate much better performance when utilizing group information with the Bipart framework.
Zuo, Z., Luo, Y., Tao, D. & Xu, C. 2014, 'Multi-view multi-task feature extraction for web image classification', MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, pp. 1137-1140.
The features used in many multimedia analysis-based applications are frequently of very high dimension. Feature extraction offers several advantages in highly dimensional cases, and many recent studies have used multi-task feature extraction approaches, which often outperform single-task feature extraction approaches. However, most of these methods are limited in that they only consider data represented by a single type of feature, even though features usually represent images from multiple views. We therefore propose a novel multi-view multi-task feature extraction (MVMTFE) framework for handling multi-view features for image classification. In particular, MVMTFE simultaneously learns the feature extraction matrix for each view and the view combination coefficients. In this way, MVMTFE not only handles correlated and noisy features, but also utilizes the complementarity of different views to further help reduce feature redundancy in each view. An alternating algorithm is developed for problem optimization and each sub-problem can be efficiently solved. Experiments on an real-world web image dataset demonstrate the effectiveness and superiority of the proposed method.
Mao, B., Guan, N., Tao, D., Huang, X. & Luo, Z. 2014, 'Correntropy induced metric based graph regularized non-negative matrix factorization', Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics, SPAC 2014, pp. 163-168.
&copy; 2014 IEEE. Non-negative matrix factorization (NMF) is an efficient dimension reduction method and plays an important role in many pattern recognition and computer vision tasks. However, conventional NMF methods are not robust since the objective functions are sensitive to outliers and do not consider the geometric structure in datasets. In this paper, we proposed a correntropy graph regularized NMF (CGNMF) to overcome the aforementioned problems. CGNMF maximizes the correntropy between data matrix and its reconstruction to filter out the noises of large magnitudes, and expects the coefficients to preserve the intrinsic geometric structure of data. We also proposed a modified version of our CGNMF which construct the adjacent graph by using sparse representation to enhance its reliability. Experimental results on popular image datasets confirm the effectiveness of CGNMF.
Xu, Z., Tao, D., Zhang, Y., Wu, J. & Tsoi, A.C. 2014, 'Architectural Style Classification Using Multinomial Latent Logistic Regression', 13th European Conference on Computer Vision, Proceedings, Part I, 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 600-615.
Architectural style classification differs from standard classification tasks due to the rich inter-class relationships between different styles, such as re-interpretation, revival, and territoriality. In this paper, we adopt Deformable Part-based Models (DPM) to capture the morphological characteristics of basic architectural components and propose Multinomial Latent Logistic Regression (MLLR) that introduces the probabilistic analysis and tackles the multi-class problem in latent variable models. Due to the lack of publicly available datasets, we release a new large-scale architectural style dataset containing twenty-five classes. Experimentation on this dataset shows that MLLR in combination with standard global image features, obtains the best classification results. We also present interpretable probabilistic explanations for the results, such as the styles of individual buildings and a style relationship network, to illustrate inter-class relationships.
Cheng, J., Duan, L., Wong, D.W.K., Tao, D., Akiba, M. & Liu, J. 2014, 'Speckle Reduction in Optical Coherence Tomography by Image Registration and Matrix Completion', Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, MICCAI 2014, Springer International Publishing, Boston, MA, USA, pp. 162-169.
Speckle noise is problematic in optical coherence tomography (OCT). With the fast scan rate, swept source OCT scans the same position in the retina for multiple times rapidly and computes an average image from the multiple scans for speckle reduction. However, the eye movement poses some challenges. In this paper, we propose a new method for speckle reduction from multiply-scanned OCT slices. The proposed method applies a preliminary speckle reduction on the OCT slices and then registers them using a global alignment followed by a local alignment based on fast iterative diamond search. After that, low rank matrix completion using bilateral random projection is utilized to iteratively estimate the noise and recover the underlying clean image. Experimental results show that the proposed method achieves average contrast to noise ratio 15.65, better than 13.78 by the baseline method used currently in swept source OCT devices. The technology can be embedded into current OCT machines to enhance the image quality for subsequent analysis.
Liu, X., Tao, D., Song, M., Ruan, Y., Chen, C. & Bu, J. 2014, 'Weakly supervised multiclass video segmentation', 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, OH, pp. 57-64.
The desire of enabling computers to learn semantic concepts from large quantities of Internet videos has motivated increasing interests on semantic video understanding, while video segmentation is important yet challenging for understanding videos. The main difficulty of video segmentation arises from the burden of labeling training samples, making the problem largely unsolved. In this paper, we present a novel nearest neighbor-based label transfer scheme for weakly supervised video segmentation. Whereas previous weakly supervised video segmentation methods have been limited to the two-class case, our proposed scheme focuses on more challenging multiclass video segmentation, which finds a semantically meaningful label for every pixel in a video. Our scheme enjoys several favorable properties when compared with conventional methods. First, a weakly supervised hashing procedure is carried out to handle both metric and semantic similarity. Second, the proposed nearest neighbor-based label transfer algorithm effectively avoids overfitting caused by weakly supervised data. Third, a multi-video graph model is built to encourage smoothness between regions that are spatiotemporally adjacent and similar in appearance. We demonstrate the effectiveness of the proposed scheme by comparing it with several other state-of-the-art weakly supervised segmentation methods on one new Wild8 dataset and two other publicly available datasets.
Liu, X., Song, M., Tao, D., Zhou, X., Chen, C. & Bu, J. 2014, 'Semi-Supervised Coupled Dictionary Learning for Person Re-identification', 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, OH, pp. 3550-3557.
The desirability of being able to search for specific persons in surveillance videos captured by different cameras has increasingly motivated interest in the problem of person re-identification, which is a critical yet under-addressed challenge in multi-camera tracking systems. The main difficulty of person re-identification arises from the variations in human appearances from different camera views. In this paper, to bridge the human appearance variations across cameras, two coupled dictionaries that relate to the gallery and probe cameras are jointly learned in the training phase from both labeled and unlabeled images. The labeled training images carry the relationship between features from different cameras, and the abundant unlabeled training images are introduced to exploit the geometry of the marginal distribution for obtaining robust sparse representation. In the testing phase, the feature of each target image from the probe camera is first encoded by the sparse representation and then recovered in the feature space spanned by the images from the gallery camera. The features of the same person from different cameras are similar following the above transformation. Experimental results on publicly available datasets demonstrate the superiority of our method.
Xu, J., Deng, C., Gao, X., Tao, D. & Li, X. 2014, 'Image super-resolution using multi-layer support vector regression', 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Florence, pp. 5799-5803.
Existing support vector regression (SVR) based image superresolution (SR) methods always utilize single layer SVR model to reconstruct source image, which are incapable of restoring the details and reduce the reconstruction quality. In this paper, we present a novel image SR approach, where a multi-layer SVR model is adopted to describe the relationship between the low resolution (LR) image patches and the corresponding high resolution (HR) ones. Besides, considering the diverse content in the image, we introduce pixel-wise classification to divide pixels into different classes, such as horizontal edges, vertical edges and smooth areas, which is more conductive to highlight the local characteristics of the image. Moreover, the input elements to each SVR model are weighted respectively according to their corresponding output pixel's space positions in the HR image. Experimental results show that, compared with several other learning-based SR algorithms, our method gains high-quality performance.
Wang, S., Wang, N., Tao, D., Zhang, L. & Du, B. 2014, 'A K-L divergence constrained sparse NMF for hyperspectral signal unmixing', 2014 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), International Conference on Security, Pattern Analysis, and Cybernetics, IEEE, Wuhan.
Hyperspectral unmixing is a hot topic in signal and image processing. A high-dimensional data can be decomposed into two non-negative low-dimensional matrices by Non-negative matrix factorization(NMF). However, the algorithm has many local solutions because of the non-convexity of the objective function. Some algorithms solve this problem by adding auxiliary constraints, such as sparse. The sparse NMF has good performance but the result is unstable and sensitive to noise. Using the structural information for the unmixing approaches can make the decomposition stable. Someone used a clustering based on Euclidean distance to guide the decomposition and obtain good performance. The Euclidean distance is just used to measure the straight line distance of two points, and the ground objects usually obey certain statistical distribution. It's difficult to measure the difference between the statistical distributions comprehensively by Euclidean distance. KL divergence is a better metric. In this paper, we propose a new approach named KL divergence constrained NMF which measures the statistical distribution difference using KL divergence instead of the Euclidean distance. It can improve the accuracy of structured information by using the KL divergence in the algorithm. Experimental results based on synthetic and real hyperspectral data show the superiority of the proposed algorithm with respect to other state-of-the-art algorithms.
Xu, C., Tao, D., Xu, C. & Rui, Y. 2014, 'Large-margin weakly supervised dimensionality reduction', Proceedings of the 31st International Conference on Machine Learning (ICML-14), International Conference on Machine Learning, Beijing; China, pp. 865-873.
Gong, C., Tao, D., Fu, K. & Yang, J. 2014, 'ReLISH: Reliable Label Inference via Smoothness Hypothesis', Proceedings of the National Conference on Artificial Intelligence, 28th AAAI Conference on Artificial Intelligence, Quebec City; Canada, pp. 1840-1846.
Fang, M., Yin, J. & Tao, D. 2014, 'Active learning for crowdsourcing using knowledge transfer', Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, Quebec City; Canada, pp. 1809-1815.
Gong, C., Tao, D., Yang, J. & Fu, K. 2014, 'Signed Laplacian embedding for supervised dimension reduction', Proceedings of the National Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, Quebec City; Canada, pp. 1847-1853.
Shao, M., Li, S., Liu, T., Tao, D., Huang, T.S. & Fu, Y. 2014, 'Learning relative features through adaptive pooling for image classification', Proceedings - IEEE International Conference on Multimedia and Expo.
&copy; 2014 IEEE. Bag-of-Feature (BoF) representations and spatial constraints have been popular in image classification research. One of the most successful methods uses sparse coding and spatial pooling to build discriminative features. However, minimizing the reconstruction error by sparse coding only considers the similarity between the input and codebooks. In contrast, this paper describes a novel feature learning approach for image classification by considering the dissimilarity between inputs and prototype images, or what we called reference basis (RB). First, we learn the feature representation by max-margin criterion between the input and the RB. The learned hyperplane is stored as the relative feature. Second, we propose an adaptive pooling technique to assemble multiple relative features generated by different RBs under the SVM framework, where the classifier and the pooling weights are jointly learned. Experiments based on three challenging datasets: Caltech-101, Scene 15 and Willow-Actions, demonstrate the effectiveness and generality of our framework.
Xu, C., Tao, D., Xu, C. & Rui, Y. 2014, 'Large-margin weakly supervised dimensionality reduction', 31st International Conference on Machine Learning, ICML 2014, pp. 2472-2482.
Copyright &copy; (2014) by the International Machine Learning Society (IMLS) All rights reserved. This paper studies dimensionality reduction in a weakly supervised setting, in which the preference relationship between examples is indicated by weak cues. A novel framework is proposed that integrates two aspects of the large margin principle (angle and distance), which simultaneously encourage angle consistency between preference pairs and maximize the distance between examples in preference pairs. Two specific algorithms are developed: an alternating direction method to learn a linear transformation matrix and a gradient boosting technique to optimize a non-linear transformation directly in the function space. Theoretical analysis demonstrates that the proposed large margin optimization criteria can strengthen and improve the robustness and generalization performance of preference learning algorithms on the obtained low-dimensional subspace. Experimental results on real-world datasets demonstrate the significance of studying dimensionality reduction in the weakly supervised setting and the effectiveness of the proposed framework.
Zhu, Z., You, X., Chent, C.L.P., Tao, D., Jiang, X., You, F. & Zou, J. 2014, 'A noise-robust adaptive hybrid pattern for texture classification', Proceedings - International Conference on Pattern Recognition, pp. 1633-1638.
&copy; 2014 IEEE. In this paper, we focus on developing a novel noise-robust LBP-based texture feature extraction scheme for texture classification. Specifically, two solutions have been proposed to overcome the primary two reasons that cause local binary pattern sensitive to noise. First, a hybrid model is proposed for noise-robust texture description. In this new model, the local primitive micro features are encoded with the texture's global spatial structure to reduce the noise sensitiveness. Second, we design an adaptive quantization algorithm, in which quantization thresholds are choosing adaptively on the basis of the texture's content. Higher noise-tolerance and discriminant power can be obtained in the quantization process. Based on the proposed hybrid texture description model and adaptive quantization algorithm, we develop an adaptive hybrid pattern scheme for noise-robust texture feature extraction. Compared with several state-of-the-art feature extraction schemes, our scheme leads to significant improvement in noisy texture classification.
Hong, Z., Mei, X., Prokhorov, D. & Tao, D. 2013, 'Tracking via Robust Multi-task Multi-view Joint Sparse Representation', Proceedings of IEEE International Conference on Computer Vision, IEEE International Conference on Computer Vision, IEEE, Sydney, pp. 649-656.
Combining multiple observation views has proven beneficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely represented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. We show that the proposed formulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared to several state-of-the-art trackers.
Peng, H., Deng, C., An, L., Gao, X. & Tao, D. 2013, 'Learning to multimodal hash for robust video copy detection', IEEE International Conference on Image Processing, ICIP 2013, IEEE International Conference on Image Processing, ICIP 2013, IEEE, Melbourne, Australia, pp. 4482-4486.
Content-based video copy detection (CBVCD) has attracted increasing attention in recent years. However, video content description and search efficiency are still two challenges in this domain. To cope with these two problems, this paper proposes a novel CBVCD approach with similarity preserving multimodal hash learning (SPM2H). The pre-processed video keyframes are represented as multiple features from different perspectives. SPM2H integrates the multimodal feature fusion and the hashing function learning into a joint framework. Mapping video keyframes into hash codes can conducts fast similarity search in the Hamming space. The experiments show that our approach achieves good performance in accuracy as well as efficiency.
Zhang, T., Ji, R., Liu, W., Tao, D. & Hua, G. 2013, 'Semi-supervised learning with manifold fitted graphs', International Joint Conference on Artificial Intelligence, nternational Joint Conferences on Artificial Intelligence, AAAI, Beijing, China, pp. 1896-1902.
In this paper, we propose a locality-constrained and sparsity-encouraged manifold fitting approach, aiming at capturing the locally sparse manifold structure into neighborhood graph construction by exploiting a principled optimization model. The proposed model formulates neighborhood graph construction as a sparse coding problem with the locality constraint, therefore achieving simultane- ous neighbor selection and edge weight optimiza- tion. The core idea underlying our model is to per- form a sparse manifold fitting task for each data point so that close-by points lying on the same local manifold are automatically chosen to connect and meanwhile the connection weights are acquired by simple geometric reconstruction. We term the nov- el neighborhood graph generated by our proposed optimization model M - Fitted Graph since such a graph stems from sparse manifold fitting. To eval- uate the robustness and effectiveness of M -fitted graphs, we leverage graph-based semi-supervised learning as the testbed. Extensive experiments car- ried out on six benchmark datasets validate that the proposed M -fitted graph is superior to state- of-the-art neighborhood graphs in terms of classi- fication accuracy using popular graph-based semi- supervised learning methods.
Zhou, T. & Tao, D. 2013, 'Shifted Subspaces Tracking on Sparse Outlier for Motion Segmentation', International Joint Conference on Artificial Intelligence, 2013 International Joint Conferences on Artificial Intelligence, AAAI, Beijing, China, pp. 1946-1952.
In low-rank & sparse matrix decomposition, the entries of the sparse part are often assumed to be i.i.d. sampled from a random distribution. But the structure of sparse part, as the central interest of many problems, has been rarely studied. One motivating problem is tracking multiple sparse object flows (motions) in video. We introduce "shifted subspaces tracking (SST)" to segment the motions and recover their trajectories by exploring the low-rank property of background and the shifted subspace property of each motion. SST is composed of two steps, background modeling and flow tracking. In step 1, we propose "semi-soft GoDec" to separate all the motions from the low-rank background L as a sparse outlier S. Its soft-thresholding in updating S significantly speeds up GoDec and facilitates the parameter tuning. In step 2, we update X as S obtained in step 1 and develop "SST algorithm" further decomposing X as X = Si=1k L(i)ot(i)+ S+G, wherein L(i) is a low-rank matrix storing the ith flow after transformation t(i). SST algorithm solves k sub-problems in sequel by alternating minimization, each of which recovers one L(i) and its t(i) by randomized method. Sparsity of L(i) and between-frame affinity are leveraged to save computations. We justify the effectiveness of SST on surveillance video sequences.
Zhou, T. & Tao, D. 2013, 'k-bit Hamming compressed sensing', IEEE International Symposium on Information Theory, 2013 IEEE International Symposium on Information Theory, IEEE, Istanbul, Turkey, pp. 679-683.
Wei, L., Guan, N., Zhang, X., Luo, Z. & Tao, D. 2013, 'Orthogonal Nonnegative Locally Linear Embedding', 2013 IEEE International Conference on Systems, Man, and Cybernetics, 2013 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE, Manchester, UK, pp. 2134-2139.
Nonnegative matrix factorization (NMF) decomposes a nonnegative dataset X into two low-rank nonnegative factor matrices, i.e., W and H, by minimizing either Kullback-Leibler (KL) divergence or Euclidean distance between X and WH. NMF has been widely used in pattern recognition, data mining and computer vision because the non-negativity constraints on both W and H usually yield intuitive parts-based representation. However, NMF suffers from two problems: 1) it ignores geometric structure of dataset, and 2) it does not explicitly guarantee parts-based representation on any datasets. In this paper, we propose an orthogonal nonnegative locally linear embedding (ONLLE) method to overcome aforementioned problems. ONLLE assumes that each example embeds in its nearest neighbors and keeps such relationship in the learned subspace to preserve geometric structure of a dataset. For the purpose of learning parts-based representation, ONLLE explicitly incorporates an orthogonality constraint on the learned basis to keep its spatial locality. To optimize ONLLE, we applied an efficient fast gradient descent (FGD) method on Stiefel manifold which accelerates the popular multiplicative update rule (MUR). The experimental results on real-world datasets show that FGD converges much faster than MUR. To evaluate the effectiveness of ONLLE, we conduct both face recognition and image clustering on real-world datasets by comparing with the representative NMF methods.
Luo, Y., Tao, D., Xu, C., Li, D. & Xu, C. 2013, 'Vector-valued multi-view semi-supervised learning for multi-label image classification', Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI Press, Bellevue, Washington, USA, pp. 647-653.
Images are usually associated with multiple labels and comprised of multiple views, due to each image containing several objects (e.g. a pedestrian, bicycle and tree) and multiple visual features (e.g. color, texture and shape). Currently available tools tend to use either labels or features for classification, but both are necessary to describe the image properly. There have been recent successes in using vector-valued functions, which construct matrix-valued kernels, to explore the multi-label structure in the output space. This has motivated us to develop multi-view vector-valued manifold regularization (MV$^3$MR) in order to integrate multiple features. MV$^3$MR exploits the complementary properties of different features, and discovers the intrinsic local geometry of the compact support shared by different features, under the theme of manifold regularization. We validate the effectiveness of the proposed MV$^3$MR methodology for image classification by conducting extensive experiments on two challenge datasets, PASCAL VOC' 07 and MIR Flickr
Wu, F., Tan, X., Yang, Y., Tao, D., Tang, S. & Zhuang, Y. 2013, 'Supervised Nonnegative Tensor Factorization with Maximum-Margin Constraint', Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI Press, Bellevue, Washington, USA, pp. 962-968.
Non-negative tensor factorization (NTF) has attracted great attention in the machine learning community. In this paper, we extend traditional non-negative tensor factorization into a supervised discriminative decomposition, referred as Supervised Non-negative Tensor Factorization with Maximum-Margin Constraint(SNTFM2). SNTFM2 formulates the optimal discriminative factorization of non-negative tensorial data as a coupled least-squares optimization problem via a maximum-margin method. As a result, SNTFM2 not only faithfully approximates the tensorial data by additive combinations of the basis, but also obtains a strong generalization power to discriminative analysis (in particularfor classification in this paper). The experimental results show the superiority of our proposed model over state-of-the-art techniques on both toy and real world data sets.
Zhou, T. & Tao, D. 2013, 'Greedy Bilateral Sketch, Completion & Smoothing', Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, International Conference on Artificial Intelligence and Statistics, JMLR.org, Scottsdale, AZ, USA, pp. 650-658.
Recovering a large low-rank matrix from highly corrupted, incomplete or sparse outlier overwhelmed observations is the crux of various intriguing statistical problems. We explore the power of "greedy bilateral (GreB)" paradigm in reducing both time and sample complexities for solving these problems. GreB models a low-rank variable as a bilateral factorization, and updates the left and right factors in a mutually adaptive and greedy incremental manner. We detail how to model and solve low-rank approximation, matrix completion and robust PCA in GreBs paradigm. On their MATLAB implementations, approximating a noisy 10000x10000 matrix of rank 500 with SVD accuracy takes 6s; MovieLens10M matrix of size 69878x10677 can be completed in 10s from 30% of 107 ratings with RMSE 0.86 on the rest 70%; the low-rank background and sparse moving outliers in a 120x160 video of 500 frames are accurately separated in 1s. This brings 30 to 100 times acceleration in solving these popular statistical problems
Liu, X., Song, M., Tao, D., Liu, Z., Zhang, L., Chen, C. & Bu, J. 2013, 'Semi-supervised node splitting for random forest construction', 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Portland, Oregon, pp. 492-499.
Node splitting is an important issue in Random Forest but robust splitting requires a large number of training samples. Existing solutions fail to properly partition the feature space if there are insufficient training data. In this paper, we present semi-supervised splitting to overcome this limitation by splitting nodes with the guidance of both labeled and unlabeled data. In particular, we derive a nonparametric algorithm to obtain an accurate quality measure of splitting by incorporating abundant unlabeled data. To avoid the curse of dimensionality, we project the data points from the original high-dimensional feature space onto a low-dimensional subspace before estimation. A unified optimization framework is proposed to select a coupled pair of subspace and separating hyper plane such that the smoothness of the subspace and the quality of the splitting are guaranteed simultaneously. The proposed algorithm is compared with state-of-the-art supervised and semi-supervised algorithms for typical computer vision applications such as object categorization and image segmentation. Experimental results on publicly available datasets demonstrate the superiority of our method.
Deng, C., Ji, R., Liu, W., Tao, D. & Gao, X. 2013, 'Visual Reranking through Weakly Supervised Multi-Graph Learning', IEEE International Conference on Computer Vision, ICCV 2013, IEEE International Conference on Computer Vision, ICCV 2013, IEEE, Sydney, Australia, pp. 2600-2607.
Visual reranking has been widely deployed to refine the quality of conventional content-based image retrieval en- gines. The current trend lies in employing a crowd of re- trieved results stemming from multiple feature modalities to boost the overall performance of visual reranking. Howev- er, a major challenge pertaining to current reranking meth- ods is how to take full advantage of the complementary property of distinct feature modalities. Given a query im- age and one feature modality, a regular visual reranking framework treats the top-ranked images as pseudo positive instances which are inevitably noisy, difficult to reveal this complementary property, and thus lead to inferior ranking performance. This paper proposes a novel image rerank- ing approach by introducing a Co-Regularized Multi-Graph Learning (Co-RMGL) framework, in which the intra-graph and inter-graph constraints are simultaneously imposed to encode affinities in a single graph and consistency across d- ifferent graphs. Moreover, weakly supervised learning driv- en by image attributes is performed to denoise the pseudo- labeled instances, thereby highlighting the unique strength of individual feature modality. Meanwhile, such learning can yield a few anchors in graphs that vitally enable the alignment and fusion of multiple graphs. As a result, an edge weight matrix learned from the fused graph automat- ically gives the ordering to the initially retrieved results. We evaluate our approach on four benchmark image re- trieval datasets, demonstrating a significant performance gain over the state-of-the-arts
Zhou, T., Bian, W. & Tao, D. 2013, 'Divide-and-Conquer Anchoring for Near-Separable Nonnegative Matrix Factorization and Completion in High Dimensions', IEEE 13th International Conference on Data Mining, IEEE 13th International Conference on Data Mining, IEEE, Dallas, TX, USA, pp. 917-926.
Abstract Nonnegative matrix factorization (NMF) becomes tractable in polynomial time with unique solution under separability assumption , which postulates all the data points are contained in the conical hull of a few anchor data points. Recently developed linear programming and greedy pursuit methods can pick out the anchors from noisy data and results in a near-separable NMF. But their efficiency could be seriously weakened in high dimensions. In this paper, we show that the anchors can be precisely located from low- dimensional geometry of the data points even when their high dimensional features suffer from serious incompleteness. Our framework, entitled divide-and-conquer anchoring (DCA), divides the high-dimensional anchoring problem into a few cheaper sub-problems seeking anchors of data projections in low-dimensional random spaces, which can be solved in parallel by any near-separable NMF, and combines all the detected low-dimensional anchors via a fast hypothesis testing to identify the original anchors. We further develop two non- iterative anchoring algorithms in 1D and 2D spaces for data in convex hull and conical hull, respectively. These two rapid algorithms in the ultra low dimensions suffice to generate a robust and efficient near-separable NMF for high-dimensional or incomplete data via DCA. Compared to existing methods, two vital advantages of DCA are its scalability for big data, and capability of handling incomplete and high-dimensional noisy data. A rigorous analysis proves that DCA is able to find the correct anchors of a rank- k matrix by solving O ( k log k ) sub- problems. Finally, we show DCA outperforms state-of-the-art methods on various datasets and tasks.
Zhang, K., Gao, X., Tao, D. & Li, X. 2013, 'Image super-resolution via non-local steering kernel regression regularization', IEEE International Conference on Image Processing, ICIP 2013, IEEE International Conference on Image Processing, ICIP 2013, IEEE, Melbourne, Australia, pp. 943-946.
In this paper, we employ the non-local steering kernel regres- sion to construct an effective regularization term for the sin- gle image super-resolution problem. The proposed method seamlessly integrates the properties of local structural regu- larity and non-local self-similarity existing in natural images, and solves a least squares minimization problem for obtain- ing the desired high-resolution image. Extensive experimen- tal results on both simulated and real low-resolution images demonstrate that the proposed method can restore compelling results with sharp edges and fine textures.
Zhao, H., Cheng, J., Jiang, J. & Tao, D. 2013, 'Multiple instance learning via distance metric optimization', IEEE International Conference on Image Processing, ICIP 2013, IEEE International Conference on Image Processing, ICIP 2013, IEEE, Melbourne, Australia, pp. 2617-2621.
Multiple Instance Learning (MIL) has been widely applied in practice, such as drug activity prediction, content-based im- age retrieval. In MIL, a sample, comprised of a set of in- stances, is called a bag. Labels are assigned to bags instead of instances. The uncertainty of labels on instances makes MIL different from conventional supervised single instance learn- ing (SIL) tasks. Therefore, it is critical to learn an effective mapping to convert an MIL task to an SIL task. In this pa- per, we present OptMILES by learning the optimal transfor- mation on the bag-to-instance similarity measure, exploring the optimal distance metric between instances, by an alternat- ing minimization training procedure. We thoroughly evalu- ate the proposed method on both a synthetic dataset and real world datasets by comparing with representative MIL algo- rithms. The experimental results suggest the effectiveness of OptMILES
Li, J. & Tao, D. 2013, 'A Bayesian factorised covariance model for image analysis', International Joint Conference on Artificial Intelligence, 2013 International Joint Conferences on Artificial Intelligence, AAAI, Beijing, China, pp. 1465-1471.
Mu, Y., Ding, W., Zhou, T. & Tao, D. 2013, 'Constrained stochastic gradient descent for large-scale least squares problem', ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, Chicago, IL, USA, pp. 883-891.
The least squares problem is one of the most important re- gression problems in statistics, machine learning and data mining. In this paper, we present the Constrained Stochas- tic Gradient Descent (CSGD) algorithm to solve the large- scale least squares problem. CSGD improves the Stochastic Gradient Descent (SGD) by imposing a provable constraint that the linear regression line passes through the mean point of all the data points. It results in the best regret bound O (log T ), and fastest convergence speed among all first or- der approaches. Empirical studies justify the effectiveness of CSGD by comparing it with SGD and other state-of-the- art approaches. An example is also given to show how to use CSGD to optimize SGD based least squares problems to achieve a better performance.
Cheng, J., Liu, J., Tao, D., Yin, F., Wong, D.W., Xu, Y. & Wong, T.Y. 2013, 'Superpixel Classification Based Optic Cup Segmentation', Lecture Notes in Computer Science, Springer Berlin Heidelberg, Nagoya, Japan, pp. 421-428.
In this paper, we propose a superpixel classification based optic cup segmentation for glaucoma detection. In the proposed method, each optic disc image is first over-segmented into superpixels. Then mean intensities, center surround statistics and the location features are extracted from each superpixel to classify it as cup or non-cup. The proposed method has been evaluated in one database of 650 images with manual optic cup boundaries marked by trained professionals and one database of 1676 images with diagnostic outcome. Experimental results show average overlapping error around 26.0% compared with manual cup region and area under curve of the receiver operating characteristic curve in glaucoma detection at 0.811 and 0.813 in the two databases, much better than other methods. The method could be used for glaucoma screening.
Xu, C., Tao, D., Li, Y. & Xu, C. 2013, 'Large-margin multi-view Gaussian process for image classification', Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, International Conference on Internet Multimedia Computing and Service, ACM, Huangshan, China, pp. 7-12.
In image classification, the goal is to decide whether an image belongs to a certain category or not. Multiple features are usually employed to comprehend the contents of images substantially for the improvement of classification accuracy. However, it also brings in some new problems that how to effectively combine multiple features together, and how to handle the high-dimensional features from multiple views given the small training set. In this paper, we present a large-margin Gaussian process approach to discover the latent space shared by multiple features. Therefore, multiple features can complement each other in this low-dimensional latent space, which derives a strong discriminative ability from the large-margin principle, and then the following classification task can be effectively accomplished. The resulted objective function can be efficiently solved using the gradient descent techniques. Finally, we demonstrate the advantages of the proposed algorithm on real-world image datasets for discovering discriminative latent space and improving the classification performance.
Cheng, X., Chin, A., Ling, C.X., Wang, F., Chen, E., Chen, G., Cui, P., King, I., Tian, J., Wang, J., Ho, J., Ishikawa, Y., Jin, X., Kim, D., Kim, J., Tang, J., Tao, D., Wang, X., Yu, Z., Zhang, D., Zhang, J., Zheng, V. & Zhou, J. 2013, 'Preface to nining and understanding from big data', Proceedings - IEEE 13th International Conference on Data Mining Workshops, ICDMW 2013, pp. xlvi-xlvii.
Wang, B., Gao, X., Li, J., Li, X. & Tao, D. 2013, 'A level set with shape priors using moment-based alignment and locality preserving projections', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 697-704.
A novel level set method (LSM) with shape priors is proposed to implement a shape-driven image segmentation. By using image moments, we deprive the shape priors of position, scale and angle information, consequently obtain the aligned shape priors. Considering that the shape priors sparsely distribute into the observation space, we utilize the locality preserving projections (LPP) to map them into a low dimensional subspace in which the probability distribution is predicted by using kernel density estimation. Finally, a new energy functional with shape priors is developed by combining the negative log-probability of shape priors with other data-driven energy items. We assess the proposed LSM on the synthetic, medical and natural images. The experimental results show that it is superior to the pure data-driven LSMs and the representative LSM with shape priors. &copy; 2013 Springer-Verlag Berlin Heidelberg.
Zhang, L., Zhang, L., Tao, D., Huang, X. & Du, B. 2013, 'Nonnegative discriminative manifold learning for hyperspectral data dimension reduction', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 351-358.
Manifold learning algorithms have been demonstrated to be effective for hyperspectral data dimension reduction (DR). However, the low dimensional feature representation resulted by traditional manifold learning algorithms could not preserve the nonnegative property of the hyperspectral data, which leads inconsistency with the psychological intuition of "combining parts to form a whole". In this paper, we introduce a nonnegative discriminative manifold learning (NDML) algorithm for hyperspectral data DR, which yields a discriminative and low dimensional feature representation, with psychological and physical evidence in the human brain. Our method benefits from both the nonnegative matrix factorization (NMF) algorithm and the discriminative manifold learning (DML) algorithm. We apply the NDML algorithm to hyperspectral remote sensing image classification on HYDICE dataset. Experimental results confirm the efficiency of the proposed NDML algorithm, compared with some existing manifold learning based DR methods. &copy; 2013 Springer-Verlag Berlin Heidelberg.
Du, B., Wang, N., Zhang, L. & Tao, D. 2013, 'Hyperspectral medical images unmixing for cancer screening based on rotational independent component analysis', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 336-343.
Hyperspectral images have shown promising performance in many applications, especially extracting information from remotely sensed geometric images. One obvious advantage is its good ability to reflect the physical meaning from a point view of spectrum, since even two very similar materials would present an obvious difference by a hyperspectral imaging system. Recent work has made great progress on the hyperspectral fluorescence imaging techniques, which makes the elaborate spectral observation of cancer areas possible. Cancer cells would be distinguishable with normal ones when the living body is injected with fluorescence, which helps organs inside the living body emit lights, and then the signals can be obtained by the passive imaging sensor. This paper discusses the ability to screen the cancers by means of hyperspectral bioluminescence images. A rotational independent component analysis method is proposed to solve the problem. Experiments evaluate the superior performance of the proposed ICA-based method to other blind source separation methods: 1) The ICA-based methods do perform well in detect the cancer areas inside the living body; 2) The proposed method presents more accurate cancer areas than other state-of-the-art algorithms. &copy; 2013 Springer-Verlag Berlin Heidelberg.
Gunther, M., Costa-Pazo, A., Ding, C., Boutellaa, E., Chiachia, G., Zhang, H., De Assis Angeloni, M., Struc, V., Khoury, E., Vazquez-Fernandez, E., Tao, D., Bengherabi, M., Cox, D., Kiranyaz, S., De Freitas Pereira, T., Zganec-Gros, J., Argones-Rua, E., Pinto, N., Gabbouj, M., Simoes, F., Dobrisek, S., Gonzalez-Jimenez, D., Rocha, A., Neto, M.U., Pavesic, N., Falcao, A., Violato, R. & Marcel, S. 2013, 'The 2013 face recognition evaluation in mobile environment', Proceedings - 2013 International Conference on Biometrics, ICB 2013.
Automatic face recognition in unconstrained environments is a challenging task. To test current trends in face recognition algorithms, we organized an evaluation on face recognition in mobile environment. This paper presents the results of 8 different participants using two verification metrics. Most submitted algorithms rely on one or more of three types of features: local binary patterns, Gabor wavelet responses including Gabor phases, and color information. The best results are obtained from UNILJ-ALP, which fused several image representations and feature types, and UC-HU, which learns optimal features with a convolutional neural network. Additionally, we assess the usability of the algorithms in mobile devices with limited resources. &copy; 2013 IEEE.
Li, J. & Tao, D. 2012, 'Sampling Normal Distribution Restricted on Multiple Regions', International Conference on Neural Information Processing, Springer-Verlag, Doha, Qatar, pp. 492-500.
We develop an accept-reject sampler for probability densities that have the similar form of a normal density function, but supported on restricted regions. Compared to existing techniques, the proposed method deals with multiple disjoint regions, truncated on one or both sides. For the original problem of sampling from one region, the efficiency is enhanced as well. We verify the desirable attributes of the proposed algorithm by both theoretical analysis and simulation studies.
Wu, Z., Wu, J., Cao, J. & Tao, D. 2012, 'HySAD: a semi-supervised hybrid shilling attack detector for trustworthy product recommendation', KDD '12: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Beijing, China, pp. 985-993.
Shilling attackers apply biased rating profiles to recommender systems for manipulating online product recommendations. Although many studies have been devoted to shilling attack detection, few of them can handle the hybrid shilling attacks that usually happen in practice, and the studies for real-life applications are rarely seen. Moreover, little attention has yet been paid to modeling both labeled and unlabeled user profiles, although there are often a few labeled but numerous unlabeled users available in practice. This paper presents a Hybrid Shilling Attack Detector, or HySAD for short, to tackle these problems. In particular, HySAD introduces MC-Relief to select effective detection metrics, and Semi-supervised Naive Bayes (SNB_lambda) to precisely separate Random-Filler model attackers and Average-Filler model attackers from normal users. Thorough experiments on MovieLens and Netflix datasets demonstrate the effectiveness of HySAD in detecting hybrid shilling attacks, and its robustness for various obfuscated strategies. A real-life case study on product reviews of Amazon.cn is also provided, which further demonstrates that HySAD can effectively improve the accuracy of a collaborative-filtering based recommender system, and provide interesting opportunities for in-depth analysis of attacker behaviors. These, in turn, justify the value of HySAD for real-world applications.
Zhou, T. & Tao, D. 2012, 'Labelset anchored subspace ensemble (LASE) for multi-label annotation', Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ACM International Conference on Multimedia Retrieval, ACM, Hong Kong, pp. 1-8.
In multimedia retrieval, multi-label annotation for image, text and video is challenging and attracts rapidly growing interests in past decades. The main crux of multi-label annotation lies on 1) how to reduce the model complexity when the label space expands exponentially with the increase of the number of labels; and 2) how to leverage the label correlations which have broadly believed useful for boosting annotation performance. In this paper, we propose "labelsets anchored subspace ensemble (LASE)" to solve both problems in an efficient scheme, whose training is a regularized matrix decomposition and prediction is an inference of group sparse representations. In order to shrink the label space, we firstly introduce "label distilling" extracting the frequent labelsets to replace the original labels. In the training stage, the data matrix is decomposed as the sum of several low-rank matrices and a sparse residual via a randomized optimization, where each low-rank part defines a feature subspace mapped by a labelset. A manifold regularization is applied to map the labelset geometry to the geometry of the obtained subspaces. In the prediction stage, the group sparse representation of a new sample on the subspace ensemble is estimated by group lasso. The selected subspaces indicate the labelsets that the sample should be annotated with. Experiments on several benchmark datasets of texts, images, web data and videos validate the appealing performance of LASE in multi-label annotation.
Liu, X., Song, M., Zhang, L., Tao, D., Bu, J. & Chen, C. 2012, 'Pedestrian detection using a mixture mask model', 2012 9th IEEE International Conference on Networking, Sensing and Control (ICNSC), IEEE International Conference on Networking, Sensing and Control (ICNSC), IEEE, Beijing, China, pp. 271-276.
Pedestrian detection is one of the fundamental tasks of an intelligent transportation system. Differences in illumination, posture and point of view make pedestrian detection confront with great challenges. In this paper, we focus on the main defect in the existing methods: the interference of the non-person area. Firstly, we use mapping vectors to map the original feature matrix to the different mask spaces, then using a part-based structure, we implicitly formulate the model into a multiple-instance problem, and finally use a MIL-SVM to solve the problem. Based on the model, we design a system which can find pedestrians from pictures. We give detailed description on the model and the system in this paper. The experimental results on public data sets show that our method decreases the miss rate greatly
Shi, M., Sun, X., Tao, D. & Xu, C. 2012, 'Exploiting visual word co-occurrence for image retrieval', Proceedings of the 20th ACM international Conference on Multimedia, ACM international Conference on Multimedia, ACM, Nara, Japan, pp. 69-78.
Bag-of-visual-words (BOVW) based image representation has received intense attention in recent years and has improved content based image retrieval (CBIR) significantly. BOVW does not consider the spatial correlation between visual words in natural images and thus biases the generated visual words towards noise when the corresponding visual features are not stable. In this paper, we construct a visual word co-occurrence table by exploring visual word co-occurrence extracted from small affine-invariant regions in a large collection of natural images. Based on this visual word co-occurrence table, we first present a novel high-order predictor to accelerate the generation of neighboring visual words. A co-occurrence matrix is introduced to refine the similarity measure for image ranking. Like the inverse document frequency (idf), it down-weights the contribution of the words that are less discriminative because of frequent co-occurrence. We conduct experiments on Oxford and Paris Building datasets, in which the ImageNet dataset is used to implement a large scale evaluation. Thorough experimental results suggest that our method outperforms the state-of-the-art, especially when the vocabulary size is comparatively small. In addition, our method is not much more costly than the BOVW model.
Wang, S., Zhao, Q., Song, M., Bu, J., Chen, C. & Tao, D. 2012, 'Learning Visual Saliency based on Object's Relative Relationship', The 19th International Conference on Neural Information Processing, Springer, Doha, Qatar, pp. 318-327.
As a challenging issue in both computer vision and psychological research, visual attention has arouse a wide range of discussions and studies in recent years. However, conventional computational models mainly focus on low-level information, while high-level information and their interrelationship are ignored. In this paper, we stress the issue of relative relationship between high-level information, and a saliency model based on low-level and high-level analysis is also proposed. Firstly, more than 50 categories of objects are selected from nearly 800 images in MIT data set[1], and concrete quantitative relationship is learned based on detail analysis and computation. Secondly, using the least square regression with constraints method, we demonstrate an optimal saliency model to produce saliency maps. Experimental results indicate that our model outperforms several state-of-art methods and produces better matching to human eye-tracking data.
Chen, D., Cheng, J. & Tao, D. 2012, 'Clustering-based Discriminative Locality Alignment for Face Gender Recognition', 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Vilamoura, Portugal, pp. 4156-4161.
To facilitate human-robot interactions, human gender information is very important. Motivated by the success of manifold learning for visual recognition, we present a novel clustering-based discriminative locality alignment (CDLA) algorithm to discover the low-dimensional intrinsic submanifold from the embedding high-dimensional ambient space for improving the face gender recognition performance. In particular, CDLA exploits the global geometry through k-means clustering, extracts the discriminative information through margin maximization and explores the local geometry through intra cluster sample concentration. These three properties uniquely characterize CDLA for face gender recognition. The experimental results obtained from the FERET data sets suggest the superiority of the proposed method in terms of recognition speed and accuracy by comparing with several representative methods
Zhou, T. & Tao, D. 2012, '1-bit Hamming compressed sensing', IEEE International Symposium on Information Theory - Proceedings, IEEE International Symposium on Information Theory, IEEE, Cambridge, USA, pp. 1862-1866.
Compressed sensing (CS) and 1-bit CS cannot directly recover quantized signals preferred in digital systems and require time consuming recovery. In this paper, we introduce 1-bit Hamming compressed sensing (HCS) that directly recovers a k-bit quantized signal of dimension n from its 1-bit measurements via invoking n times of Kullback-Leibler divergence based nearest neighbor search. Compared to CS and 1-bit CS, 1-bit HCS allows the signal to be dense, takes considerably less (linear and non-iterative) recovery time and requires substantially less measurements. Moreover, 1-bit HCS can accelerate 1bit CS recover. We study a quantized recovery error bound of 1-bit HCS for general signals. Extensive numerical simulations verify the appealing accuracy, robustness, efficiency and consistency of 1-bit HCS.
Zhou, T. & Tao, D. 2012, 'Bilateral random projections', 2012 IEEE International Symposium on Information Theory Proceedings (ISIT), IEEE International Symposium on Information Theory Proceedings (ISIT), IEEE, Cambridge, USA, pp. 1286-1290.
Low-rank structure have been profoundly studied in data mining and machine learning. In this paper, we show a dense matrix X's low-rank approximation can be rapidly built from its left and right random projections Y1 = XA1 and Y2 = XT A2, or bilateral random projection (BRP). We then show power scheme can further improve the precision. The deterministic, average and deviation bounds of the proposed method and its power scheme modification are proved theoretically. The effectiveness and the efficiency of BRP based low-rank approximation is empirically verified on both artificial and real datasets.
He, L., Tao, D., Li, X. & Gao, X. 2012, 'Sparse representation for blind image quality assessment', 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Providence, USA, pp. 1146-1153.
Blind image quality assessment (BIQA) is an important yet difficult task in image processing related applications. Existing algorithms for universal BIQA learn a mapping from features of an image to the corresponding subjective quality or divide the image into different distortions before mapping. Although these algorithms are promising, they face the following problems: (1) they require a large number of samples (pairs of distorted image and its subjective quality) to train a robust mapping; (2) they are sensitive to different datasets; and (3) they have to be retrained when new training samples are available. In this paper, we introduce a simple yet effective algorithm based upon the sparse representation of natural scene statistics (NSS) feature. It consists of three key steps: extracting NSS features in the wavelet domain, representing features via sparse coding, and weighting differential mean opinion scores by the sparse coding coefficients to obtain the final visual quality values. Thorough experiments on standard databases show that the proposed algorithm outperforms representative BIQA algorithms and some full-reference metrics.
Zhang, K., Gao, X., Tao, D. & Li, X. 2012, 'Multi-scale dictionary for single image super-resolution', 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Providence, USA, pp. 1114-1121.
Reconstruction- and example-based super-resolution (SR) methods are promising for restoring a high-resolution (HR) image from low-resolution (LR) image(s). Under large magnification, reconstruction-based methods usually fail to hallucinate visual details while example-based methods sometimes introduce unexpected details. Given a generic LR image, to reconstruct a photo-realistic SR image and to suppress artifacts in the reconstructed SR image, we introduce a multi-scale dictionary to a novel SR method that simultaneously integrates local and non-local priors. The local prior suppresses artifacts by using steering kernel regression to predict the target pixel from a small local area. The non-local prior enriches visual details by taking a weighted average of a large neighborhood as an estimate of the target pixel. Essentially, these two priors are complementary to each other. Experimental results demonstrate that the proposed method can produce high quality SR recovery both quantitatively and perceptually.
Zhang, L., Song, M., Sun, L., Liu, X., Wang, Y., Tao, D., Bu, J. & Chen, C. 2012, 'Spatial graphlet matching kernel for recognizing aerial image categories', Proceedings - International Conference on Pattern Recognition, pp. 2813-2816.
This paper presents a method for recognizing aerial image categories based on matching graphlets(i.e., small connected subgraphs) extracted from aerial images. By constructing a Region Adjacency Graph (RAG) to encode the geometric property and the color distribution of each aerial image, we cast aerial image category recognition as RAG-to-RAG matching. Based on graph theory, RAG-to-RAG matching is conducted by matching all their respective graphlets. Towards an effective graphlet matching process, we develop a manifold embedding algorithm to transfer different-sized graphlets into equal length feature vectors and further integrate these feature vectors into a kernel. This kernel is used to train a SVM [8] classifier for aerial image categories recognition. Experimental results demonstrate our method outperforms several state-of-the-art object/scene recognition models. &copy; 2012 ICPR Org Committee.
Liu, X., Song, M., Zhang, L., Wang, S., Bu, J., Chen, C. & Tao, D. 2012, 'Joint shot boundary detection and key frame extraction', Proceedings - International Conference on Pattern Recognition, pp. 2565-2568.
Representing a video by a set of key frames is useful for efficient video browsing and retrieving. But key frame extraction keeps a challenge in the computer vision field. In this paper, we propose a joint framework to integrate both shot boundary detection and key frame extraction, wherein three probabilistic components are taken into account, i.e. The prior of the key frames, the conditional probability of shot boundaries and the conditional probability of each video frame. Thus the key frame extraction is treated as a Maximum A Posteriori which can be solved by adopting alternate strategy. Experimental results show that the proposed method preserves the scene level structure and extracts key frames that are representative and discriminative. &copy; 2012 ICPR Org Committee.
Zhou, T. & Tao, D. 2012, 'Multi-label subspace ensemble', Journal of Machine Learning Research, pp. 1444-1452.
&copy; Copyright 2012 by the authors. A challenging problem of multi-label learning is that both the label space and the model complexity will grow rapidly with the increase in the number of labels, and thus makes the available training samples insufficient for training a proper model. In this paper, we eliminate this problem by learning a mapping of each label in the feature space as a robust subspace, and formulating the prediction as finding the group sparse representation of a given instance on the subspace ensemble. We term this approach as "multi-label subspace ensemble (MSE)". In the training stage, the data matrix is decomposed as the sum of several low-rank matrices and a sparse residual via a randomized optimization, where each low-rank part defines a subspace mapped by a label. In the prediction stage, the group sparse representation on the subspace ensemble is estimated by group lasso. Experiments on several benchmark datasets demonstrate the appealing performance of MSE.
Du, B., Zhang, L., Tao, D., Wang, N. & Chen, T. 2012, 'A spectral dissimilarity constrained nonnegative matrix factorization based cancer screening algorithm from hyperspectral fluorescence images', ICCH 2012 Proceedings - International Conference on Computerized Healthcare, pp. 112-119.
Bioluminescence from living body can help screen cancers without penetrating the inside of living body. Hyperspectral imaging technique is a novel way to obtain physical meaningful signatures, providing very fine spectral resolution, that can be very used in distinguishing different kinds of materials, and have been widely used in remote sensing field. Fluorescence imaging has proved effective in monitoring probable cancer cells. Recent work has made great progress on the hyperspectral fluorescence imaging techniques, which makes the elaborate spectral observation of cancer areas possible. So how to propose the proper hyperspectral image processing methods to handle the hyperspectral medical images is of practical importance. Cancer cells would be distinguishable with normal ones when the living body is injected with fluorescence, which helps organs inside the living body emit lights, and then the signals can be catched by the passive imaging sensor. Spectral unmixing technique in hyperspectral remote sensing has been introduced to detect the probable cancer areas. However, since the cancer areas are small and the normal areas and the cancer ares may not pure pixels so that the predefined endmembers would not available. In this case, the classic blind signals separation methods are applicable. Considering the spectral dissimilarity between cancer and normal cells, a novel spectral dissimilarity constrained based NMF method is proposed in this paper for cancer screening from fluorescence hyperspectral images. Experiments evaluate the performance of variable NMF based method and our proposed spectral dissimilarity based NMF methods: 1) The NMF methods do perform well in detect the cancer areas inside the living body; 2) The spectral dissimilarity constrained NMF present more accurate cancer areas; 3) The spectral dissimilarity constraint presents better performance in different SNR and different purities of the mixing endmembers. &copy; 2012 IEEE.
Tao, D. & Reformat, M. 2012, 'Preface', Proceedings - 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012, pp. xvi-xvii.
Han, B., Li, X., Gao, X. & Tao, D. 2012, 'A biological inspired features based saliency map', 2012 International Conference on Computing, Networking and Communications, ICNC'12, pp. 371-375.
A visual attention mechanism is believed to be responsible for the most informative spots in complex scenes. We proposed a novel biologically inspired attention model based on Cortex-like mechanisms and sparse representation. Biological Inspired model, HMAX model, is a feature extraction method and this method is motivated by a quantitative model of visual cortex. This biological inspired feature will be used to build the Saliency Criteria to measure the perspective fields. Saliency Criteria is obtained from Shannon's information entropy and sparse representation. We demonstrate that the proposed model achieves superior accuracy with the comparison to classical approach in static saliency map generation on real data of natural scenes and psychology stimuli patterns. &copy; 2012 IEEE.
Gao, Y., Wang, M., Luan, H., Shen, J., Yan, S. & Tao, D. 2011, 'Tag-Based Social Image Search with Visual-Text Joint Hypergraph Learning', Proceedings of the 2011 ACM Multimedia Conference & Co-Located Workshops, ACM Multimedia, Association for Computing Machinery, Inc. (ACM)., Scottsdale, Arizona, USA, pp. 1517-1520.
Tag-based social image search has attracted great interest and how to order the search results based on relevance level is a research problem. Visual content of images and tags have both been investigated. However, existing methods usually employ tags and visual content separately or sequentially to learn the image relevance. This paper proposes a tag-based image search with visual-text joint hypergraph learning. We simultaneously investigate the bag-of-words and bag-of-visual-words representations of images and accomplish the relevance estimation with a hypergraph learning approach. Each textual or visual word generates a hyperedge in the constructed hypergraph. We conduct experiments with a real-world data set and experimental results demonstrate the effectiveness of our approach.
Li, J. & Tao, D. 2011, 'A Probabilistic Model for Discovering High Level Brain Activities from fMRI', Lecture Notes in Computer Science, International Conference on Neural Information Processing, Springer-Verlag, Shanghai, China, pp. 329-336.
Functional magnetic resonance imaging (fMRI) has provided an invaluable method of investing real time neuron activities. Statistical tools have been developed to recognise the mental state from a batch of fMRI observations over a period. However, an interesting question is whether it is possible to estimate the real time mental states at each moment during the fMRI observation. In this paper, we address this problem by building a probabilistic model of the brain activity. We model the tempo-spatial relations among the hidden high-level mental states and observable low-level neuron activities. We verify our model by experiments on practical fMRI data. The model also implies interesting clues on the task-responsible regions in the brain.
Li, J., Bian, W., Tao, D. & Zhang, C. 2011, 'Learning Colours from Textures by Sparse Manifold Embedding', Lecture Notes in Artificial Intelligence.AI 2011: Advances in Artificial Intelligence.24th Australasian Joint Conference, AI 2011: Advances in Artificial Intelligence.24th Australasian Joint Conference, Springer-Verlag Berlin / Heidelberg, Perth, Australia, pp. 600-608.
The capability of inferring colours from the texture (grayscale contents) of an image is useful in many application areas, when the imaging device/environment is limited. Traditional colour assignment involves intensive human effort. Automatic methods have been proposed to establish relations between image textures and the corresponding colours. Existing research mainly focuses on linear relations. In this paper, we employ sparse constraints in the model of texture-colour relationship. The technique is developed on a locally linear model, which assumes manifold assumption of the distribution of the image data. Given the texture of an image patch, learning the model transfers colours to the texture patch by combining known colours of similar texture patches. The sparse constraint checks the contributing factors in the model and helps improve the stability of the colour transfer. Experiments show that our method gives superior results to those of the previous work.
Li, J. & Tao, D. 2011, 'Wisdom of Crowds: Single Image Super-resolution from the Web', ofWorkshop on Large Scale Visual Analytics with the IEEE International Conference on Data Mining, IEEE- Computer Society, Vancouver, Canada, pp. 812-816.
This paper addresses the problem of learning based single image super-resolution. Previous research on this problem employs human user to provide a set of images that are similar to the target image as a reference. Then the superresolution algorithm can learn from the provided reference images to predict the high resolution details for the target image. We propose a fully automatic scheme, which leverages the knowledge of the entire visual world and to query relevant references from the Internet. The proposed scheme is free of human supervision, and the performance compromise is small. We conduct experiments to show the effectiveness of the method.
Luo, Y., Tao, D., Geng, B., Xu, C. & Maybank, S. 2011, 'Shared Feature Extraction for Semi-supervised Image Classification', Proceedings of ACM Multimedia 2011 and the co-located Workshops, Association for Computing Machinery, Inc. (ACM), Scottsdale, AZ, USA, pp. 1165-1168.
Multi-task learning (MTL) plays an important role in image analysis applications, e.g. image classification, face recognition and image annotation. That is because MTL can estimate the latent shared subspace to represent the common features given a set of images from different tasks. However, the geometry of the data probability distribution is always supported on an intrinsic image sub-manifold that is embedded in a high dimensional Euclidean space. Therefore, it is improper to directly apply MTL to multiclass image classification. In this paper, we propose a manifold regularized MTL (MRMTL) algorithm to discover the latent shared subspace by treating the high-dimensional image space as a submanifold embedded in an ambient space. We conduct experiments on the PASCAL VOC&acirc;07 dataset with 20 classes and the MIR dataset with 38 classes by comparing MRMTL with conventional MTL and several representative image classification algorithms. The results suggest that MRMTL can properly extract the common features for image representation and thus improve the generalization performance of the image classification models.
Wang, S., Song, M., Tao, D., Zhang, L., Bu, J. & Chen, C. 2011, 'Opponent and Feedback: Visual Attention Captured', Lecture Notes in Computer Science. Neural Information Processing. 18th International Conference, ICONIP 2011, International Conference on Neural Information Processing, Springer-Verlag Berlin / Heidelberg, Shanghai, China, pp. 667-675.
Visual attention, as an important issue in computer vision field, has been raised for decades. And many approaches mainly based on the bottom-up or top-down computing models have been put forward to solve this problem. In this paper, we propose a new and effective saliency model which considers the inner opponent relationship of the image information. Inspired by the opponent and feedback mechanism in human perceptive learning, firstly, some opponent models are proposed based on the analysis of original color image information. Secondly, as both positive and negative feedbacks can be learned from the opponent models, we construct the saliency map according to the optimal combination of these feedbacks by using the least square regression with constraints method. Experimental results indicate that our model achieves a better performance both in the simple and complex nature scenes.
Zheng, S., Xie, B., Huang, K. & Tao, D. 2011, 'Multi-view Pedestrian Recognition Using Shared Dictionary Learning with Group Sparsity', Lecture Notes in Computer Science. Neural Information Processing. 18th International Conference, ICONIP 2011, International Conference on Neural Information Processing, Springer-Verlag Berlin / Heidelberg, Shanghai, China, pp. 629-638.
Pedestrian tracking in multi-camera is an important task in intelligent visual surveillance system, but it suffers from the problem of large appearance variations of the same person under different cameras. Inspired by the success of existing view transformation model in multi-view gait recognition, we present a novel view transformation model based approach named shared dictionary learning with group sparsity to address the problem. It projects the pedestrian appearance feature descriptor in probe view into the gallery one before feature descriptors matching. In this case, L1,&acirc; regularization over the latent embedding ensure the lower reconstruction error and more stable feature descriptors generation, comparing with the existing Singular Value Decomposition. Although the overall optimization function is not global convex, the Nesterovs optimal gradient scheme ensure the efficiency and reliability. Experiments on VIPeR dataset show that our approach reaches the state-of-the-art performance.
Mu, Y., Ding, W., Tao, D. & Stepinski, T.T. 2011, 'Biologically Inspired Model for Crater Detection', International Joint Conference on Neural Networks, IEEE International Joint Conference on Neural Networks, IEEE, San Jose, pp. 2487-2495.
Crater detection from panchromatic images has its unique challenges when comparing to the traditional object detection tasks. Craters are numerous, have large range of sizes and textures, and they continuously merge into image backgrounds. Using traditional feature construction methods to describe craters cannot well embody the diversified characteristics of craters. On the other hand, we are gradually revealing the secret of object recognition in the primate&acirc;s visual cortex. Biologically inspired features, designed to mimic the human cortex, have achieved great performance on object detection problem. Therefore, it is time to reconsider crater detection by using biologically inspired features. In this paper, we represent crater images by utilizing the C1 units, which correspond to complex cells in the visual cortex, and pool over the S1 units by using a maximum operation to reserve only the maximum response of each local area of the S1 units. The features generated from the C1 units have the hallmarks of size invariance and location invariance. We further extract a set of improved Haar features on each C1 map which contain gradient texture information. We apply this biologically inspired based Haar feature to crater detection. Because the feature construction process requires a set of biologically inspired transformations, these features are embedded in a high dimension space. We apply a subspace learning algorithm to find the intrinsic discriminative subspace for accurate classification. Experiments on Mars impact crater dataset show the superiority of the proposed method.
Zhang, L., Bian, W., Song, M., Tao, D. & Liu, X. 2011, 'Integrating Local Features into Discriminative Graphlets for Scene Classification', Lecture Notes in Computer Science. Neural Information Processing. 18th International Conference, ICONIP 2011, International Conference, ICONIP, Springer-Verlag Berlin / Heidelberg, Shanghai, China, pp. 657-666.
Scene classification plays an important role in multimedia information retrieval. Since local features are robust to image transformation, they have been used extensively for scene classification. However, it is difficult to encode the spatial relations of local features in the classification process. To solve this problem, Geometric Local Features Integration(GLFI) is proposed. By segmenting a scene image into a set of regions, a so-called Region Adjacency Graph(RAG) is constructed to model their spatial relations. To measure the similarity of two RAGs, we select a few discriminative templates and then use them to extract the corresponding discriminative graphlets(connected subgraphs of an RAG). These discriminative graphlets are further integrated by a boosting strategy for scene classification. Experiments on five datasets validate the effectiveness of our GLFI.
Zhou, T. & Tao, D. 2011, 'GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case', Proceedings of the 28th International Conference on Machine Learning, International Conference on Machine Learning, Omnipress, Bellevue,Washington, USA, pp. 33-40.
Low-rank and sparse structures have been profoundly studied in matrix completion and compressed sensing. In this paper, we develop 'Go Decomposition' (GoDec) to efficiently and robustly estimate the low-rank part L and the sparse part S of a matrix X = L + S + G with noise G. GoDec alternatively assigns the low-rank approximation of X - S to L and the sparse approximation of X - L to S. The algorithm can be significantly accelerated by bilateral random projections (BRP). We also propose GoDec for matrix completion as an important variant. We prove that the objective value ||X - L - S||2F converges to a local minimum, while L and S linearly converge to local optimums. Theoretically, we analyze the influence of L, S and G to the asymptotic/convergence speeds in order to discover the robustness of GoDec. Empirical studies suggest the efficiency, robustness and effectiveness of GoDec comparing with representative matrix decomposition and completion tools, e.g., Robust PCA and OptSpace.
zhuo, Z., Bu, J., Tao, D., Zhang, L., Song, M. & Chen, C. 2011, 'Describing Human Identity Using Attributes', Lecture Notes In Computer Science, Neural Information Processing,18th International Conference, ICONIP 2011, Proceedings, Part II, International Conference on Neural Information Processing, Springer-Verlag, Shanghai, China, pp. 86-94.
Smart surveillance of wide areas requires a system of multiple cameras to keep tracking people by their identities. In such multiview systems, the captured body figures and appearances of human, the orientation as well as the backgrounds are usually different camera by camera, which brings challenges to the view-invariant representation of human towards correct identification. In order to tackle this problem, we introduce an attribute based description of human identity in this paper. Firstly, two groups of attributes responsible for figure and appearance are obtained respectively. Then, Predict-Taken and Predict-Not-Taken schemes are defined to overcome the attribute-loss problem caused by different view of multi-cameras, and the attribute representation of human is obtained consequently. Thirdly, the human identification based on voter-candidate scheme is carried out by taking into account of human outside of the training data. Experimental results show that our method is robust to view changes, attributes-loss and different backgrounds.
Cheng, J., Tao, D., Liu, J., Wong, D.W., Lee, B.H., Baskaran, M., Wong, T.Y. & Aung, T. 2011, 'Focal Biologically Inspired Feature for Glaucoma Type Classification', Lecture Notes in Computer Science. 14th International Conference. Medical Image Computing and Computer-Assisted Intervention MICCAI2011, Medical Image Computing and Computer-Assisted Intervention â MICCAI, Springer-Verlag Berlin / Heidelberg, Toronto, Canada, pp. 91-98.
Glaucoma is an optic nerve disease resulting in loss of vision. There are two common types of glaucoma: open angle glaucoma and angle closure glaucoma. Glaucoma type classification is important in glaucoma diagnosis. Ophthalmologists examine the iridocorneal angle between iris and cornea to determine the glaucoma type. However, manual classification/grading of the iridocorneal angle images is subjective and time consuming. To save workload and facilitate large-scale clinical use, it is essential to determine glaucoma type automatically. In this paper, we propose to use focal biologically inspired feature for the classification. The iris surface is located to determine the focal region. The association between focal biologically inspired feature and angle grades is built. The experimental results show that the proposed method can correctly classify 85.2% images from open angle glaucoma and 84.3% images from angle closure glaucoma. The accuracy could be improved close to 90% with more images included in the training. The results show that the focal biologically inspired feature is effective for automatic glaucoma type classification. It can be used to reduce workload of ophthalmologists and diagnosis cost.
Li, Y., Geng, B., Zha, Z., Tao, D., Yang, L. & Xu, C. 2011, 'Difficulty Guided Image Retrieval using Linear Multiview Embedding', Proceedings of the 2011 ACM Multimedia Conference, ACM, Scottsdale, Arizona, USA, pp. 1169-1172.
Existing image retrieval systems suffer from a radical performance variance for different queries. The bad initial search results for &acirc;difficult&acirc; queries may greatly degrade the performance of their subsequent refinements, especially the refinement that utilizes the information mined from the search results, e.g., pseudo relevance feedback based reranking. In this paper, we tackle this problem by proposing a query difficulty guided image retrieval system, which selectively performs reranking according to the estimated query difficulty. To improve the performance of both reranking and difficulty estimation, we apply multiview embedding (ME) to images represented by multiple different features for integrating a joint subspace by preserving the neighborhood information in each feature space. However, existing ME approaches suffer from both out of sample and huge computational cost problems, and cannot be applied to online reranking or offline large-scale data processing for practical image retrieval systems. Therefore, we propose a linear multiview embedding algorithm which learns a linear transformation from a small set of data and can effectively infer the subspace features of new data. Empirical evaluations on both Oxford and 500K ImageNet datasets suggest the effectiveness of the proposed difficulty guided retrieval system with LME.
Mu, Y., Ding, W., Morabito, M. & Tao, D. 2011, 'Empirical Discriminative Tensor Analysis for Crime Forecasting', Proceedings Knowledge Science, Engineering and Management 5th International Conference, KSEM 2011, International Conference on Knowledge Science, Engineering and Management, Springer, Irvine, USA, pp. 293-304.
Police agencies have been collecting an increasing amount of information to better understand patterns in criminal activity. Recently there is a new trend in using the data collected to predict where and when crime will occur. Crime prediction is greatly beneficial because if it is done accurately, police practitioner would be able to allocate resources to the geographic areas most at risk for criminal activity and ultimately make communities safer. In this paper, we discuss a new four-order tensor representation for crime data. The tensor encodes the longitude, latitude, time, and other relevant incidents. Using the tensor data structure, we propose the Empirical Discriminative Tensor Analysis (EDTA) algorithm to obtain sufficient discriminative information while minimizing empirical risk simultaneously. We examine the algorithm on the crime data collected in one Northeastern city. EDTA demonstrates promising results compared to other existing methods in real world scenarios.
Zhang, C. & Tao, D. 2011, 'Generalization Bound for Infinitely Divisible Empirical Process', Journal of Machine Learning Research Workshop and Conference Proceedings, Fourteenth International Conference on Artificial Intelligence and Statistics, MIT Press, Ft. Lauderdale, FL, USA, pp. 864-872.
In this paper, we study the generalization bound for an empirical process of samples independently drawn from an infinitely divisible (ID) distribution, which is termed as the ID empirical process. In particular, based on a martingale method, we develop deviation inequalities for the sequence of random variables of an ID distribution. By applying the obtained deviation inequalities, we then show the generalization bound for ID empirical process based on the annealed Vapnik-Chervonenkis (VC) entropy. Afterward, according to Sauer's lemma, we get the generalization bound for ID empirical process based on the VC dimension. Finally, by using a resulted result bound, we analyze the asymptotic convergence of ID empirical process and show that the convergence rate of ID empirical process is faster than the results of the generic i.i.d. empirical process (Vapnik, 1999).
Li, Y., Geng, B., Zha, Z., Li, Y., Tao, D. & Xu, C. 2011, 'Query Expansion by Spatial Co-occurrence for Image Retrieval', Proceedings of the 2011 ACM Multimedia Conference & Co-Located Workshops, ACM Multimedia, Association for Computing Machinery, Inc. (ACM), Scottsdale, Arizona, USA, pp. 1177-1180.
The well-known bag-of-features (BoF) model is widely utilized for large scale image retrieval. However, BoF model lacks the spatial information of visual words, which is informative for local features to build up meaningful visual patches. To compensate for the spatial information loss, in this paper, we propose a novel query expansion method called Spatial Co-occurrence Query Expansion (SCQE), by utilizing the spatial co-occurrence information of visual words mined from the database images to boost the retrieval performance. In offline phase, for each visual word in the vocabulary, we treat the visual words that are frequently co-occurred with it in the database images as neighbors, base on which a spatial co-occurrence graph is built. In online phase, a query image can be expanded with some spatial co-occurred but unseen visual words according to the spatial co-occurrence graph, and the retrieval performance can be improved by expanding these visual words appropriately. Experimental results demonstrate that, SCQE achieves promising improvements over the typical BoF baseline on two datasets comprising 5K and 505K images respectively.
Bian, W. & Tao, D. 2011, 'Learning a Distance Metric by Empirical Loss Minimization', Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, AAAI Press/International Joint Conferences on Artificial Intelligence, Barcelona, Catalonia, Spain, pp. 1186-1191.
In this paper, we study the problem of learning a metric and propose a loss function based metric learning framework, in which the metric is estimated by minimizing an empirical risk over a training set. With mild conditions on the instance distribution and the used loss function, we prove that the empirical risk converges to its expected counterpart at rate of root-n. In addition, with the assumption that the best metric that minimizes the expected risk is bounded, we prove that the learned metric is consistent. Two example algorithms are presented by using the proposed loss function based metric learning framework, each of which uses a log loss function and a smoothed hinge loss function, respectively. Experimental results suggest the effectiveness of the proposed algorithms.
Zhang, L., Song, M., Bian, W., Tao, D., Liu, X., Bu, J. & Chen, C. 2011, 'Feature Relationships Hypergraph for Multimodal Recognition', Lecture Notes in Computer Science. Neural Information Processing. 18th International Conference, ICONIP 2011, International Conference on Neural Information Processing, Springer-Verlag Berlin / Heidelberg, Shanghai, China, pp. 589-598.
Utilizing multimodal features to describe multimedia data is a natural way for accurate pattern recognition. However, how to deal with the complex relationships caused by the tremendous multimodal features and the curse of dimensionality are still two crucial challenges. To solve the two problems, a new multimodal features integration method is proposed. Firstly, a so-called Feature Relationships Hypergraph (FRH) is proposed to model the high-order correlations among the multimodal features. Then, based on FRH, the multimodal features are clustered into a set of low-dimensional partitions. And two types of matrices, the interpartition matrix and intra-partition matrix, are computed to quantify the inter- and intra- partition relationships. Finally, a multi-class boosting strategy is developed to obtain a strong classifier by combining the weak classifiers learned from the intra- partition matrices. The experimental results on different datasets validate the effectiveness of our approach
Gao, F., Gao, X., Tao, D., Li, X., He, L. & Lu, W. 2011, 'Universal no reference image quality assessment metrics based on local dependency', 1st Asian Conference on Pattern Recognition, ACPR 2011, pp. 298-302.
No reference image quality assessment (NR-IQA) is to evaluate image quality blindly without the ground truth. Most of the emerging NR-IQA algorithms are only effective for some specific distortion. Universal metrics that can work for various categories of distortions have hardly been explored, and the algorithms available are not fully adequate in performance. In this paper, we study the local dependency (LD) characteristic of natural images, and propose two universal NR-IQA metrics: LD global scheme (LD-GS) and LD two-step scheme (LD-TS). We claim that the local dependency characteristic among wavelet coefficients is disturbed by various distortion processes, and the disturbances are strongly correlated to image qualities. Experimental results on LIVE database II demonstrate that both the proposed metrics are highly consistent with the human perception and outpace the state-of-the-art NR-IQA indexes and some full reference quality indicators for diverse distortions and across the entire database. &copy; 2011 IEEE.
Xie, B., Bian, W., Tao, D. & Chordia, P. 2011, 'Music tagging with regularized logistic regression', Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, pp. 711-716.
In this paper, we present a set of simple and efficient regularized logistic regression algorithms to predict tags of music. We first vector-quantize the delta MFCC features using k-means and construct "bag-of-words" representation for each song. We then learn the parameters of these logistic regression algorithms from the "bag-of- words" vectors and ground truth labels in the training set. At test time, the prediction confidence by the linear classifiers can be used to rank the songs for music annotation and retrieval tasks. Thanks to the convex property of the objective functions, we adopt an efficient and scalable generalized gradient method to learn the parameters, with global optimum guaranteed. And we show that these efficient algorithms achieve stateof- the-art performance in annotation and retrieval tasks evaluated on CAL-500. &copy; 2011 International Society for Music Information Retrieval.
Tao, D., Li, Z., Li, J., Katsaggelos, A., Bian, W., Chen, Y., Fan, J., Hu, Y., Izquierdo, E., Ji, S., Jiang, X., Kwok, J., Li, Q., Liu, J., Loog, M., Lu, H., Lu, Y.L., Maybank, S.J., Pau, D., Ro, Y.M., Shan, C., Shao, L., Smeraldi, F., Song, Y., Wang, F., Xu, Y., Yang, L., Ye, J., Yu, J., Zhang, D., Zhang, J., Zhao, X., Huang, K., Ying, Y. & Zhou, C. 2011, 'Preface', Proceedings - IEEE International Conference on Data Mining, ICDM, pp. xliii-xliv.
Zhang, J., Wang, N., Gao, X., Tao, D. & Li, X. 2011, 'Face sketch-photo synthesis based on support vector regression', Proceedings - International Conference on Image Processing, ICIP, pp. 1125-1128.
The existing face sketch-photo synthesis methods trend to lose some vital details more or less. In this paper, we propose a novel sketch-photo synthesis approach based on support vector regression (SVR) to handle this difficulty. First, we utilize an existing method to acquire the initial estimate of the synthesized image. Then, the final synthesized image is obtained by combining the initial estimate and the SVR based high frequency information together to further enhance the quality of synthesized image. Experimental results on the benchmark database and our new constructed database demonstrate that the proposed method can achieve significant improvement on perceptual quality. Moreover, the synthesized face images can obtain higher recognition rate when used in retrieval system. &copy; 2011 IEEE.
Mu, G., Gao, X., Zhang, K., Li, X. & Tao, D. 2011, 'Single image super resolution with high resolution dictionary', Proceedings - International Conference on Image Processing, ICIP, pp. 1141-1144.
Image super resolution (SR) is a technique to estimate or synthesize a high resolution (HR) image from one or several low resolution (LR) images. This paper proposes a novel framework for single image super resolution based on sparse representation with high resolution dictionary. Unlike the previous methods, the training set is constructed from the HR images instead of HR-LR image pairs. Due to this property, there is no need to retrain a new dictionary when the zooming factor changed. Given a testing LR image, the patch-based representation coefficients and the desired image are estimated alternately through the use of dynamic group sparsity, the fidelity term and the non-local means regularization. Experimental results demonstrate the effectiveness of the proposed algorithm. &copy; 2011 IEEE.
Li, Y., Luo, Y., Tao, D. & Xu, C. 2011, 'Query difficulty guided image retrieval system', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 479-482.
Query difficulty estimation is a useful tool for content-based image retrieval. It predicts the performance of the search result of a given query, and thus it can guide the pseudo relevance feedback to rerank the image search results, and can be used to re-write the given query by suggesting "easy" alternatives. This paper presents a query difficulty estimation guided image retrieval system. The system initially estimates the difficulty of a given query image by analyzing both the query image and the retrieved top ranked images. Different search strategies are correspondingly applied to improve the retrieval performance. &copy; 2011 Springer-Verlag Berlin Heidelberg.
Cheng, J., Tao, D., Liu, J., Wong, D.W.K., Lee, B.H., Baskaran, M., Wong, T.Y. & Aung, T. 2011, 'Focal biologically inspired feature for glaucoma type classification', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 91-98.
Glaucoma is an optic nerve disease resulting in loss of vision. There are two common types of glaucoma: open angle glaucoma and angle closure glaucoma. Glaucoma type classification is important in glaucoma diagnosis. Ophthalmologists examine the iridocorneal angle between iris and cornea to determine the glaucoma type. However, manual classification/grading of the iridocorneal angle images is subjective and time consuming. To save workload and facilitate large-scale clinical use, it is essential to determine glaucoma type automatically. In this paper, we propose to use focal biologically inspired feature for the classification. The iris surface is located to determine the focal region. The association between focal biologically inspired feature and angle grades is built. The experimental results show that the proposed method can correctly classify 85.2% images from open angle glaucoma and 84.3% images from angle closure glaucoma. The accuracy could be improved close to 90% with more images included in the training. The results show that the focal biologically inspired feature is effective for automatic glaucoma type classification. It can be used to reduce workload of ophthalmologists and diagnosis cost. &copy; 2011 Springer-Verlag.
Zhou, Z., Song, M., Zhang, L., Tao, D., Bu, J. & Chen, C. 2011, 'KPose: A new representation for action recognition', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 436-447.
Human action recognition is an important problem in computer vision. Most existing techniques use all the video frames for action representation, which leads to high computational cost. Different from these techniques, we present a novel action recognition approach by describing the action with a few frames of representative poses, namely kPose. Firstly, a set of pose templates corresponding to different pose classes are learned based on a newly proposed Pose-Weighted Distribution Model (PWDM). Then, a local set of kPoses describing an action are extracted by clustering the poses belonging to the action. Thirdly, a further kPose selection is carried out to remove the redundant poses among the different local sets, which leads to a global set of kPoses with the least redundancy. Finally, a sequence of kPoses is obtained to describe the action by searching the nearest kPose in the global set. And the proposed action classification is carried out by comparing the obtained pose sequence with each local set of kPose. The experimental results validate the proposed method by remarkable recognition accuracy. &copy; 2011 Springer-Verlag Berlin Heidelberg.
Wang, N., Gao, X., Tao, D. & Li, X. 2011, 'Face sketch-photo synthesis under multi-dictionary sparse representation framework', Proceedings - 6th International Conference on Image and Graphics, ICIG 2011, pp. 82-87.
Sketch-photo synthesis is one of the important research issues of heterogeneous image transformation. Some available popular synthesis methods, like locally linear embedding (LLE), usually generate sketches or photos with lower definition and blurred details, which reduces the visual quality and the recognition rate across the heterogeneous images. In order to improve the quality of the synthesized images, a multi-dictionary sparse representation based face sketch-photo synthesis model is constructed. In the proposed model, LLE is used to estimate an initial sketch or photo, while the multi-dictionary sparse representation model is applied to generate the high frequency and detail information. Finally, by linear superimposing, the enhanced face sketch or photo can be obtained. Experimental results show that sketches and photos synthesized by the proposed method have higher definition and much richer detail information resulting in a higher face recognition rate between sketches and photos. &copy; 2011 IEEE.
Zhang, C. & Tao, D. 2011, 'Risk bounds for infinitely divisible distribution', Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, UAI 2011, pp. 796-803.
In this paper, we study the risk bounds for samples independently drawn from an infinitely divisible (ID) distribution. In particular, based on a martingale method, we develop two deviation inequalities for a sequence of random variables of an ID distribution with zero Gaussian component. By applying the deviation inequalities, we obtain the risk bounds based on the covering number for the ID distribution. Finally, we analyze the asymptotic convergence of the risk bound derived from one of the two deviation inequalities and show that the convergence rate of the bound is faster than the result for the generic i.i.d. empirical process (Mendelson, 2003).
Li, J. & Tao, D. 2010, 'Boosted Dynamic Cognitive Activity Recognition from Brain Images', Proceedings - The 9th International Conference on Machine Learning and Applications, ICMLA 2010, International Conference on Machine Learning and Applications, IEEE, Washington, D.C., USA, pp. 361-366.
Functional Magnetic Resonance Imaging (fMRI) has become an important diagnostic tool for measuring brain haemodynamics. Previous research on analysing fMRI data mainly focuses on detecting low-level neuron activation from the ensued haemodynamic activities. An important recent advance is to show that the high-level cognitive status is recognisable from a period of fMRI records. Nevertheless, it would also be helpful to reveal dynamics of cognitive activities during the period. In this paper, we tackle the problem of discovering the dynamic cognitive activities by proposing an algorithm of boosted structure learning. We employ statistic model of random fields to represent the dynamics of the brain. To exploit the rich fMRI observations with reasonable model complexity, we build multiple models, where one model links the cognitive activities to only a fraction of the fMRI observations. We combine the simple models by using an altered AdaBoost scheme for multi-class structure learning and show theoretical justification of the proposed scheme. Empirical test shows the method effectively links the physiological and the psychological activities of the brain.
Zhou, T., Tao, D. & Wu, X. 2010, 'NESVM: a Fast Gradient Method for Support Vector Machines', IEEE International Conference on Data Mining, IEEE International Conference on Data Mining, IEEE, Sydney, Australia, pp. 679-688.
Support vector machines (SVMs) are invaluable tools for many practical applications in artificial intelligence, e.g., classification and event recognition. However, popular SVM solvers are not sufficiently efficient for applications with a great deal of samples as well as a large number of features. In this paper, thus, we present NESVM, a fast gradient SVM solver that can optimize various SVM models, e.g., classical SVM, linear programming SVM and least square SVM. Compared against SVM-Perf \cite{SVM_Perf}\cite{PerfML} (whose convergence rate in solving the dual SVM is upper bounded by $\mathcal O(1/\sqrt{k})$ where $k$ is the number of iterations) and Pegasos \cite{Pegasos} (online SVM that converges at rate $\mathcal O(1/k)$ for the primal SVM), NESVM achieves the optimal convergence rate at $\mathcal O(1/k^{2})$ and a linear time complexity. In particular, NESVM smoothes the non-differentiable hinge loss and $\ell_1$-norm in the primal SVM. Then the optimal gradient method without any line search is adopted to solve the optimization. In each iteration round, the current gradient and historical gradients are combined to determine the descent direction, while the Lipschitz constant determines the step size. Only two matrix-vector multiplications are required in each iteration round. Therefore, NESVM is more efficient than existing SVM solvers. In addition, NESVM is available for both linear and nonlinear kernels. We also propose homotopy NESVM'' to accelerate NESVM by dynamically decreasing the smooth parameter and using the continuation method. Our experiments on census income categorization, indoor/outdoor scene classification, event recognition and scene recognition suggest the efficiency and the effectiveness of NESVM. The MATLAB code of NESVM will be available on our website for further assessment.
Xie, B., Song, M., Mu, Y. & Tao, D. 2010, 'Random Projection Tree and Multiview Embedding for Large-scale Image Retrieval', The 17th International Conference on Neural Information Processing: Models and Applications (ICONIP 2010), International Conference on Neural Information Processing, Springer, Sydney, Australia, pp. 641-649.
Image retrieval on large-scale datasets is challenging. Current indexing schemes, such as k-d tree, suffer from the &acirc;curse of dimensionality&acirc;. In addition, there is no principled approach to integrate various features that measure multiple views of images, such as color histogram and edge directional histogram. We propose a novel retrieval system that tackles these two problems simultaneously. First, we use random projection trees to index data whose complexity only depends on the low intrinsic dimension of a dataset. Second, we apply a probabilistic multiview embedding algorithm to unify different features. Experiments on MSRA large-scale dataset demonstrate the efficiency and effectiveness of the proposed approach.
Li, J. & Tao, D. 2010, 'An Exponential Family Extension to Principal Component Analysis', International Conference on Neural Information Processing 2011, International Conference on Neural Information Processing, Springer, Sydney, Australia, pp. 1-9.
In this paper, we present a unified probabilistic model for constrained factorisation models, which employs exponential family distributions to represent the constrained factors. Our main objective is to provide a versatile framework, on which prototype models with various constraints can be implemented effortlessly. For learning the proposed stochastic model, Gibbs sampling is employed for model inference. We also demonstrate the utility and versatility of the model by experiments.
Zhou, T. & Tao, D. 2010, 'Backward-Forward Least Angle Shrinkage for Sparse Quadratic Optimization', Proceedings, Part I of the 17th International Conference on Neural Information Processing: Theory and Algorithms (ICONIP 2010), International Conference on Neural Information Processing, Springer, Sydney, Australia, pp. 388-396.
In compressed sensing and statistical society, dozens of algorithms have been developed to solve &acirc;1 penalized least square regression, but constrained sparse quadratic optimization (SQO) is still an open problem. In this paper, we propose backward-forward least angle shrinkage (BF-LAS), which provides a scheme to solve general SQO including sparse eigenvalue minimization. BF-LAS starts from the dense solution, iteratively shrinks unimportant variables&acirc; magnitudes to zeros in the backward step for minimizing the &acirc;1 norm, decreases important variables&acirc; gradients in the forward step for optimizing the objective, and projects the solution on the feasible set defined by the constraints. The importance of a variable is measured by its correlation w.r.t the objective and is updated via least angle shrinkage (LAS). We show promising performance of BF-LAS on sparse dimension reduction.
Xie, B., Mu, Y. & Tao, D. 2010, 'm-SNE: Multiview Stochastic Neighbor Embedding', Lecture Notes in Computer Science - Vol 6443 - Proceedings of the 17th International Conference on Neural Information Processing, International Conference on Neural Information Processing, Springer, Sydney, Australia, pp. 338-346.
In many real world applications, different features (or multiview data) can be obtained and how to duly utilize them in dimension reduction is a challenge. Simply concatenating them into a long vector is not appropriate because each view has its specific statistical property and physical interpretation. In this paper, we propose a multiview stochastic neighbor embedding (m-SNE) that systematically integrates heterogeneous features into a unified representation for subsequent processing based on a probabilistic framework. Compared with conventional strategies, our approach can automatically learn a combination coefficient for each view adapted to its contribution to the data embedding. Also, our algorithm for learning the combination coefficient converges at a rate of O(1/k2)O1k2 , which is the optimal rate for smooth problems. Experiments on synthetic and real datasets suggest the effectiveness and robustness of m-SNE for data visualization and image retrieval.
Bian, W., Li, J. & Tao, D. 2010, 'Feature Extraction For FMRI-based Human Brain Activity Recognition', Machine Learning In Medical Imaging, International Workshop on Machine Learning in Medical Imaging, Springer-Verlag Berlin, Beijing, China, pp. 148-156.
Mitchell et al. [9] demonstrated that support vector machines (SVM) are effective to classify the cognitive state of a human subject based on fRMI images observed over a single time interval. However, the direct use of classifiers on active voxels veils
Huang, Y., Huang, K., Tan, T. & Tao, D. 2009, 'A Novel Visual Organization Based On Topological Perception', Computer Vision - ACCV 2009, Pt I, Asian Conference on Computer Vision, Springer-Verlag, Xian, China, pp. 180-189.
What are the primitives of visual perception? The early feature-analysis theory insists on it being a local-to-global process which has acted as the foundation of most computer vision applications for the past 30 years. The early holistic registration th
Gao, X., Liul, N., Lui, W., Tao, D. & Li, X. 2010, 'Spatio-temporal Salience Based Video Quality Assessment', IEEE International Conference On Systems, Man And Cybernetics (SMC 2010), IEEE International Conference on Systems, Man and Cybernetics, IEEE, Istanbul, Turkey, pp. 1501-1505.
It is important to design an effective and efficient objective metric of the video quality in video processing areas. The most reliable way is subjective evaluation, thus the most reasonable objective metric should adequately consider characteristics of
Li, X., He, L., Lu, W., Gao, X. & Tao, D. 2010, 'A Novel Image Quality Metric Based On Morphological Component Analysis', IEEE International Conference On Systems, Man And Cybernetics (SMC 2010), IEEE International Conference on Systems, Man and Cybernetics, IEEE, Istanbul, Turkey, pp. 1449-1454.
Due to that human eye has different perceptual characteristics for different morphological components, so a novel image quality metric is proposed by incorporating morphological component analysis (MCA) and human visual system (HVS), which is capable of
Yan, J., Tao, D., Tian, C., Gao, X. & Li, X. 2010, 'Chinese Text Detection And Location For Images In Multimedia Messaging Service', IEEE International Conference On Systems, Man And Cybernetics (SMC 2010), IEEE International Conference on Systems, Man and Cybernetics, IEEE, Istanbul, Turkey, pp. 3896-3901.
Text detection and recognition for images in multimedia messaging service is a very important task. Since Chinese characters are composed of four kinds of strokes, i.e., horizontal line, top-down vertical line, left-downward slope line and short pausing
Liu, W., Ma, S., Tao, D., Liu, J. & Liu, P. 2010, 'Semi-Supervised Sparse Metric Learning Using Alternating Linearization Optimization', Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data, ACM SIGKDD International Conference on Knowledge Discovery and Data, Association for Computing Machinery, Inc. (ACM), Washington, DC, USA, pp. 1139-1147.
In plenty of scenarios, data can be represented as vectors and then mathematically abstracted as points in a Euclidean space. Because a great number of machine learning and data mining applications need proximity measures over data, a simple and universal distance metric is desirable, and metric learning methods have been explored to produce sensible distance measures consistent with data relationship. However, most existing methods suffer from limited labeled data and expensive training. In this paper, we address these two issues through employing abundant unlabeled data and pursuing sparsity of metrics, resulting in a novel metric learning approach called semi-supervised sparse metric learning. Two important contributions of our approach are: 1) it propagates scarce prior affinities between data to the global scope and incorporates the full affinities into the metric learning; and 2) it uses an efficient alternating linearization method to directly optimize the sparse metric. Compared with conventional methods, ours can effectively take advantage of semi-supervision and automatically discover the sparse metric structure underlying input data patterns. We demonstrate the efficacy of the proposed approach with extensive experiments carried out on six datasets, obtaining clear performance gains over the state-of-the-arts.
Liu, W., Tian, X., Tao, D. & Liu, J. 2010, 'Constrained Metric Learning via Distance Gap Maximization', Proceedings of the Twenty-Fourth AAAi conference on Artificial Intelligence (AAAI-10), AAAI Conference on Artificial Intelligenc, AAAI Press, Atlanta Georgia, pp. 518-524.
Vectored data frequently occur in a variety of fields, which are easy to handle since they can be mathematically abstracted as points residing in a Euclidean space. An appropriate distance metric in the data space is quite demanding for a great number of applications. In this paper, we pose robust and tractable metric learning under pairwise constraints that are expressed as similarity judgements between data pairs. The major features of our approach include: 1) it maximizes the gap between the average squared distance among dissimilar pairs and the average squared distance among similar pairs; 2) it is capable of propagating similar constraints to all data pairs; and 3) it is easy to implement in contrast to the existing approaches using expensive optimization such as semidefinite programming. Our constrained metric learning approach has widespread applicability without being limited to particular backgrounds. Quantitative experiments are performed for classification and retrieval tasks, uncovering the effectiveness of the proposed approach.
Xie, B., Song, M. & Tao, D. 2010, 'Large-scale dictionary learning for local coordinate coding', British Machine Vision Conference, BMVC 2010 - Proceedings.
Local coordinate coding has recently been introduced to learning visual feature dictionary and achieved top level performance for object recognition. However, the computational complexity scales linearly with the number of samples, so it does not scale up well for large-scale databases. In this paper, we propose an online learning algorithm which, at every iteration round, only processes one or a mini-batch of random samples (e.g., two hundred samples). Our algorithm theoretically ensures the convergence to the expected objective at infinity. Experiments on object recognition demonstrate the advantage over the original local coordinate coding method in terms of efficiency with comparable performance. &copy; 2010. The copyright of this document resides with its authors.
Li, J. & Tao, D. 2010, 'Simple exponential family PCA', Journal of Machine Learning Research, pp. 453-460.
Bayesian principal component analysis (BPCA), a probabilistic reformulation of PCA with Bayesian model selection, is a systematic approach to determining the number of essential principal components (PCs) for data representation. However, it assumes that data are Gaussian distributed and thus it cannot handle all types of practical observations, e.g. integers and binary values. In this paper, we propose simple exponential family PCA (SePCA), a generalised family of probabilistic principal component analysers. SePCA employs exponential family distributions to handle general types of observations. By using Bayesian inference, SePCA also automatically discovers the number of essential PCs. We discuss techniques for fitting the model, develop the corresponding mixture model, and show the effectiveness of the model based on experiments.
Zhang, C. & Tao, D. 2010, 'Risk bounds for Lévy processes in the PAC-learning framework', Journal of Machine Learning Research, pp. 948-955.
L&eacute;vy processes play an important role in the stochastic process theory. However, since samples are non-i.i.d., statistical learning results based on the i.i.d. scenarios cannot be utilized to study the risk bounds for L&eacute;vy processes. In this paper, we present risk bounds for non-i.i.d. samples drawn from L&eacute;vy processes in the PAC-learning framework. In particular, by using a concentration inequality for infinitely divisible distributions, we first prove that the function of risk error is Lipschitz continuous with a high probability, and then by using a specific concentration inequality for L&eacute;vy processes, we obtain the risk bounds for non-i.i.d. samples drawn from L&eacute;vy processes without Gaussian components. Based on the resulted risk bounds, we analyze the factors that affect the convergence of the risk bounds and then prove the convergence. Copyright 2010 by the authors.
Lu, W., Li, J., Tao, D., Gao, X. & Li, X. 2010, 'A new quality metric for compressed images based on DDCT', Proceedings of SPIE - The International Society for Optical Engineering.
As the performance-indicator of the image processing algorithms or systems, image quality assessment (IQA) has attracted the attention of many researchers. Aiming to the widely used compression standards, JPEG and JPEG2000, we propose a new no reference (NR) metric for compressed images to do IQA. This metric exploits the causes of distortion by JPEG and JPEG2000, employs the directional discrete cosine transform (DDCT) to obtain the detail and direction information of the images and incorporates with the visual perception to obtain the image quality index. Experimental results show that the proposed metric not only has outstanding performance on JPEG and JPEG2000 images, but also applicable to other types of artifacts. &copy; 2010 SPIE.
Gao, F., Gao, X., Lu, W., Tao, D. & Li, X. 2010, 'An image quality assessment metric with no reference using hidden Markov tree model', Proceedings of SPIE - The International Society for Optical Engineering.
No reference (NR) method is the most difficult issue of image quality assessment (IQA), which does not need the original image or its features as reference and only depends on the statistical law of the natural images. So, the NR-IQA is a high-level evaluation for image quality and simulates the complicated subjective process of human beings. This paper presents a NR-IQA metric based on Hidden Markov Tree (HMT) model. First, the HMT is utilized to model natural images, and the statistical properties of the model parameters are analyzed to mimic variation of image degradation. Then, by estimating the deviation degree of the parameters from the statistical law the distortion metric is constructed. Experimental results show that the proposed image quality assessment model is consistent well with the subjective evaluation results, and outperforms the existing models on difference distortions. &copy; 2010 SPIE.
Deng, C. & Tao, D. 2010, 'Color image quality assessment with biologically inspired feature and machine learning', Proceedings of SPIE - The International Society for Optical Engineering.
In this paper, we present a new no-reference quality assessment metric for color images by using biologically inspired features (BIFs) and machine learning. In this metric, we first adopt a biologically inspired model to mimic the visual cortex and represent a color image based on BIFs which unifies color units, intensity units and C1 units. Then, in order to reduce the complexity and benefit the classification, the high dimensional features are projected to a low dimensional representation with manifold learning. Finally, a multiclass classification process is performed on this new low dimensional representation of the image and the quality assessment is based on the learned classification result in order to respect the one of the human observers. Instead of computing a final note, our method classifies the quality according to the quality scale recommended by the ITU. The preliminary results show that the developed metric can achieve good quality evaluation performance. &copy; 2010 SPIE.
Gao, X., Lu, W., Tao, D. & Li, X. 2010, 'Image quality assessment and human visual system', Proceedings of SPIE - The International Society for Optical Engineering.
This paper summaries the state-of-the-art of image quality assessment (IQA) and human visual system (HVS). IQA provides an objective index or real value to measure the quality of the specified image. Since human beings are the ultimate receivers of visual information in practical applications, the most reliable IQA is to build a computational model to mimic the HVS. According to the properties and cognitive mechanism of the HVS, the available HVS-based IQA methods can be divided into two categories, i.e., bionics methods and engineering methods. This paper briefly introduces the basic theories and development histories of the above two kinds of HVS-based IQA methods. Finally, some promising research issues are pointed out in the end of the paper. &copy; 2010 SPIE.
Si, S., Tao, D. & Chan, K.P. 2010, 'Discriminative Hessian Eigenmaps for face recognition', ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 5586-5589.
Dimension reduction algorithms have attracted a lot of attentions in face recognition because they can select a subset of effective and efficient discriminative features in the face images. Most of dimension reduction algorithms can not well model both the intra-class geometry and interclass discrimination simultaneously. In this paper, we introduce the Discriminative Hessian Eigenmaps (DHE), a novel dimension reduction algorithm to address this problem. DHE will consider encoding the geometric and discriminative information in a local patch by improved Hessian Eigenmaps and margin maximization respectively. Empirical studies on public face database thoroughly demonstrate that DHE is superior to popular algorithms for dimension reduction, e.g., FLDA, LPP, MFA and DLA. &copy;2010 IEEE.
Wen, J., Gao, X., Li, X. & Tao, D. 2009, 'Incremental Learning Of Weighted Tensor Subspace For Visual Tracking', 2009 IEEE International Conference On Systems, Man And Cybernetics (SMC 2009), IEEE International Conference on Systems, Man and Cybernetics, IEEE, San Antonio, TX, pp. 3688-3693.
Tensor analysis has been widely utilized in image-related machine learning applications, which has preferable performance over the vector-based approaches for its capability of holding the spatial structure information in some research field. The traditi
Wang, B., Gao, X., Tao, D., Li, X. & Li, J. 2009, 'The Gabor-based Tensor Level Set Method For Multiregional Image Segmentation', Computer Analysis Of Images And Patterns, Proceedings, International Conference on Computer Analysis of Images and Patterns, Springer-Verlag Berlin, Munster, Germany, pp. 987-994.
This paper represents a new level set method for multiregional image segmentation. It employs the Gabor filter bank to extract local geometrical features and builds the pixel tensor representation whose dimensionality is reduced by using the offline tens
Bian, W., Cheng, J.L. & Tao, D. 2009, 'Biased Isomap Projections For Interactive Reranking', ICME: 2009 IEEE International Conference On Multimedia And Expo, Vols 1-3, IEEE International Conference on Multimedia and Expo, IEEE, New York, NY, pp. 1632-1635.
Image search has recently gained more and more attention for various applications. To capture users' intensions and to bridge the gap between the low level visual features and the high level semantics, a dozen of interactive reranking (IR) or relevance f
Song, D. & Tao, D. 2009, 'Discrminative Geometry Preserving Projections', 2009 16th IEEE International Conference On Image Processing, Vols 1-6, IEEE International Conference on Image Processing, IEEE, Cairo, Egypt, pp. 2429-2432.
Dimension reduction algorithms have attracted a lot of attentions in face recognition and human gait recognition because they can select a subset of effective and efficient discriminative features. In this paper, we apply the Discriminative Geometry Pres
Bian, W. & Tao, D. 2009, 'Dirichlet Mixture Allocation For Multiclass Document Collections Modeling', 2009 9th IEEE International Conference On Data Mining, IEEE International Conference on Data Mining, IEEE, Miami Beach, FL, pp. 711-715.
Topic model, Latent Dirichlet Allocation (LDA), is an effective tool for statistical analysis of large collections of documents. In LDA, each document is modeled as a mixture of topics and the topic proportions are generated from the unimodal Dirichlet d
Su, Y., Tao, D., Li, X. & Gao, X. 2009, 'Texture Representation In AAM Using Gabor Wavelet And Local Binary Patterns', 2009 IEEE International Conference On Systems, Man And Cybernetics (SMC 2009), Vols 1-9, IEEE International Conference on Systems, Man and Cybernetics, IEEE, San Antonio, TX, pp. 3274-3279.
Active appearance model (AAM) has been widely used for modeling the shape and the texture of deformable objects and matching new ones effectively. The traditional AAM consists of two parts, shape model and texture model. In the texture model, for the sak
Zhou, T. & Tao, D. 2009, 'Manifold Elastic Net For Sparse Learning', 2009 IEEE International Conference On Systems, Man And Cybernetics (SMC 2009), Vols 1-9, IEEE International Conference on Systems, Man and Cybernetics, IEEE, San Antonio, TX, pp. 3699-3704.
In this paper, we present the manifold elastic net (MEN) for sparse variable selection. MEN combines merits of the manifold regularization and the elastic net regularization, so it considers both the nonlinear manifold structure of a dataset and the spar
Wang, Y., Gao, X., Li, X., Tao, D. & Wang, B. 2009, 'Embedded Geometric Active Contour With Shape Constraint For Mass Segmentation', Computer Analysis Of Images And Patterns, Proceedings, International Conference on Computer Analysis of Images and Patterns, Springer-Verlag Berlin, Munster, Germany, pp. 995-1002.
Mass boundary segmentation plays an important role in computer aided diagnosis (CAD) system. Since the shape and boundary are crucial discriminant features in CAD, the active contour methods are more competitive in mass segmentation. However, the general
Yang, Y., Zhuang, Y., Xu, D., Pan, Y., Tao, D. & Maybank, S. 2009, 'Retrieval Based Interactive Cartoon Synthesis via Unsupervised Bi-Distance Metric Learning', 2009 ACM International Conference on Multimedia Compilation E-Proceedings (with co-located workshops & symposiums), ACM international conference on Multimedia, Association for Computing Machinery, Inc. (ACM), Beijing, China, pp. 311-320.
Cartoons play important roles in many areas, but it requires a lot of labor to produce new cartoon clips. In this paper, we propose a gesture recognition method for cartoon character images with two applications, namely content-based cartoon image retrieval and cartoon clip synthesis. We first define Edge Features (EF) and Motion Direction Features (MDF) for cartoon character images. The features are classified into two different groups, namely intra-features and inter-features. An Unsupervised Bi-Distance Metric Learning (UBDML) algorithm is proposed to recognize the gestures of cartoon character images. Different from the previous research efforts on distance metric learning, UBDML learns the optimal distance metric from the heterogeneous distance metrics derived from intra-features and inter-features. Content-based cartoon character image retrieval and cartoon clip synthesis can be carried out based on the distance metric learned by UBDML. Experiments show that the cartoon character image retrieval has a high precision and that the cartoon clip synthesis can be carried out efficiently.
Bian, W. & Tao, D. 2009, 'Manifold Regularization for SIR with Rate Root-n Convergence', Proceedings of the 2009 Conference ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 22, Annual Conference on Neural Information Processing Systems, Curran Associates, Inc, Vancouver, British Columbia, Canada, pp. 1-9.
In this paper, we study the manifold regularization for the Sliced Inverse Regression (SIR). The manifold regularization improves the standard SIR in two aspects: 1) it encodes the local geometry for SIR and 2) it enables SIR to deal with transductive and semi-supervised learning problems. We prove that the proposed graph Laplacian based regularization is convergent at rate root-n. The projection directions of the regularized SIR are optimized by using a conjugate gradient method on the Grassmann manifold. Experimental results support our theory.
Geng, B., Tao, D., Xu, C., Yang, L. & Hua, X. 2009, 'Ensemble Manifold Regularization', IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Miami USA, pp. 2396-2402.
We propose an automatic approximation of the intrinsic manifold for general semi-supervised learning problems. Unfortunately, it is not trivial to define an optimization function to obtain optimal hyperparameters. Usually, pure cross-validation is considered but it does not necessarily scale up. A second problem derives from the suboptimality incurred by discrete grid search and overfitting problems. As a consequence, we developed an ensemble manifold regularization (EMR) framework to approximate the intrinsic manifold by combining several initial guesses. Algorithmically, we designed EMR very carefully so that it (a) learns both the composite manifold and the semi-supervised classifier jointly; (b) is fully automatic for learning the intrinsic manifold hyperparameters implicitly; (c) is conditionally optimal for intrinsic manifold approximation under a mild and reasonable assumption; and (d) is scalable for a large number of candidate manifold hyperparameters, from both time and space perspectives. Extensive experiments over both synthetic and real datasets show the effectiveness of the proposed framework.
Huang, Q., Jin, L. & Tao, D. 2009, 'An unsupervised feature ranking scheme by discovering biclusters', Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, pp. 4970-4975.
In this paper, we aim to propose an unsupervised feature ranking algorithm for evaluating features using discovered biclusters which are local patterns extracted from a data matrix. The biclusters can be expressed as sub-matrices which are used for scoring relevant features from two aspects, i.e. the interdependence of features and the separability of instances. The features are thereby ranked with respect to their accumulated scores from the total discovered biclusters before the pattern classification. Experimental results show that this proposed algorithm can yield comparable or even better performance in comparison with the well-known Fisher Score, Laplacian Score and Variance Score using several UCI data sets. &copy;2009 IEEE.
Li, Q. & Tao, D. 2009, 'Detecting image points of general imbalance', Proceedings - International Conference on Image Processing, ICIP, pp. 337-340.
Imbalance oriented selection scheme was recently proposed to detect stable image points in weakly or sparsely textured images. The scheme chooses image points whose one-pixel-wide directional intensity variations can be clustered into two imbalanced classes as candidates. In this paper, we propose general imbalance decided by multi-pixel-wide directional intensity variations. We present a case study of general imbalanced points in road sign images, which demonstrates the good potential of general imbalanced points. &copy;2009 IEEE.
Bian, W. & Tao, D. 2009, 'Manifold regularization for SIR with rate root-n convergence', Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference, pp. 117-125.
In this paper, we study the manifold regularization for the Sliced Inverse Regression (SIR). The manifold regularization improves the standard SIR in two aspects: 1) it encodes the local geometry for SIR and 2) it enables SIR to deal with trans-ductive and semi-supervised learning problems. We prove that the proposed graph Laplacian based regularization is convergent at rate root-n. The projection directions of the regularized SIR are optimized by using a conjugate gradient method on the Grassmann manifold. Experimental results support our theory.
Shen, J., Tao, D. & Li, X. 2009, 'Robust semantic concept detection in large video collections', Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, pp. 635-638.
With explosive amounts of video data emerging from the Internet, automatic video concept detection is becoming very important and has been received great attention. However, reported approaches mainly suffer from low identification accuracy and poor robustness over different concepts. One of the main reason is that the existing approaches typically isolate the video signature generation from the process of classifier training. Also, very few approaches consider effects of multiple video features. The paper describes a novel approach fusing different information from diverse knowledge sources to facilitate effective video concept detection. The system is designed based on CM*F scheme [7], [5] and its basic architecture contains two core components including 1) CM*F based video signature generation scheme and 2) CM*F based video concept detector. To evaluate the approach proposed, an extensive experimental study on two large video databases has been carried out. The results demonstrate the superiority of the method in terms of effectiveness and robustness. &copy;2009 IEEE.
Shen, J., Pang, H., Tao, D. & Li, X. 2009, 'Dual phase learning for large scale video gait recognition', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 500-510.
Accurate gait recognition from video is a complex process involving heterogenous features, and is still being developed actively. This article introduces a novel framework, called GC2F, for effective and efficient gait recognition and classification. Adopting a "refinement-and- classification" principle, the framework comprises two components: 1) a classifier to generate advanced probabilistic features from low level gait parameters; and 2) a hidden classifier layer (based on multilayer perceptron neural network) to model the statistical properties of different subject classes. To validate our framework, we have conducted comprehensive experiments with a large test collection, and observed significant improvements in identification accuracy relative to other state-of-the-art approaches. &copy; 2010 Springer-Verlag Berlin Heidelberg.
Si, S., Tao, D. & Chan, K.P. 2009, 'Transfer discriminative logmaps', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 131-143.
In recent years, transfer learning has attracted much attention in multimedia. In this paper, we propose an efficient transfer dimensionality reduction algorithm called transfer discriminative Logmaps (TDL). TDL finds a common feature so that 1) the quadratic distance between the distribution of the training set and that of the testing set is minimized and 2) specific knowledge of the training samples can be conveniently delivered to or shared with the testing samples. Drawing on this common feature in the representation space, our objective is to develop a linear subspace in which discriminative and geometric information can be exploited. TDL adopts the margin maximization to identify discriminative information between different classes, while Logmaps is used to preserve the local-global geodesic distance as well as the direction information. Experiments carried out on both synthetic and real-word image datasets show the effectiveness of TDL for cross-domain face recognition and web image annotation. &copy; 2009 Springer-Verlag Berlin Heidelberg.
Chen, W., Huang, K., Tan, T. & Tao, D. 2009, 'A convergent solution to two dimensional linear discriminant analysis', Proceedings - International Conference on Image Processing, ICIP, pp. 4133-4136.
The matrix based data representation has been recognized to be effective for face recognition because it can deal with the undersampled problem. One of the most popular algorithms, the two dimensional linear discriminant analysis (2DLDA), has been identified to be effective to encode the discriminative information for training matrix represented samples. However, 2DLDA does not converge in the training stage. This paper presents an evolutionary computation based solution, referred to as E-2DLDA, to provide a convergent training stage for 2DLDA. In E-2DLDA, every randomly generated candidate projection matrices are first normalized. The evolutionary computation method optimizes the projection matrices to best separate different classes. Experimental results show E-2DLDA is convergent and outperforms 2DLDA. &copy;2009 IEEE.
Li, Z., Wright, S., Fu, Y., Zhai, F. & Tao, D. 2009, 'Message from the MCC'09 Chairs', Proceedings - International Conference on Computer Communications and Networks, ICCCN.
Geng, B., Tao, D., Xu, C., Yang, L. & Hua, X.S. 2009, 'Ensemble manifold regularization', 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, pp. 2396-2402.
We propose an automatic approximation of the intrinsic manifold for general semi-supervised learning problems. Unfortunately, it is not trivial to define an optimization function to obtain optimal hyperparameters. Usually, pure cross-validation is considered but it does not necessarily scale up. A second problem derives from the suboptimality incurred by discrete grid search and overfitting problems. As a consequence, we developed an ensemble manifold regularization (EMR) framework to approximate the intrinsic manifold by combining several initial guesses. Algorithmically, we designed EMR very carefully so that it (a) learns both the composite manifold and the semi-supervised classifier jointly; (b) is fully automatic for learning the intrinsic manifold hyperparameters implicitly; (c) is conditionally optimal for intrinsic manifold approximation under a mild and reasonable assumption; and (d) is scalable for a large number of candidate manifold hyperparameters, from both time and space perspectives. Extensive experiments over both synthetic and real datasets show the effectiveness of the proposed framework. &copy; 2009 IEEE.
Li, Q., Xia, Z. & Tao, D. 2009, 'A global-to-local scheme for imbalanced point matching', Proceedings - International Conference on Image Processing, ICIP, pp. 2117-2120.
Imbalanced points are image points whose first-order intensity can be clustered into two imbalanced classes. An important property of imbalanced points is that they can be contiguous to each other. The property helps improve the localization accuracy of imbalanced points across imaging variations. Based on this local geometric coherency property, we propose a global-to-local scheme for imbalanced point matching. The proposed matching scheme first builds correspondence between components of coherent imbalanced points and then refines point correspondence within corresponding components. We test the global-to-local matching scheme, compared with several other well-known methods, on a set of groundtruth stereo images. Furthermore, we present a case study of the proposed scheme in face liveness detection. Our results show the promise of the global-to-local matching scheme. &copy;2009 IEEE.
Zhou, H., Tao, D., Yuan, Y. & Li, X. 2009, 'Object trajectory clustering via tensor analysis', Proceedings - International Conference on Image Processing, ICIP, pp. 1945-1948.
In this paper we present a new video object trajectory clustering algorithm1, which allows us to model and analyse the patterns of object behaviors based on the extracted features using tensor analysis. The proposed algorithm consists of three steps as follows: extraction of trajectory features by tensor analysis, non-parametric probabilistic mean shift clustering and clustering correction. The performance of the proposed algorithm is evaluated on standard data-sets and compared with classical techniques. &copy;2009 IEEE.
Pan, J., Pang, Y., Li, X., Yuan, Y. & Tao, D. 2009, 'A fast feature extraction method', ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 1797-1800.
A fast subspace analysis and feature extraction algorithm is proposed which is based on fast Haar transform and integral vector. In rapid object detection and conventional binary subspace learning, Haar-like functions have been frequently used but true Haar functions are seldom employed. In this paper we have shown that true Haar functions can be successfully used to accelerate subspace analysis and feature extraction. Both the training and testing speed of the proposed method is higher than conventional algorithms. Experimental results on face database demonstrated its effectiveness. &copy;2009 IEEE.
Si, S., Tao, D. & Chan, K.P. 2009, 'Cross-domain web image annotation', ICDM Workshops 2009 - IEEE International Conference on Data Mining, pp. 184-189.
In recent years, cross-domain learning algorithms have attracted much attention to solve labeled data insufficient problem. However, these cross-domain learning algorithms cannot be applied for subspace learning, which plays a key role in multimedia, e.g., web image annotation. This paper envisions the cross-domain discriminative subspace learning and provides an effective solution to cross-domain subspace learning. In particular, we propose the cross-domain discriminative Hessian Eigenmaps or CDHE for short. CDHE connects the training and the testing samples by minimizing the quadratic distance between the distribution of the training samples and that of the testing samples. Therefore, a common subspace for data representation can be preserved. We basically expect the discriminative information to separate the concepts in the training set can be shared to separate the concepts in the testing set as well and thus we have a chance to address above cross-domain problem duly. The margin maximization is duly adopted in CDHE so the discriminative information for separating different classes can be well preserved. Finally, CDHE encodes the local geometry of each training class in the local tangent space which is locally isometric to the data manifold and thus can locally preserve the intra-class local geometry. Experimental evidence on real world image datasets demonstrates the effectiveness of CDHE for cross-domain web image annotation. &copy; 2009 IEEE.
Su, Y., Gao, X., Tao, D. & Li, X. 2008, 'Gabor-based Texture Representation In AAMs', 2008 IEEE International Conference On Systems, Man And Cybernetics (SMC), Vols 1-6, IEEE International Conference on Systems, Man and Cybernetics, IEEE, Singapore, Singapore, pp. 2235-2239.
Active Appearance Models (AAMs) are generative models which can describe deformable objects. However, the texture in basic AAMs is represented using intensity values. Despite its simplicity, this representation does not contain enough information for ima
Liu, W., Tao, D. & Liu, J. 2008, 'Transductive Component Analysis', ICDM 2008: Eighth IEEE International Conference On Data Mining, Proceedings, IEEE International Conference on Data Mining, IEEE Computer Soc, Pisa, Italy, pp. 433-442.
In this paper, we study semi-supervised linear dimensionality reduction. Beyond conventional supervised methods which merely consider labeled instances, the semi-supervised scheme allows to leverage abundant and ample unlabeled instances into learning so
Niu, Z., Gao, X., Tao, D. & Li, X. 2008, 'Semantic Video Shot Segmentation Based On Color Ratio Feature And SVM', Proceedings Of The 2008 International Conference On Cyberworlds, International Conference on Cyberworlds, IEEE Computer Soc, Hangzhou, China, pp. 157-162.
With the fast development of video semantic analysis, there has been increasing attention to the typical issue of the semantic analysis of soccer program. Based on the color feature analysis, this paper focuses on the video shot segmentation problem from
Deng, C., Gao, X., Li, X. & Tao, D. 2008, 'Invariant Image Watermarking Based On Local Feature Regions', Proceedings Of The 2008 International Conference On Cyberworlds, International Conference on Cyberworlds, IEEE Computer Soc, Hangzhou, China, pp. 6-10.
In this paper, a robust image watermarking approach is presented based on image local invariant features. The affine invariant point detector is used to extract feature regions of the given host image. Image normalization and dominant gradient orientatio
Lu, W., Gao, X., Li, X. & Tao, D. 2008, 'An Image Quality Assessment Metric Based Contourlet', 2008 15th IEEE International Conference On Image Processing, Vols 1-5, IEEE International Conference on Image Processing, IEEE, San Diego, CA, pp. 1172-1175.
In reduced-reference (RR) image quality assessment (IQA), the visual quality of distorted images is evaluated with only partial information extracted from original images. In this paper, by considering the information of textures and directions during im
Deng, C., Gao, X., Tao, D. & Li, X. 2008, 'Geometrically Invariant Watermarking Using Affine Covariant Regions', 2008 15th IEEE International Conference On Image Processing, Vols 1-5, IEEE International Conference on Image Processing, IEEE, San Diego, CA, pp. 413-416.
In this paper, we present a robust approach to digital watermarking embedding and retrieval for digital images. The affine-invariant point detector is used to extract feature regions of the given host image. Image normalization and dominant gradient orie
Huang, Y., Huang, K., Wang, L., Tao, D., Tan, T. & Li, X. 2008, 'Enhanced Biologically Inspired Model', 2008 IEEE Conference On Computer Vision And Pattern Recognition, Vols 1-12, IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Anchorage, AK, pp. 2000-2007.
It has been demonstrated by Serre et al. that the biologically inspired model (BIM) is effective for object recognition. It outperforms many state-of-the-art methods in challenging databases. However, BIM has the following three problems: a very heavy co
Jia, W., Deng, C., Tao, D. & Zhang, D. 2008, 'Palmprint Identification Based On Directional Representation', 2008 IEEE International Conference On Systems, Man And Cybernetics (SMC), Vols 1-6, IEEE International Conference on Systems, Man and Cybernetics, IEEE, Singapore, Singapore, pp. 1561-1566.
In this paper, we propose a novel approach for palmprint identification, which contains two interesting components. Firstly, we propose the directional representation for appearance based approaches. The new representation is robust to drastic illuminati
Tao, D., Sun, J., Wu, X., Li, X., Shen, J., Maybank, S. & Faloutsos, C. 2007, 'Probabilistic Tensor Analysis with Akaike and Bayesian Information Criteria', Neural Information Processing. 14th International Conference, ICONIP 2007, International Conference on Neural Information Processing, Springer-Verlag Berlin / Heidelberg, Kitakyushu, Japan, pp. 791-801.
From data mining to computer vision, from visual surveillance to biometrics research, from biomedical imaging to bioinformatics, and from multimedia retrieval to information management, a large amount of data are naturally represented by multidimensional arrays, i.e., tensors. However, conventional probabilistic graphical models with probabilistic inference only model data in vector format, although they are very important in many statistical problems, e.g., model selection. Is it possible to construct multilinear probabilistic graphical models for tensor format data to conduct probabilistic inference, e.g., model selection? This paper provides a positive answer based on the proposed decoupled probabilistic model by developing the probabilistic tensor analysis (PTA), which selects suitable model for tensor format data modeling based on Akaike information criterion (AIC) and Bayesian information criterion (BIC). Empirical studies demonstrate that PTA associated with AIC and BIC selects correct number of models.
Zhang, T., Tao, D. & Yang, J. 2008, 'Discriminative Locality Alignment', Computer Vision ECCV 2008, Proceedings Part I, European Conference on Computer Vision, Springer, Marseille, France, pp. 725-738.
Fisher's linear discriminant analysis (LDA), one of the most popular dimensionality reduction algorithms for classification, has three particular problems: it fails to find the nonlinear structure hidden in the high dimensional data; it assumes all samples contribute equivalently to reduce dimension for classification; and it suffers from the matrix singularity problem. In this paper, we propose a new algorithm, termed Discriminative Locality Alignment (DLA), to deal with these problems. The algorithm operates in the following three stages: first, in part optimization, discriminative information is imposed over patches, each of which is associated with one sample and its neighbors; then, in sample weighting, each part optimization is weighted by the margin degree, a measure of the importance of a given sample; and finally, in whole alignment, the alignment trick is used to align all weighted part optimizations to the whole optimization. Furthermore, DLA is extended to the semi-supervised case, i.e., semi-supervised DLA (SDLA), which utilizes unlabeled samples to improve the classification performance. Thorough empirical studies on the face recognition demonstrate the effectiveness of both DLA and SDLA.
Tao, D., Sun, J., Shen, J., Wu, X., Li, X., Maybank, S.J. & Faloutsos, C. 2008, 'Bayesian tensor analysis', Proceedings of the International Joint Conference on Neural Networks, pp. 1402-1409.
Vector data are normally used for probabilistic graphical models with Bayesian inference. However, tensor data, i.e., multidimensional arrays, are actually natural representations of a large amount of real data, in data mining, computer vision, and many other applications. Aiming at breaking the huge gap between vectors and tensors in conventional statistical tasks, e.g., automatic model selection, this paper proposes a decoupled probabilistic algorithm, named Bayesian tensor analysis (BTA). BTA automatically selects a suitable model for tensor data, as demonstrated by empirical studies. &copy; 2008 IEEE.
Song, D. & Tao, D. 2008, 'C1 units for scene classification', Proceedings - International Conference on Pattern Recognition.
In this paper, we unify C1 units and the locality preserving projections (LPP) into the conventional gist model for scene classification. For the improved gist model, we first utilize the C1 units, intensity channel and color channel of color image to represent the color image with the high dimensional feature, then we project high dimensional samples to a low dimensional subspace via LPP to preserve both the local geometry and the discriminate information, and finally, we apply the nearest neighbour rule with the Euclidean distance for classification. Experimental results based on the USC scene database not only demonstrate that the proposed gist improves the classification accuracy around 7% but also reduce the testing cost around 50 times in comparing with the original gist model proposed by Siagian and Itti in TPAMI 2007. &copy; 2008 IEEE.
Li, X., Lu, W., Tao, D. & Gao, X. 2008, 'Frequency structure analysis for IQA', Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, pp. 2246-2251.
Over the past years, research and applications of image quality assessment have attracted increasing attention. Basically, naked human eyes are the final receivers of an image and human visual system is able to extract structural information from the viewing field with high adaptation. Different frequency components of an image play different roles on image semantic contents, and the extraction of the visual information is also crucial in computerized image quality assessment systems. In this paper, an image quality assessment metrics is proposed based on wavelet structure and human perception. With the proposed metric, structural similarity is well expanded from pixel-wise to frequency field. Experimental results illustrate that the proposed metric gives good consistency with subjective assessment results of naked human eyes, i.e., it fits well the perception of human visual system. &copy; 2008 IEEE.
Bian, W. & Tao, D. 2008, 'Harmonic mean for subspace selection', Proceedings - International Conference on Pattern Recognition.
Under the homoscedastic Gaussian assumption, it has been shown that Fisher's linear discriminant analysis (FLDA) suffers from the class separation problem when the dimensionality of subspace selected by FLDA is strictly less than the class number minus 1, i.e., the projection to a subspace tends to merge close class pairs. A recent result shows that maximizing the geometric mean of Kullback-Leibler (KL) divergences of class pairs can significantly reduce this problem. In this paper, to further reduce the class separation problem, the harmonic mean is applied to replace the geometric mean for subspace selection. The new method is termed maximization of the harmonic mean of all pairs of symmetric KL divergences (MHMD). As MHMD is invariant to rotational transformations, an efficient optimization procedure can be conducted on the Grassmann manifold. Thorough empirical studies demonstrate the effective of harmonic mean in dealing with the class separation problem. &copy; 2008 IEEE.
Zhang, T., Tao, D. & Yang, J. 2008, 'Discriminative locality alignment', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 725-738.
Fisher's linear discriminant analysis (LDA), one of the most popular dimensionality reduction algorithms for classification, has three particular problems: it fails to find the nonlinear structure hidden in the high dimensional data; it assumes all samples contribute equivalently to reduce dimension for classification; and it suffers from the matrix singularity problem. In this paper, we propose a new algorithm, termed Discriminative Locality Alignment (DLA), to deal with these problems. The algorithm operates in the following three stages: first, in part optimization, discriminative information is imposed over patches, each of which is associated with one sample and its neighbors; then, in sample weighting, each part optimization is weighted by the margin degree, a measure of the importance of a given sample; and finally, in whole alignment, the alignment trick is used to align all weighted part optimizations to the whole optimization. Furthermore, DLA is extended to the semi-supervised case, i.e., semi-supervised DLA (SDLA), which utilizes unlabeled samples to improve the classification performance. Thorough empirical studies on the face recognition demonstrate the effectiveness of both DLA and SDLA. &copy; 2008 Springer Berlin Heidelberg.
Zhang, T., Tao, D., Li, X. & Yang, J. 2008, 'A unifying framework for spectral analysis based dimensionality reduction', Proceedings of the International Joint Conference on Neural Networks, pp. 1670-1677.
Past decades, numerous spectral analysis based algorithms have been proposed for dimensionality reduction, which plays an important role in machine learning and artificial intelligence. However, most of these existing algorithms are developed intuitively and pragmatically, i.e., on the base of the experience and knowledge of experts for their own purposes. Therefore, it will be more informative to provide some a systematic framework for understanding the common properties and intrinsic differences in the algorithms. In this paper, we propose such a framework, i.e., "patch alignment", which consists of two stages: part optimization and whole alignment. With the proposed framework, various algorithms including the conventional linear algorithms and the manifold learning algorithms are reformulated into a unified form, which gives us some new understandings on these algorithms. &copy; 2008 IEEE.
Shen, J., Tao, D. & Li, X. 2008, 'Effective video event detection via subspace projection', Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008, pp. 22-27.
This paper describes a new video event detection framework based on subspace selection technique. With the approach, feature vectors presenting different kinds of video information can be easily projected from different modalities onto an unified subspace, on which recognition process can be performed. The approach is capable of discriminating different classes and preserving the intra-modal geometry of samples within an identical class. Distinguished from the existing multimodal detection methods, the new system works well when some modalities are not available. Experimental results based on soccer video and TRECVID news video collections demonstrate the effectiveness, efficiency and robustness of the proposed method for individual recognition tasks in comparison to the existing approaches. &copy; 2008 IEEE.
Zhang, T., Li, X., Tao, D. & Yang, J. 2008, 'Local coordinates alignment and its linearization', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 643-652.
Manifold learning has been demonstrated to be an effective way to discover the intrinsic geometrical structure of a number of samples. In this paper, a new manifold learning algorithm, Local Coordinates Alignment (LCA), is developed based on the alignment technique. LCA first obtains the local coordinates as representations of a local neighborhood by preserving the proximity relations on the patch which is Euclidean; and then the extracted local coordinates are aligned to yield the global embeddings. To solve the out of sample problem, the linearization of LCA (LLCA) is also proposed. Empirical studies on both synthetic data and face images show the effectiveness of LCA and LLCA in comparing with existing manifold learning algorithms and linear subspace methods. &copy; 2008 Springer-Verlag Berlin Heidelberg.
Deng, C., Gao, X., Tao, D. & Li, X. 2007, 'Digital Watermarking In Image Affine Co-variant Regions', Proceedings Of 2007 International Conference On Machine Learning And Cybernetics, Vols 1-7, International Conference on Machine Learning and Cybernetics, IEEE, Hong Kong, China, pp. 2125-2130.
In this paper, we present a robust approach of digital watermarking embedding and retrieval for digital images. This new approach works in special domain and it has two major steps: (1) to extract affine co-variant regions, and (2) to embed watermarks wi
Tao, D., Li, X., Wu, X. & Maybank, S. 2007, 'General Averaged Divergence Analysis', Proceedings of the Seventh IEEE International Conference on Data Mining, IEEE International Conference on Data Mining, IEEE Computer Society, Omaha, Nebraska, pp. 302-311.
Subspace selection is a powerful tool in data mining. An important subspace method is the Fisher&acirc;Rao linear discriminant analysis (LDA), which has been successfully applied in many fields such as biometrics, bioinformatics, and multimedia retrieval. However, LDA has a critical drawback: the projection to a subspace tends to merge those classes that are close together in the original feature space. If the separated classes are sampled from Gaussian distributions, all with identical covariance matrices, then LDA maximizes the mean value of the Kullback&acirc;Leibler (KL) divergences between the different classes. We generalize this point of view to obtain a framework for choosing a subspace by 1) generalizing the KL divergence to the Bregman divergence and 2) generalizing the arithmetic mean to a general mean. The framework is named the general averaged divergence analysis (GADA). Under this GADA framework, a geometric mean divergence analysis (GMDA) method based on the geometric mean is studied. A large number of experiments based on synthetic data show that our method significantly outperforms LDA and several representative LDA extensions.
Li, X., Maybank, S. & Tao, D. 2007, 'Gender recognition based on local body motions', Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, pp. 3881-3886.
Human body motions, including gait information, are a promising biometrics resource. In this paper, the human silhouette is segmented into seven components for visual surveillance applications, namely, head, arm, body, thigh, front-leg, back-leg, and feet. The legs are classified as front-leg or back-leg because of the bipedal walking style: during walking, the left-leg and the right-leg are in front or at the back in turn. The motions of the individual components and of a number of combinations of components are then studied for gender recognition. For HumanID recognition under different cases, the performances of and underlying links amongst the seven human gait components are analyzed. &copy; 2007 IEEE.
Tao, D., Li, X., Wu, X. & Maybank, S. 2006, 'Human Carrying Status in Visual Surveillance', Proceedings 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, New York, NY, USA, pp. 1670-1677.
A person&acirc;s gait changes when he or she is carrying an object such as a bag, suitcase or rucksack. As a result, human identification and tracking are made more difficult because the averaged gait image is too simple to represent the carrying status. Therefore, in this paper we first introduce a set of Gabor based human gait appearance models, because Gabor functions are similar to the receptive field profiles in the mammalian cortical simple cells. The very high dimensionality of the feature space makes training difficult. In order to solve this problem we propose a general tensor discriminant analysis (GTDA), which seamlessly incorporates the object (Gabor based human gait appearance model) structure information as a natural constraint. GTDA differs from the previous tensor based discriminant analysis methods in that the training converges. Existing methods fail to converge in the training stage. This makes them unsuitable for practical tasks. Experiments are carried out on the USF baseline data set to recognize a human&acirc;s ID from the gait silhouette. The proposed Gabor gait incorporated with GTDA is demonstrated to significantly outperform the existing appearance-based methods.
Sun, J., Tao, D. & Faloutsos, C. 2006, 'Beyond Streams and Graphs Dynamic Tensor Analysis', Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, International Conference on Knowledge Discovery and Data Mining, ACM Press, Philadelphia PA USA, pp. 374-383.
How do we find patterns in author-keyword associations, evolving over time? Or in DataCubes, with product-branchcustomer sales information? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks and many more. However, they have only two orders, like author and keyword, in the above example. We propose to envision such higher order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semi-infinite streams. Thus, we introduce the dynamic ten- sor analysis (DTA) method, and its variants. DTA provides a compact summary for high-order and high-dimensional data, and it also reveals the hidden correlations. Algorithmically, we designed DTA very carefully so that it is (a) scalable, (b) space efficient (it does not need to store the past) and (c) fully automatic with no need for user defined parameters. Moreover, we propose STA, a streaming tensor analysis method, which provides a fast, streaming approximation to DTA. We implemented all our methods, and applied them in two real settings, namely, anomaly detection and multi-way latent semantic indexing. We used two real, large datasets, one on network flow data (100GB over 1 month) and one from DBLP (200MB over 25 years). Our experiments show that our methods are fast, accurate and that they find interesting patterns and outliers on the real datasets.
Tao, D., Li, X., Wu, X. & Maybank, S. 2006, 'Elapsed time in human gait recognition: A new approach', ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. II177-II180.
Human gait is an effective biometric source for human identification and visual surveillance; therefore human gait recognition becomes to be a hot topic in recent research. However, the elapsed time problem, which is in its infancy, still receives poor performance. In this paper, we introduce a novel discriminant analysis method to improve the performance. The new model inherits the merits from the tensor rank one analysis, which handles the small samples size problem naturally, and the linear discriminant analysis, which is optimal for classification. Although 2DLDA and DATR also benefit from these two methods, they cannot converge during the training procedure. This means they can be hardly utilized for practical applications. Based on a lot of experiments on elapsed time problem in human gait recognition, the new method is demonstrated to significantly outperform the existing appearance-based methods, such as the principle component analysis, the linear discriminant analysis, and the tensor rank one analysis. &copy; 2006 IEEE.
Tao, D., Maybank, S., Hu, W. & Li, X. 2005, 'Stable Third-order Tensor Representation For Colour Image Classification', 2005 IEEE/WIC/ACM International Conference On Web Intelligence, Proceedings, IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Soc, Compiegne, France, pp. 641-644.
General tensors can represent colour images more naturally than conventional features: however the general tensors' stability properties are not reported and remain to be a key problem. In this paper, we use the tensor minimax probability (TMPM) to prove
Tao, D.C., Li, X.L., Hu, W.M., Maybank, S. & Wu, X.D. 2005, 'Supervised tensor learning', Fifth IEEE International Conference on Data Mining, Proceedings, pp. 450-457.
Li, J., Tao, D., Hu, W. & Li, X. 2005, 'Kernel principle component analysis in pixels clustering', Proceedings - 2005 IEEE/WIC/ACM InternationalConference on Web Intelligence, WI 2005, pp. 786-789.
We propose two new methods in the nonlinear kernel feature space for pixel clustering based on the traditional KMeans and Gaussian Mixture Model (GMM). Unlike the previous work on the kernel machines, we give out a new perspective on the new developed kernel machines. That is, kernel principle component analysis (KPCA) combined with the KMeans and the GMM are kernel KMeans (KKMeans) and kernel GMM (KGMM), respectively. In this paper, we prove the new perspective on KKMeans and give out a clear statement on the KGMM as well. Based on this new perspectives, we can implement the KKMeans and the KGMM conveniently. At the end of the paper, we utilize these new algorithms on the problem of the colour image segmentation. Based on a series of experimental results on Corel Colour Images, we find that the KKMeans and KGMM can outperform the traditional KMeans and GMM consistently, respectively. &copy; 2005 IEEE.
Tao, D., Liu, J. & Tang, X. 2004, 'Learning User's Perception Using Region-based Svm For Content-based Image Retrieval', Cisst '04: Proceedings Of The International Conference On Imaging Science, Systems, And Technology, International Conference on Imaging Science, Systems and Technology, C S R E A Press, Las Vegas, NV, pp. 462-468.
Relevance feedback is often a critical component for content-based image retrieval to capture the user's perception. Previous methods for image retrieval with relevance feedback are image-based. In this paper, we propose a novel region-based retrieval me
Tao, D. & Tang, X. 2004, 'Nonparametric Discriminant Analysis In Relevance Feedback For Content-based Image Retrieval', Proceedings Of The 17th International Conference On Pattern Recognition, Vol 2, International Conference on Pattern Recognition, IEEE Computer Soc, Cambridge, ENGLAND, pp. 1013-1016.
Relevance feedback (RF) has been wildely used to improve the performance of content-based image retrieval (CBIR). How to select a subset of features from a large-scale feature pool and to construct a suitable dissimilarity measure are key steps in RE Bia
Tao, D. & Tang, X. 2004, 'Random sampling based SVM for relevance feedback image retrieval', Proceedings Of The 2004 IEEE Computer Society Conference On Computer Vision And Pattern Recognition, Vol 2, Conference on Computer Vision and Pattern Recognition, IEEE Computer Soc, Washington, DC, pp. 647-652.
Relevance feedback (RF) schemes based on support vector machine (SVM) have been widely used in content-based image retrieval. However, the performance of SVM based RF is often poor when the number of labeled positive feedback samples is small. This is ma
Tao, D. & Tang, X. 2004, 'Multi-class Discreviminant Learning For Image Retrieval', Cisst '04: Proceedings Of The International Conference On Imaging Science, Systems, And Technology, International Conference on Imaging Science, Systems and Technology, C S R E A Press, Las Vegas, NV, pp. 452-455.
For image retrieval, relevance feedback (RF) can effectively reduce the gap between the low-level visual feature and the high-level human perception. In this paper, we propose a multi-class label scheme and use the discriminant analysis technique on the
Tang, X., Tao, D. & Antonio, G. 2004, 'Texture Classification Of Sars Infected Region In Radiographic Image', Icip: 2004 International Conference On Image Processing, Vols 1- 5, IEEE International Conference on Image Processing, IEEE, Singapore, SINGAPORE, pp. 2941-2944.
In this paper, we conduct the first study on SARS radiographic image processing. In order to distinguish SARS infected regions from normal lung regions using texture features, we propose several improvements to the traditional gray-level co-occurrence te
Tao, D. & Tang, X. 2004, 'Orthogonal Complement Component Analysis For Positive Samples In Svm Based Relevance Feedback Image Retrieval', Proceedings Of The 2004 IEEE Computer Society Conference On Computer Vision And Pattern Recognition, Vol 2, Conference on Computer Vision and Pattern Recognition, IEEE Computer Soc, Washington, DC, pp. 586-591.
Relevance feedback (RF) is an important tool to improve the performance of content-based image retrieval system. Support vector machine (SVM based RF is popular because it can generalize better than most other classifiers. However, directly using SVM in
Tao, D. & Tang, X. 2004, 'A Direct Method To Solve The Biased Discriminant Analysis In Kernel Feature Space For Content Based Image Retrieval', 2004 IEEE International Conference On Acoustics, Speech, And Signal Processing, Vol Iii, Proceedings: Image And Multidimensional Signal Processing Special Sessions, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, Montreal, CANADA, pp. 441-444.
In recent years, relevance feedback has been widely used to improve the performance of content-based image retrieval. How to select a subset of features from a large-scale feature pool and to construct a suitable dissimilarity measure are key steps in a
Tao, D. & Tang, X. 2004, 'Kernel Full-space Biased Discriminant Analysis', 2004 IEEE International Conference On Multimedia And Exp (icme), Vols 1-3, IEEE International Conference on Multimedia and Expo, IEEE, Taipei, TAIWAN, pp. 1287-1290.
Recently, relevance feedback has been widely used to improve the performance of content-based image retrieval. How to select a subset of features from a large-scale feature pool and to construct a suitable dissimilarity measure are key steps in a relevan
Tao, D. & Tang, X. 2004, 'Svm-based Relevance Feedback Using Random Subspace Method', 2004 IEEE International Conference On Multimedia And Exp (icme), Vols 1-3, IEEE International Conference on Multimedia and Expo, IEEE, Taipei, TAIWAN, pp. 269-272.
Relevance feedback (RF) schemes based on Support Vector Machine (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM based RF is often poor when the number of labeled feedback samples is small. In order to
Tao, D., Liu, H. & Tang, X. 2004, 'K-BOX: A query-by-singing based music retrieval system', ACM Multimedia 2004 - proceedings of the 12th ACM International Conference on Multimedia, pp. 464-467.
In this paper, we present an efficient query-by-singing based musical retrieval system. We first combine multiple Support Vector Machines by classifier committee learning to segment the sentences from a song automatically. Many new methods in manipulating Mel-Frequency Cepstral Coefficient (MFCC) matrix are studied and compared for optimal feature selection. Experiments show that the 3rd coefficient is the most relevant to music comparison out of 13 coefficients and the proposed simplified MFCC feature is able to achieve a reasonable trade-off between accuracy and efficiency. To improve system efficiency, we re-organize the database by a new two-stage clustering scheme in both time space and feature space. We combine K-means algorithm and dynamic time wrapping similarity measurement for feature space clustering. We also propose a new method for model-selection of K-means algorithm. Experiments show that the proposed approach can achieve more than 30 percent increase in accuracy while speed up more than 16 times in average query time.
Tam, K., Yu, L.C., Tao, D., Liu, H., Luo, B. & Tang, X. 2004, 'Content-based SMIL retrieval', Proceedings - Third International Conference on Image and Graphics, pp. 146-149.
The Synchronised Multimedia Integration Language (SML) fulfills the needs of integration, synchronization, ana efficient online delivery of different media types such as text, music, speech, image, and video. In this paper, we represent these multimedia elements in a synchronized manner under a unified feature space. An efficient SML retrieval scheme based on textual feature and content feature is proposed. Pilot experiments on our SML database show that the proposed method can work well on SMIL retrieval. &copy; 2004 IEEE.
Yuan, Y., Yu, N., Li, X., Tao, D. & Liu, Z. 2002, 'Supervised Clustering Algorithm Based Visual Information Features Classification', Second Internation Conference On Image And Graphics, Pts 1 And 2, International Conference on Image and Graphics, Spie-int Soc Optical Engineering, HEFEI, PEOPLES R CHINA, pp. 614-618.
An intelligent image-indexing algorithm is proposed in this paper. It based on knowledge extracted from some simple single low-level image features. Two independent large image databases are built with more than 12000 images for training and test, and th
Tao, D., Li, X., Yuan, Y., Yu, N., Liu, Z. & Tang, X. 2002, 'A Set Of Novel Textural Features Based On 3d Co-occurrence Matrix For Content-based Image Retrieval', Proceedings Of The Fifth International Conference On Information Fusion, Vol Ii, International Conference on Information Fusion, Int Soc Information Fusion, ANNAPOLIS, MD, pp. 1403-1407.
This paper presents a set of novel texture features for Content-based image retrieval (CBIR). CBIR require new algorithms for the automated extraction and indexing of salient image features, while texture features provide one important cue for the visual

Journal articles

Wang, R. & Tao, D. 2016, 'Recent Progress in Image Deblurring'.
This paper comprehensively reviews the recent development of image deblurring, including non-blind/blind, spatially invariant/variant deblurring techniques. Indeed, these techniques share the same objective of inferring a latent sharp image from one or several corresponding blurry images, while the blind deblurring techniques are also required to derive an accurate blur kernel. Considering the critical role of image restoration in modern imaging systems to provide high-quality images under complex environments such as motion, undesirable lighting conditions, and imperfect system components, image deblurring has attracted growing attention in recent years. From the viewpoint of how to handle the ill-posedness which is a crucial issue in deblurring tasks, existing methods can be grouped into five categories: Bayesian inference framework, variational methods, sparse representation-based methods, homography-based modeling, and region-based methods. In spite of achieving a certain level of development, image deblurring, especially the blind case, is limited in its success by complex application conditions which make the blur kernel hard to obtain and be spatially variant. We provide a holistic understanding and deep insight into image deblurring in this review. An analysis of the empirical evidence for representative methods, practical issues, as well as a discussion of promising future directions are also presented.
Xu, C., Liu, T., Tao, D. & Xu, C. 2016, 'Local Rademacher Complexity for Multi-label Learning', IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1495-1507.
We analyze the local Rademacher complexity of empirical risk minimization (ERM)-based multi-label learning algorithms, and in doing so propose a new algorithm for multi-label learning. Rather than using the trace norm to regularize the multi-label predictor, we instead minimize the tail sum of the singular values of the predictor in multi-label learning. Benefiting from the use of the local Rademacher complexity, our algorithm, therefore, has a sharper generalization error bound and a faster convergence rate. Compared to methods that minimize over all singular values, concentrating on the tail singular values results in better recovery of the low-rank structure of the multi-label predictor, which plays an import role in exploiting label correlations. We propose a new conditional singular value thresholding algorithm to solve the resulting objective function. Empirical studies on real-world datasets validate our theoretical results and demonstrate the effectiveness of the proposed algorithm.
Liu, T. & Tao, D. 2016, 'Classification with Noisy Labels by Importance Reweighting', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 3, pp. 447-461.
&copy; 1979-2012 IEEE. In this paper, we study a classification problem in which sample labels are randomly corrupted. In this scenario, there is an unobservable sample with noise-free labels. However, before being observed, the true labels are independently flipped with a probability in [0,0.5) , and the random label noise can be class-conditional. Here, we address two fundamental problems raised by this scenario. The first is how to best use the abundant surrogate loss functions designed for the traditional classification problem when there is label noise. We prove that any surrogate loss function can be used for classification with noisy labels by using importance reweighting, with consistency assurance that the label noise does not ultimately hinder the search for the optimal classifier of the noise-free sample. The other is the open problem of how to obtain the noise rate . We show that the rate is upper bounded by the conditional probability P(Y|X) of the noisy sample. Consequently, the rate can be estimated, because the upper bound can be easily reached in classification problems. Experimental results on synthetic and real datasets confirm the efficiency of our methods.
Chua, T.S., He, X., Liu, W., Piccardi, M., Wen, Y. & Tao, D. 2016, 'Big data meets multimedia analytics', Signal Processing, vol. 124, pp. 1-4.
Deng, J., Liu, Q., Yang, J. & Tao, D. 2016, 'M3 CSR: Multi-view, multi-scale and multi-component cascade shape regression', Image and Vision Computing.
&copy; 2015 Elsevier B.V. Automatic face alignment is a fundamental step in facial image analysis. However, this problem continues to be challenging due to the large variability of expression, illumination, occlusion, pose, and detection drift in the real-world face images. In this paper, we present a multi-view, multi-scale and multi-component cascade shape regression (M 3CSR) model for robust face alignment. Firstly, face view is estimated according to the deformable facial parts for learning view specified CSR, which can decrease the shape variance, alleviate the drift of face detection and accelerate shape convergence. Secondly, multi-scale HoG features are used as the shape-index features to incorporate local structure information implicitly, and a multi-scale optimization strategy is adopted to avoid trapping in local optimum. Finally, a component-based shape refinement process is developed to further improve the performance of face alignment. Extensive experiments on the IBUG dataset and the 300-W challenge dataset demonstrate the superiority of the proposed method over the state-of-the-art methods.
Zheng, H., Geng, X., Tao, D. & Jin, Z. 2016, 'A multi-task model for simultaneous face identification and facial expression recognition', Neurocomputing, vol. 171, pp. 515-523.
&copy; 2015 Elsevier B.V. Regarded as two independent tasks, both face identification and facial expression recognition perform poorly given small size training sets. To address this problem, we propose a multi-task facial inference model (MT-FIM) for simultaneous face identification and facial expression recognition. In particular, face identification and facial expression recognition are learnt simultaneously by extracting and utilizing appropriate shared information across them in the framework of multi-task learning, in which the shared information refers to the parameter controlling the sparsity. MT-FIM simultaneously minimizes the within-class scatter and maximizes the distance between different classes to enable the robust performance of each individual task. We conduct comprehensive experiments on three face image databases. The experimental results show that our algorithm outperforms the state-of-the-art algorithms.
Peng, C., Gao, X., Wang, N., Tao, D., Li, X. & Li, J. 2016, 'Multiple Representations-Based Face Sketch-Photo Synthesis', IEEE Transactions on Neural Networks and Learning Systems.
Face sketch-photo synthesis plays an important role in law enforcement and digital entertainment. Most of the existing methods only use pixel intensities as the feature. Since face images can be described using features from multiple aspects, this paper presents a novel multiple representations-based face sketch-photo-synthesis method that adaptively combines multiple representations to represent an image patch. In particular, it combines multiple features from face images processed using multiple filters and deploys Markov networks to exploit the interacting relationships between the neighboring image patches. The proposed framework could be solved using an alternating optimization strategy and it normally converges in only five outer iterations in the experiments. Our experimental results on the Chinese University of Hong Kong (CUHK) face sketch database, celebrity photos, CUHK Face Sketch FERET Database, IIIT-D Viewed Sketch Database, and forensic sketches demonstrate the effectiveness of our method for face sketch-photo synthesis. In addition, cross-database and database-dependent style-synthesis evaluations demonstrate the generalizability of this novel method and suggest promising solutions for face identification in forensic science.
Deng, C., Xu, J., Zhang, K., Tao, D., Gao, X. & Li, X. 2016, 'Similarity Constraints-Based Structured Output Regression Machine: An Approach to Image Super-Resolution', IEEE Transactions on Neural Networks and Learning Systems.
For regression-based single-image super-resolution (SR) problem, the key is to establish a mapping relation between high-resolution (HR) and low-resolution (LR) image patches for obtaining a visually pleasing quality image. Most existing approaches typically solve it by dividing the model into several single-output regression problems, which obviously ignores the circumstance that a pixel within an HR patch affects other spatially adjacent pixels during the training process, and thus tends to generate serious ringing artifacts in resultant HR image as well as increase computational burden. To alleviate these problems, we propose to use structured output regression machine (SORM) to simultaneously model the inherent spatial relations between the HR and LR patches, which is propitious to preserve sharp edges. In addition, to further improve the quality of reconstructed HR images, a nonlocal (NL) self-similarity prior in natural images is introduced to formulate as a regularization term to further enhance the SORM-based SR results. To offer a computation-effective SORM method, we use a relative small nonsupport vector samples to establish the accurate regression model and an accelerating algorithm for NL self-similarity calculation. Extensive SR experiments on various images indicate that the proposed method can achieve more promising performance than the other state-of-the-art SR methods in terms of both visual quality and computational cost.
Gui, J., Liu, T., Tao, D., Sun, Z. & Tan, T. 2016, 'Representative Vector Machines: A Unified Framework for Classical Classifiers', IEEE Transactions on Cybernetics.
Classifier design is a fundamental problem in pattern recognition. A variety of pattern classification methods such as the nearest neighbor (NN) classifier, support vector machine (SVM), and sparse representation-based classification (SRC) have been proposed in the literature. These typical and widely used classifiers were originally developed from different theory or application motivations and they are conventionally treated as independent and specific solutions for pattern classification. This paper proposes a novel pattern classification framework, namely, representative vector machines (or RVMs for short). The basic idea of RVMs is to assign the class label of a test example according to its nearest representative vector. The contributions of RVMs are twofold. On one hand, the proposed RVMs establish a unified framework of classical classifiers because NN, SVM, and SRC can be interpreted as the special cases of RVMs with different definitions of representative vectors. Thus, the underlying relationship among a number of classical classifiers is revealed for better understanding of pattern classification. On the other hand, novel and advanced classifiers are inspired in the framework of RVMs. For example, a robust pattern classification method called discriminant vector machine (DVM) is motivated from RVMs. Given a test example, DVM first finds its k-NNs and then performs classification based on the robust M-estimator and manifold regularization. Extensive experimental evaluations on a variety of visual recognition tasks such as face recognition (Yale and face recognition grand challenge databases), object categorization (Caltech-101 dataset), and action recognition (Action Similarity LAbeliNg) demonstrate the advantages of DVM over other classifiers.
Cao, X., Wu, B., Tao, D. & Jiao, L. 2016, 'Automatic Band Selection Using Spatial-Structure Information and Classifier-Based Clustering', IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
Band selection plays an important role in hyperspectral image processing, which can reduce subsequent computation and storage requirement. There are two problems that are rarely investigated for band selection. First, some low-discriminating bands need to be manually removed by experts, which is time consuming and expensive; second, how to automatically determine the number of selected bands is not well investigated, though this is an indispensable step in practical applications. In this paper, we propose an automatic band selection (ABS)&nbsp;method to solve these problems. First, we exploit spatial structure to determine the discriminating power of each band, these bands with little structure information will be discarded; then, a powerful classifier is used for clustering, which can automatically find the underlying number of clusters. Experiments based on three real hyperspectral datasets demonstrate the effectiveness of our method.
Zeng, K., Yu, J., Wang, R., Li, C. & Tao, D. 2016, 'Coupled Deep Autoencoder for Single Image Super-Resolution', IEEE Transactions on Cybernetics.
Sparse coding has been widely applied to learning-based single image super-resolution (SR) and has obtained promising performance by jointly learning effective representations for low-resolution (LR) and high-resolution (HR) image patch pairs. However, the resulting HR images often suffer from ringing, jaggy, and blurring artifacts due to the strong yet ad hoc assumptions that the LR image patch representation is equal to, is linear with, lies on a manifold similar to, or has the same support set as the corresponding HR image patch representation. Motivated by the success of deep learning, we develop a data-driven model coupled deep autoencoder (CDA) for single image SR. CDA is based on a new deep architecture and has high representational capability. CDA simultaneously learns the intrinsic representations of LR and HR image patches and a big-data-driven function that precisely maps these LR representations to their corresponding HR representations. Extensive experimentation demonstrates the superior effectiveness and efficiency of CDA for single image SR compared to other state-of-the-art methods on Set5 and Set14 datasets.
Nie, L., Hong, R., Zhang, L., Xia, Y., Tao, D. & Sebe, N. 2016, 'Perceptual Attributes Optimization for Multivideo Summarization', IEEE Transactions on Cybernetics.
Nowadays, many consumer videos are captured by portable devices such as iPhone. Different from constrained videos that are produced by professionals, e.g., those for broadcast, summarizing multiple handheld videos from a same scenery is a challenging task. This is because: 1) these videos have dramatic semantic and style variances, making it difficult to extract the representative key frames; 2) the handheld videos are with different degrees of shakiness, but existing summarization techniques cannot alleviate this problem adaptively; and 3) it is difficult to develop a quality model that evaluates a video summary, due to the subjectiveness of video quality assessment. To solve these problems, we propose perceptual multiattribute optimization which jointly refines multiple perceptual attributes (i.e., video aesthetics, coherence, and stability) in a multivideo summarization process. In particular, a weakly supervised learning framework is designed to discover the semantically important regions in each frame. Then, a few key frames are selected based on their contributions to cover the multivideo semantics. Thereafter, a probabilistic model is proposed to dynamically fit the key frames into an aesthetically pleasing video summary, wherein its frames are stabilized adaptively. Experiments on consumer videos taken from sceneries throughout the world demonstrate the descriptiveness, aesthetics, coherence, and stability of the generated summary.
Du, B., Wang, Z., Zhang, L., Zhang, L., Liu, W., Shen, J. & Tao, D. 2016, 'Exploring Representativeness and Informativeness for Active Learning', IEEE Transactions on Cybernetics.
How can we find a general way to choose the most suitable samples for training a classifier? Even with very limited prior information? Active learning, which can be regarded as an iterative optimization procedure, plays a key role to construct a refined training set to improve the classification performance in a variety of applications, such as text analysis, image recognition, social network modeling, etc. Although combining representativeness and informativeness of samples has been proven promising for active sampling, state-of-the-art methods perform well under certain data structures. Then can we find a way to fuse the two active sampling criteria without any assumption on data? This paper proposes a general active learning framework that effectively fuses the two criteria. Inspired by a two-sample discrepancy problem, triple measures are elaborately designed to guarantee that the query samples not only possess the representativeness of the unlabeled data but also reveal the diversity of the labeled data. Any appropriate similarity measure can be employed to construct the triple measures. Meanwhile, an uncertain measure is leveraged to generate the informativeness criterion, which can be carried out in different ways. Rooted in this framework, a practical active learning algorithm is proposed, which exploits a radial basis function together with the estimated probabilities to construct the triple measures and a modified best-versus-second-best strategy to construct the uncertain measure, respectively. Experimental results on benchmark datasets demonstrate that our algorithm consistently achieves superior performance over the state-of-the-art active learning algorithms.
Ding, C., Choi, J., Tao, D. & Davis, L.S. 2016, 'Multi-Directional Multi-Level Dual-Cross Patterns for Robust Face Recognition', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 3, pp. 518-531.
&copy; 1979-2012 IEEE. To perform unconstrained face recognition robust to variations in illumination, pose and expression, this paper presents a new scheme to extract 'Multi-Directional Multi-Level Dual-Cross Patterns' (MDML-DCPs) from face images. Specifically, the MDML-DCPs scheme exploits the first derivative of Gaussian operator to reduce the impact of differences in illumination and then computes the DCP feature at both the holistic and component levels. DCP is a novel face image descriptor inspired by the unique textural structure of human faces. It is computationally efficient and only doubles the cost of computing local binary patterns, yet is extremely robust to pose and expression variations. MDML-DCPs comprehensively yet efficiently encodes the invariant characteristics of a face image from multiple levels into patterns that are highly discriminative of inter-personal differences but robust to intra-personal variations. Experimental results on the FERET, CAS-PERL-R1, FRGC 2.0, and LFW databases indicate that DCP outperforms the state-of-the-art local descriptors (e.g., LBP, LTP, LPQ, POEM, tLBP, and LGXP) for both face identification and face verification tasks. More impressively, the best performance is achieved on the challenging LFW and FRGC 2.0 databases by deploying MDML-DCPs in a simple recognition scheme.
Ding, C. & Tao, D. 2016, 'A comprehensive survey on Pose-Invariant Face Recognition', ACM Transactions on Intelligent Systems and Technology, vol. 7, no. 3.
&copy; 2016 ACM. The capacity to recognize faces under varied poses is a fundamental human ability that presents a unique challenge for computer vision systems. Compared to frontal face recognition, which has been intensively studied and has gradually matured in the past few decades, Pose-Invariant Face Recognition (PIFR) remains a largely unsolved problem. However, PIFR is crucial to realizing the full potential of face recognition for real-world applications, since face recognition is intrinsically a passive biometric technology for recognizing uncooperative subjects. In this article, we discuss the inherent difficulties in PIFR and present a comprehensive review of established techniques. Existing PIFR methods can be grouped into four categories, that is, pose-robust feature extraction approaches, multiview subspace learning approaches, face synthesis approaches, and hybrid approaches. The motivations, strategies, pros/cons, and performance of representative approaches are described and compared. Moreover, promising directions for future research are discussed.
Li, Z., Gong, D., Li, Q., Tao, D. & Li, X. 2016, 'Mutual component analysis for heterogeneous face recognition', ACM Transactions on Intelligent Systems and Technology, vol. 7, no. 3.
Heterogeneous face recognition, also known as cross-modality face recognition or intermodality face recognition, refers to matching two face images from alternative image modalities. Since face images from different image modalities of the same person are associated with the same face object, there should be mutual components that reflect those intrinsic face characteristics that are invariant to the image modalities. Motivated by this rationality, we propose a novel approach called Mutual Component Analysis (MCA) to infer the mutual components for robust heterogeneous face recognition. In the MCA approach, a generative model is first proposed to model the process of generating face images in different modalities, and then an Expectation Maximization (EM) algorithm is designed to iteratively learn the model parameters. The learned generative model is able to infer the mutual components (which we call the hidden factor, where hidden means the factor is unreachable and invisible, and can only be inferred from observations) that are associated with the person's identity, thus enabling fast and effective matching for cross-modality face recognition. To enhance recognition performance, we propose an MCA-based multiclassifier framework using multiple local features. Experimental results show that our new approach significantly outperforms the state-of-the-art results on two typical application scenarios: sketch-to-photo and infrared-to-visible face recognition.
Li, X., Liu, T., Deng, J. & Tao, D. 2016, 'Video face editing using temporal-spatial-smooth warping', ACM Transactions on Intelligent Systems and Technology, vol. 7, no. 3.
&copy; 2016 ACM 2157-6904/2016/02-ART32 15.00. Editing faces in videos is a popular yet challenging task in computer vision and graphics that encompasses various applications, including facial attractiveness enhancement, makeup transfer, face replacement, and expression manipulation. Directly applying the existing warping methods to video face editing has the major problem of temporal incoherence in the synthesized videos, which cannot be addressed by simply employing face tracking techniques or manual interventions, as it is difficult to eliminate the subtly temporal incoherence of the facial feature point localizations in a video sequence. In this article, we propose a temporal-spatial-smooth warping (TSSW) method to achieve a high temporal coherence for video face editing. TSSW is based on two observations: (1) the control lattices are critical for generating warping surfaces and achieving the temporal coherence between consecutive video frames, and (2) the temporal coherence and spatial smoothness of the control lattices can be simultaneously and effectively preserved. Based upon these observations, we impose the temporal coherence constraint on the control lattices on two consecutive frames, as well as the spatial smoothness constraint on the control lattice on the current frame. TSSW calculates the control lattice (in either the horizontal or vertical direction) by updating the control lattice (in the corresponding direction) on its preceding frame, i.e., minimizing a novel energy function that unifies a data-driven term, a smoothness term, and feature point constraints. The contributions of this article are twofold: (1) we develop TSSW, which is robust to the subtly temporal incoherence of the facial feature point localizations and is effective to preserve the temporal coherence and spatial smoothness of the control lattices for editing faces in videos, and (2) we present a new unified video face editing framework that is capable for improving the performances... Ren, W., Huang, K., Tao, D. & Tan, T. 2016, 'Weakly Supervised Large Scale Object Localization with Multiple Instance Learning and Bag Splitting.', IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 405-416. Localizing objects of interest in images when provided with only image-level labels is a challenging visual recognition task. Previous efforts have required carefully designed features and have difficulty in handling images with cluttered backgrounds. Up-scaling to large datasets also poses a challenge to applying these methods to real applications. In this paper, we propose an efficient and effective learning framework called MILinear, which is able to learn an object localization model from large-scale data without using bounding box annotations. We integrate rich general prior knowledge into a learning model using a large pre-trained convolutional network. Moreover, to reduce ambiguity in positive images, we present a bag-splitting algorithm that iteratively generates new negative bags from positive ones. We evaluate the proposed approach on the challenging Pascal VOC 2007 dataset, and our method outperforms other state-of-the-art methods by a large margin; some results are even comparable to fully supervised models trained with bounding box annotations. To further demonstrate scalability, we also present detection results on the ILSVRC 2013 detection dataset, and our method outperforms supervised deformable part-based model without using box annotations. Yang, X., Gao, X., Tao, D., Li, X., Han, B. & Li, J. 2016, 'Shape-Constrained Sparse and Low-Rank Decomposition for Auroral Substorm Detection', IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 1, pp. 32-46. View/Download from: Publisher's site &copy; 2015 IEEE. An auroral substorm is an important geophysical phenomenon that reflects the interaction between the solar wind and the Earth's magnetosphere. Detecting substorms is of practical significance in order to prevent disruption to communication and global positioning systems. However, existing detection methods can be inaccurate or require time-consuming manual analysis and are therefore impractical for large-scale data sets. In this paper, we propose an automatic auroral substorm detection method based on a shape-constrained sparse and low-rank decomposition (SCSLD) framework. Our method automatically detects real substorm onsets in large-scale aurora sequences, which overcomes the limitations of manual detection. To reduce noise interference inherent in current SLD methods, we introduce a shape constraint to force the noise to be assigned to the low-rank part (stationary background), thus ensuring the accuracy of the sparse part (moving object) and improving the performance. Experiments conducted on aurora sequences in solar cycle 23 (1996-2008) show that the proposed SCSLD method achieves good performance for motion analysis of aurora sequences. Moreover, the obtained results are highly consistent with manual analysis, suggesting that the proposed automatic method is useful and effective in practice. Liu, J., Su, H., Hu, W., Zhang, L. & Tao, D. 2016, 'A minimal Munsell value error based laser printer model', Neurocomputing, vol. 204, pp. 231-239. View/Download from: Publisher's site The image printed by laser printer may be nonlinearly distorted by dot gain and dot loss. In this case, printer model is usually built to suppress this nonlinear distortion and to make sure the printout result is the same as input image. The parameters of the printer model which will directly affect the printout result are key. In this paper, the chroma or density values of printout result is changed into Munsell value, and optimal parameters of printer model are calculated via the calculation of minimal error between Munsell value and input gray value. And then the minimal-Munsell-value-error-based laser printer model (MMVEBLPM) is established and applied in the green noise halftone method. Experimental results showed that the optimal parameters can be calculated fast and the nonlinear distortion of laser printer is suppressed significantly with the proposed model. Hong, R., Zhang, L. & Tao, D. 2016, 'Unified Photo Enhancement by Discovering Aesthetic Communities from Flickr', IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1124-1135. View/Download from: Publisher's site &copy; 1992-2012 IEEE. Photo enhancement refers to the process of increasing the aesthetic appeal of a photo, such as changing the photo aspect ratio and spatial recomposition. It is a widely used technique in the printing industry, graphic design, and cinematography. In this paper, we propose a unified and socially aware photo enhancement framework which can leverage the experience of photographers with various aesthetic topics (e.g., portrait and landscape). We focus on photos from the image hosting site Flickr, which has 87 million users and to which more than 3.5 million photos are uploaded daily. First, a tagwise regularized topic model is proposed to describe the aesthetic topic of each Flickr user, and coherent and interpretable topics are discovered by leveraging both the visual features and tags of photos. Next, a graph is constructed to describe the similarities in aesthetic topics between the users. Noticeably, densely connected users have similar aesthetic topics, which are categorized into different communities by a dense subgraph mining algorithm. Finally, a probabilistic model is exploited to enhance the aesthetic attractiveness of a test photo by leveraging the photographic experiences of Flickr users from the corresponding communities of that photo. Paired-comparison-based user studies show that our method performs competitively on photo retargeting and recomposition. Moreover, our approach accurately detects aesthetic communities in a photo set crawled from nearly 100000 Flickr users. Xu, Z., Hong, Z., Zhang, Y., Wu, J., Tsoi, A.C. & Tao, D. 2016, 'Multinomial Latent Logistic Regression for Image Understanding', IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 973-987. View/Download from: Publisher's site &copy; 1992-2012 IEEE. In this paper, we present multinomial latent logistic regression (MLLR), a new learning paradigm that introduces latent variables to logistic regression. By inheriting the advantages of logistic regression, MLLR is efficiently optimized using the second-order derivatives and provides effective probabilistic analysis on output predictions. MLLR is particularly effective in weakly supervised settings where the latent variable has an exponential number of possible values. The effectiveness of MLLR is demonstrated on four different image understanding applications, including a new challenging architectural style classification task. Furthermore, we show that MLLR can be generalized to general structured output prediction, and in doing so, we provide a thorough investigation of the connections and differences between MLLR and existing related algorithms, including latent structural SVMs and hidden conditional random fields. Philip Chen, C.L., Tao, D. & You, X. 2016, 'Big learning in social media analytics', Neurocomputing. View/Download from: Publisher's site Li, Z., Gong, D., Li, X. & Tao, D. 2016, 'Aging Face Recognition: A Hierarchical Learning Model Based on Local Patterns Selection', IEEE Transactions on Image Processing, vol. 25, no. 5, pp. 2146-2154. View/Download from: Publisher's site &copy; 2015 IEEE. Aging face recognition refers to matching the same person's faces across different ages, e.g., matching a person's older face to his (or her) younger one, which has many important practical applications, such as finding missing children. The major challenge of this task is that facial appearance is subject to significant change during the aging process. In this paper, we propose to solve the problem with a hierarchical model based on two-level learning. At the first level, effective features are learned from low-level microstructures, based on our new feature descriptor called local pattern selection (LPS). The proposed LPS descriptor greedily selects low-level discriminant patterns in a way, such that intra-user dissimilarity is minimized. At the second level, higher level visual information is further refined based on the output from the first level. To evaluate the performance of our new method, we conduct extensive experiments on the MORPH data set (the largest face aging data set available in the public domain), which show a significant improvement in accuracy over the state-of-the-art methods. Liu, C.L., Lovell, B., Tao, D. & Tistarelli, M. 2016, 'Pattern Recognition, Part 1 [Guest editors' introduction]', IEEE Intelligent Systems, vol. 31, no. 2, pp. 6-8. View/Download from: Publisher's site &copy; 2001-2011 IEEE. This special issue reports the advances in pattern recognition theory and applications, particularly, the basic issues of pattern classification and image analysis and their applications. The selected articles address image feature representation, contextual pattern analysis, compact low-rank sparse representation for abnormal event detection in video, customer churn prediction using ensemble learning, online text-independent writer identification using deep convolutional neural network, image depth ordering reasoning, and brain MR image tumor segmentation, respectively. Liu, F., Xu, X., Qiu, S., Qing, C. & Tao, D. 2016, 'Simple to Complex Transfer Learning for Action Recognition', IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 949-960. View/Download from: Publisher's site &copy; 1992-2012 IEEE. Recognizing complex human actions is very challenging, since training a robust learning model requires a large amount of labeled data, which is difficult to acquire. Considering that each complex action is composed of a sequence of simple actions which can be easily obtained from existing data sets, this paper presents a simple to complex action transfer learning model (SCA-TLM) for complex human action recognition. SCA-TLM improves the performance of complex action recognition by leveraging the abundant labeled simple actions. In particular, it optimizes the weight parameters, enabling the complex actions to be learned to be reconstructed by simple actions. The optimal reconstruct coefficients are acquired by minimizing the objective function, and the target weight parameters are then represented as a combination of source weight parameters. The main advantage of the proposed SCA-TLM compared with existing approaches is that we exploit simple actions to recognize complex actions instead of only using complex actions as training samples. To validate the proposed SCA-TLM, we conduct extensive experiments on two well-known complex action data sets: 1) Olympic Sports data set and 2) UCF50 data set. The results show the effectiveness of the proposed SCA-TLM for complex action recognition. Tian, D. & Tao, D. 2016, 'Coupled Learning for Facial Deblur', IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 961-972. View/Download from: Publisher's site &copy; 1992-2012 IEEE. Blur in facial images significantly impedes the efficiency of recognition approaches. However, most existing blind deconvolution methods cannot generate satisfactory results due to their dependence on strong edges, which are sufficient in natural images but not in facial images. In this paper, we represent point spread functions (PSFs) by the linear combination of a set of pre-defined orthogonal PSFs, and similarly, an estimated intrinsic (EI) sharp face image is represented by the linear combination of a set of pre-defined orthogonal face images. In doing so, PSF and EI estimation is simplified to discovering two sets of linear combination coefficients, which are simultaneously found by our proposed coupled learning algorithm. To make our method robust to different types of blurry face images, we generate several candidate PSFs and EIs for a test image, and then, a non-blind deconvolution method is adopted to generate more EIs by those candidate PSFs. Finally, we deploy a blind image quality assessment metric to automatically select the optimal EI. Thorough experiments on the facial recognition technology database, extended Yale face database B, CMU pose, illumination, and expression (PIE) database, and face recognition grand challenge database version 2.0 demonstrate that the proposed approach effectively restores intrinsic sharp face images and, consequently, improves the performance of face recognition. Wang, R. & Tao, D. 2016, 'Non-Local Auto-Encoder with Collaborative Stabilization for Image Restoration', IEEE Transactions on Image Processing, vol. 25, no. 5, pp. 2117-2129. View/Download from: Publisher's site &copy; 2016 IEEE. Deep neural networks have been applied to image restoration to achieve the top-level performance. From a neuroscience perspective, the layerwise abstraction of knowledge in a deep neural network can, to some extent, reveal the mechanisms of how visual cues are processed in human brain. A pivotal property of human brain is that similar visual cues can stimulate the same neuron to induce similar neurological signals. However, conventional neural networks do not consider this property, and the resulting models are, as a result, unstable regarding their internal propagation. In this paper, we develop the (stacked) non-local auto-encoder, which exploits self-similar information in natural images for stability. We propose that similar inputs should induce similar network propagation. This is achieved by constraining the difference between the hidden representations of non-local similar image blocks during training. By applying the proposed model to image restoration, we then develop a collaborative stabilization step to further rectify forward propagation. To obtain a reliable deep model, we employ several strategies to simplify training and improve testing. Extensive image restoration experiments, including image denoising and super-resolution, demonstrate the effectiveness of the proposed method. Gong, C., Tao, D., Liu, W., Liu, L. & Yang, J. 2016, 'Label Propagation via Teaching-to-Learn and Learning-to-Teach.', IEEE transactions on neural networks and learning systems. How to propagate label information from labeled examples to unlabeled examples over a graph has been intensively studied for a long time. Existing graph-based propagation algorithms usually treat unlabeled examples equally, and transmit seed labels to the unlabeled examples that are connected to the labeled examples in a neighborhood graph. However, such a popular propagation scheme is very likely to yield inaccurate propagation, because it falls short of tackling ambiguous but critical data points (e.g., outliers). To this end, this paper treats the unlabeled examples in different levels of difficulties by assessing their reliability and discriminability, and explicitly optimizes the propagation quality by manipulating the propagation sequence to move from simple to difficult examples. In particular, we propose a novel iterative label propagation algorithm in which each propagation alternates between two paradigms, teaching-to-learn and learning-to-teach (TLLT). In the teaching-to-learn step, the learner conducts the propagation on the simplest unlabeled examples designated by the teacher. In the learning-to-teach step, the teacher incorporates the learner's feedback to adjust the choice of the subsequent simplest examples. The proposed TLLT strategy critically improves the accuracy of label propagation, making our algorithm substantially robust to the values of tuning parameters, such as the Gaussian kernel width used in graph construction. The merits of our algorithm are theoretically justified and empirically demonstrated through experiments performed on both synthetic and real-world data sets. Liu, X., Deng, C., Lang, B., Tao, D. & Li, X. 2016, 'Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search', IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 907-919. View/Download from: Publisher's site &copy; 1992-2012 IEEE. Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significan... Cai, B., Xu, X., Xing, X., Jia, K., Miao, J. & Tao, D. 2016, 'BIT: Biologically Inspired Tracker', IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1327-1339. View/Download from: Publisher's site &copy; 1992-2012 IEEE. Visual tracking is challenging due to image variations caused by various factors, such as object deformation, scale change, illumination change, and occlusion. Given the superior tracking performance of human visual system (HVS), an ideal design of biologically inspired model is expected to improve computer visual tracking. This is, however, a difficult task due to the incomplete understanding of neurons' working mechanism in the HVS. This paper aims to address this challenge based on the analysis of visual cognitive mechanism of the ventral stream in the visual cortex, which simulates shallow neurons (S1 units and C1 units) to extract low-level biologically inspired features for the target appearance and imitates an advanced learning mechanism (S2 units and C2 units) to combine generative and discriminative models for target location. In addition, fast Gabor approximation and fast Fourier transform are adopted for real-time learning and detection in this framework. Extensive experiments on large-scale benchmark data sets show that the proposed biologically inspired tracker performs favorably against the state-of-the-art methods in terms of efficiency, accuracy, and robustness. The acceleration technique in particular ensures that biologically inspired tracker maintains a speed of approximately 45 frames/s. Du, B., Xiong, W., Wu, J., Zhang, L., Zhang, L. & Tao, D. 2016, 'Stacked Convolutional Denoising Auto-Encoders for Feature Representation', IEEE Transactions on Cybernetics. View/Download from: Publisher's site Deep networks have achieved excellent performance in learning representation from visual data. However, the supervised deep models like convolutional neural network require large quantities of labeled data, which are very expensive to obtain. To solve this problem, this paper proposes an unsupervised deep network, called the stacked convolutional denoising auto-encoders, which can map images to hierarchical representations without any label information. The network, optimized by layer-wise training, is constructed by stacking layers of denoising auto-encoders in a convolutional way. In each layer, high dimensional feature maps are generated by convolving features of the lower layer with kernels learned by a denoising auto-encoder. The auto-encoder is trained on patches extracted from feature maps in the lower layer to learn robust feature detectors. To better train the large network, a layer-wise whitening technique is introduced into the model. Before each convolutional layer, a whitening layer is embedded to sphere the input data. By layers of mapping, raw images are transformed into high-level feature representations which would boost the performance of the subsequent support vector machine classifier. The proposed algorithm is evaluated by extensive experimentations and demonstrates superior classification performance to state-of-the-art unsupervised networks. Zhang, K., Tao, D., Gao, X., Li, X. & Li, J. 2016, 'Coarse-to-Fine Learning for Single-Image Super-Resolution', IEEE Transactions on Neural Networks and Learning Systems. View/Download from: Publisher's site This paper develops a coarse-to-fine framework for single-image super-resolution (SR) reconstruction. The coarse-to-fine approach achieves high-quality SR recovery based on the complementary properties of both example learning- and reconstruction-based algorithms: example learning-based SR approaches are useful for generating plausible details from external exemplars but poor at suppressing aliasing artifacts, while reconstruction-based SR methods are propitious for preserving sharp edges yet fail to generate fine details. In the coarse stage of the method, we use a set of simple yet effective mapping functions, learned via correlative neighbor regression of grouped low-resolution (LR) to high-resolution (HR) dictionary atoms, to synthesize an initial SR estimate with particularly low computational cost. In the fine stage, we devise an effective regularization term that seamlessly integrates the properties of local structural regularity, nonlocal self-similarity, and collaborative representation over relevant atoms in a learned HR dictionary, to further improve the visual quality of the initial SR estimation obtained in the coarse stage. The experimental results indicate that our method outperforms other state-of-the-art methods for producing high-quality images despite that both the initial SR estimation and the followed enhancement are cheap to implement. Du, B., Wang, S., Wang, N., Zhang, L., Tao, D. & Zhang, L. 2016, 'Hyperspectral signal unmixing based on constrained non-negative matrix factorization approach', Neurocomputing, vol. 204, pp. 153-161. View/Download from: Publisher's site &copy; 2016 Elsevier B.V. Hyperspectral unmixing is a hot topic in signal and image processing. A set of high-dimensional data matrices can be decomposed into two sets of non-negative low-dimensional matrices by Non-negative matrix factorization (NMF). However, the algorithm has many local solutions because of the non-convexity of the objective function. Some algorithms solve this problem by adding auxiliary constraints, such as sparse. The sparse NMF has a good performance but the result is unstable and sensitive to noise. Using the structural information for the unmixing approaches can make the decomposition stable. Someone used a clustering based on Euclidean distance to guide the decomposition and obtain good performance. The Euclidean distance is just used to measure the straight line distance of two points. However, the ground objects usually obey certain statistical distribution. It's difficult to measure the difference between the statistical distributions comprehensively by Euclidean distance. Kullback-Leibler divergence (KL divergence) is a better metric. In this paper, we propose a new approach named KL divergence constrained NMF which measures the statistical distribution difference using KL divergence instead of the Euclidean distance. It can improve the accuracy of structured information by using the KL divergence in the algorithm. Experimental results based on synthetic and real hyperspectral data show the superiority of the proposed algorithm with respect to other state-of-the-art algorithms. Gong, C., Tao, D., Maybank, S., Liu, W., Kang, G. & Yang, J. 2016, 'Multi-modal Curriculum Learning for Semi-supervised Image Classification.', IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. Semi-supervised image classification aims to classify a large quantity of unlabeled images by harnessing typically scarce labeled images. Existing semi-supervised methods often suffer from inadequate classification accuracy when encountering difficult yet critical images such as outliers, because they treat all unlabeled images equally and conduct classifications in an imperfectly ordered sequence. In this paper, we employ the curriculum learning methodology by investigating the difficulty of classifying every unlabeled image. The reliability and discriminability of these unlabeled images are particularly investigated for evaluating their difficulty. As a result, an optimized image sequence is generated during the iterative propagations, and the unlabeled images are logically classified from simple to difficult. Furthermore, since images are usually characterized by multiple visual feature descriptors, we associate each kind of features with a "teacher", and design a Multi-Modal Curriculum Learning (MMCL) strategy to integrate the information from different feature modalities. In each propagation, each teacher analyzes the difficulties of the currently unlabeled images from its own modality viewpoint. A consensus is subsequently reached among all the teachers, determining the currently simplest images (i.e. a curriculum) which are to be reliably classified by the multi-modal "learner". This well-organized propagation process leveraging multiple teachers and one learner enables our MMCL to outperform five state-of-the-art methods on eight popular image datasets. Liu, C.L., Lovell, B., Tao, D. & Tistarelli, M. 2016, 'Pattern Recognition, Part 2', IEEE Intelligent Systems, vol. 31, no. 3, pp. 3-5. View/Download from: Publisher's site &copy; 2001-2011 IEEE.This second part of the special issue on pattern recognition reports the advances in pattern recognition for visual data. The selected articles address visual categorization by cross-domain dictionary learning, facial expression recognition, face sketch-photo matching, heartbeat rate measurement from facial video, driver gaze estimation, nighttime vehicle detection, and overlaid arrow detection in biomedical images, respectively. Yang, W., Jin, L., Tao, D., Xie, Z. & Feng, Z. 2016, 'DropSample: A new training method to enhance deep convolutional neural networks for large-scale unconstrained handwritten Chinese character recognition', Pattern Recognition, vol. 58, pp. 190-203. View/Download from: Publisher's site &copy; 2016 Elsevier Ltd.Inspired by the theory of Leitners learning box from the field of psychology, we propose DropSample, a new method for training deep convolutional neural networks (DCNNs), and apply it to large-scale online handwritten Chinese character recognition (HCCR). According to the principle of DropSample, each training sample is associated with a quota function that is dynamically adjusted on the basis of the classification confidence given by the DCNN softmax output. After a learning iteration, samples with low confidence will have a higher frequency of being selected as training data; in contrast, well-trained and well-recognized samples with very high confidence will have a lower frequency of being involved in the ongoing training and can be gradually eliminated. As a result, the learning process becomes more efficient as it progresses. Furthermore, we investigate the use of domain-specific knowledge to enhance the performance of DCNN by adding a domain knowledge layer before the traditional CNN. By adopting DropSample together with different types of domain-specific knowledge, the accuracy of HCCR can be improved efficiently. Experiments on the CASIA-OLHDWB 1.0, CASIA-OLHWDB 1.1, and ICDAR 2013 online HCCR competition datasets yield outstanding recognition rates of 97.33%, 97.06%, and 97.51% respectively, all of which are significantly better than the previous best results reported in the literature. Xiong, H., Liu, T., Tao, D. & Shen, H.T. 2016, 'Dual Diversified Dynamical Gaussian Process Latent Variable Model for Video Repairing.', IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol. 25, no. 8, pp. 3626-3637. In this paper, we propose a dual diversified dynamical Gaussian process latent variable model ( [Formula: see text]GPLVM) to tackle the video repairing issue. For preservation purposes, videos have to be conserved on media. However, storing on media, such as films and hard disks, can suffer from unexpected data loss, for instance, physical damage. So repairing of missing or damaged pixels is essential for better video maintenance. Most methods seek to fill in missing holes by synthesizing similar textures from local patches (the neighboring pixels), consecutive frames, or the whole video. However, these can introduce incorrect contexts, especially when the missing hole or number of damaged frames is large. Furthermore, simple texture synthesis can introduce artifacts in undamaged and recovered areas. To address aforementioned problems, we introduce two diversity encouraging priors to both of inducing points and latent variables for considering the variety in existing videos. In [Formula: see text]GPLVM, the inducing points constitute a smaller subset of observed data, while latent variables are a low-dimensional representation of observed data. Since they have a strong correlation with the observed data, it is essential that both of them can capture distinct aspects of and fully represent the observed data. The dual diversity encouraging priors ensure that the trained inducing points and latent variables are more diverse and resistant for context-aware and artifacts-free-based video repairing. The defined objective function in our proposed model is initially not analytically tractable and must be solved by variational inference. Finally, experimental testing results illustrate the robustness and effectiveness of our method for damaged video repairing. Liu, T., Gong, M. & Tao, D. 2016, 'Large-Cone Nonnegative Matrix Factorization', IEEE Transactions on Neural Networks and Learning Systems. View/Download from: Publisher's site Nonnegative matrix factorization (NMF) has been greatly popularized by its parts-based interpretation and the effective multiplicative updating rule for searching local solutions. In this paper, we study the problem of how to obtain an attractive local solution for NMF, which not only fits the given training data well but also generalizes well on the unseen test data. Based on the geometric interpretation of NMF, we introduce two large-cone penalties for NMF and propose large-cone NMF (LCNMF) algorithms. Compared with NMF, LCNMF will obtain bases comprising a larger simplicial cone, and therefore has three advantages. 1) the empirical reconstruction error of LCNMF could mostly be smaller; (2) the generalization ability of the proposed algorithm is much more powerful; and (3) the obtained bases of LCNMF have a low-overlapping property, which enables the bases to be sparse and makes the proposed algorithms very robust. Experiments on synthetic and real-world data sets confirm the efficiency of LCNMF. Wang, M., Fu, W., Hao, S., Tao, D. & Wu, X. 2016, 'Scalable Semi-Supervised Learning by Efficient Anchor Graph Regularization', IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 7, pp. 1864-1877. View/Download from: Publisher's site &copy; 1989-2012 IEEE.Many graph-based semi-supervised learning methods for large datasets have been proposed to cope with the rapidly increasing size of data, such as Anchor Graph Regularization (AGR). This model builds a regularization framework by exploring the underlying structure of the whole dataset with both datapoints and anchors. Nevertheless, AGR still has limitations in its two components: (1) in anchor graph construction, the estimation of the local weights between each datapoint and its neighboring anchors could be biased and relatively slow; and (2) in anchor graph regularization, the adjacency matrix that estimates the relationship between datapoints, is not sufficiently effective. In this paper, we develop an Efficient Anchor Graph Regularization (EAGR) by tackling these issues. First, we propose a fast local anchor embedding method, which reformulates the optimization of local weights and obtains an analytical solution. We show that this method better reconstructs datapoints with anchors and speeds up the optimizing process. Second, we propose a new adjacency matrix among anchors by considering the commonly linked datapoints, which leads to a more effective normalized graph Laplacian over anchors. We show that, with the novel local weight estimation and normalized graph Laplacian, EAGR is able to achieve better classification accuracy with much less computational costs. Experimental results on several publicly available datasets demonstrate the effectiveness of our approach. He, Z., Li, X., You, X., Tao, D. & Tang, Y.Y. 2016, 'Connected Component Model for Multi-Object Tracking', IEEE Transactions on Image Processing, vol. 25, no. 8, pp. 3698-3711. View/Download from: Publisher's site &copy; 1992-2012 IEEE.In multi-object tracking, it is critical to explore the data associations by exploiting the temporal information from a sequence of frames rather than the information from the adjacent two frames. Since straightforwardly obtaining data associations from multi-frames is an NP-hard multi-dimensional assignment (MDA) problem, most existing methods solve this MDA problem by either developing complicated approximate algorithms, or simplifying MDA as a 2D assignment problem based upon the information extracted only from adjacent frames. In this paper, we show that the relation between associations of two observations is the equivalence relation in the data association problem, based on the spatial-Temporal constraint that the trajectories of different objects must be disjoint. Therefore, the MDA problem can be equivalently divided into independent subproblems by equivalence partitioning. In contrast to existing works for solving the MDA problem, we develop a connected component model (CCM) by exploiting the constraints of the data association and the equivalence relation on the constraints. Based upon CCM, we can efficiently obtain the global solution of the MDA problem for multi-object tracking by optimizing a sequence of independent data association subproblems. Experiments on challenging public data sets demonstrate that our algorithm outperforms the state-of-The-Art approaches. Li, Q., Xie, B., You, J., Bian, W. & Tao, D. 2016, 'Correlated Logistic Model with Elastic Net Regularization for Multilabel Image Classification', IEEE Transactions on Image Processing, vol. 25, no. 8, pp. 3801-3813. View/Download from: Publisher's site &copy; 1992-2012 IEEE.In this paper, we present correlated logistic (CorrLog) model for multilabel image classification. CorrLog extends conventional logistic regression model into multilabel cases, via explicitly modeling the pairwise correlation between labels. In addition, we propose to learn the model parameters of CorrLog with elastic net regularization, which helps exploit the sparsity in feature selection and label correlations and thus further boost the performance of multilabel classification. CorrLog can be efficiently learned, though approximately, by regularized maximum pseudo likelihood estimation, and it enjoys a satisfying generalization bound that is independent of the number of labels. CorrLog performs competitively for multilabel image classification on benchmark data sets MULAN scene, MIT outdoor scene, PASCAL VOC 2007, and PASCAL VOC 2012, compared with the state-of-the-art multilabel classification algorithms. Zhang, S., Lan, X., Yao, H., Zhou, H., Tao, D. & Li, X. 2016, 'A Biologically Inspired Appearance Model for Robust Visual Tracking', IEEE Transactions on Neural Networks and Learning Systems. View/Download from: Publisher's site In this paper, we propose a biologically inspired appearance model for robust visual tracking. Motivated in part by the success of the hierarchical organization of the primary visual cortex (area V1), we establish an architecture consisting of five layers: whitening, rectification, normalization, coding, and pooling. The first three layers stem from the models developed for object recognition. In this paper, our attention focuses on the coding and pooling layers. In particular, we use a discriminative sparse coding method in the coding layer along with spatial pyramid representation in the pooling layer, which makes it easier to distinguish the target to be tracked from its background in the presence of appearance variations. An extensive experimental study shows that the proposed method has higher tracking accuracy than several state-of-the-art trackers. Hu, Y., Wang, N., Tao, D., Gao, X. & Li, X. 2016, 'SERF: A Simple, Effective, Robust, and Fast Image Super-Resolver from Cascaded Linear Regression', IEEE Transactions on Image Processing, vol. 25, no. 9, pp. 4091-4102. View/Download from: Publisher's site &copy; 2016 IEEE.Example learning-based image super-resolution techniques estimate a high-resolution image from a low-resolution input image by relying on high- and low-resolution image pairs. An important issue for these techniques is how to model the relationship between high- and low-resolution image patches: most existing complex models either generalize hard to diverse natural images or require a lot of time for model training, while simple models have limited representation capability. In this paper, we propose a simple, effective, robust, and fast (SERF) image super-resolver for image super-resolution. The proposed super-resolver is based on a series of linear least squares functions, namely, cascaded linear regression. It has few parameters to control the model and is thus able to robustly adapt to different image data sets and experimental settings. The linear least square functions lead to closed form solutions and therefore achieve computationally efficient implementations. To effectively decrease these gaps, we group image patches into clusters via k-means algorithm and learn a linear regressor for each cluster at each iteration. The cascaded learning process gradually decreases the gap of high-frequency detail between the estimated high-resolution image patch and the ground truth image patch and simultaneously obtains the linear regression parameters. Experimental results show that the proposed method achieves superior performance with lower time consumption than the state-of-the-art methods. Liu, W., Zha, Z.J., Wang, Y., Lu, K. & Tao, D. 2016, 'P-Laplacian Regularized Sparse Coding for Human Activity Recognition', IEEE Transactions on Industrial Electronics, vol. 63, no. 8, pp. 5120-5129. View/Download from: Publisher's site &copy; 2016 IEEE.Human activity analysis in videos has increasingly attracted attention in computer vision research with the massive number of videos now accessible online. Although many recognition algorithms have been reported recently, activity representation is challenging. Recently, manifold regularized sparse coding has obtained promising performance in action recognition, because it simultaneously learns the sparse representation and preserves the manifold structure. In this paper, we propose a generalized version of Laplacian regularized sparse coding for human activity recognition called p-Laplacian regularized sparse coding (pLSC). The proposed method exploits p-Laplacian regularization to preserve the local geometry. The p-Laplacian is a nonlinear generalization of standard graph Laplacian and has tighter isoperimetric inequality. As a result, pLSC provides superior theoretical evidence than standard Laplacian regularized sparse coding with a proper p. We also provide a fast iterative shrinkage-thresholding algorithm for the optimization of pLSC. Finally, we input the sparse codes learned by the pLSC algorithm into support vector machines and conduct extensive experiments on the unstructured social activity attribute dataset and human motion database (HMDB51) for human activity recognition. The experimental results demonstrate that the proposed pLSC algorithm outperforms the manifold regularized sparse coding algorithms including the standard Laplacian regularized sparse coding algorithm with a proper p. Qiao, M., Xu, R.Y.D., Bian, W. & Tao, D. 2016, 'Fast sampling for time-varying Determinantal Point Processes', ACM Transactions on Knowledge Discovery from Data, vol. 11, no. 1. View/Download from: Publisher's site &copy; 2016 ACM.Determinantal Point Processes (DPPs) are stochastic models which assign each subset of a base dataset with a probability proportional to the subset's degree of diversity. It has been shown that DPPs are particularly appropriate in data subset selection and summarization (e.g., news display, video summarizations). DPPs prefer diverse subsets while other conventional models cannot offer. However, DPPs inference algorithms have a polynomial time complexity which makes it difficult to handle large and time-varying datasets, especially when real-time processing is required. To address this limitation, we developed a fast sampling algorithm for DPPs which takes advantage of the nature of some time-varying data (e.g., news corpora updating, communication network evolving), where the data changes between time stamps are relatively small. The proposed algorithm is built upon the simplification of marginal density functions over successive time stamps and the sequential Monte Carlo (SMC) sampling technique. Evaluations on both a real-world news dataset and the Enron Corpus confirm the efficiency of the proposed algorithm. Fang, M., Yin, J., Hall, L.O. & Tao, D. 2016, 'Active Multitask Learning With Trace Norm Regularization Based on Excess Risk', IEEE Transactions on Cybernetics. View/Download from: Publisher's site This paper addresses the problem of active learning on multiple tasks, where labeled data are expensive to obtain for each individual task but the learning problems share some commonalities across multiple related tasks. To leverage the benefits of jointly learning from multiple related tasks and making active queries, we propose a novel active multitask learning approach based on trace norm regularized least squares. The basic idea is to induce an optimal classifier which has the lowest risk and at the same time which is closest to the true hypothesis. Toward this aim, we devise a new active selection criterion that takes into account not only the risk but also the excess risk, which measures the distance to the true hypothesis. Based on this criterion, our proposed algorithm actively selects the instance to query for its label based on the combination of the two risks. Experiments on both synthetic and real-world datasets show that our proposed algorithm provides superior performance as compared to other state-of-the-art active learning methods. Xu, C., Tao, D., Li, Y. & Xu, C. 2015, 'Large-margin multi-view Gaussian process', Multimedia Systems, vol. 21, no. 2, pp. 147-157. View/Download from: Publisher's site In image classification, the goal was to decide whether an image belongs to a certain category or not. Multiple features are usually employed to comprehend the contents of images substantially for the improvement of classification accuracy. However, it also brings in some new problems that how to effectively combine multiple features together and how to handle the high-dimensional features from multiple views given the small training set. In this paper, we integrate the large-margin idea into the Gaussian process to discover the latent subspace shared by multiple features. Therefore, our approach inherits all the advantages of Gaussian process and large-margin principle. A probabilistic explanation is provided by Gaussian process to embed multiple features into the shared low-dimensional subspace, which derives a strong discriminative ability from the large-margin principle, and thus, the subsequent classification task can be effectively accomplished. Finally, we demonstrate the advantages of the proposed algorithm on real-world image datasets for discovering discriminative latent subspace and improving the classification performance. &copy; 2014 Springer-Verlag Berlin Heidelberg. Zhao, L., Gao, X., Tao, D. & Li, X. 2015, 'A deep structure for human pose estimation', Signal Processing, vol. 108, pp. 36-45. View/Download from: Publisher's site &copy; 2014 Elsevier B.V. Articulated human pose estimation in unconstrained conditions is a great challenge. We propose a deep structure that represents a human body in different granularity from coarse-to-fine for better detecting parts and describing spatial constrains between different parts. Typical approaches for this problem just utilize a single level structure, which is difficult to capture various body appearances and hard to model high-order part dependencies. In this paper, we build a three layer Markov network to model the body structure that separates the whole body to poselets (combined parts) then to parts representing joints. Parts at different levels are connected through a parent-child relationship to represent high-order spatial relationships. Unlike other multi-layer models, our approach explores more reasonable granularity for part detection and sophisticatedly designs part connections to model body configurations more effectively. Moreover, each part in our model contains different types so as to capture a wide range of pose modes. And our model is a tree structure, which can be trained jointly and favors exact inference. Extensive experimental results on two challenging datasets show the performance of our model improving or being on-par with state-of-the-art approaches. Zhang, L., Zhang, L., Tao, D., Huang, X. & Du, B. 2015, 'Compression of hyperspectral remote sensing images by tensor approach', Neurocomputing, vol. 147, pp. 358-363. View/Download from: Publisher's site Whereas the transform coding algorithms have been proved to be efficient and practical for grey-level and color images compression, they could not directly deal with the hyperspectral images (HSI) by simultaneously considering both the spatial and spectral domains of the data cube. The aim of this paper is to present an HSI compression and reconstruction method based on the multi-dimensional or tensor data processing approach. By representing the observed hyperspectral image cube to a 3-order-tensor, we introduce a tensor decomposition technology to approximately decompose the original tensor data into a core tensor multiplied by a factor matrix along each mode. Thus, the HSI is compressed to the core tensor and could be reconstructed by the multi-linear projection via the factor matrices. Experimental results on particular applications of hyperspectral remote sensing images such as unmixing and detection suggest that the reconstructed data by the proposed approach significantly preserves the HSI's data quality in several aspects. &copy; 2014 Elsevier B.V. Hou, W., Gao, X., Tao, D. & Li, X. 2015, 'Blind image quality assessment via deep learning.', IEEE transactions on neural networks and learning systems, vol. 26, no. 6, pp. 1275-1286. View/Download from: Publisher's site This paper investigates how to blindly evaluate the visual quality of an image by learning rules from linguistic descriptions. Extensive psychological evidence shows that humans prefer to conduct evaluations qualitatively rather than numerically. The qualitative evaluations are then converted into the numerical scores to fairly benchmark objective image quality assessment (IQA) metrics. Recently, lots of learning-based IQA models are proposed by analyzing the mapping from the images to numerical ratings. However, the learnt mapping can hardly be accurate enough because some information has been lost in such an irreversible conversion from the linguistic descriptions to numerical scores. In this paper, we propose a blind IQA model, which learns qualitative evaluations directly and outputs numerical scores for general utilization and fair comparison. Images are represented by natural scene statistics features. A discriminative deep model is trained to classify the features into five grades, corresponding to five explicit mental concepts, i.e., excellent, good, fair, poor, and bad. A newly designed quality pooling is then applied to convert the qualitative labels into scores. The classification framework is not only much more natural than the regression-based models, but also robust to the small sample size problem. Thorough experiments are conducted on popular databases to verify the model's effectiveness, efficiency, and robustness. Yang, X., Gao, X., Tao, D., Li, X. & Li, J. 2015, 'An efficient MRF embedded level set method for image segmentation.', IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol. 24, no. 1, pp. 9-21. This paper presents a fast and robust level set method for image segmentation. To enhance the robustness against noise, we embed a Markov random field (MRF) energy function to the conventional level set energy function. This MRF energy function builds the correlation of a pixel with its neighbors and encourages them to fall into the same region. To obtain a fast implementation of the MRF embedded level set model, we explore algebraic multigrid (AMG) and sparse field method (SFM) to increase the time step and decrease the computation domain, respectively. Both AMG and SFM can be conducted in a parallel fashion, which facilitates the processing of our method for big image databases. By comparing the proposed fast and robust level set method with the standard level set method and its popular variants on noisy synthetic images, synthetic aperture radar (SAR) images, medical images, and natural images, we comprehensively demonstrate the new method is robust against various kinds of noises. In particular, the new level set method can segment an image of size 500 500 within 3 s on MATLAB R2010b installed in a computer with 3.30-GHz CPU and 4-GB memory. Gong, C., Tao, D., Fu, K. & Yang, J. 2015, 'Fick's Law Assisted Propagation for Semisupervised Learning', IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 26, no. 9, pp. 2148-2162. View/Download from: Publisher's site Gong, C., Liu, T., Tao, D., Fu, K., Tu, E. & Yang, J. 2015, 'Deformed Graph Laplacian for Semisupervised Learning', IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 26, no. 10, pp. 2261-2274. View/Download from: Publisher's site Luo, Y., Tao, D., Ramamohanarao, K., Xu, C. & Wen, Y. 2015, 'Tensor Canonical Correlation Analysis for Multi-View Dimension Reduction', IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 11, pp. 3111-3124. View/Download from: UTS OPUS or Publisher's site &copy; 2015 IEEE. Canonical correlation analysis (CCA) has proven an effective tool for two-view dimension reduction due to its profound theoretical foundation and success in practical applications. In respect of multi-view learning, however, it is limited by its capability of only handling data represented by two-view features, while in many real-world applications, the number of views is frequently many more. Although the ad hoc way of simultaneously exploring all possible pairs of features can numerically deal with multi-view data, it ignores the high order statistics (correlation information) which can only be discovered by simultaneously exploring all features. Therefore, in this work, we develop tensor CCA (TCCA) which straightforwardly yet naturally generalizes CCA to handle the data of an arbitrary number of views by analyzing the covariance tensor of the different views. TCCA aims to directly maximize the canonical correlation of multiple (more than two) views. Crucially, we prove that the main problem of multi-view canonical correlation maximization is equivalent to finding the best rank-1 approximation of the data covariance tensor, which can be solved efficiently using the well-known alternating least squares (ALS) algorithm. As a consequence, the high order correlation information contained in the different views is explored and thus a more reliable common subspace shared by all features can be obtained. In addition, a non-linear extension of TCCA is presented. Experiments on various challenge tasks, including large scale biometric structure prediction, internet advertisement classification, and web image annotation, demonstrate the effectiveness of the proposed method. Zhang, L., Zhang, Q., Zhang, L., Tao, D., Huang, X. & Du, B. 2015, 'Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding', Pattern Recognition, vol. 48, no. 10, pp. 3102-3112. View/Download from: Publisher's site &copy; 2015. In computer vision and pattern recognition researches, the studied objects are often characterized by multiple feature representations with high dimensionality, thus it is essential to encode that multiview feature into a unified and discriminative embedding that is optimal for a given task. To address this challenge, this paper proposes an ensemble manifold regularized sparse low-rank approximation (EMR-SLRA) algorithm for multiview feature embedding. The EMR-SLRA algorithm is based on the framework of least-squares component analysis, in particular, the low dimensional feature representation and the projection matrix are obtained by the low-rank approximation of the concatenated multiview feature matrix. By considering the complementary property among multiple features, EMR-SLRA simultaneously enforces the ensemble manifold regularization on the output feature embedding. In order to further enhance its robustness against the noise, the group sparsity is introduced into the objective formulation to impose direct noise reduction on the input multiview feature matrix. Since there is no closed-form solution for EMR-SLRA, this paper provides an efficient EMR-SLRA optimization procedure to obtain the output feature embedding. Experiments on the pattern recognition applications confirm the effectiveness of the EMR-SLRA algorithm compare with some other multiview feature dimensionality reduction approaches. Zhu, Z., You, X., Chen, C.L.P., Tao, D., Ou, W., Jiang, X. & Zou, J. 2015, 'An adaptive hybrid pattern for noise-robust texture analysis', Pattern Recognition, vol. 48, no. 8, pp. 2592-2608. View/Download from: Publisher's site Local binary patterns (LBP) achieve great success in texture analysis, however they are not robust to noise. The two reasons for such disadvantage of LBP schemes are (1) they encode the texture spatial structure based only on local information which is sensitive to noise and (2) they use exact values as the quantization thresholds, which make the extracted features sensitive to small changes in the input image. In this paper, we propose a noise-robust adaptive hybrid pattern (AHP) for noised texture analysis. In our scheme, two solutions from the perspective of texture description model and quantization algorithm have been developed to reduce the feature's noise sensitiveness. First, a hybrid texture description model is proposed. In this model, the global texture spatial structure which is depicted by a global description model is encoded with the primitive microfeature for texture description. Second, we develop an adaptive quantization algorithm in which equal probability quantization is utilized to achieve the maximum partition entropy. Higher noise-tolerance can be obtained with the minimum lost information in the quantization process. The experimental results of texture classification on two texture databases with three different types of noise show that our approach leads significant improvement in noised texture analysis. Furthermore, our scheme achieves state-of-the-art performance in noisy face recognition. Yang, X., Wang, M. & Tao, D. 2015, 'Robust visual tracking via multi-graph ranking', Neurocomputing, vol. 159, pp. 35-43. View/Download from: Publisher's site &copy; 2015 Elsevier B.V. Object tracking is a fundamental problem in computer vision. Although much progress has been made, object tracking is still a challenging problem as it entails learning an effective model to account for appearance change caused by intrinsic and extrinsic factors. To improve the reliability and effectiveness, this paper presents an approach that explores the combination of graph-based ranking and multiple feature representations for tracking. We construct multiple graph matrices with various types of visual features, and integrate the multiple graphs into a regularization framework to learn a ranking vector. In particular, the approach has exploited temporal consistency by adding a regularization term to constrain the difference between two weight vectors at adjacent frames. An effective iterative optimization scheme is also proposed in this paper. Experimental results on a variety of challenging video sequences show that the proposed algorithm performs favorably against the state-of-the-art visual tracking methods. Li, Y., Tian, X., Song, M. & Tao, D. 2015, 'Multi-task proximal support vector machine', Pattern Recognition, vol. 48, no. 10, pp. 3249-3257. View/Download from: Publisher's site With the explosive growth of the use of imagery, visual recognition plays an important role in many applications and attracts increasing research attention. Given several related tasks, single-task learning learns each task separately and ignores the relationships among these tasks. Different from single-task learning, multi-task learning can explore more information to learn all tasks jointly by using relationships among these tasks. In this paper, we propose a novel multi-task learning model based on the proximal support vector machine. The proximal support vector machine uses the large-margin idea as does the standard support vector machines but with looser constraints and much lower computational cost. Our multi-task proximal support vector machine inherits the merits of the proximal support vector machine and achieves better performance compared with other popular multi-task learning models. Experiments are conducted on several multi-task learning datasets, including two classification datasets and one regression dataset. All results demonstrate the effectiveness and efficiency of our proposed multi-task proximal support vector machine. Weng, D., Wang, Y., Gong, M., Tao, D., Wei, H. & Huang, D. 2015, 'DERF: Distinctive efficient robust features from the biological modeling of the P ganglion cells', IEEE Transactions on Image Processing, vol. 24, no. 8, pp. 2287-2302. View/Download from: Publisher's site &copy; 2015 IEEE. Studies in neuroscience and biological vision have shown that the human retina has strong computational power, and its information representation supports vision tasks on both ventral and dorsal pathways. In this paper, a new local image descriptor, termed distinctive efficient robust features (DERF), is derived by modeling the response and distribution properties of the parvocellular-projecting ganglion cells in the primate retina. DERF features exponential scale distribution, exponential grid structure, and circularly symmetric function difference of Gaussian (DoG) used as a convolution kernel, all of which are consistent with the characteristics of the ganglion cell array found in neurophysiology, anatomy, and biophysics. In addition, a new explanation for local descriptor design is presented from the perspective of wavelet tight frames. DoG is naturally a wavelet, and the structure of the grid points array in our descriptor is closely related to the spatial sampling of wavelets. The DoG wavelet itself forms a frame, and when we modulate the parameters of our descriptor to make the frame tighter, the performance of the DERF descriptor improves accordingly. This is verified by designing a tight frame DoG, which leads to much better performance. Extensive experiments conducted in the image matching task on the multiview stereo correspondence data set demonstrate that DERF outperforms state of the art methods for both hand-crafted and learned descriptors, while remaining robust and being much faster to compute. Li, J., Lin, X., Rui, X., Rui, Y. & Tao, D. 2015, 'A Distributed Approach Toward Discriminative Distance Metric Learning', IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 9, pp. 2111-2122. View/Download from: Publisher's site Distance metric learning (DML) is successful in discovering intrinsic relations in data. However, most algorithms are computationally demanding when the problem size becomes large. In this paper, we propose a discriminative metric learning algorithm, develop a distributed scheme learning metrics on moderate-sized subsets of data, and aggregate the results into a global solution. The technique leverages the power of parallel computation. The algorithm of the aggregated DML (ADML) scales well with the data size and can be controlled by the partition. We theoretically analyze and provide bounds for the error induced by the distributed treatment. We have conducted experimental evaluation of the ADML, both on specially designed tests and on practical image annotation tasks. Those tests have shown that the ADML achieves the state-of-the-art performance at only a fraction of the cost incurred by most existing methods. Zhang, X., Guan, N., Tao, D., Qiu, X. & Luo, Z. 2015, 'Online multi-modal robust non-negative dictionary learning for visual tracking.', PloS one, vol. 10, no. 5, p. e0124685. Dictionary learning is a method of acquiring a collection of atoms for subsequent signal representation. Due to its excellent representation ability, dictionary learning has been widely applied in multimedia and computer vision. However, conventional dictionary learning algorithms fail to deal with multi-modal datasets. In this paper, we propose an online multi-modal robust non-negative dictionary learning (OMRNDL) algorithm to overcome this deficiency. Notably, OMRNDL casts visual tracking as a dictionary learning problem under the particle filter framework and captures the intrinsic knowledge about the target from multiple visual modalities, e.g., pixel intensity and texture information. To this end, OMRNDL adaptively learns an individual dictionary, i.e., template, for each modality from available frames, and then represents new particles over all the learned dictionaries by minimizing the fitting loss of data based on M-estimation. The resultant representation coefficient can be viewed as the common semantic representation of particles across multiple modalities, and can be utilized to track the target. OMRNDL incrementally learns the dictionary and the coefficient of each particle by using multiplicative update rules to respectively guarantee their non-negativity constraints. Experimental results on a popular challenging video benchmark validate the effectiveness of OMRNDL for visual tracking in both quantity and quality. Ding, C., Xu, C. & Tao, D. 2015, 'Multi-task pose-invariant face recognition', IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 980-993. View/Download from: UTS OPUS Face images captured in unconstrained environments usually contain significant pose variation, which dramatically degrades the performance of algorithms designed to recognize frontal faces. This paper proposes a novel face identification framework capable of handling the full range of pose variations within &plusmn;90&deg; of yaw. The proposed framework first transforms the original pose-invariant face recognition problem into a partial frontal face recognition problem. A robust patch-based face representation scheme is then developed to represent the synthesized partial frontal faces. For each patch, a transformation dictionary is learnt under the proposed multitask learning scheme. The transformation dictionary transforms the features of different poses into a discriminative subspace. Finally, face matching is performed at patch level rather than at the holistic level. Extensive and systematic experimentation on FERET, CMU-PIE, and Multi-PIE databases shows that the proposed method consistently outperforms single-task-based baselines as well as state-of-the-art methods for the pose problem. We further extend the proposed algorithm for the unconstrained face verification problem and achieve top-level performance on the challenging LFW data set. Zhao, L., Gao, X., Tao, D. & Li, X. 2015, 'Tracking Human Pose Using Max-Margin Markov Models', IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5274-5287. View/Download from: Publisher's site &copy; 2015 IEEE. We present a new method for tracking human pose by employing max-margin Markov models. Representing a human body by part-based models, such as pictorial structure, the problem of pose tracking can be modeled by a discrete Markov random field. Considering max-margin Markov networks provide an efficient way to deal with both structured data and strong generalization guarantees, it is thus natural to learn the model parameters using the max-margin technique. Since tracking human pose needs to couple limbs in adjacent frames, the model will introduce loops and will be intractable for learning and inference. Previous work has resorted to pose estimation methods, which discard temporal information by parsing frames individually. Alternatively, approximate inference strategies have been used, which can overfit to statistics of a particular data set. Thus, the performance and generalization of these methods are limited. In this paper, we approximate the full model by introducing an ensemble of two tree-structured sub-models, Markov networks for spatial parsing and Markov chains for temporal parsing. Both models can be trained jointly using the max-margin technique, and an iterative parsing process is proposed to achieve the ensemble inference. We apply our model on three challengeable data sets, which contains highly varied and articulated poses. Comprehensive experimental results demonstrate the superior performance of our method over the state-of-the-art approaches. Zeng, X., Bian, W., Liu, W., Shen, J. & Tao, D. 2015, 'Dictionary Pair Learning on Grassmann Manifolds for Image Denoising', IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4556-4569. View/Download from: Publisher's site &copy; 2015 IEEE. Image denoising is a fundamental problem in computer vision and image processing that holds considerable practical importance for real-world applications. The traditional patch-based and sparse coding-driven image denoising methods convert 2D image patches into 1D vectors for further processing. Thus, these methods inevitably break down the inherent 2D geometric structure of natural images. To overcome this limitation pertaining to the previous image denoising methods, we propose a 2D image denoising model, namely, the dictionary pair learning (DPL) model, and we design a corresponding algorithm called the DPL on the Grassmann-manifold (DPLG) algorithm. The DPLG algorithm first learns an initial dictionary pair (i.e., the left and right dictionaries) by employing a subspace partition technique on the Grassmann manifold, wherein the refined dictionary pair is obtained through a sub-dictionary pair merging. The DPLG obtains a sparse representation by encoding each image patch only with the selected sub-dictionary pair. The non-zero elements of the sparse representation are further smoothed by the graph Laplacian operator to remove the noise. Consequently, the DPLG algorithm not only preserves the inherent 2D geometric structure of natural images but also performs manifold smoothing in the 2D sparse coding space. We demonstrate that the DPLG algorithm also improves the structural SIMilarity values of the perceptual visual quality for denoised images using the experimental evaluations on the benchmark images and Berkeley segmentation data sets. Moreover, the DPLG also produces the competitive peak signal-to-noise ratio values from popular image denoising algorithms. Huang, K., Wang, C. & Tao, D. 2015, 'High-Order Topology Modeling of Visual Words for Image Classification', IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3598-3608. View/Download from: Publisher's site &copy; 1992-2012 IEEE. Modeling relationship between visual words in feature encoding is important in image classification. Recent methods consider this relationship in either image or feature space, and most of them incorporate only pairwise relationship (between visual words). However, in situations involving large variability in images, one cannot capture intrinsic invariance of intra-class images using low-order pairwise relationship. The result is not robust to larger variations in images. In addition, as the number of potential pairings grows exponentially with the number of visual words, the task of learning becomes computationally expensive. To overcome these two limitations, we propose an efficient classification framework that exploits high-order topology of visual words in the feature space, as follows. First, we propose a search algorithm that seeks dependence between the visual words. This dependence is used to construct higher order topology in the feature space. Then, the local features are encoded according to this higher order topology to improve the image classification. Experiments involving four common data sets, namely PASCAL VOC 2007, 15 Scenes, Caltech 101, and UIUC Sport Event, demonstrate that the dependence search significantly improves the efficiency of higher order topological construction, and consequently increases the image classification in all these data sets. Gao, F., Tao, D., Gao, X. & Li, X. 2015, 'Learning to Rank for Blind Image Quality Assessment', IEEE Transactions on Neural Networks and Learning Systems. View/Download from: Publisher's site Blind image quality assessment (BIQA) aims to predict perceptual image quality scores without access to reference images. State-of-the-art BIQA methods typically require subjects to score a large number of images to train a robust model. However, subjective quality scores are imprecise, biased, and inconsistent, and it is challenging to obtain a large-scale database, or to extend existing databases, because of the inconvenience of collecting images, training the subjects, conducting subjective experiments, and realigning human quality evaluations. To combat these limitations, this paper explores and exploits preference image pairs (PIPs) such as the quality of image Ia is better than that of image Ib for training a robust BIQA model. The preference label, representing the relative quality of two images, is generally precise and consistent, and is not sensitive to image content, distortion type, or subject identity; such PIPs can be generated at a very low cost. The proposed BIQA method is one of learning to rank. We first formulate the problem of learning the mapping from the image features to the preference label as one of classification. In particular, we investigate the utilization of a multiple kernel learning algorithm based on group lasso to provide a solution. A simple but effective strategy to estimate perceptual image quality scores is then presented. Experiments show that the proposed BIQA method is highly effective and achieves a performance comparable with that of state-of-the-art BIQA algorithms. Moreover, the proposed method can be easily extended to new distortion categories. Liu, X., Song, M., Tao, D., Bu, J. & Chen, C. 2015, 'Random Geometric Prior Forest for Multiclass Object Segmentation', IEEE Transactions on Image Processing, vol. 24, no. 10, pp. 3060-3070. View/Download from: Publisher's site &copy; 1992-2012 IEEE. Recent advances in object detection have led to the development of segmentation by detection approaches that integrate top-down geometric priors for multiclass object segmentation. A key yet under-addressed issue in utilizing top-down cues for the problem of multiclass object segmentation by detection is efficiently generating robust and accurate geometric priors. In this paper, we propose a random geometric prior forest scheme to obtain object-adaptive geometric priors efficiently and robustly. In the scheme, a testing object first searches for training neighbors with similar geometries using the random geometric prior forest, and then the geometry of the testing object is reconstructed by linearly combining the geometries of its neighbors. Our scheme enjoys several favorable properties when compared with conventional methods. First, it is robust and very fast because its inference does not suffer from bad initializations, poor local minimums or complex optimization. Second, the figure/ground geometries of training samples are utilized in a multitask manner. Third, our scheme is object-adaptive but does not require the labeling of parts or poselets, and thus, it is quite easy to implement. To demonstrate the effectiveness of the proposed scheme, we integrate the obtained top-down geometric priors with conventional bottom-up color cues in the frame of graph cut. The proposed random geometric prior forest achieves the best segmentation results of all of the methods tested on VOC2010/2012 and is 90 times faster than the current state-of-the-art method. Song, D., Liu, W., Zhou, T., Tao, D. & Meyer, D.A. 2015, 'Efficient Robust Conditional Random Fields', IEEE Transactions on Image Processing, vol. 24, no. 10, pp. 3124-3136. View/Download from: Publisher's site &copy; 1992-2012 IEEE. Conditional random fields (CRFs) are a flexible yet powerful probabilistic approach and have shown advantages for popular applications in various areas, including text analysis, bioinformatics, and computer vision. Traditional CRF models, however, are incapable of selecting relevant features as well as suppressing noise from noisy original features. Moreover, conventional optimization methods often converge slowly in solving the training procedure of CRFs, and will degrade significantly for tasks with a large number of samples and features. In this paper, we propose robust CRFs (RCRFs) to simultaneously select relevant features. An optimal gradient method (OGM) is further designed to train RCRFs efficiently. Specifically, the proposed RCRFs employ the <inf>1</inf> norm of the model parameters to regularize the objective used by traditional CRFs, therefore enabling discovery of the relevant unary features and pairwise features of CRFs. In each iteration of OGM, the gradient direction is determined jointly by the current gradient together with the historical gradients, and the Lipschitz constant is leveraged to specify the proper step size. We show that an OGM can tackle the RCRF model training very efficiently, achieving the optimal convergence rate O(1/k<sup>2</sup>) (where k is the number of iterations). This convergence rate is theoretically superior to the convergence rate O(1/k) of previous first-order optimization methods. Extensive experiments performed on three practical image segmentation tasks demonstrate the efficacy of OGM in training our proposed RCRFs. Luo, Y., Liu, T., Tao, D. & Xu, C. 2015, 'Multiview matrix completion for multilabel image classification', IEEE Transactions on Image Processing, vol. 24, no. 8, pp. 2355-2368. View/Download from: Publisher's site &copy; 2015 IEEE. There is growing interest in multilabel image classification due to its critical role in web-based image analytics-based applications, such as large-scale image retrieval and browsing. Matrix completion (MC) has recently been introduced as a method for transductive (semisupervised) multilabel classification, and has several distinct advantages, including robustness to missing data and background noise in both feature and label space. However, it is limited by only considering data represented by a single-view feature, which cannot precisely characterize images containing several semantic concepts. To utilize multiple features taken from different views, we have to concatenate the different features as a long vector. However, this concatenation is prone to over-fitting and often leads to very high time complexity in MC-based image classification. Therefore, we propose to weightedly combine the MC outputs of different views, and present the multiview MC (MVMC) framework for transductive multilabel image classification. To learn the view combination weights effectively, we apply a cross-validation strategy on the labeled set. In particular, MVMC splits the labeled set into two parts, and predicts the labels of one part using the known labels of the other part. The predicted labels are then used to learn the view combination coefficients. In the learning process, we adopt the average precision (AP) loss, which is particular suitable for multilabel image classification, since the ranking-based criteria are critical for evaluating a multilabel classification system. A least squares loss formulation is also presented for the sake of efficiency, and the robustness of the algorithm based on the AP loss compared with the other losses is investigated. Experimental evaluation on two real-world data sets (PASCAL VOC' 07 and MIR Flickr) demonstrate the effectiveness of MVMC for transductive (semisupervised) multilabel image classification, and show that MVMC can e... Jiang, X., You, X., Yu, S., Tao, D., Chen, C.L.P. & Cheung, Y.M. 2015, 'Variance constrained partial least squares', Chemometrics and Intelligent Laboratory Systems, vol. 145, pp. 60-71. View/Download from: Publisher's site &copy; 2015 Elsevier B.V. Partial least squares (PLS) regression has achieved desirable performance for modeling the relationship between a set of dependent (response) variables with another set of independent (predictor) variables, especially when the sample size is small relative to the dimension of these variables. In each iteration, PLS finds two latent variables from a set of dependent and independent variables via maximizing the product of three factors: variances of the two latent variables as well as the square of the correlation between these two latent variables. In this paper, we derived the mathematical formulation of the relationship between mean square error (MSE) and these three factors. We find that MSE is not monotonous with the product of the three factors. However, the corresponding optimization problem is difficult to solve if we extract the optimal latent variables directly based on this relationship. To address these problems, a novel multilinear regression model-variance constrained partial least squares (VCPLS) is proposed. In the proposed VCPLS, we find the latent variables via maximizing the product of the variance of latent variable from dependent variables and the square of the correlation between the two latent variables, while constraining the variance of the latent variable from independent variables must be larger than a predetermined threshold. The corresponding optimization problem can be solved computational efficiently, and the latent variables extracted by VCPLS are near-optimal. Compared with classical PLS and it is variants, VCPLS can achieve lower prediction error in the sense of MSE. The experiments are conducted on three near-infrared spectroscopy (NIR) data sets. To demonstrate the applicability of our proposed VCPLS, we also conducted experiments on another data set, which has different characteristics from NIR data. Experimental results verified the superiority of our proposed VCPLS. Hong, C., Yu, J., Tao, D. & Wang, M. 2015, 'Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval', IEEE Transactions on Industrial Electronics, vol. 62, no. 6, pp. 3742-3751. View/Download from: Publisher's site &copy; 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Image-based 3-D human pose recovery is usually conducted by retrieving relevant poses with image features. However, it suffers from the high dimensionality of image features and the low efficiency of the retrieving process. Particularly for multiview data, the integration of different types of features is difficult. In this paper, a novel approach is proposed to recover 3-D human poses from silhouettes. This approach improves traditional methods by adopting multiview locality-sensitive sparse coding in the retrieving process. First, it incorporates a local similarity preserving term into the objective of sparse coding, which groups similar silhouettes to alleviate the instability of sparse codes. Second, the objective function of sparse coding is improved by integrating multiview data. The experimental results show that the retrieval error has been reduced by 20% to 50%, which demonstrate the effectiveness of the proposed method. Liu, X., Tao, D., Song, M., Zhang, L., Bu, J. & Chen, C. 2015, 'Learning to track multiple targets.', IEEE transactions on neural networks and learning systems, vol. 26, no. 5, pp. 1060-1073. Monocular multiple-object tracking is a fundamental yet under-addressed computer vision problem. In this paper, we propose a novel learning framework for tracking multiple objects by detection. First, instead of heuristically defining a tracking algorithm, we learn that a discriminative structure prediction model from labeled video data captures the interdependence of multiple influence factors. Given the joint targets state from the last time step and the observation at the current frame, the joint targets state at the current time step can then be inferred by maximizing the joint probability score. Second, our detection results benefit from tracking cues. The traditional detection algorithms need a nonmaximal suppression postprocessing to select a subset from the total detection responses as the final output and a large number of selection mistakes are induced, especially under a congested circumstance. Our method integrates both detection and tracking cues. This integration helps to decrease the postprocessing mistake risk and to improve performance in tracking. Finally, we formulate the entire model training into a convex optimization problem and estimate its parameters using the cutting plane optimization. Experiments show that our method performs effectively in a large variety of scenarios, including pedestrian tracking in crowd scenes and vehicle tracking in congested traffic. Zhang, K., Tao, D., Gao, X., Li, X. & Xiong, Z. 2015, 'Learning multiple linear mappings for efficient single image super-resolution', IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 846-861. View/Download from: Publisher's site &copy; 1992-2012 IEEE. Example learning-based superresolution (SR) algorithms show promise for restoring a high-resolution (HR) image from a single low-resolution (LR) input. The most popular approaches, however, are either time-or space-intensive, which limits their practical applications in many resource-limited settings. In this paper, we propose a novel computationally efficient single image SR method that learns multiple linear mappings (MLM) to directly transform LR feature subspaces into HR subspaces. In particular, we first partition the large nonlinear feature space of LR images into a cluster of linear subspaces. Multiple LR subdictionaries are then learned, followed by inferring the corresponding HR subdictionaries based on the assumption that the LR-HR features share the same representation coefficients. We establish MLM from the input LR features to the desired HR outputs in order to achieve fast yet stable SR recovery. Furthermore, in order to suppress displeasing artifacts generated by the MLM-based method, we apply a fast nonlocal means algorithm to construct a simple yet effective similarity-based regularization term for SR enhancement. Experimental results indicate that our approach is both quantitatively and qualitatively superior to other application-oriented SR methods, while maintaining relatively low time and space complexity. Gao, Y., Shi, M., Tao, D. & Xu, C. 2015, 'Database saliency for fast image retrieval', IEEE Transactions on Multimedia, vol. 17, no. 3, pp. 359-369. View/Download from: Publisher's site &copy; 2014 IEEE. The bag-of-visual-words (BoW) model is effective for representing images and videos in many computer vision problems, and achieves promising performance in image retrieval. Nevertheless, the level of retrieval efficiency in a large-scale database is not acceptable for practical usage. Considering that the relevant images in the database of a given query are more likely to be distinctive than ambiguous, this paper defines "database saliency" as the distinctiveness score calculated for every image to measure its overall "saliency" in the database. By taking advantage of database saliency, we propose a saliency-inspired fast image retrieval scheme, S-sim, which significantly improves efficiency while retains state-of-the-art accuracy in image retrieval. There are two stages in S-sim: the bottom-up saliency mechanism computes the database saliency value of each image by hierarchically decomposing a posterior probability into local patches and visual words, the concurrent information of visual words is then bottom-up propagated to estimate the distinctiveness, and the top-down saliency mechanism discriminatively expands the query via a very low-dimensional linear SVM trained on the top-ranked images after initial search, ranking images are then sorted on their distances to the decision boundary as well as the database saliency values. We comprehensively evaluate S-sim on common retrieval benchmarks, e.g., Oxford and Paris datasets. Thorough experiments suggest that, because of the offline database saliency computation and online low-dimensional SVM, our approach significantly speeds up online retrieval and outperforms the state-of-the-art BoW-based image retrieval schemes. Ding, C. & Tao, D. 2015, 'Robust Face Recognition via Multimodal Deep Face Representation', IEEE Transactions on Multimedia, vol. 17, no. 11, pp. 2049-2058. View/Download from: UTS OPUS or Publisher's site &copy; 2015 IEEE. Face images appearing in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All of the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefitting from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set. He, X., Luo, S., Tao, D., Xu, C. & Yang, J. 2015, 'The 21st International Conference on MultiMedia Modeling', IEEE Multimedia, vol. 22, no. 2, pp. 86-88. View/Download from: UTS OPUS or Publisher's site &copy; 2015 IEEE. This report on The 21st International Conference on MultiMedia Modeling provides an overview of the best papers and keynote presentations. It also reviews the special sessions on Personal (Big) Data Modeling for Information Access and Retrieval; Social Geo-Media Analytics and Retrieval; and Image or Video Processing, Semantic Analysis, and Understanding. Dong, Y., Tao, D. & Li, X. 2015, 'Nonnegative multiresolution representation-based texture image classification', ACM Transactions on Intelligent Systems and Technology, vol. 7, no. 1. View/Download from: Publisher's site Copyright &copy; 2015 ACM. Effective representation of image texture is important for an image-classification task. Statistical modelling in wavelet domains has been widely used to image texture representation. However, due to the intraclass complexity and interclass diversity of textures, it is hard to use a predefined probability distribution function to fit adaptively all wavelet subband coefficients of different textures. In this article, we propose a novel modelling approach, Heterogeneous and Incrementally Generated Histogram (HIGH), to indirectly model the wavelet coefficients by use of four local features in wavelet subbands. By concatenating all the HIGHs in allwavelet subbands of a texture, we can construct a nonnegative multiresolution vector (NMV) to represent a texture image. Considering the NMV's high dimensionality and nonnegativity, we further propose a Hessian regularized discriminative nonnegative matrix factorization to compute a low-dimensional basis of the linear subspace of NMVs. Finally, we present a texture classification approach by projecting NMVs on the lowdimensional basis. Experimental results show that our proposed texture classification method outperforms seven representative approaches. Wang, Y., Song, M., Tao, D., Rui, Y., Bu, J., Tsoi, A.C., Zhuo, S. & Tan, P. 2015, 'Where2Stand: A human position recommendation system for souvenir photography', ACM Transactions on Intelligent Systems and Technology, vol. 7, no. 1. View/Download from: Publisher's site &copy; 2015 ACM 2157-6904/2015/09-ART915.00. People often take photographs at tourist sites and these pictures usually have two main elements: a person in the foreground and scenery in the background. This type of "souvenir photo" is one of the most common photos clicked by tourists. Although algorithms that aid a user-photographer in taking a well-composed picture of a scene exist [Ni et al. 2013], few studies have addressed the issue of properly positioning human subjects in photographs. In photography, the common guidelines of composing portrait images exist. However, these rules usually do not consider the background scene. Therefore, in this article, we investigate human-scenery positional relationships and construct a photographic assistance system to optimize the position of human subjects in a given background scene, thereby assisting the user in capturing high-quality souvenir photos. We collect thousands of well-composed portrait photographs to learn human-scenery aesthetic composition rules. In addition, we define a set of negative rules to exclude undesirable compositions. Recommendation results are achieved by combining the first learned positive rule with our proposed negative rules. We implement the proposed system on an Android platform in a smartphone. The system demonstrates its efficacy by producing well-composed souvenir photos.
Mei, X., Hong, Z., Prokhorov, D. & Tao, D. 2015, 'Robust Multitask Multiview Tracking in Videos', IEEE Transactions on Neural Networks and Learning Systems.
Various sparse-representation-based methods have been proposed to solve tracking problems, and most of them employ least squares (LSs) criteria to learn the sparse representation. In many tracking scenarios, traditional LS-based methods may not perform well owing to the presence of heavy-tailed noise. In this paper, we present a tracking approach using an approximate least absolute deviation (LAD)-based multitask multiview sparse learning method to enjoy robustness of LAD and take advantage of multiple types of visual features, such as intensity, color, and texture. The proposed method is integrated in a particle filter framework, where learning the sparse representation for each view of the single particle is regarded as an individual task. The underlying relationship between tasks across different views and different particles is jointly exploited in a unified robust multitask formulation based on LAD. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components that enable a more robust and accurate approximation. We show that the proposed formulation can be effectively approximated by Nesterov's smoothing method and efficiently solved using the accelerated proximal gradient method. The presented tracker is implemented using four types of features and is tested on numerous synthetic sequences and real-world video sequences, including the CVPR2013 tracking benchmark and ALOV$++$ data set. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared with several state-of-the-art trackers.
Hussain, A., Tao, D., Wu, J. & Zhao, D. 2015, 'Computational Intelligence for Changing Environments [Guest Editorial]', IEEE Computational Intelligence Magazine, vol. 10, no. 4, pp. 10-11.
Wang, B., Gao, X., Li, J., Li, X. & Tao, D. 2015, 'A level set method with shape priors by using locality preserving projections', Neurocomputing, vol. 170, pp. 188-200.
&copy; 2015 Elsevier B.V. A novel level set method (LSM) with the constraint of shape priors is proposed to implement a selective image segmentation. Firstly, the shape priors are aligned by using image moment to deprive the spatial related information. Secondly, the aligned shape priors are projected into the subspace expanded by using locality preserving projection to measure the similarity between the shapes. Finally, a new energy functional is built by combing data-driven and shape-driven energy items to implement a selective image segmentation method. We assess the proposed method and some representative LSMs on the synthetic, medical and natural images, the results suggest that the proposed one is superior to the pure data-driven LSMs and the representative LSMs with shape priors.
Liu, T. & Tao, D. 2015, 'On the Performance of Manhattan Nonnegative Matrix Factorization', IEEE Transactions on Neural Networks and Learning Systems.
Extracting low-rank and sparse structures from matrices has been extensively studied in machine learning, compressed sensing, and conventional signal processing, and has been widely applied to recommendation systems, image reconstruction, visual analytics, and brain signal processing. Manhattan nonnegative matrix factorization (MahNMF) is an extension of the conventional NMF, which models the heavy-tailed Laplacian noise by minimizing the Manhattan distance between a nonnegative matrix X and the product of two nonnegative low-rank factor matrices. Fast algorithms have been developed to restore the low-rank and sparse structures of X in the MahNMF. In this paper, we study the statistical performance of the MahNMF in the frame of the statistical learning theory. We decompose the expected reconstruction error of the MahNMF into the estimation error and the approximation error. The estimation error is bounded by the generalization error bounds of the MahNMF, while the approximation error is analyzed using the asymptotic results of the minimum distortion of vector quantization. The generalization error bound is valuable for determining the size of the training sample needed to guarantee a desirable upper bound for the defect between the expected and empirical reconstruction errors. Statistical performance analysis shows how the reduced dimensionality affects the estimation and approximation errors. Our framework can also be used for analyzing the performance of the NMF.
Li, Z., Gong, D., Li, X. & Tao, D. 2015, 'Learning compact feature descriptor and adaptive matching framework for face recognition', IEEE Transactions on Image Processing, vol. 24, no. 9, pp. 2736-2745.
&copy; 2015 IEEE.Dense feature extraction is becoming increasingly popular in face recognition tasks. Systems based on this approach have demonstrated impressive performance in a range of challenging scenarios. However, improvements in discriminative power come at a computational cost and with a risk of over-fitting. In this paper, we propose a new approach to dense feature extraction for face recognition, which consists of two steps. First, an encoding scheme is devised that compresses high-dimensional dense features into a compact representation by maximizing the intrauser correlation. Second, we develop an adaptive feature matching algorithm for effective classification. This matching method, in contrast to the previous methods, constructs and chooses a small subset of training samples for adaptive matching, resulting in further performance gains. Experiments using several challenging face databases, including labeled Faces in the Wild data set, Morph Album 2, CUHK optical-infrared, and FERET, demonstrate that the proposed approach consistently outperforms the current state of the art.
Cheng, J., Yin, F., Wong, D.W., Tao, D. & Liu, J. 2015, 'Sparse dissimilarity-constrained coding for glaucoma screening.', IEEE transactions on bio-medical engineering, vol. 62, no. 5, pp. 1395-1403.
Glaucoma is an irreversible chronic eye disease that leads to vision loss. As it can be slowed down through treatment, detecting the disease in time is important. However, many patients are unaware of the disease because it progresses slowly without easily noticeable symptoms. Currently, there is no effective method for low-cost population-based glaucoma detection or screening. Recent studies have shown that automated optic nerve head assessment from 2-D retinal fundus images is promising for low-cost glaucoma screening. In this paper, we propose a method for cup to disc ratio (CDR) assessment using 2-D retinal fundus images.In the proposed method, the optic disc is first segmented and reconstructed using a novel sparse dissimilarity-constrained coding (SDC) approach which considers both the dissimilarity constraint and the sparsity constraint from a set of reference discs with known CDRs. Subsequently, the reconstruction coefficients from the SDC are used to compute the CDR for the testing disc.The proposed method has been tested for CDR assessment in a database of 650 images with CDRs manually measured by trained professionals previously. Experimental results show an average CDR error of 0.064 and correlation coefficient of 0.67 compared with the manual CDRs, better than the state-of-the-art methods. Our proposed method has also been tested for glaucoma screening. The method achieves areas under curve of 0.83 and 0.88 on datasets of 650 and 1676 images, respectively, outperforming other methods.The proposed method achieves good accuracy for glaucoma detection.The method has a great potential to be used for large-scale population-based glaucoma screening.
Liu, L. & Tao, D. 2015, 'Review on recent method of solving lasso problem', Shuju Caiji Yu Chuli/Journal of Data Acquisition and Processing, vol. 30, no. 1, pp. 35-46.
&copy;2015 by Journal of Data Acquisition and Processing. With the increase of big data, solving Lasso problem becomes top research field. Past methods could not satisfy the time and efficient problem under big data situation. In order to deal with difficulty of computation and storage bringing from huge-scale and high-dimension data, this paper analyze the recent Lasso algorithm from three aspects: one-order method, random, and parallel and distributed computation, which play an important roles in dealing with huge-scale data problem. Based on those three aspects, this paper introduces and analyzes the novel algorithms: gradient descent method, Alternating Direction method of multipliers (ADMM), and coordinate descent method. Gradient descent method combine one-order method and Nesterov's accelerate and smoothing method; ADMM put the random algorithm into the recent research; Coordinate descent make use of the character of coordinate system incorporation one-order method, random, and parallel and distributed computation. Moreover, this paper makes a deep analysis and research from primal and dual objective function.
Wang, S., Tao, D. & Yang, J. 2015, 'Relative Attribute SVM+ Learning for Age Estimation', IEEE Transactions on Cybernetics.
When estimating age, human experts can provide privileged information that encodes the facial attributes of aging, such as smoothness, face shape, face acne, wrinkles, and bags under-eyes. In automatic age estimation, privileged information is unavailable to test images. To overcome this problem, we hypothesize that asymmetric information can be explored and exploited to improve the generalizability of the trained model. Using the {learning using privileged information} (LUPI) framework, we tested this hypothesis by carefully defining relative attributes for support vector machine (SVM+) to improve the performance of age estimation. We term this specific setting as relative attribute SVM+ (raSVM+), in which the privileged information enables separation of outliers from inliers at the training stage and effectively manipulates slack variables and age determination errors during model training, and thus guides the trained predictor toward a generalizable solution. Experimentally, the superiority of raSVM+ was confirmed by comparing it with state-of-the-art algorithms on the face and gesture recognition research network (FG-NET) and craniofacial longitudinal morphological face aging databases. raSVM+ is a promising development that improves age estimation, with the mean absolute error reaching 4.07 on FG-NET.
Yu, J., Tao, D., Wang, M. & Rui, Y. 2015, 'Learning to Rank Using User Clicks and Visual Features for Image Retrieval', IEEE Transactions on Cybernetics, vol. 45, no. 4, pp. 767-779.
&copy; 2013 IEEE.The inconsistency between textual features and visual contents can cause poor image search results. To solve this problem, click features, which are more reliable than textual information in justifying the relevance between a query and clicked images, are adopted in image ranking model. However, the existing ranking model cannot integrate visual features, which are efficient in refining the click-based search results. In this paper, we propose a novel ranking model based on the learning to rank framework. Visual features and click features are simultaneously utilized to obtain the ranking model. Specifically, the proposed approach is based on large margin structured output learning and the visual consistency is integrated with the click features through a hypergraph regularizer term. In accordance with the fast alternating linearization method, we design a novel algorithm to optimize the objective function. This algorithm alternately minimizes two different approximations of the original objective function by keeping one function unchanged and linearizing the other. We conduct experiments on a large-scale dataset collected from the Microsoft Bing image search engine, and the results demonstrate that the proposed learning to rank models based on visual features and user clicks outperforms state-of-the-art algorithms.
Shi, M., Sun, X., Tao, D., Xu, C., Baciu, G. & Liu, H. 2015, 'Exploring spatial correlation for visual object retrieval', ACM Transactions on Intelligent Systems and Technology, vol. 6, no. 2.
&copy; 2015 ACM. Bag-of-visual-words (BOVW)-based image representation has received intense attention in recent years and has improved content-based image retrieval (CBIR) significantly. BOVW does not consider the spatial correlation between visual words in natural images and thus biases the generated visual words toward noise when the corresponding visual features are not stable. This article outlines the construction of a visual word co-occurrence matrix by exploring visual word co-occurrence extracted from small affine-invariant regions in a large collection of natural images. Based on this co-occurrence matrix, we first present a novel high-order predictor to accelerate the generation of spatially correlated visual words and a penalty tree (PTree) to continue generating the words after the prediction. Subsequently, we propose two methods of co-occurrence weighting similarity measure for image ranking: Co-Cosine and Co-TFIDF. These two new schemes downweight the contributions of the words that are less discriminative because of frequent co-occurrences with other words.We conduct experiments on Oxford and Paris Building datasets, in which the ImageNet dataset is used to implement a large-scale evaluation. Cross-dataset evaluations between the Oxford and Paris datasets and Oxford and Holidays datasets are also provided. Thorough experimental results suggest that our method outperforms the state of the art without adding much additional cost to the BOVW model.
Tian, X., Yang, L., Lu, Y., Tian, Q. & Tao, D. 2015, 'Image Search Reranking With Hierarchical Topic Awareness', IEEE Transactions on Cybernetics, vol. 45, no. 10, pp. 2177-2189.
With much attention from both academia and industrial communities, visual search reranking has recently been proposed to refine image search results obtained from text-based image search engines. Most of the traditional reranking methods cannot capture both relevance and diversity of the search results at the same time. Or they ignore the hierarchical topic structure of search result. Each topic is treated equally and independently. However, in real applications, images returned for certain queries are naturally in hierarchical organization, rather than simple parallel relation. In this paper, a new reranking method 'topic-aware reranking (TARerank) is proposed. TARerank describes the hierarchical topic structure of search results in one model, and seamlessly captures both relevance and diversity of the image search results simultaneously. Through a structured learning framework, relevance and diversity are modeled in TARerank by a set of carefully designed features, and then the model is learned from human-labeled training samples. The learned model is expected to predict reranking results with high relevance and diversity for testing queries. To verify the effectiveness of the proposed method, we collect an image search dataset and conduct comparison experiments on it. The experimental results demonstrate that the proposed TARerank outperforms the existing relevance-based and diversified reranking methods.
Liu, X., Song, M., Tao, D., Liu, Z., Zhang, L., Chen, C. & Bu, J. 2015, 'Random forest construction with robust semisupervised node splitting.', IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol. 24, no. 1, pp. 471-483.
Random forest (RF) is a very important classifier with applications in various machine learning tasks, but its promising performance heavily relies on the size of labeled training data. In this paper, we investigate constructing of RFs with a small size of labeled data and find that the performance bottleneck is located in the node splitting procedures; hence, existing solutions fail to properly partition the feature space if there are insufficient training data. To achieve robust node splitting with insufficient data, we present semisupervised splitting to overcome this limitation by splitting nodes with the guidance of both labeled and abundant unlabeled data. In particular, an accurate quality measure of node splitting is obtained by carrying out the kernel-based density estimation, whereby a multiclass version of asymptotic mean integrated squared error criterion is proposed to adaptively select the optimal bandwidth of the kernel. To avoid the curse of dimensionality, we project the data points from the original high-dimensional feature space onto a low-dimensional subspace before estimation. A unified optimization framework is proposed to select a coupled pair of subspace and separating hyperplane such that the smoothness of the subspace and the quality of the splitting are guaranteed simultaneously. Our algorithm efficiently avoids overfitting caused by bad initialization and local maxima when compared with conventional margin maximization-based semisupervised methods. We demonstrate the effectiveness of the proposed algorithm by comparing it with state-of-the-art supervised and semisupervised algorithms for typical computer vision applications, such as object categorization, face recognition, and image segmentation, on publicly available data sets.
Huang, Q., Wang, T., Tao, D. & Li, X. 2015, 'Biclustering Learning of Trading Rules', IEEE Transactions on Cybernetics, vol. 19, no. 5, pp. 644-658.
Zhang, L., Zhang, L., Tao, D. & Du, B. 2015, 'A sparse and discriminative tensor to vector projection for human gait feature representation', Signal Processing, vol. 106, pp. 245-252.
In this paper, we introduce an efficient tensor to vector projection algorithm for human gait feature representation and recognition. The proposed approach is based on the multi-dimensional or tensor signal processing technology, which finds a low-dimensional tensor subspace of original input gait sequence tensors while most of the data variation has been well captured. In order to further enhance the class separability and avoid the potential overfitting, we adopt a discriminative locality preserving projection with sparse regularization to transform the refined tensor data to the final vector feature representation for subsequent recognition. Numerous experiments are carried out to evaluate the effectiveness of the proposed sparse and discriminative tensor to vector projection algorithm, and the proposed method achieves good performance for human gait recognition using the sequences from the University of South Florida (USF) HumanID Database. &copy; 2014 Elsevier B.V.
Xu, C., Tao, D. & Xu, C. 2015, 'Multi-View Learning with Incomplete Views', IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5812-5825.
&copy; 2015 IEEE. One underlying assumption of the conventional multi-view learning algorithms is that all examples can be successfully observed on all the views. However, due to various failures or faults in collecting and pre-processing the data on different views, we are more likely to be faced with an incomplete-view setting, where an example could be missing its representation on one view (i.e., missing view) or could be only partially observed on that view (i.e., missing variables). Low-rank assumption used to be effective for recovering the random missing variables of features, but it is disabled by concentrated missing variables and has no effect on missing views. This paper suggests that the key to handling the incomplete-view problem is to exploit the connections between multiple views, enabling the incomplete views to be restored with the help of the complete views. We propose an effective algorithm to accomplish multi-view learning with incomplete views by assuming that different views are generated from a shared subspace. To handle the large-scale problem and obtain fast convergence, we investigate a successive over-relaxation method to solve the objective function. Convergence of the optimization technique is theoretically analyzed. The experimental results on toy data and real-world data sets suggest that studying the incomplete-view problem in multi-view learning is significant and that the proposed algorithm can effectively handle the incomplete views in different applications.
Miao, J., Xu, X., Qiu, S., Qing, C. & Tao, D. 2015, 'Temporal Variance Analysis for Action Recognition', IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5904-5915.
&copy; 2015 IEEE. Slow feature analysis (SFA) extracts slowly varying signals from input data and has been used to model complex cells in the primary visual cortex (V1). It transmits information to both ventral and dorsal pathways to process appearance and motion information, respectively. However, SFA only uses slowly varying features for local feature extraction, because they represent appearance information more effectively than motion information. To better utilize temporal information, we propose temporal variance analysis (TVA) as a generalization of SFA. TVA learns a linear transformation matrix that projects multidimensional temporal data to temporal components with temporal variance. Inspired by the function of V1, we learn receptive fields by TVA and apply convolution and pooling to extract local features. Embedded in the improved dense trajectory framework, TVA for action recognition is proposed to: 1) extract appearance and motion features from gray using slow and fast filters, respectively; 2) extract additional motion features using slow filters from horizontal and vertical optical flows; and 3) separately encode extracted local features with different temporal variances and concatenate all the encoded features as final features. We evaluate the proposed TVA features on several challenging data sets and show that both slow and fast features are useful in the low-level feature extraction. Experimental results show that the proposed TVA features outperform the conventional histogram-based features, and excellent results can be achieved by combining all TVA features.
Qiao, M., Bian, W., Xu, R.Y.D. & Tao, D. 2015, 'Diversified Hidden Markov Models for Sequential Labeling', IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 11, pp. 2947-2960.
&copy; 2015 IEEE. Labeling of sequential data is a prevalent meta-problem for a wide range of real world applications. While the first-order Hidden Markov Models (HMM) provides a fundamental approach for unsupervised sequential labeling, the basic model does not show satisfying performance when it is directly applied to real world problems, such as part-of-speech tagging (PoS tagging) and optical character recognition (OCR). Aiming at improving performance, important extensions of HMM have been proposed in the literatures. One of the common key features in these extensions is the incorporation of proper prior information. In this paper, we propose a new extension of HMM, termed diversified Hidden Markov Models (dHMM), which utilizes a diversity-encouraging prior over the state-transition probabilities and thus facilitates more dynamic sequential labellings. Specifically, the diversity is modeled by a continuous determinantal point process prior, which we apply to both unsupervised and supervised scenarios. Learning and inference algorithms for dHMM are derived. Empirical evaluations on benchmark datasets for unsupervised PoS tagging and supervised OCR confirmed the effectiveness of dHMM, with competitive performance to the state-of-the-art.
Zhao, L., Gao, X., Tao, D. & Li, X. 2015, 'Learning a tracking and estimation integrated graphical model for human pose tracking', IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 12, pp. 3176-3186.
&copy; 2015 IEEE. We investigate the tracking of 2-D human poses in a video stream to determine the spatial configuration of body parts in each frame, but this is not a trivial task because people may wear different kinds of clothing and may move very quickly and unpredictably. The technology of pose estimation is typically applied, but it ignores the temporal context and cannot provide smooth, reliable tracking results. Therefore, we develop a tracking and estimation integrated model (TEIM) to fully exploit temporal information by integrating pose estimation with visual tracking. However, joint parsing of multiple articulated parts over time is difficult, because a full model with edges capturing all pairwise relationships within and between frames is loopy and intractable. In previous models, approximate inference was usually resorted to, but it cannot promise good results and the computational cost is large. We overcome these problems by exploring the idea of divide and conquer, which decomposes the full model into two much simpler tractable submodels. In addition, a novel two-step iteration strategy is proposed to efficiently conquer the joint parsing problem. Algorithmically, we design TEIM very carefully so that: 1) it enables pose estimation and visual tracking to compensate for each other to achieve desirable tracking results; 2) it is able to deal with the problem of tracking loss; and 3) it only needs past information and is capable of tracking online. Experiments are conducted on two public data sets in the wild with ground truth layout annotations, and the experimental results indicate the effectiveness of the proposed TEIM framework.
Xu, C., Tao, D. & Xu, C. 2015, 'Multi-View Intact Space Learning', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 12, pp. 2531-2544.
&copy; 2015 IEEE. It is practical to assume that an individual view is unlikely to be sufficient for effective multi-view learning. Therefore, integration of multi-view information is both valuable and necessary. In this paper, we propose the Multi-view Intact Space Learning (MISL) algorithm, which integrates the encoded complementary information in multiple views to discover a latent intact representation of the data. Even though each view on its own is insufficient, we show theoretically that by combing multiple views we can obtain abundant information for latent intact space learning. Employing the Cauchy loss (a technique used in statistical learning) as the error measurement strengthens robustness to outliers. We propose a new definition of multi-view stability and then derive the generalization error bound based on multi-view stability and Rademacher complexity, and show that the complementarity between multiple views is beneficial for the stability and generalization. MISL is efficiently optimized using a novel Iteratively Reweight Residuals (IRR) technique, whose convergence is theoretically analyzed. Experiments on synthetic data and real-world datasets demonstrate that MISL is an effective and promising algorithm for practical applications.
Hong, C., Yu, J., Wan, J., Tao, D. & Wang, M. 2015, 'Multimodal Deep Autoencoder for Human Pose Recovery', IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5659-5670.
&copy; 2015 IEEE. Video-based human pose recovery is usually conducted by retrieving relevant poses using image features. In the retrieving process, the mapping between 2D images and 3D poses is assumed to be linear in most of the traditional methods. However, their relationships are inherently non-linear, which limits recovery performance of these methods. In this paper, we propose a novel pose recovery method using non-linear mapping with multi-layered deep neural network. It is based on feature extraction with multimodal fusion and back-propagation deep learning. In multimodal fusion, we construct hypergraph Laplacian with low-rank representation. In this way, we obtain a unified feature description by standard eigen-decomposition of the hypergraph Laplacian matrix. In back-propagation deep learning, we learn a non-linear mapping from 2D images to 3D poses with parameter fine-tuning. The experimental results on three data sets show that the recovery error has been reduced by 20%-25%, which demonstrates the effectiveness of the proposed method.
Lu, Y., Xie, F., Liu, T., Jiang, Z. & Tao, D. 2015, 'No Reference Quality Assessment for Multiply-Distorted Images Based on an Improved Bag-of-Words Model', IEEE Signal Processing Letters, vol. 22, no. 10, pp. 1811-1815.
&copy; 2015 IEEE. Multiple distortion assessment is a big challenge in image quality assessment (IQA). In this letter, a no reference IQA model for multiply-distorted images is proposed. The features, which are sensitive to each distortion type even in the presence of other distortions, are first selected from three kinds of NSS features. An improved Bag-of-Words (BoW) model is then applied to encode the selected features. Lastly, a simple yet effective linear combination is used to map the image features to the quality score. The combination weights are obtained through lasso regression. A series of experiments show that the feature selection strategy and the improved BoW model are effective in improving the accuracy of quality prediction for multiple distortion IQA. Compared with other algorithms, the proposed method delivers the best result for multiple distortion IQA.
Li, X., He, H., Wang, R. & Tao, D. 2015, 'Single Image Superresolution via Directional Group Sparsity and Directional Features', IEEE Transactions on Image Processing, vol. 24, no. 9, pp. 2874-2888.
&copy; 2015 IEEE. Single image superresolution (SR) aims to construct a high-resolution version from a single low-resolution (LR) image. The SR reconstruction is challenging because of the missing details in the given LR image. Thus, it is critical to explore and exploit effective prior knowledge for boosting the reconstruction performance. In this paper, we propose a novel SR method by exploiting both the directional group sparsity of the image gradients and the directional features in similarity weight estimation. The proposed SR approach is based on two observations: 1) most of the sharp edges are oriented in a limited number of directions and 2) an image pixel can be estimated by the weighted averaging of its neighbors. In consideration of these observations, we apply the curvelet transform to extract directional features which are then used for region selection and weight estimation. A combined total variation regularizer is presented which assumes that the gradients in natural images have a straightforward group sparsity structure. In addition, a directional nonlocal means regularization term takes pixel values and directional information into account to suppress unwanted artifacts. By assembling the designed regularization terms, we solve the SR problem of an energy function with minimal reconstruction error by applying a framework of templates for first-order conic solvers. The thorough quantitative and qualitative results in terms of peak signal-to-noise ratio, structural similarity, information fidelity criterion, and preference matrix demonstrate that the proposed approach achieves higher quality SR reconstruction than the state-of-the-art algorithms.
Ou, W., You, X., Tao, D., Zhang, P., Tang, Y. & Zhu, Z. 2014, 'Robust face recognition via occlusion dictionary learning', Pattern Recognition, vol. 47, no. 4, pp. 1559-1572.
Sparse representation based classification (SRC) has recently been proposed for robust face recognition. To deal with occlusion, SRC introduces an identity matrix as an occlusion dictionary on the assumption that the occlusion has sparse representation in this dictionary. However, the results show that SRC's use of this occlusion dictionary is not nearly as robust to large occlusion as it is to random pixel corruption. In addition, the identity matrix renders the expanded dictionary large, which results in expensive computation. In this paper, we present a novel method, namely structured sparse representation based classification (SSRC), for face recognition with occlusion. A novel structured dictionary learning method is proposed to learn an occlusion dictionary from the data instead of an identity matrix. Specifically, a mutual incoherence of dictionaries regularization term is incorporated into the dictionary learning objective function which encourages the occlusion dictionary to be as independent as possible of the training sample dictionary. So that the occlusion can then be sparsely represented by the linear combination of the atoms from the learned occlusion dictionary and effectively separated from the occluded face image. The classification can thus be efficiently carried out on the recovered non-occluded face images and the size of the expanded dictionary is also much smaller than that used in SRC. The extensive experiments demonstrate that the proposed method achieves better results than the existing sparse representation based face recognition methods, especially in dealing with large region contiguous occlusion and severe illumination variation, while the computational cost is much lower.
Yu, J., Gao, X., Tao, D., Li, X. & Zhang, K. 2014, 'A Unified Learning Framework for Single Image Super-Resolution', IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 4, pp. 780-792.
It has been widely acknowledged that learning- and reconstruction-based super-resolution (SR) methods are effective to generate a high-resolution (HR) image from a single low-resolution (LR) input. However, learning-based methods are prone to introduce unexpected details into resultant HR images. Although reconstruction-based methods do not generate obvious artifacts, they tend to blur fine details and end up with unnatural results. In this paper, we propose a new SR framework that seamlessly integrates learning- and reconstruction-based methods for single image SR to: 1) avoid unexpected artifacts introduced by learning-based SR and 2) restore the missing high-frequency details smoothed by reconstruction-based SR. This integrated framework learns a single dictionary from the LR input instead of from external images to hallucinate details, embeds nonlocal means filter in the reconstruction-based SR to enhance edges and suppress artifacts, and gradually magnifies the LR input to the desired high-quality SR result. We demonstrate both visually and quantitatively that the proposed framework produces better results than previous methods from the literature.
Deng, C., Ji, R., Tao, D., Gao, X. & Li, X. 2014, 'Weakly Supervised Multi-Graph Learning for Robust Image Reranking', IEEE Transactions On Multimedia, vol. 16, no. 3, pp. 785-795.
Visual reranking has been widely deployed to refine the traditional text-based image retrieval. Its current trend is to combine the retrieval results from various visual features to boost reranking precision and scalability. And its prominent challenge is how to effectively exploit the complementary property of different features. Another significant issue raises from the noisy instances, from manual or automatic labels, which makes the exploration of such complementary property difficult. This paper proposes a novel image reranking by introducing a new Co-Regularized Multi- Graph Learning (Co-RMGL) framework, in which intra-graph and inter-graph constraints are integrated to simultaneously encode the similarity in a single graph and the consistency across multiple graphs. To deal with the noisy instances, weakly supervised learning via co-occurred visual attribute is utilized to select a set of graph anchors to guide multiple graphs alignment and fusion, and to filter out those pseudo labeling instances to highlight the strength of individual features. After that, a learned edge weighting matrix from a fused graph is used to reorder the retrieval results. We evaluate our approach on four popular image retrieval datasets and demonstrate a significant improvement over state-of-the-art methods.
Ji, R., Gao, Y., Hong, R., Liu, Q., Tao, D. & Li, X. 2014, 'Spectral-Spatial Constraint Hyperspectral Image Classification', IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 3, pp. 1811-1824.
Hyperspectral image classification has attracted extensive research efforts in the recent decade. The main difficulty lies in the few labeled samples versus the high dimensional features. To this end, it is a fundamental step to explore the relationship among different pixels in hyperspectral image classification, toward jointly handing both the lack of label and high dimensionality problems. In the hyperspectral images, the classification task can be benefited from the spatial layout information. In this paper, we propose a hyperspectral image classification method to address both the pixel spectral and spatial constraints, in which the relationship among pixels is formulated in a hypergraph structure. In the constructed hypergraph, each vertex denotes a pixel in the hyperspectral image. And the hyperedges are constructed from both the distance between pixels in the feature space and the spatial locations of pixels. More specifically, a feature-based hyperedge is generated by using distance among pixels, where each pixel is connected with its K nearest neighbors in the feature space. Second, a spatial-based hyperedge is generated to model the layout among pixels by linking where each pixel is linked with its spatial local neighbors. Both the learning on the combinational hypergraph is conducted by jointly investigating the image feature and the spatial layout of pixels to seek their joint optimal partitions. Experiments on four data sets are performed to evaluate the effectiveness and and efficiency of the proposed method. Comparisons to the state-of-the-art methods demonstrate the superiority of the proposed method in the hyperspectral image classification.
Liu, W., Tao, D., Cheng, J. & Tang, Y. 2014, 'Multiview Hessian discriminative sparse coding for image annotation', Computer Vision and Image Understanding, vol. 118, pp. 50-60.
Sparse coding represents a signal sparsely by using an overcomplete dictionary, and obtains promising performance in practical computer vision applications, especially for signal restoration tasks such as image denoising and image inpainting. In recent years, many discriminative sparse coding algorithms have been developed for classification problems, but they cannot naturally handle visual data represented by multiview features. In addition, existing sparse coding algorithms use graph Laplacian to model the local geometry of the data distribution. It has been identified that Laplacian regularization biases the solution towards a constant function which possibly leads to poor extrapolating power. In this paper, we present multiview Hessian discriminative sparse coding (mHDSC) which seamlessly integrates Hessian regularization with discriminative sparse coding for multiview learning problems. In particular, mHDSC exploits Hessian regularization to steer the solution which varies smoothly along geodesics in the manifold, and treats the label information as an additional view of feature for incorporating the discriminative power for image annotation. We conduct extensive experiments on PASCAL VOC07 dataset and demonstrate the effectiveness of mHDSC for image annotation.
Shen, H., Li, X., Zhang, L., Tao, D. & Zeng, C. 2014, 'Compressed Sensing-Based Inpainting of Aqua Moderate Resolution Imaging Spectroradiometer Band 6 Using Adaptive Spectrum-Weighted Sparse Bayesian Dictionary Learning', IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 2, pp. 894-906.
Because of malfunction or noise in 15 out of the 20 detectors, band 6 (1.628-1.652 &micro;m) of the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor aboard the Aqua satellite contains large areas of dead pixel stripes. Therefore, the corresponding high-level products of MODIS are corrupted by this periodic phenomenon. This paper proposes an improved Bayesian dictionary learning algorithm based on the burgeoning compressed sensing theory to solve this problem. Compared with other state-of-the-art methods, the proposed method can adaptively exploit the spectral relations of band 6 and other spectra. The performance of the proposed method is demonstrated by experiments on both simulated Terra and real Aqua images.
Zhang, L., Zhang, L., Tao, D. & Huang, X. 2014, 'Sparse Transfer Manifold Embedding for Hyperspectral Target Detection', IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 2, pp. 1030-1043.
Target detection is one of the most important applications in hyperspectral remote sensing image analysis. However, the state-of-the-art machine-learning-based algorithms for hyperspectral target detection cannot perform well when the training samples, especially for the target samples, are limited in number. This is because the training data and test data are drawn from different distributions in practice and given a small-size training set in a high-dimensional space, traditional learning models without the sparse constraint face the over-fitting problem. Therefore, in this paper, we introduce a novel feature extraction algorithm named sparse transfer manifold embedding (STME), which can effectively and efficiently encode the discriminative information from limited training data and the sample distribution information from unlimited test data to find a low-dimensional feature embedding by a sparse transformation. Technically speaking, STME is particularly designed for hyperspectral target detection by introducing sparse and transfer constraints. As a result of this, it can avoid over-fitting when only very few training samples are provided. The proposed feature extraction algorithm was applied to extensive experiments to detect targets of interest, and STME showed the outstanding detection performance on most of the hyperspectral datasets.
Zhang, L., Zhang, L., Tao, D., Huang, X. & Du, B. 2014, 'Hyperspectral Remote Sensing Image Subpixel Target Detection Based on Supervised Metric Learning', IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 8, pp. 4955-4965.
he detection and identification of target pixels such as certain minerals and man-made objects from hyperspectral remote sensing images is of great interest for both civilian and military applications. However, due to the restriction in the spatial resolution of most airborne or satellite hyperspectral sensors, the targets often appear as subpixels in the hyperspectral image (HSI). The observed spectral feature of the desired target pixel (positive sample) is therefore a mixed signature of the reference target spectrum and the background pixels spectra (negative samples), which belong to various land cover classes. In this paper, we propose a novel supervised metric learning (SML) algorithm, which can effectively learn a distance metric for hyperspectral target detection, by which target pixels are easily detected in positive space while the background pixels are pushed into negative space as far as possible. The proposed SML algorithm first maximizes the distance between the positive and negative samples by an objective function of the supervised distance maximization. Then, by considering the variety of the background spectral features, we put a similarity propagation constraint into the SML to simultaneously link the target pixels with positive samples, as well as the background pixels with negative samples, which helps to reject false alarms in the target detection. Finally, a manifold smoothness regularization is imposed on the positive samples to preserve their local geometry in the obtained metric. Based on the public data sets of mineral detection in an Airborne Visible/Infrared Imaging Spectrometer image and fabric and vehicle detection in a Hyperspectral Mapper image, quantitative comparisons of several HSI target detection methods, as well as some state-of-the-art metric learning algorithms, were performed. All the experimental results demonstrate the effectiveness of the proposed SML algorithm for hyperspectral target detection.
Guan, N., Wei, L., Luo, Z. & Tao, D. 2014, 'Limited-Memory Fast Gradient Descent Method for Graph Regularized Nonnegative Matrix Factorization', PLoS One, vol. 9, no. 1.
Graph regularized nonnegative matrix factorization (GNMF) decomposes a nonnegative data matrix to the product of two lower-rank nonnegative factor matrices, i.e., and () and aims to preserve the local geometric structure of the dataset by minimizing squared Euclidean distance or Kullback-Leibler (KL) divergence between X and WH. The multiplicative update rule (MUR) is usually applied to optimize GNMF, but it suffers from the drawback of slow-convergence because it intrinsically advances one step along the rescaled negative gradient direction with a non-optimal step size. Recently, a multiple step-sizes fast gradient descent (MFGD) method has been proposed for optimizing NMF which accelerates MUR by searching the optimal step-size along the rescaled negative gradient direction with Newton's method. However, the computational cost of MFGD is high because 1) the high-dimensional Hessian matrix is dense and costs too much memory; and 2) the Hessian inverse operator and its multiplication with gradient cost too much time. To overcome these deficiencies of MFGD, we propose an efficient limited-memory FGD (L-FGD) method for optimizing GNMF. In particular, we apply the limited-memory BFGS (L-BFGS) method to directly approximate the multiplication of the inverse Hessian and the gradient for searching the optimal step size in MFGD. The preliminary results on real-world datasets show that L-FGD is more efficient than both MFGD and MUR. To evaluate the effectiveness of L-FGD, we validate its clustering performance for optimizing KL-divergence based GNMF on two popular face image datasets including ORL and PIE and two text corpora including Reuters and TDT2. The experimental results confirm the effectiveness of L-FGD by comparing it with the representative GNMF solvers.
Qiao, M., Cheng, J., Bian, W. & Tao, D. 2014, 'Biview Learning for Human Posture Segmentation from 3D Points Cloud', PLoS One, vol. 9, no. 1, p. e85811.
Posture segmentation plays an essential role in human motion analysis. The state-of-the-art method extracts sufficiently high-dimensional features from 3D depth images for each 3D point and learns an efficient body part classifier. However, high-dimensional features are memory-consuming and difficult to handle on large-scale training dataset. In this paper, we propose an efficient two-stage dimension reduction scheme, termed biview learning, to encode two independent views which are depth-difference features (DDF) and relative position features (RPF). Biview learning explores the complementary property of DDF and RPF, and uses two stages to learn a compact yet comprehensive low-dimensional feature space for posture segmentation. In the first stage, discriminative locality alignment (DLA) is applied to the high-dimensional DDF to learn a discriminative low-dimensional representation. In the second stage, canonical correlation analysis (CCA) is used to explore the complementary property of RPF and the dimensionality reduced DDF. Finally, we train a support vector machine (SVM) over the output of CCA. We carefully validate the effectiveness of DLA and CCA utilized in the two-stage scheme on our 3D human points cloud dataset. Experimental results show that the proposed biview learning scheme significantly outperforms the state-of-the-art method for human posture segmentation.
Wang, N., Tao, D., Gao, X., Li, X. & Li, J. 2014, 'A Comprehensive Survey to Face Hallucination', International journal of computer vision, vol. 106, no. 1, pp. 9-30.
This paper comprehensively surveys the development of face hallucination (FH), including both face super-resolution and face sketch-photo synthesis techniques. Indeed, these two techniques share the same objective of inferring a target face image (e.g. high-resolution face image, face sketch and face photo) from a corresponding source input (e.g. low-resolution face image, face photo and face sketch). Considering the critical role of image interpretation in modern intelligent systems for authentication, surveillance, law enforcement, security control, and entertainment, FH has attracted growing attention in recent years. Existing FH methods can be grouped into four categories: Bayesian inference approaches, subspace learning approaches, a combination of Bayesian inference and subspace learning approaches, and sparse representation-based approaches. In spite of achieving a certain level of development, FH is limited in its success by complex application conditions such as variant illuminations, poses, or views. This paper provides a holistic understanding and deep insight into FH, and presents a comparative analysis of representative methods and promising future directions.
Li, Z., Gong, D., Qiao, Y. & Tao, D. 2014, 'Common Feature Discriminant Analysis for Matching Infrared Face Images to Optical Face Images', IEEE Transactions On Image Processing, vol. 23, no. 6, pp. 2436-2445.
In biometrics research and industry, it is critical yet a challenge to match infrared face images to optical face images. The major difficulty lies in the fact that a great discrepancy exists between the infrared face image and corresponding optical face image because they are captured by different devices (optical imaging device and infrared imaging device). This paper presents a new approach called common feature discriminant analysis to reduce this great discrepancy and improve optical-infrared face recognition performance. In this approach, a new learning-based face descriptor is first proposed to extract the common features from heterogeneous face images (infrared face images and optical face images), and an effective matching method is then applied to the resulting features to obtain the final decision. Extensive experiments are conducted on two large and challenging optical-infrared face data sets to show the superiority of our approach over the state-of-the-art.
Yu, J., Rui, Y. & Tao, D. 2014, 'Click Prediction for Web Image Reranking Using Multimodal Sparse Coding', IEEE Transactions On Image Processing, vol. 23, no. 5, pp. 2019-2032.
Image reranking is effective for improving the performance of a text-based image search. However, existing reranking algorithms are limited for two main reasons: 1) the textual meta-data associated with images is often mismatched with their actual visual content and 2) the extracted visual features do not accurately describe the semantic similarities between images. Recently, user click information has been used in image reranking, because clicks have been shown to more accurately describe the relevance of retrieved images to search queries. However, a critical problem for click-based methods is the lack of click data, since only a small number of web images have actually been clicked on by users. Therefore, we aim to solve this problem by predicting image clicks. We propose a multimodal hypergraph learning-based sparse coding method for image click prediction, and apply the obtained click data to the reranking of images. We adopt a hypergraph to build a group of manifolds, which explore the complementarity of different features through a group of weights. Unlike a graph that has an edge between two vertices, a hyperedge in a hypergraph connects a set of vertices, and helps preserve the local smoothness of the constructed sparse codes. An alternating optimization procedure is then performed, and the weights of different modalities and the sparse codes are simultaneously obtained. Finally, a voting strategy is used to describe the predicted click as a binary event (click or no click), from the images' corresponding sparse codes. Thorough empirical studies on a large-scale database including nearly 330 K images demonstrate the effectiveness of our approach for click prediction when compared with several other methods. Additional image reranking experiments on real-world data show the use of click prediction is beneficial to improving the performance of prominent graph-based image reranking algorithms.
You, X., Wang, R. & Tao, D. 2014, 'Diverse Expected Gradient Active Learning for Relative Attributes', IEEE Transactions On Image Processing, vol. 23, no. 7, pp. 3203-3217.
Lou, Y., Liu, T. & Tao, D. 2014, 'Decomposition-Based Transfer Distance Metric Learning for Image Classification', IEEE Transactions On Image Processing, vol. 23, no. 9, pp. 3789-3801.
Song, M., Tao, D., Sun, S., Chen, C. & Maybank, S.J. 2014, 'Robust 3D face landmark localization based on local coordinate coding.', IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol. 23, no. 12, pp. 5108-5122.
In the 3D facial animation and synthesis community, input faces are usually required to be labeled by a set of landmarks for parameterization. Because of the variations in pose, expression and resolution, automatic 3D face landmark localization remains a challenge. In this paper, a novel landmark localization approach is presented. The approach is based on local coordinate coding (LCC) and consists of two stages. In the first stage, we perform nose detection, relying on the fact that the nose shape is usually invariant under the variations in the pose, expression, and resolution. Then, we use the iterative closest points algorithm to find a 3D affine transformation that aligns the input face to a reference face. In the second stage, we perform resampling to build correspondences between the input 3D face and the training faces. Then, an LCC-based localization algorithm is proposed to obtain the positions of the landmarks in the input face. Experimental results show that the proposed method is comparable to state of the art methods in terms of its robustness, flexibility, and accuracy.
Shao, L., Zhen, X., Tao, D. & Li, X. 2014, 'Spatio-temporal Laplacian pyramid coding for action recognition.', IEEE transactions on cybernetics, vol. 44, no. 6, pp. 817-827.
We present a novel descriptor, called spatio-temporal Laplacian pyramid coding (STLPC), for holistic representation of human actions. In contrast to sparse representations based on detected local interest points, STLPC regards a video sequence as a whole with spatio-temporal features directly extracted from it, which prevents the loss of information in sparse representations. Through decomposing each sequence into a set of band-pass-filtered components, the proposed pyramid model localizes features residing at different scales, and therefore is able to effectively encode the motion information of actions. To make features further invariant and resistant to distortions as well as noise, a bank of 3-D Gabor filters is applied to each level of the Laplacian pyramid, followed by max pooling within filter bands and over spatio-temporal neighborhoods. Since the convolving and pooling are performed spatio-temporally, the coding model can capture structural and motion information simultaneously and provide an informative representation of actions. The proposed method achieves superb recognition rates on the KTH, the multiview IXMAS, the challenging UCF Sports, and the newly released HMDB51 datasets. It outperforms state of the art methods showing its great potential on action recognition.
Zhao, J., Wu, J., Liu, G., Tao, D., Xu, K. & Liu, C.Y. 2014, 'Being Rational or Aggressive? A Revisit to Dunbar's Number in Online Social Networks', Neurocomputing, vol. 142, pp. 343-353.
Recent years have witnessed the explosion of online social networks (OSNs). They provide powerful IT-innovations for online social activities such as organizing contacts, publishing contents, and sharing interests between friends who may never meet before. As more and more people become the active users of online social networks, one may ponder questions such as: (1) Do OSNs indeed improve our sociability? (2) To what extent can we expand our offline social spectrum in OSNs? (3) Can we identify some interesting user behaviors in OSNs? Our work in this paper just aims to answer these interesting questions. To this end, we pay a revisit to the well-known Dunbar's number in online social networks. Our main research contributions are as follows. First, to our best knowledge, our work is the first one that systematically validates the existence of the online Dunbar's number in the range of [200,300]. To reach this, we combine using local-structure analysis and user-interaction analysis for extensive real-world OSNs. Second, we divide OSNs users into two categories: rational and aggressive, and find that rational users intend to develop close and reciprocated relationships, whereas aggressive users have no consistent behaviors. Third, we build a simple model to capture the constraints of time and cognition that affect the evolution of online social networks. Finally, we show the potential use of our findings in viral marketing and privacy management in online social networks.
Han, B., Zhao, X., Tao, D., Li, X., Hu, Z. & Hu, H. 2014, 'Dayside aurora classification via BIFs-based sparse representation using manifold learning', International Journal of Computer Mathematics, vol. 91, no. 11, pp. 2415-2426.
&copy; 2013, Taylor & Francis. Aurora is the typical ionosphere track generated by the interaction of solar wind and magnetosphere, whose modality and variation are significant to the study of space weather activity A new aurora classification algorithm based on biologically inspired features (BIFs) and discriminative locality alignment (DLA) is proposed in this paper First, an aurora image is represented by the BIFs, which combines the C1 units from the hierarchical model of object recognition in cortex and the gist features from the saliency map; then, the manifold learning method called DLA is used to obtain the effective sparse representation for auroras based on BIFs; finally, classification results using support vector machine and nearest neighbour with three sets of features: the C1 unit features, the gist features and the BIFs illustrate the effectiveness and robustness of our method on the real aurora image database from Chinese Arctic Yellow River Station.
Liu, W., Li, Y., Lin, X., Tao, D. & Wang, Y. 2014, 'Hessian-regularized co-training for social activity recognition', PLoS ONE, vol. 9, no. 9.
&copy; 2014 Liu et al. Co-training is a major multi-view learning paradigm that alternately trains two classifiers on two distinct views and maximizes the mutual agreement on the two-view unlabeled data. Traditional co-training algorithms usually train a learner on each view separately and then force the learners to be consistent across views. Although many co-trainings have been developed, it is quite possible that a learner will receive erroneous labels for unlabeled data when the other learner has only mediocre accuracy. This usually happens in the first rounds of co-training, when there are only a few labeled examples. As a result, co-training algorithms often have unstable performance. In this paper, Hessian-regularized co-training is proposed to overcome these limitations. Specifically, each Hessian is obtained from a particular view of examples; Hessian regularization is then integrated into the learner training process of each view by penalizing the regression function along the potential manifold. Hessian can properly exploit the local structure of the underlying data manifold. Hessian regularization significantly boosts the generalizability of a classifier, especially when there are a small number of labeled examples and a large number of unlabeled examples. To evaluate the proposed method, extensive experiments were conducted on the unstructured social activity attribute (USAA) dataset for social activity recognition. Our results demonstrate that the proposed method outperforms baseline methods, including the traditional co-training and LapCo algorithms.
You, X., Li, Q., Tao, D., Ou, W. & Gong, M. 2014, 'Local metric learning for exemplar-based object detection', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 8, pp. 1265-1276.
Object detection has been widely studied in the computer vision community and it has many real applications, despite its variations, such as scale, pose, lighting, and background. Most classical object detection methods heavily rely on category-based training to handle intra-class variations. In contrast to classical methods that use a rigid category-based representation, exemplar-based methods try to model variations among positives by learning from specific positive samples. However, current existing exemplar-based methods either fail to use any training information or suffer from a significant performance drop when few exemplars are available. In this paper, we design a novel local metric learning approach to well handle exemplar-based object detection task. The main works are two-fold: 1) a novel local metric learning algorithm called exemplar metric learning (EML) is designed and 2) an exemplar-based object detection algorithm based on EML is implemented. We evaluate our method on two generic object detection data sets: UIUC-Car and UMass FDDB. Experiments show that compared with other exemplar-based methods, our approach can effectively enhance object detection performance when few exemplars are available. &copy; 2014 IEEE.
Xu, C., Tao, D. & Xu, C. 2014, 'Large-margin multi-vewInformation bottleneck', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 8, pp. 1559-1572.
In this paper, we extend the theory of the information bottleneck (IB) to learning from examples represented by multi-view features. We formulate the problem as one of encoding a communication system with multiple senders, each of which represents one view of the data. Based on the precise components filtered out from multiple information sources through a 'bottleneck', a margin maximization approach is then used to strengthen the discrimination of the encoder by improving the code distance within the frame of coding theory. The resulting algorithm therefore inherits all the merits of the IB principle and coding theory. It has two distinct advantages over existing algorithms, namely, that our method finds a tradeoff between the accuracy and complexity of the multi-view model, and that the encoded multi-view data retains sufficient discrimination for classification. We also derive the robustness and generalization error bound of the proposed algorithm, and reveal the specific properties of multi-view learning. First, the complementarity of multi-view features guarantees the robustness of the algorithm. Second, the consensus of multi-view features reduces the empirical Rademacher complexity of the objective function, enhances the accuracy of the solution, and improves the generalization error bound of the algorithm. The resulting objective function is solved efficiently using the alternating direction method. Experimental results on annotation, classification and recognition tasks demonstrate that the proposed algorithm is promising for practical applications. &copy; 1979-2012 IEEE.
Bian, W. & Tao, D. 2014, 'Asymptotic generalization bound of fisher's linear discriminant analysis', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 12, pp. 2325-2337.
&copy; 1979-2012 IEEE. Fisher's linear discriminant analysis (FLDA) is an important dimension reduction method in statistical pattern recognition. It has been shown that FLDA is asymptotically Bayes optimal under the homoscedastic Gaussian assumption. However, this classical result has the following two major limitations: 1) it holds only for a fixed dimensionality D , and thus does not apply when D and the training sample size N are proportionally large; 2) it does not provide a quantitative description on how the generalization ability of FLDA is affected by D and N. In this paper, we present an asymptotic generalization analysis of FLDA based on random matrix theory, in a setting where both D and N increase and D/N\ [0,1). The obtained lower bound of the generalization discrimination power overcomes both limitations of the classical result, i.e., it is applicable when D and N are proportionally large and provides a quantitative description of the generalization ability of FLDA in terms of the ratio =D/N and the population discrimination power. Besides, the discrimination power bound also leads to an upper bound on the generalization error of binary-classification with FLDA.
Tang, J., Tao, D., Qi, G.-.J. & Huet, B. 2014, 'Social media mining and knowledge discovery', Multimedia Systems, vol. 20, no. 6, pp. 633-634.
Wang, M., Tao, D. & Huet, B. 2014, 'Multimedia modeling', Information Sciences, vol. 281, pp. 521-522.
Wang, B., Gao, X., Tao, D.C. & Li, X.L. 2014, 'A Nonlinear Adaptive Level Set for Image Segmentation', IEEE Transactions on Cybernetics, vol. 44, no. 3, pp. 418-428.
Hong, R.C., Wang, M., Gao, Y., Tao, D.C., Li, X.L. & Wu, X.D. 2014, 'Image Annotation by Multiple-Instance Learning With Discriminative Feature Mapping and Selection', IEEE Transactions on Cybernetics, vol. 44, no. 5, pp. 669-680.
Multiple-instance learning (MIL) has been widely investigated in image annotation for its capability of exploring region-level visual information of images. Recent studies show that, by performing feature mapping, MIL can be cast to a single-instance learning problem and, thus, can be solved by traditional supervised learning methods. However, the approaches for feature mapping usually overlook the discriminative ability and the noises of the generated features. In this paper, we propose an MIL method with discriminative feature mapping and feature selection, aiming at solving this problem. Our method is able to explore both the positive and negative concept correlations. It can also select the effective features from a large and diverse set of low-level features for each concept under MIL settings. Experimental results and comparison with other methods demonstrate the effectiveness of our approach.
Yu, J., Rui, Y., Tang, Y.Y. & Tao, D. 2014, 'High-Order Distance-Based Multiview Stochastic Learning in Image Classification', IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2431-2442.
How do we find all images in a larger set of images which have a specific content? Or estimate the position of a specific object relative to the camera? Image classification methods, like support vector machine (supervised) and transductive support vector machine (semi-supervised), are invaluable tools for the applications of content-based image retrieval, pose estimation, and optical character recognition. However, these methods only can handle the images represented by single feature. In many cases, different features (or multiview data) can be obtained, and how to efficiently utilize them is a challenge. It is inappropriate for the traditionally concatenating schema to link features of different views into a long vector. The reason is each view has its specific statistical property and physical interpretation. In this paper, we propose a high-order distance-based multiview stochastic learning (HD-MSL) method for image classification. HD-MSL effectively combines varied features into a unified representation and integrates the labeling information based on a probabilistic framework. In comparison with the existing strategies, our approach adopts the high-order distance obtained from the hypergraph to replace pairwise distance in estimating the probability matrix of data distribution. In addition, the proposed approach can automatically learn a combination coefficient for each view, which plays an important role in utilizing the complementary information of multiview data. An alternative optimization is designed to solve the objective functions of HD-MSL and obtain different views on coefficients and classification scores simultaneously. Experiments on two real world datasets demonstrate the effectiveness of HD-MSL in image classification.
Yang, X., Gao, X.B., Tao, D.C. & Li, X.L. 2014, 'Improving Level Set Method for Fast Auroral Oval Segmentation', IEEE Transactions on Image Processing, vol. 23, no. 7, pp. 2854-2865.
Auroral oval segmentation from ultraviolet imager images is of significance in the field of spatial physics. Compared with various existing image segmentation methods, level set is a promising auroral oval segmentation method with satisfactory precision. However, the traditional level set methods are time consuming, which is not suitable for the processing of large aurora image database. For this purpose, an improving level set method is proposed for fast auroral oval segmentation. The proposed algorithm combines four strategies to solve the four problems leading to the high-time complexity. The first two strategies, including our shape knowledge-based initial evolving curve and neighbor embedded level set formulation, can not only accelerate the segmentation process but also improve the segmentation accuracy. And then, the latter two strategies, including the universal lattice Boltzmann method and sparse field method, can further reduce the time cost with an unlimited time step and narrow band computation. Experimental results illustrate that the proposed algorithm achieves satisfactory performance for auroral oval segmentation within a very short processing time.
Gui, J., Tao, D.C., Sun, Z.N., Luo, Y., You, X.G. & Tang, Y.Y. 2014, 'Group Sparse Multiview Patch Alignment Framework With View Consistency for Image Classification', IEEE Transactions on Image Processing, vol. 23, no. 7, pp. 3126-3137.
No single feature can satisfactorily characterize the semantic concepts of an image. Multiview learning aims to unify different kinds of features to produce a consensual and efficient representation. This paper redefines part optimization in the patch alignment framework (PAF) and develops a group sparse multiview patch alignment framework (GSM-PAF). The new part optimization considers not only the complementary properties of different views, but also view consistency. In particular, view consistency models the correlations between all possible combinations of any two kinds of view. In contrast to conventional dimensionality reduction algorithms that perform feature extraction and feature selection independently, GSM-PAF enjoys joint feature extraction and feature selection by exploiting l(2,1)-norm on the projection matrix to achieve row sparsity, which leads to the simultaneous selection of relevant features and learning transformation, and thus makes the algorithm more discriminative. Experiments on two real-world image data sets demonstrate the effectiveness of GSM-PAF for image classification.
Ma, L.Y., Yang, X.K. & Tao, D.C. 2014, 'Person Re-Identification Over Camera Networks Using Multi-Task Distance Metric Learning', IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3656-3670.
Person reidentification in a camera network is a valuable yet challenging problem to solve. Existing methods learn a common Mahalanobis distance metric by using the data collected from different cameras and then exploit the learned metric for identifying people in the images. However, the cameras in a camera network have different settings and the recorded images are seriously affected by variability in illumination conditions, camera viewing angles, and background clutter. Using a common metric to conduct person reidentification tasks on different camera pairs overlooks the differences in camera settings; however, it is very time-consuming to label people manually in images from surveillance videos. For example, in most existing person reidentification data sets, only one image of a person is collected from each of only two cameras; therefore, directly learning a unique Mahalanobis distance metric for each camera pair is susceptible to over-fitting by using insufficiently labeled data. In this paper, we reformulate person reidentification in a camera network as a multitask distance metric learning problem. The proposed method designs multiple Mahalanobis distance metrics to cope with the complicated conditions that exist in typical camera networks. We address the fact that these Mahalanobis distance metrics are different but related, and learned by adding joint regularization to alleviate over-fitting. Furthermore, by extending, we present a novel multitask maximally collapsing metric learning (MtMCML) model for person reidentification in a camera network. Experimental results demonstrate that formulating person reidentification over camera networks as multitask distance metric learning problem can improve performance, and our proposed MtMCML works substantially better than other current state-of-the-art person reidentification methods.
Sun, Y., Liu, Q., Tang, J. & Tao, D. 2014, 'Learning Discriminative Dictionary for Group Sparse Representation', IEEE Transactions on Image Processing, vol. 23, no. 9, pp. 3816-3828.
In recent years, sparse representation has been widely used in object recognition applications. How to learn the dictionary is a key issue to sparse representation. A popular method is to use l(1) norm as the sparsity measurement of representation coefficients for dictionary learning. However, the l1 norm treats each atom in the dictionary independently, so the learned dictionary cannot well capture the multisubspaces structural information of the data. In addition, the learned subdictionary for each class usually shares some common atoms, which weakens the discriminative ability of the reconstruction error of each subdictionary. This paper presents a new dictionary learning model to improve sparse representation for image classification, which targets at learning a class-specific subdictionary for each class and a common subdictionary shared by all classes. The model is composed of a discriminative fidelity, a weighted group sparse constraint, and a subdictionary incoherence term. The discriminative fidelity encourages each class-specific subdictionary to sparsely represent the samples in the corresponding class. The weighted group sparse constraint term aims at capturing the structural information of the data. The subdictionary incoherence term is to make all subdictionaries independent as much as possible. Because the common subdictionary represents features shared by all classes, we only use the reconstruction error of each class-specific subdictionary for classification. Extensive experiments are conducted on several public image databases, and the experimental results demonstrate the power of the proposed method, compared with the state-of-the-arts.
Wang, Y., Tao, D., Li, X., Song, M., Bu, J. & Tan, P. 2014, 'Video Tonal Stabilization via Color States Smoothing', IEEE Transactions on Image Processing, vol. 23, no. 11, pp. 4838-4849.
We address the problem of removing video color tone jitter that is common in amateur videos recorded with hand-held devices. To achieve this, we introduce color state to represent the exposure and white balance state of a frame. The color state of each frame can be computed by accumulating the color transformations of neighboring frame pairs. Then, the tonal changes of the video can be represented by a time-varying trajectory in color state space. To remove the tone jitter, we smooth the original color state trajectory by solving an L1 optimization problem with PCA dimensionality reduction. In addition, we propose a novel selective strategy to remove small tone jitter while retaining extreme exposure and white balance changes to avoid serious artifacts. Quantitative evaluation and visual comparison with previous work demonstrate the effectiveness of our tonal stabilization method. This system can also be used as a preprocessing tool for other video editing methods.
Bian, W., Zhou, T., Martinez, A.M., Baciu, G. & Tao, D. 2014, 'Minimizing Nearest Neighbor Classification Error for Nonparametric Dimension Reduction', IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 8, pp. 1588-1594.
In this brief, we show that minimizing nearest neighbor classification error (MNNE) is a favorable criterion for supervised linear dimension reduction (SLDR). We prove that MNNE is better than maximizing mutual information in the sense of being a proxy of the Bayes optimal criterion. Based on kernel density estimation, we derive a nonparametric algorithm for MNNE. Experiments on benchmark data sets show the superiority of MNNE over existing nonparametric SLDR methods.
Hou, C., Nie, F., Yi, D. & Tao, D. 2014, 'Discriminative Embedded Clustering: A Framework for Grouping High-Dimensional Data', IEEE Transactions on Neural Networks and Learning Systems.
In many real applications of machine learning and data mining, we are often confronted with high-dimensional data. How to cluster high-dimensional data is still a challenging problem due to the curse of dimensionality. In this paper, we try to address this problem using joint dimensionality reduction and clustering. Different from traditional approaches that conduct dimensionality reduction and clustering in sequence, we propose a novel framework referred to as discriminative embedded clustering which alternates them iteratively. Within this framework, we are able not only to view several traditional approaches and reveal their intrinsic relationships, but also to be stimulated to develop a new method. We also propose an effective approach for solving the formulated nonconvex optimization problem. Comprehensive analyses, including convergence behavior, parameter determination, and computational complexity, together with the relationship to other related approaches, are also presented. Plenty of experimental results on benchmark data sets illustrate that the proposed method outperforms related state-of-the-art clustering approaches and existing joint dimensionality reduction and clustering methods.
Dong, Y., Tao, D., Li, X., Ma, J. & Pu, J. 2014, 'Texture Classification and Retrieval Using Shearlets and Linear Regression', IEEE Transactions on Cybernetics.
Statistical modeling of wavelet subbands has frequently been used for image recognition and retrieval. However, traditional wavelets are unsuitable for use with images containing distributed discontinuities, such as edges. Shearlets are a newly developed extension of wavelets that are better suited to image characterization. Here, we propose novel texture classification and retrieval methods that model adjacent shearlet subband dependences using linear regression. For texture classification, we use two energy features to represent each shearlet subband in order to overcome the limitation that subband coefficients are complex numbers. Linear regression is used to model the features of adjacent subbands; the regression residuals are then used to define the distance from a test texture to a texture class. Texture retrieval consists of two processes: the first is based on statistics in contourlet domains, while the second is performed using a pseudo-feedback mechanism based on linear regression modeling of shearlet subband dependences. Comprehensive validation experiments performed on five large texture datasets reveal that the proposed classification and retrieval methods outperform the current state-of-the-art.
Hou, W., Gao, X., Tao, D. & Li, X. 2013, 'Visual Saliency Detection Using Information Divergence', Pattern Recognition, vol. 46, no. 10, pp. 2658-2669.
The technique of visual saliency detection supports video surveillance systems by reducing redundant information and highlighting the critical, visually important regions. It follows that information about the image might be of great importance in depict
Cheng, J., Liu, J., Xu, Y., Yin, F., Wong, D., Tan, N., Tao, D., Cheng, C., Aung, T. & Wong, T. 2013, 'Superpixel Classification Based Optic Disc And Optic Cup Segmentation For Glaucoma Screening', IEEE Transactions on Medical Imaging, vol. 32, no. 6, pp. 1019-1032.
Glaucoma is a chronic eye disease that leads to vision loss. As it cannot be cured, detecting the disease in time is important. Current tests using intraocular pressure (IOP) are not sensitive enough for population based glaucoma screening. Optic nerve head assessment in retinal fundus images is both more promising and superior. This paper proposes optic disc and optic cup segmentation using superpixel classification for glaucoma screening. In optic disc segmentation, histograms, and center surround statistics are used to classify each superpixel as disc or non-disc. A self-assessment reliability score is computed to evaluate the quality of the automated optic disc segmentation. For optic cup segmentation, in addition to the histograms and center surround statistics, the location information is also included into the feature space to boost the performance. The proposed segmentation methods have been evaluated in a database of 650 images with optic disc and optic cup boundaries manually marked by trained professionals. Experimental results show an average overlapping error of 9.5% and 24.1% in optic disc and optic cup segmentation, respectively. The results also show an increase in overlapping error as the reliability score is reduced, which justifies the effectiveness of the self-assessment. The segmented optic disc and optic cup are then used to compute the cup to disc ratio for glaucoma screening. Our proposed method achieves areas under curve of 0.800 and 0.822 in two data sets, which is higher than other methods. The methods can be used for segmentation and glaucoma screening. The self-assessment will be used as an indicator of cases with large errors and enhance the clinical deployment of the automatic segmentation and screening.
Zhang, C. & Tao, D. 2013, 'Structure Of Indicator Function Classes With Finite Vapnik-chervonenkis Dimensions', IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 7, pp. 1156-1160.
The Vapnik-Chervonenkis (VC) dimension is used to measure the complexity of a function class and plays an important role in a variety of fields, including artificial neural networks and machine learning. One major concern is the relationship between the VC dimension and inherent characteristics of the corresponding function class. According to Sauer's lemma, if the VC dimension of an indicator function class F is equal to D, the cardinality of the set FS1N will not be larger than Sd=0DCNd. Therefore, there naturally arises a question about the VC dimension of an indicator function class: what kinds of elements will be contained in the function class F if F has a finite VC dimension? In this brief, we answer the above question. First, we investigate the structure of the function class F when the cardinality of the set FS1N reaches the maximum value Sd=0DCNd. Based on the derived result, we then figure out what kinds of elements will be contained in F if F has a finite VC dimension.
Li, J., Bian, W., Tao, D. & Zhang, C. 2013, 'Learning Colours From Textures By Sparse Manifold Embedding', Signal Processing, vol. 93, no. 6, pp. 1485-1495.
The capability of inferring colours from the texture (grayscale contents) of an image is useful in many application areas, when the imaging device/environment is limited. Traditional manual or limited automatic colour assignment involves intensive human
Song, M., Tao, D., Sun, S., Chen, C. & Bu, J. 2013, 'Joint sparse learning for 3-D facial expression generation', IEEE Transactions On Image Processing, vol. 22, no. 8, pp. 3283-3295.
3-D facial expression generation, including synthesis and retargeting, has received intensive attentions in recent years, because it is important to produce realistic 3-D faces with speci?c expressions in modern ?lm production and computer games. In this paper, we present joint sparse learning (JSL) to learn mapping functions and their respective inverses to model the relationship between the high-dimensional 3-D faces (of different expressions and identities) and their corresponding low-dimensional representations. Based on JSL, we can effectively and ef?ciently generate various expressions of a 3-D face by either synthesizing or retargeting. Furthermore, JSL is able to restore 3-D faces with holes by learning a mapping function between incomplete and intact data. Experimental results on a wide range of 3-D faces demonstrate the effectiveness of the proposed approach by comparing with representative ones in terms of quality, time cost, and robustness.
Zhang, L., Zhang, L., Tao, D. & Huang, X. 2013, 'A modified stochastic neighbor embedding for multi-feature dimension reduction of remote sensing images', ISPRS Journal of Photogrammetry and Remote Sensing, vol. 83, no. 1, pp. 30-39.
In automated remote sensing based image analysis, it is important to consider the multiple features of a certain pixel, such as the spectral signature, morphological property, and shape feature, in both the spatial and spectral domains, to improve the classification accuracy. Therefore, it is essential to consider the complementary properties of the different features and combine them in order to obtain an accurate classification rate. In this paper, we introduce a modified stochastic neighbor embedding (MSNE) algorithm for multiple features dimension reduction (DR) under a probability preserving projection framework. For each feature, a probability distribution is constructed based on t-distributed stochastic neighbor embedding (t-SNE), and we then alternately solve t-SNE and learn the optimal combination coefficients for different features in the proposed multiple features DR optimization. Compared with conventional remote sensing image DR strategies, the suggested algorithm utilizes both the spatial and spectral features of a pixel to achieve a physically meaningful low-dimensional feature representation for the subsequent classification, by automatically learning a combination coefficient for each feature. The classification results using hyperspectral remote sensing images (HSI) show that MSNE can effectively improve RS image classification performance
Liu, W. & Tao, D. 2013, 'Multiview hessian regularization for image annotation', IEEE Transactions On Image Processing, vol. 22, no. 7, pp. 2676-2687.
Multiview hessian regularization for image annotation
Wang, X., Bian, W. & Tao, D. 2013, 'Grassmannian regularized structured multi-view embedding for image classification', IEEE Transactions On Image Processing, vol. 22, no. 7, pp. 2646-2660.
Images are usually represented by features from multiple views, e.g., color and texture. In image classification, the goal is to fuse all the multi-view features in a reasonable manner and achieve satisfactory classification performance. However, the fea
Xiao, B., Gao, X., Tao, D. & Li, X. 2013, 'Biview Face Recognition In The Shape-texture Domain', Pattern Recognition, vol. 46, no. 7, pp. 1906-1919.
Face recognition is one of the biometric identification methods with the highest potential. The existing face recognition algorithms relying on the texture information of face images are affected greatly by the variation of expression, scale and illumination. Whereas the algorithms based on the shape topology weaken the influence of illumination to some extent, but the impact of expression, scale and illumination on face recognition is still unsolved. To this end, we propose a new method for face recognition by integrating texture information with shape information, called biview face recognition algorithm. The texture models are constructed by using subspace learning methods and shape topologies are formed by building graphs for face images. The proposed biview face recognition method is compared with recognition algorithms merely based on texture or shape information. Experimental results of recognizing faces under the variation of illumination, expression and scale demonstrate that the performance of the proposed biview face recognition outperforms texture-based and shape-based algorithms.
Li, J. & Tao, D. 2013, 'Exponential Family Factors For Bayesian Factor Analysis', IEEE Transactions On Neural Networks And Learning Systems, vol. 24, no. 6, pp. 964-976.
Expressing data as linear functions of a small number of unknown variables is a useful approach employed by several classical data analysis methods, e.g., factor analysis, principal component analysis, or latent semantic indexing. These models represent the data using the product of two factors. In practice, one important concern is how to link the learned factors to relevant quantities in the context of the application. To this end, various specialized forms of the factors have been proposed to improve interpretability. Toward developing a unified view and clarifying the statistical significance of the specialized factors, we propose a Bayesian model family. We employ exponential family distributions to specify various types of factors, which provide a unified probabilistic formulation. A Gibbs sampling procedure is constructed as a general computation routine. We verify the model by experiments, in which the proposed model is shown to be effective in both emulating existing models and motivating new model designs for particular problem settings.
Pan, Z., You, X., Chen, H., Tao, D. & Pang, B. 2013, 'Generalization Performance Of Magnitude-preserving Semi-supervised Ranking With Graph-based Regularization', Information Sciences, vol. 221, no. 1, pp. 284-296.
Semi-supervised ranking is a relatively new and important learning problem inspired by many applications. We propose a novel graph-based regularized algorithm which learns the ranking function in the semi-supervised learning framework. It can exploit geometry of the data while preserving the magnitude of the preferences. The least squares ranking loss is adopted and the optimal solution of our model has an explicit form. We establish error analysis of our proposed algorithm and demonstrate the relationship between predictive performance and intrinsic properties of the graph. The experiments on three datasets for recommendation task and two quantitative structureactivity relationship datasets show that our method is effective and comparable to some other state-of-the-art algorithms for ranking.
Wang, N., Li, J., Tao, D., Li, X. & Gao, X. 2013, 'Heterogeneous Image Transformation', Pattern Recognition Letters, vol. 34, no. 1, pp. 77-84.
Heterogeneous image transformation (HIT) plays an important role in both law enforcements and digital entertainment. Some available popular transformation methods, like locally linear embedding based, usually generate images with lower definition and blurred details mainly due to two defects: (1) these approaches use a fixed number of nearest neighbors (NN) to model the transformation process, i.e., K-NN-based methods; (2) with overlapping areas averaged, the transformed image is approximately equivalent to be filtered by a low pass filter, which filters the high frequency or detail information. These drawbacks reduce the visual quality and the recognition rate across heterogeneous images. In order to overcome these two disadvantages, a two step framework is constructed based on sparse feature selection (SFS) and support vector regression (SVR). In the proposed model, SFS selects nearest neighbors adaptively based on sparse representation to implement an initial transformation, and subsequently the SVR model is applied to estimate the lost high frequency information or detail information. Finally, by linear superimposing these two parts, the ultimate transformed image is obtained. Extensive experiments on both sketch-photo database and near infraredvisible image database illustrates the effectiveness of the proposed heterogeneous image transformation method.
Mu, Y., Ding, W. & Tao, D. 2013, 'Local Discriminative Distance Metrics Ensemble Learning', Pattern Recognition, vol. 46, no. 8, pp. 2337-2349.
The ultimate goal of distance metric learning is to incorporate abundant discriminative information to keep all data samples in the same class close and those from different classes separated. Local distance metric methods can preserve discriminative information by considering the neighborhood influence. In this paper, we propose a new local discriminative distance metrics (LDDM) algorithm to learn multiple distance metrics from each training sample (a focal sample) and in the vicinity of that focal sample (focal vicinity), to optimize local compactness and local separability. Those locally learned distance metrics are used to build local classifiers which are aligned in a probabilistic framework via ensemble learning. Theoretical analysis proves the convergence rate bound, the generalization bound of the local distance metrics and the final ensemble classifier. We extensively evaluate LDDM using synthetic datasets and large benchmark UCI datasets
Zhang, C. & Tao, D. 2013, 'Risk Bounds Of Learning Processes For Levy Processes', Journal of Machine Learning Research, vol. 14, no. NA, pp. 351-376.
Levy processes refer to a class of stochastic processes, for example, Poisson processes and Brownian motions, and play an important role in stochastic processes and machine learning. Therefore, it is essential to study risk bounds of the learning process
Li, J. & Tao, D. 2013, 'Simple Exponential Family PCA', IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 3, pp. 485-497.
Principal component analysis (PCA) is a widely used model for dimensionality reduction. In this paper, we address the problem of determining the intrinsic dimensionality of a general type data population by selecting the number of principal components for a generalized PCA model. In particular, we propose a generalized Bayesian PCA model, which deals with general type data by employing exponential family distributions. Model selection is realized by empirical Bayesian inference of the model. We name the model as simple exponential family PCA (SePCA), since it embraces both the principal of using a simple model for data representation and the practice of using a simplified computational procedure for the inference. Our analysis shows that the empirical Bayesian inference in SePCA formally realizes an intuitive criterion for PCA model selection - a preserved principal component must sufficiently correlate to data variance that is uncorrelated to the other principal components. Experiments on synthetic and real data sets demonstrate effectiveness of SePCA and exemplify its characteristics for model selection.
Zhang, L., Zhang, L., Tao, D. & Huang, X. 2013, 'Tensor Discriminative Locality Alignment For Hyperspectral Image Spectral-spatial Feature Extraction', IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 1, pp. 242-256.
In this paper, we propose a method for the dimensionality reduction (DR) of spectral-spatial features in hyperspectral images (HSIs), under the umbrella of multilinear algebra, i.e., the algebra of tensors. The proposed approach is a tensor extension of conventional supervised manifold-learning-based DR. In particular, we define a tensor organization scheme for representing a pixel's spectral-spatial feature and develop tensor discriminative locality alignment (TDLA) for removing redundant information for subsequent classification. The optimal solution of TDLA is obtained by alternately optimizing each mode of the input tensors. The methods are tested on three public real HSI data sets collected by hyperspectral digital imagery collection experiment, reflective optics system imaging spectrometer, and airborne visible/infrared imaging spectrometer. The classification results show significant improvements in classification accuracies while using a small number of features.
Luo, Y., Tao, D., Xu, C., Xu, C., Liu, H. & Wen, Y. 2013, 'Multiview Vector-valued Manifold Regularization For Multilabel Image Classification', IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 5, pp. 709-722.
In computer vision, image datasets used for classification are naturally associated with multiple labels and comprised of multiple views, because each image may contain several objects (e.g., pedestrian, bicycle, and tree) and is properly characterized by multiple visual features (e.g., color, texture, and shape). Currently, available tools ignore either the label relationship or the view complementarily. Motivated by the success of the vector-valued function that constructs matrix-valued kernels to explore the multilabel structure in the output space, we introduce multiview vector-valued manifold regularization (MV3MR) to integrate multiple features. MV3MR exploits the complementary property of different features and discovers the intrinsic local geometry of the compact support shared by different features under the theme of manifold regularization. We conduct extensive experiments on two challenging, but popular, datasets, PASCAL VOC' 07 and MIR Flickr, and validate the effectiveness of the proposed MV3MR for image classification.
Luo, Y., Tao, D., Geng, B., Xu, C. & Maybank, S. 2013, 'Manifold Regularized Multi-task Learning for Semi-supervised Multi-label Image Classification', IEEE Transactions On Image Processing, vol. 22, no. 2, pp. 523-532.
It is a significant challenge to classify images with multiple labels by using only a small number of labeled samples. One option is to learn a binary classifier for each label and use manifold regularization to improve the classification performance by exploring the underlying geometric structure of the data distribution. However, such an approach does not perform well in practice when images from multiple concepts are represented by high-dimensional visual features. Thus, manifold regularization is insufficient to control the model complexity. In this paper, we propose a manifold regularized multitask learning (MRMTL) algorithm. MRMTL learns a discriminative subspace shared by multiple classification tasks by exploiting the common structure of these tasks. It effectively controls the model complexity because different tasks limit one another's search volume, and the manifold regularization ensures that the functions in the shared hypothesis space are smooth along the data manifold. We conduct extensive experiments, on the PASCAL VOC'07 dataset with 20 classes and the MIR dataset with 38 classes, by comparing MRMTL with popular image classification algorithms. The results suggest that MRMTL is effective for image classification.
Shi, M., Xu, R., Tao, D. & Xu, C. 2013, 'W-tree Indexing for Fast Visual Word Generation', IEEE Transactions On Image Processing, vol. 22, no. 3, pp. 1209-1222.
The bag-of-visual-words representation has been widely used in image retrieval and visual recognition. The most time-consuming step in obtaining this representation is the visual word generation, i.e., assigning visual words to the corresponding local features in a high-dimensional space. Recently, structures based on multibranch trees and forests have been adopted to reduce the time cost. However, these approaches cannot perform well without a large number of backtrackings. In this paper, by considering the spatial correlation of local features, we can significantly speed up the time consuming visual word generation process while maintaining accuracy. In particular, visual words associated with certain structures frequently co-occur; hence, we can build a co-occurrence table for each visual word for a largescale data set. By associating each visual word with a probability according to the corresponding co-occurrence table, we can assign a probabilistic weight to each node of a certain index structure (e.g., a KD-tree and a K-means tree), in order to re-direct the searching path to be close to its global optimum within a small number of backtrackings. We carefully study the proposed scheme by comparing it with the fast library for approximate nearest neighbors and the random KD-trees on the Oxford data set. Thorough experimental results suggest the efficiency and effectiveness of the new scheme.
Yu, J., Tao, D., Rui, Y. & Cheng, J. 2013, 'Pairwise Constraints Based Multiview Features Fusion for Scene Classification', Pattern Recognition, vol. 46, no. 2, pp. 483-496.
Recently, we have witnessed a surge of interests of learning a low-dimensional subspace for scene classification. The existing methods do not perform well since they do not consider scenes' multiple features from different views in low-dimensional subspace construction. In this paper, we describe scene images by finding a group of features and explore their complementary characteristics. We consider the problem of multiview dimensionality reduction by learning a unified low-dimensional subspace to effectively fuse these features. The new proposed method takes both intraclass and interclass geometries into consideration, as a result the discriminability is effectively preserved because it takes into account neighboring samples which have different labels. Due to the semantic gap, the fusion of multiview features still cannot achieve excellent performance of scene classification in real applications. Therefore, a user labeling procedure is introduced in our approach. Initially, a query image is provided by the user, and a group of images are retrieved by a search engine. After that, users label some images in the retrieved set as relevant or irrelevant with the query. The must-links are constructed between the relevant images, and the cannot-links are built between the irrelevant images. Finally, an alternating optimization procedure is adopted to integrate the complementary nature of different views with the user labeling information, and develop a novel multiview dimensionality reduction method for scene classification. Experiments are conducted on the real-world datasets of natural scenes and indoor scenes, and the results demonstrate that the proposed method has the best performance in scene classification. In addition, the proposed method can be applied to other classification problems. The experimental results of shape classification on Caltech 256 suggest the effectiveness of our method.
Shen, H., Tao, D. & Ma, D. 2013, 'Multiview Locally Linear Embedding for Effective Medical Image Retrieval', Plos One, vol. 8, no. 12, pp. 1-21.
Content-based medical image retrieval continues to gain attention for its potential to assist radiological image interpretation and decision making. Many approaches have been proposed to improve the performance of medical image retrieval system, among wh
Liu, T., Sachdev, P., Lipnicki, D., Jiang, J., Geng, G., Zhu, W., Reppermund, S., Tao, D., Trollor, J., Brodaty, H. & Wen, W. 2013, 'Limited relationships between two-year changes in sulcal morphology and other common neuroimaging indices in the elderly', Neuroimage, vol. 83, no. 1, pp. 12-17.
Measuring the geometry or morphology of sulcal folds has recently become an important approach to investigating neuroanatomy. However, relationships between cortical sulci and other brain structures are poorly understood. The present study investigates h
Du, B., Zhang, L., Tao, D. & Zhang, D. 2013, 'Unsupervised transfer learning for target detection from hyperspectral images', Neurocomputing, vol. 120, no. 1, pp. 72-82.
Target detection has been of great interest in hyperspectral image analysis. Feature extraction from target samples and counterpart backgrounds consist the key to the problem. Traditional target detection methods depend on comparatively fixed feature for
Peng, B., Wu, J., Yuan, H., Guo, Q. & Tao, D. 2013, 'ANEEC: A Quasi-Automatic System for Massive Named Entity Extraction and Categorization', Computer Journal, vol. 56, no. 11, pp. 1328-1346.
Named entity recognition seeks to locate atomic elements in texts and classify them into predefined categories. It is essentially useful for many applications, including microblog analysis and query suggestion. In recent years, with the explosion of Web
Li, J. & Tao, D. 2013, 'A Bayesian Hierarchical Factorization Model for Vector Fields', IEEE Transactions On Image Processing, vol. 22, no. 11, pp. 4510-4521.
Factorization-based techniques explain arrays of observations using a relatively small number of factors and provide an essential arsenal for multi-dimensional data analysis. Most factorization models are, however, developed on general arrays of scalar v
Zhang, K., Gao, X., Tao, D. & Li, X. 2013, 'Single Image Super-Resolution With Multiscale Similarity Learning', IEEE Transactions On Neural Networks And Learning Systems, vol. 24, no. 10, pp. 1648-1659.
Example learning-based image super-resolution (SR) is recognized as an effective way to produce a high-resolution (HR) image with the help of an external training set. The effectiveness of learning-based SR methods, however, depends highly upon the consi
Wang, N., Tao, D., Gao, X., Li, X. & Li, J. 2013, 'Transductive Face Sketch-Photo Synthesis', IEEE Transactions On Neural Networks And Learning Systems, vol. 24, no. 9, pp. 1364-1376.
Face sketch-photo synthesis plays a critical role in many applications, such as law enforcement and digital entertainment. Recently, many face sketch-photo synthesis methods have been proposed under the framework of inductive learning, and these have obt
Zhen, X., Shao, L., Tao, D. & Li, X. 2013, 'Embedding Motion and Structure Features for Action Recognition', IEEE Transactions On Circuits And Systems For Video Technology, vol. 23, no. 7, pp. 1182-1190.
We propose a novel method to model human actions by explicitly coding motion and structure features that are separately extracted from video sequences. Firstly, the motion template (one feature map) is applied to encode the motion information and image p
Tao, D., Wang, D. & Murtagh, F. 2013, 'Machine learning in intelligent image processing', Signal Processing, vol. 93, no. 6, pp. 1399-1400.
NA
Zhou, T. & Tao, D. 2013, 'Double Shrinking for Sparse Dimension Reduction', IEEE Transactions On Image Processing, vol. 22, no. 1, pp. 244-257.
Learning tasks such as classification and clustering usually perform better and cost less (time and space) on compressed representations than on the original data. Previous works mainly compress data via dimension reduction. In this paper, we propose double shrinking to compress image data on both dimensionality and cardinality via building either sparse low-dimensional representations or a sparse projection matrix for dimension reduction. We formulate a double shrinking model (DSM) as an l1 regularized variance maximization with constraint ||x||2=1, and develop a double shrinking algorithm (DSA) to optimize DSM. DSA is a path-following algorithm that can build the whole solution path of locally optimal solutions of different sparse levels. Each solution on the path is a warm start for searching the next sparser one. In each iteration of DSA, the direction, the step size, and the Lagrangian multiplier are deduced from the Karush-Kuhn-Tucker conditions. The magnitudes of trivial variables are shrunk and the importances of critical variables are simultaneously augmented along the selected direction with the determined step length. Double shrinking can be applied to manifold learning and feature selections for better interpretation of features, and can be combined with classification and clustering to boost their performance. The experimental results suggest that double shrinking produces efficient and effective data compression.
Gao, X., Gao, F., Tao, D. & Li, X. 2013, 'Universal Blind Image Quality Assessment Metrics Via Natural Scene Statistics and Multiple Kernel Learning', IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 12, pp. 2013-2026.
Universal blind image quality assessment (IQA) metrics that can work for various distortions are of great importance for image processing systems, because neither ground truths are available nor the distortion types are aware all the time in practice. Existing state-of-the-art universal blind IQA algorithms are developed based on natural scene statistics (NSS). Although NSS-based metrics obtained promising performance, they have some limitations: 1) they use either the Gaussian scale mixture model or generalized Gaussian density to predict the nonGaussian marginal distribution of wavelet, Gabor, or discrete cosine transform coefficients. The prediction error makes the extracted features unable to reflect the change in nonGaussianity (NG) accurately. The existing algorithms use the joint statistical model and structural similarity to model the local dependency (LD). Although this LD essentially encodes the information redundancy in natural images, these models do not use information divergence to measure the LD. Although the exponential decay characteristic (EDC) represents the property of natural images that large/small wavelet coefficient magnitudes tend to be persistent across scales, which is highly correlated with image degradations, it has not been applied to the universal blind IQA metrics; and 2) all the universal blind IQA metrics use the same similarity measure for different features for learning the universal blind IQA metrics, though these features have different properties. To address the aforementioned problems, we propose to construct new universal blind quality indicators using all the three types of NSS, i.e., the NG, LD, and EDC, and incorporating the heterogeneous property of multiple kernel learning (MKL). By analyzing how different distortions affect these statistical properties, we present two universal blind quality assessment models, NSS global scheme and NSS two-step scheme. In the proposed metrics: 1) we exploit the NG of natural images u...
Cheng, J., Bian, W. & Tao, D. 2013, 'Locally regularized sliced inverse regression based 3D hand gesture recognition on a dance robot', Information Sciences, vol. 221, pp. 274-283.
Gesture recognition plays an important role in human machine interactions (HMIs) for multimedia entertainment. In this paper, we present a dimension reduction based approach for dynamic real-time hand gesture recognition. The hand gestures are recorded as acceleration signals by using a handheld with a 3-axis accelerometer sensor installed, and represented by discrete cosine transform (DCT) coefficients. To recognize different hand gestures, we develop a new dimension reduction method, locally regularized sliced inverse regression (LR-SIR), to find an effective low dimensional subspace, in which different hand gestures are well separable, following which recognition can be performed by using simple and efficient classifiers, e.g., nearest mean, k-nearest-neighbor rule and support vector machine. LR-SIR is built upon the well-known sliced inverse regression (SIR), but overcomes its limitation that it ignores the local geometry of the data distribution. Besides, LR-SIR can be effectively and efficiently solved by eigen-decomposition. Finally, we apply the LR-SIR based gesture recognition to control our recently developed dance robot for multimedia entertainment. Thorough empirical studies on `digits-gesture recognition suggest the effectiveness of the new gesture recognition scheme for HMI.
Guan, N., Zhang, X., Luo, Z., Tao, D. & Yang, X. 2013, 'Discriminant projective non-negative matrix factorization.', PloS one, vol. 8, no. 12, p. e83291.
Projective non-negative matrix factorization (PNMF) projects high-dimensional non-negative examples X onto a lower-dimensional subspace spanned by a non-negative basis W and considers W(T) X as their coefficients, i.e., XWW(T) X. Since PNMF learns the natural parts-based representation Wof X, it has been widely used in many fields such as pattern recognition and computer vision. However, PNMF does not perform well in classification tasks because it completely ignores the label information of the dataset. This paper proposes a Discriminant PNMF method (DPNMF) to overcome this deficiency. In particular, DPNMF exploits Fisher's criterion to PNMF for utilizing the label information. Similar to PNMF, DPNMF learns a single non-negative basis matrix and needs less computational burden than NMF. In contrast to PNMF, DPNMF maximizes the distance between centers of any two classes of examples meanwhile minimizes the distance between any two examples of the same class in the lower-dimensional subspace and thus has more discriminant power. We develop a multiplicative update rule to solve DPNMF and prove its convergence. Experimental results on four popular face image datasets confirm its effectiveness comparing with the representative NMF and PNMF algorithms.
Shen, H., Tao, D. & Ma, D. 2013, 'Dual-force ISOMAP: a new relevance feedback method for medical image retrieval.', PloS one, vol. 8, no. 12, p. e84096.
With great potential for assisting radiological image interpretation and decision making, content-based image retrieval in the medical domain has become a hot topic in recent years. Many methods to enhance the performance of content-based medical image retrieval have been proposed, among which the relevance feedback (RF) scheme is one of the most promising. Given user feedback information, RF algorithms interactively learn a user's preferences to bridge the "semantic gap" between low-level computerized visual features and high-level human semantic perception and thus improve retrieval performance. However, most existing RF algorithms perform in the original high-dimensional feature space and ignore the manifold structure of the low-level visual features of images. In this paper, we propose a new method, termed dual-force ISOMAP (DFISOMAP), for content-based medical image retrieval. Under the assumption that medical images lie on a low-dimensional manifold embedded in a high-dimensional ambient space, DFISOMAP operates in the following three stages. First, the geometric structure of positive examples in the learned low-dimensional embedding is preserved according to the isometric feature mapping (ISOMAP) criterion. To precisely model the geometric structure, a reconstruction error constraint is also added. Second, the average distance between positive and negative examples is maximized to separate them; this margin maximization acts as a force that pushes negative examples far away from positive examples. Finally, the similarity propagation technique is utilized to provide negative examples with another force that will pull them back into the negative sample set. We evaluate the proposed method on a subset of the IRMA medical image dataset with a RF-based medical image retrieval framework. Experimental results show that DFISOMAP outperforms popular approaches for content-based medical image retrieval in terms of accuracy and stability.
Guan, N., Wei, L., Luo, Z. & Tao, D. 2013, 'Limited-memory fast gradient descent method for graph regularized nonnegative matrix factorization.', PloS one, vol. 8, no. 10, p. e77162.
Graph regularized nonnegative matrix factorization (GNMF) decomposes a nonnegative data matrix X[Symbol:see text]R(m x n) to the product of two lower-rank nonnegative factor matrices, i.e.,W[Symbol:see text]R(m x r) and H[Symbol:see text]R(r x n) (r < min {m,n}) and aims to preserve the local geometric structure of the dataset by minimizing squared Euclidean distance or Kullback-Leibler (KL) divergence between X and WH. The multiplicative update rule (MUR) is usually applied to optimize GNMF, but it suffers from the drawback of slow-convergence because it intrinsically advances one step along the rescaled negative gradient direction with a non-optimal step size. Recently, a multiple step-sizes fast gradient descent (MFGD) method has been proposed for optimizing NMF which accelerates MUR by searching the optimal step-size along the rescaled negative gradient direction with Newton's method. However, the computational cost of MFGD is high because 1) the high-dimensional Hessian matrix is dense and costs too much memory; and 2) the Hessian inverse operator and its multiplication with gradient cost too much time. To overcome these deficiencies of MFGD, we propose an efficient limited-memory FGD (L-FGD) method for optimizing GNMF. In particular, we apply the limited-memory BFGS (L-BFGS) method to directly approximate the multiplication of the inverse Hessian and the gradient for searching the optimal step size in MFGD. The preliminary results on real-world datasets show that L-FGD is more efficient than both MFGD and MUR. To evaluate the effectiveness of L-FGD, we validate its clustering performance for optimizing KL-divergence based GNMF on two popular face image datasets including ORL and PIE and two text corpora including Reuters and TDT2. The experimental results confirm the effectiveness of L-FGD by comparing it with the representative GNMF solvers.
Zhang, C. & Tao, D. 2013, 'Risk bounds of learning processes for Lévy processes', Journal of Machine Learning Research, vol. 14, no. 1, pp. 351-376.
L&eacute;vy processes refer to a class of stochastic processes, for example, Poisson processes and Brownian motions, and play an important role in stochastic processes and machine learning. Therefore, it is essential to study risk bounds of the learning process for time-dependent samples drawn from a L&eacute;vy process (or briefly called learning process for L&eacute;vy process). It is noteworthy that samples in this learning process are not independently and identically distributed (i.i.d.). Therefore, results in traditional statistical learning theory are not applicable (or at least cannot be applied directly), because they are obtained under the sample-i.i.d. assumption. In this paper, we study risk bounds of the learning process for time-dependent samples drawn from a L&eacute;vy process, and then analyze the asymptotical behavior of the learning process. In particular, we first develop the deviation inequalities and the symmetrization inequality for the learning process. By using the resultant inequalities, we then obtain the risk bounds based on the covering number. Finally, based on the resulting risk bounds, we study the asymptotic convergence and the rate of convergence of the learning process for L&eacute;vy process. Meanwhile, we also give a comparison to the related results under the sample-i.i.d. assumption. &copy; 2013 Chao Zhang and Dacheng Tao.
Cheng, J., Xie, C., Bian, W. & Tao, D. 2012, 'Feature fusion for 3D hand gesture recognition by learning a shared hidden space', Pattern Recognition Letters, vol. 33, no. 4, pp. 476-484.
Hand gesture recognition has been intensively applied in various humancomputer interaction (HCI) systems. Different hand gesture recognition methods were developed based on particular features, e.g., gesture trajectories and acceleration signals. However, it has been noticed that the limitation of either features can lead to flaws of a HCI system. In this paper, to overcome the limitations but combine the merits of both features, we propose a novel feature fusion approach for 3D hand gesture recognition. In our approach, gesture trajectories are represented by the intersection numbers with randomly generated line segments on their 2D principal planes, acceleration signals are represented by the coefficients of discrete cosine transformation (DCT). Then, a hidden space shared by the two features is learned by using penalized maximum likelihood estimation (MLE). An iterative algorithm, composed of two steps per iteration, is derived to for this penalized MLE, in which the first step is to solve a standard least square problem and the second step is to solve a Sylvester equation. We tested our hand gesture recognition approach on different hand gesture sets. Results confirm the effectiveness of the feature fusion method.
Zhang, L., Zhang, L., Tao, D. & huang, X. 2012, 'On Combining Multiple Features for Hyperspectral Remote Sensing Image Classification', IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 3, pp. 879-893.
In hyperspectral remote sensing image classification, multiple features, e.g., spectral, texture, and shape features, are employed to represent pixels from different perspectives. It has been widely acknowledged that properly combining multiple features always results in good classification performance. In this paper, we introduce the patch alignment framework to linearly combine multiple features in the optimal way and obtain a unified low-dimensional representation of these multiple features for subsequent classification. Each feature has its particular contribution to the unified representation determined by simultaneously optimizing the weights in the objective function. This scheme considers the specific statistical properties of each feature to achieve a physically meaningful unified low-dimensional representation of multiple features. Experiments on the classification of the hyperspectral digital imagery collection experiment and reflective optics system imaging spectrometer hyperspectral data sets suggest that this scheme is effective.
Song, M., Tao, D., Chen, C., Bu, J., Luo, J. & Zhang, C. 2012, 'Probabilistic Exposure Fusion', IEEE Transactions On Image Processing, vol. 21, no. 1, pp. 341-357.
The luminance of a natural scene is often of high dynamic range (HDR). In this paper, we propose a new scheme to handle HDR scenes by integrating locally adaptive scene detail capture and suppressing gradient reversals introduced by the local adaptation. The proposed scheme is novel for capturing an HDR scene by using a standard dynamic range (SDR) device and synthesizing an image suitable for SDR displays. In particular, we use an SDR capture device to record scene details (i.e., the visible contrasts and the scene gradients) in a series of SDR images with different exposure levels. Each SDR image responds to a fraction of the HDR and partially records scene details. With the captured SDR image series, we first calculate the image luminance levels, which maximize the visible contrasts, and then the scene gradients embedded in these images. Next, we synthesize an SDR image by using a probabilistic model that preserves the calculated image luminance levels and suppresses reversals in the image luminance gradients. The synthesized SDR image contains much more scene details than any of the captured SDR image. Moreover, the proposed scheme also functions as the tone mapping of an HDR image to the SDR image, and it is superior to both global and local tone mapping operators. This is because global operators fail to preserve visual details when the contrast ratio of a scene is large, whereas local operators often produce halos in the synthesized SDR image. The proposed scheme does not require any human interaction or parameter tuning for different scenes. Subjective evaluations have shown that it is preferred over a number of existing approaches.
Song, M., Tao, D., Huang, X., Chen, C. & Bu, J. 2012, 'Three-Dimensional Face Reconstruction From A Single Image By A Coupled RBF Network', Ieee Transactions On Image Processing, vol. 21, no. 5, pp. 2887-2897.
Reconstruction of a 3-D face model from a single 2-D face image is fundamentally important for face recognition and animation because the 3-D face model is invariant to changes of viewpoint, illumination, background clutter, and occlusions. Given a coupl
huang, Q., Tao, D., li, X. & Liew, A. 2012, 'Parallelized Evolutionary Learning for detection of Biclusters in Gene Expression Data', IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 560-570.
The analysis of gene expression data obtained from microarray experiments is important for discovering the biological process of genes. Biclustering algorithms have been proven to be able to group the genes with similar expression patterns under a number of experimental conditions. In this paper, we propose a new biclustering algorithm based on evolutionary learning. By converting the biclustering problem into a common clustering problem, the algorithm can be applied in a search space constructed by the conditions. To further reduce the size of the search space, we randomly separate the full conditions into a number of condition subsets (subspaces), each of which has a smaller number of conditions. The algorithm is applied to each subspace and is able to discover bicluster seeds within a limited computing time. Finally, an expanding and merging procedure is employed to combine the bicluster seeds into larger biclusters according to a homogeneity criterion. We test the performance of the proposed algorithm using synthetic and real microarray data sets. Compared with several previously developed biclustering algorithms, our algorithm demonstrates a significant improvement in discovering additive biclusters.
Liu, T.T., Lipnicki, D., Zhu, W., Tao, D., Zhang, C., Cui, Y., Jin, J., Sachdev, P. & Wen, W. 2012, 'Cortical Gyrification And Sulcal Spans In Early Stage Alzheimer'S Disease', PLoS One, vol. 7, no. 2, pp. 1-5.
Alzheimer's disease (AD) is characterized by an insidious onset of progressive cerebral atrophy and cognitive decline. Previous research suggests that cortical folding and sulcal width are associated with cognitive function in elderly individuals, and th
Tang, J., Zha, Z., Tao, D. & Chua, T. 2012, 'Semantic-Gap-Oriented Active Learning For Multilabel Image Annotation', Ieee Transactions On Image Processing, vol. 21, no. 4, pp. 2354-2360.
User interaction is an effective way to handle the semantic gap problem in image annotation. To minimize user effort in the interactions, many active learning methods were proposed. These methods treat the semantic concepts individually or correlatively.
Zheng, S., Huang, K., tan, T. & Tao, D. 2012, 'A Cascade Fusion Scheme For Gait And Cumulative Foot Pressure Image Recognition', Pattern Recognition, vol. 45, no. 10, pp. 3603-3610.
Cumulative foot pressure images represent the 2D ground reaction force during one gait cycle. Biomedical and forensic studies show that humans can be distinguished by unique limb movement patterns and ground reaction force. Considering continuous gait po
Deng, X., Shen, Y., Song, M., Tao, D., Bu, J. & Chen, C. 2012, 'Video-Based Non-Uniform Object Motion Blur Estimation And Deblurring', Neurocomputing, vol. 86, no. 1, pp. 170-178.
Motion deblurring is a challenging problem in computer vision. Most previous blind deblurring approaches usually assume that the Point Spread Function (PSF) is spatially invariant. However, non-uniform motions exist ubiquitously and cannot be handled suc
Bian, W., Tao, D. & Rui, Y. 2012, 'Cross-Domain Human Action Recognition', Ieee Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 42, no. 2, pp. 298-307.
Conventional human action recognition algorithms cannot work well when the amount of training videos is insufficient. We solve this problem by proposing a transfer topic model (TTM), which utilizes information extracted from videos in the auxiliary domai
Yu, J.X., Bian, W., Song, M., Cheng, J.L. & Tao, D. 2012, 'Graph Based Transductive Learning For Cartoon Correspondence Construction', Neurocomputing, vol. 79, pp. 105-114.
Correspondence construction of characters in key frames is the prerequisite for cartoon animations' automatic inbetweening and coloring. Since each frame of an animation consists of multiple layers, characters are complicated in terms of shape and struct
Yu, J.X., Tao, D. & Wang, M. 2012, 'Adaptive Hypergraph Learning And Its Application In Image Classification', Ieee Transactions On Image Processing, vol. 21, no. 7, pp. 3262-3272.
Recent years have witnessed a surge of interest in graph-based transductive image classification. Existing simple graph-based transductive learning methods only model the pairwise relationship of images, however, and they are sensitive to the radius para
An, L., Gao, X., Yuan, Y., Tao, D., Deng, C. & Ji, F. 2012, 'Content-Adaptive Reliable Robust Lossless Data Embedding', Neurocomputing, vol. 79, no. 1, pp. 1-11.
It is well known that robust lossless data embedding (RLDE) methods can be used to protect copyright of digital images when the intactness of host images is highly demanded and the unintentional attacks may be encountered in data communication. However,
Gao, X., Zhang, K., Tao, D. & Li, X. 2012, 'Image Super-Resolution With Sparse Neighbor Embedding', Ieee Transactions On Image Processing, vol. 21, no. 7, pp. 3194-3205.
Until now, neighbor-embedding-based (NE) algorithms for super-resolution (SR) have carried out two independent processes to synthesize high-resolution (HR) image patches. In the first process, neighbor search is performed using the Euclidean distance met
Gao, X., Zhang, K., Tao, D. & Li, X. 2012, 'Joint Learning For Single-Image Super-Resolution Via A Coupled Constraint', IEEE Transactions On Image Processing, vol. 21, no. 2, pp. 469-480.
The neighbor-embedding (NE) algorithm for single-image super-resolution (SR) reconstruction assumes that the feature spaces of low-resolution (LR) and high-resolution (HR) patches are locally isometric. However, this is not true for SR because of one-to-
Guan, N., Tao, D., Luo, Z. & Yuan, B. 2012, 'Online Nonnegative Matrix Factorization With Robust Stochastic Approximation', IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 7, pp. 1087-1099.
Nonnegative matrix factorization (NMF) has become a popular dimension-reduction method and has been widely applied to image processing and pattern recognition problems. However, conventional NMF learning methods require the entire dataset to reside in th
Zhang, Z. & Tao, D. 2012, 'Slow Feature Analysis For Human Action Recognition', Ieee Transactions On Pattern Analysis And Machine Intelligence, vol. 34, no. 3, pp. 436-450.
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal [1]. It has been successfully applied to modeling the visual receptive fields of the cortical neurons. Sufficient experimental results in neuroscience sugges
Guan, N., Tao, D., Luo, Z. & Yuan, B. 2012, 'NeNMF: An Optimal Gradient Method For Nonnegative Matrix Factorization', Ieee Transactions On Signal Processing, vol. 60, no. 6, pp. 2882-2898.
Nonnegative matrix factorization (NMF) is a powerful matrix decomposition technique that approximates a nonnegative matrix by the product of two low-rank nonnegative matrix factors. It has been widely applied to signal processing, computer vision, and da
Zhou, T., Tao, D. & Wu, X. 2012, 'Compressed Labeling On Distilled Labelsets For Multi-Label Learning', Machine Learning, vol. 88, no. 1-2, pp. 69-126.
Directly applying single-label classification methods to the multi-label learning problems substantially limits both the performance and speed due to the imbalance, dependence and high dimensionality of the given label matrix. Existing methods either ign
Li, J., Tao, D. & Li, X. 2012, 'A probabilistic model for image representation via multiple patterns', Pattern Recognition, vol. 45, no. 11, pp. 4044-4053.
For image analysis, an important extension to principal component analysis (PCA) is to treat an image as multiple samples, which helps alleviate the small sample size problem. Various schemes of transforming an image to multiple samples have been proposed. Although having been shown effective in practice, the schemes are mainly based on heuristics and experience. In this paper, we propose a probabilistic PCA model, in which we explicitly represent the transformation scheme and incorporate the scheme as a stochastic component of the model. Therefore fitting the model automatically learns the transformation. Moreover, the learned model allows us to distinguish regions that can be well described by the PCA model from those that need further treatment. Experiments on synthetic images and face data sets demonstrate the properties and utility of the proposed model
Li, J. & Tao, D. 2012, 'On Preserving Original Variables in Bayesian PCA with Application to Image Analysis', IEEE Transactions On Image Processing, vol. 21, no. 12, pp. 4830-4843.
Principal component analysis (PCA) computes a succinct data representation by converting the data to a few new variables while retaining maximum variation. However, the new variables are dif?cult to interpret, because each one is combined with all of the original input variables and has obscure semantics. Under the umbrella of Bayesian data analysis, this paper presents a new prior to explicitly regularize combinations of input variables. In particular, the prior penalizes pair-wise products of the coef?cients of PCA and encourages a sparse model. Compared to the commonly used 1 -regularizer, the proposed prior encourages the sparsity pattern in the resultant coef?cients to be consistent with the intrinsic groups in the original input variables. Moreover, the proposed prior can be explained as recovering a robust estimation of the covariance matrix for PCA. The proposed model is suited for analyzing visual data, where it encourages the output variables to correspond to meaningful parts in the data. We demonstrate the characteristics and effectiveness of the proposed technique through experiments on both synthetic and real data.
Jiang, J., Cheng, J. & Tao, D. 2012, 'Color Biological Features-based Solder Paste Defects Detection And Classification On Printed Circuit Boards', IEEE Transactions on Components, Packaging, and Manufaturing Technology, vol. 2, no. 9, pp. 1536-1544.
Deposited solder paste inspection plays a critical role in surface mounting processes. When detecting solder pastes defects on a printed circuit board, profile measurement-based methods suffer from large system size, high cost, and low speed for inspecti
Zhang, C. & Tao, D. 2012, 'Generalization Bounds Of Erm-based Learning Processes For Continuous-time Markov Chains', IEEE Transactions On Neural Networks And Learning Systems, vol. 23, no. 12, pp. 1872-1883.
Many existing results on statistical learning theory are based on the assumption that samples are independently and identically distributed (i.i.d.). However, the assumption of i.i.d. samples is not suitable for practical application to problems in which
Bian, W. & Tao, D. 2012, 'Constrained Empirical Risk Minimization Framework For Distance Metric Learning', IEEE Transactions On Neural Networks And Learning Systems, vol. 23, no. 8, pp. 1194-1205.
Distance metric learning (DML) has received increasing attention in recent years. In this paper, we propose a constrained empirical risk minimization framework for DML. This framework enriches the state-of-the-art studies on both theoretic and algorithmi
Tian, X., Tao, D. & Rui, Y. 2012, 'Sparse Transfer Learning For Interactive Video Search Reranking', ACM Transactions on Multimedia Computing Communications and Applications, vol. 8, no. 3, pp. 1-19.
Visual reranking is effective to improve the performance of the text-based video search. However, existing reranking algorithms can only achieve limited improvement because of the well-known semantic gap between low-level visual features and high-level s
Gao, X., Wang, N., Tao, D. & Li, X. 2012, 'Face Sketch-photo Synthesis And Retrieval Using Sparse Representation', IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 8, pp. 1213-1226.
Sketch-photo synthesis plays an important role in sketch-based face photo retrieval and photo-based face sketch retrieval systems. In this paper, we propose an automatic sketch-photo synthesis and retrieval algorithm based on sparse representation. The p
Zhang, C., Bian, W., Tao, D. & Weisi, L. 2012, 'Discretized-Vapnik-Chervonenkis Dimension For Analyzing Complexity Of Real Function Classes', IEEE Transactions On Neural Networks And Learning Systems, vol. 23, no. 9, pp. 1461-1472.
In this paper, we introduce the discretized-Vapnik-Chervonenkis (VC) dimension for studying the complexity of a real function class, and then analyze properties of real function classes and neural networks. We first prove that a countable traversal set i
An, L., Gao, X., Yuan, Y. & Tao, D. 2012, 'Robust Lossless Data Hiding Using Clustering And Statistical Quantity Histogram', Neurocomputing, vol. 77, no. 1, pp. 1-11.
Lossless data hiding methods usually fail to recover the hidden messages completely when the watermarked images are attacked. Therefore, the robust lossless data hiding (RLDH), or the robust reversible watermarking technique, is urgently needed to effect
Gao, Y., Wang, M., Tao, D., Ji, R. & Dai, Q. 2012, '3-D Object Retrieval And Recognition With Hypergraph Analysis', IEEE Transactions On Image Processing, vol. 21, no. 9, pp. 4290-4303.
View-based 3-D object retrieval and recognition has become popular in practice, e. g., in computer aided design. It is difficult to precisely estimate the distance between two objects represented by multiple views. Thus, current view-based 3-D object ret
Li, Y., Geng, B., Yang, L., Xu, C. & Bian, W. 2012, 'Query Difficulty Estimation For Image Retrieval', Neurocomputing, vol. 95, no. NA, pp. 48-53.
Query difficulty estimation predicts the performance of the search result of the given query. It is a powerful tool for multimedia retrieval and receives increasing attention. It can guide the pseudo relevance feedback to rerank the image search results
Zhang, K., Mu, G., Yuan, Y., Gao, X. & Tao, D. 2012, 'Video Super-resolution With 3D Adaptive Normalized Convolution', Neurocomputing, vol. 94, no. NA, pp. 140-151.
The classic multi-image-based super-resolution (SR) methods typically take global motion pattern to produce one or multiple high-resolution (HR) versions from a set of low-resolution (LR) images. However, due to the influence of aliasing and noise, it is
Han, Y., Wu, F., Tao, D., Shao, J., Zhuang, Y. & Jiang, J. 2012, 'Sparse Unsupervised Dimensionality Reduction For Multiple View Data', IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 10, pp. 1485-1496.
Different kinds of high-dimensional visual features can be extracted from a single image. Images can thus be treated as multiple view data when taking each type of extracted high-dimensional visual feature as a particular understanding of images. In this
An, L., Gao, X., Li, X., Tao, D., Deng, C. & Li, J. 2012, 'Robust Reversible Watermarking Via Clustering And Enhanced Pixel-wise Masking', IEEE Transactions On Image Processing, vol. 21, no. 8, pp. 3598-3611.
Robust reversible watermarking (RRW) methods are popular in multimedia for protecting copyright, while preserving intactness of host images and providing robustness against unintentional attacks. However, conventional RRW methods are not readily applicab
Li, Y., Geng, B., Tao, D., Zha, Z., Yang, L. & Xu, C. 2012, 'Difficulty Guided Image Retrieval Using Linear Multiple Feature Embedding', IEEE Transactions On Multimedia, vol. 14, no. 6, pp. 1618-1630.
Existing image retrieval systems suffer from a performance variance for different queries. Severe performance variance may greatly degrade the effectiveness of the subsequent query-dependent ranking optimization algorithms, especially those that utilize
Liu, X., Song, M., Zhao, Q., Tao, D., Chen, C. & Bu, J. 2012, 'Attribute-restricted Latent Topic Model For Person Re-identification', Pattern Recognition, vol. 45, no. 12, pp. 4204-4213.
Searching for specific persons from surveillance videos captured by different cameras, known as person re-identification, is a key yet under-addressed challenge. Difficulties arise from the large variations of human appearance in different poses, and fro
Cheng, J., Tao, D., Liu, J., Wong, D.W., Tan, N., Wong, T.Y. & Saw, S. 2012, 'Peripapillary Atrophy Detection By Sparse Biologically Inspired Feature Manifold', IEEE Transactions on Medical Imaging, vol. 31, no. 12, pp. 2355-2365.
Peripapillary atrophy (PPA) is an atrophy of pre-existing retina tissue. Because of its association with eye diseases such as myopia and glaucoma, PPA is an important indicator for diagnosis of these diseases. Experienced ophthalmologists are able to det
Geng, B., Li, Y., Tao, D., Wang, M., Zha, Z. & Xu, C. 2012, 'Parallel lasso for large-scale video concept detection', IEEE Transactions On Multimedia, vol. 14, no. 1, pp. 55-65.
Existing video concept detectors are generally built upon the kernel based machine learning techniques, e.g., support vector machines, regularized least squares, and logistic regression, just to name a few. However, in order to build robust detectors, the learning process suffers from the scalability issues including the high-dimensional multi-modality visual features and the large-scale keyframe examples. In this paper, we propose parallel lasso (Plasso) by introducing the parallel distributed computation to significantly improve the scalability of lasso (the l1 regularized least squares). We apply the parallel incomplete Cholesky factorization to approximate the covariance statistics in the preprocess step, and the parallel primal-dual interior-point method with the Sherman-Morrison-Woodbury formula to optimize the model parameters. For a dataset with n samples in a d-dimensional space, compared with lasso, Plasso significantly reduces complexities from the original O(d3) for computational time and O(d2) for storage space to O(h2d/m) and O(hd/m) , respectively, if the system has m processors and the reduced dimension h is much smaller than the original dimension d
Yu, J., Cheng, J. & Tao, D. 2012, 'Interactive cartoon reusing by transfer learning', Signal Processing, vol. 92, no. 9, pp. 2147-2158.
Cartoon character retrieval is critical for cartoonists to effectively and efficiently make cartoons by reusing existing cartoon data. To successfully achieve these tasks, it is essential to extract visual features to comprehensively represent cartoon characters and accurately estimate dissimilarity between cartoon characters. In this paper, we define three visual features: Hausdorff contour feature (HCF), color histogram (CH) and motion feature (MF), to characterize the shape, color and motion structure information of a cartoon character. The HCF can be referred as intra-features, and the features of CH and MF can be regarded as inter-feature. However, due to the semantic gap, the cartoon retrieval by using these visual features still cannot achieve excellent performance. Since the labeling information has been proven effective to reduce the semantic gap, we introduce a labeling procedure called interactive cartoon labeling (ICL). The labeling information actually reflects users retrieval purpose. A new dimension reduction tool, termed sparse transfer learning (SPA-TL), is adopted to effectively and efficiently encode users search intention. In particular, SPA-TL exploits two pieces of knowledge data, i.e., the labeling knowledge contained in labeled data and the data distribution knowledge contained in all samples (labeled and unlabeled). The low-dimensional subspace is obtained by transferring the user feedback knowledge from labeled samples to unlabeled samples by preserving the sample distribution knowledge. Experimental evaluations in cartoon synthesis suggest the effectiveness of the visual features and SPA-TL.
He, L., Wang, D., Li, X., Tao, D., Gao, X. & Fei, G. 2012, 'Color fractal structure model for reduced-reference colorful image quality assessment', Lecture Notes in Computer Science, vol. 7664, pp. 401-408.
Developing reduced reference image quality assessment (RR-IQA) plays a vital role in dealing with the prediction of the visual quality of distorted images. However, most of existing methods fail to take color information into consideration, although the color distortion is significant for the increasing color images. To solve the aforementioned problem, this paper proposed a novel IQA method which focuses on the color distortion. In particular, we extract color features based on the model of color fractal structure. Then the color and structure features are mapped into visual quality using the support vector regression. Experimental results on the LIVE II database demonstrate that the proposed method has a good consistency with the human perception especially on images with color distortion.
Yu, J., Liu, D., Tao, D. & Seah, H. 2012, 'On Combining Multiple Features for Cartoon Character Retrieval and Clip Synthesis', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 5, pp. 1413-1427.
How do we retrieve cartoon characters accurately? Or how to synthesize new cartoon clips smoothly and efficiently from the cartoon library? Both questions are important for animators and cartoon enthusiasts to design and create new cartoons by utilizing existing cartoon materials. The first key issue to answer those questions is to find a proper representation that describes the cartoon character effectively. In this paper, we consider multiple features from different views, i.e., color histogram, Hausdorff edge feature, and skeleton feature, to represent cartoon characters with different colors, shapes, and gestures. Each visual feature reflects a unique characteristic of a cartoon character, and they are complementary to each other for retrieval and synthesis. However, how to combine the three visual features is the second key issue of our application. By simply concatenating them into a long vector, it will end up with the so-called curse of dimensionality, let alone their heterogeneity embedded in different visual feature spaces. Here, we introduce a semisupervised multiview subspace learning (semi-MSL) algorithm, to encode different features in a unified space. Specifically, under the patch alignment framework, semi- MSL uses the discriminative information from labeled cartoon characters in the construction of local patches where the manifold structure revealed by unlabeled cartoon characters is utilized to capture the geometric distribution. The experimental evaluations based on both cartoon character retrieval and clip synthesis demonstrate the effectiveness of the proposed method for cartoon application. Moreover, additional results of content-based image retrieval on benchmark data suggest the generality of semi-MSL for other applications.
Hong, Z., Mei, X. & Tao, D. 2012, 'Dual-Force Metric Learning for Robust Distracter-Resistant Tracker', Lecture Notes in Computer Science, vol. 7572, pp. 513-527.
In this paper, we propose a robust distracter-resistant tracking approach by learning a discriminative metric that adaptively learns the importance of features on-the-fly
Zhang, Z., Cheng, J., Li, J., Bian, W. & Tao, D. 2012, 'Segment-Based Features for Time Series Classification', Computer Journal, vol. 55, no. 9, pp. 1088-1102.
In this paper, we propose an approach termed segment-based features (SBFs) to classify time series. The approach is inspired by the success of the component- or part-based methods of object recognition in computer vision, in which a visual object is described as a number of characteristic parts and the relations among the parts. Utilizing this idea in the problem of time series classification, a time series is represented as a set of segments and the corresponding temporal relations. First, a number of interest segments are extracted by interest point detection with automatic scale selection. Then, a number of feature prototypes are collected by random sampling from the segment set, where each feature prototype may include single segment or multiple ordered segments. Subsequently, each time series is transformed to a standard feature vector, i.e. SBF, where each entry in the SBF is calculated as the maximum response (maximum similarity) of the corresponding feature prototype to the segment set of the time series.
Yu, J., Wang, M. & Tao, D. 2012, 'Semisupervised multiview distance metric learning for cartoon synthesis', IEEE Transactions On Image Processing, vol. 21, no. 11, pp. 4636-4648.
In image processing, cartoon character classification, retrieval, and synthesis are critical, so that cartoonists can effectively and efficiently make cartoons by reusing existing cartoon data. To successfully achieve these tasks, it is essential to extract visual features that comprehensively represent cartoon characters and to construct an accurate distance metric to precisely measure the dissimilarities between cartoon characters. In this paper, we introduce three visual features, color histogram, shape context, and skeleton, to characterize the color, shape, and action, respectively, of a cartoon character. These three features are complementary to each other, and each feature set is regarded as a single view. However, it is improper to concatenate these three features into a long vector, because they have different physical properties, and simply concatenating them into a high-dimensional feature vector will suffer from the so-called curse of dimensionality. Hence, we propose a semisupervised multiview distance metric learning (SSM-DML). SSM-DML learns the multiview distance metrics from multiple feature sets and from the labels of unlabeled cartoon characters simultaneously, under the umbrella of graph-based semisupervised learning. SSM-DML discovers complementary characteristics of different feature sets through an alternating optimization-based iterative algorithm. Therefore, SSM-DML can simultaneously accomplish cartoon character classification and dissimilarity measurement. On the basis of SSM-DML, we develop a novel system that composes the modules of multiview cartoon character classification, multiview graph-based cartoon synthesis, and multiview retrieval-based cartoon synthesis. Experimental evaluations based on the three modules suggest the effectiveness of SSM-DML in cartoon applications.
Zhang, K., Gao, X., Tao, D. & Li, X. 2012, 'Single image super-resolution with non-local means and steering kernel regression', IEEE Transactions On Image Processing, vol. 21, no. 11, pp. 4544-4556.
Image super-resolution (SR) reconstruction is essentially an ill-posed problem, so it is important to design an effective prior. For this purpose, we propose a novel image SR method by learning both non-local and local regularization priors from a given low-resolution image. The non-local prior takes advantage of the redundancy of similar patches in natural images, while the local prior assumes that a target pixel can be estimated by a weighted average of its neighbors. Based on the above considerations, we utilize the non-local means ?lter to learn a non-local prior and the steering kernel regression to learn a local prior. By assembling the two complementary regularization terms, we propose a maximum a posteriori probability framework for SR recovery. Thorough experimental results suggest that the proposed SR method can reconstruct higher quality results both quantitatively and perceptually
Wang, M., Li, H., Tao, D., Lu, K. & Wu, X. 2012, 'Multimodal graph-based reranking for web image search', IEEE Transactions On Image Processing, vol. 21, no. 11, pp. 4649-4661.
This paper introduces a web image search reranking approach that explores multiple modalities in a graphbased learning scheme. Different from the conventional methods that usually adopt a single modality or integrate multiple modalities into a long feature vector, our approach can effectively integrate the learning of relevance scores, weights of modalities, and the distance metric and its scaling for each modality into a uni?ed scheme. In this way, the effects of different modalities can be adaptively modulated and better reranking performance can be achieved. We conduct experiments on a large dataset that contains more than 1000 queries and 1 million images to evaluate our approach. Experimental results demonstrate that the proposed reranking approach is more robust than using each individual modality, and it also performs better than many existing methods.
Geng, B., Tao, D., Xu, C., Yang, L. & Hua, X. 2012, 'Ensemble Manifold Regularization', IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 34, no. 6, pp. 1227-1233.
We propose an automatic approximation of the intrinsic manifold for general semi-supervised learning (SSL) problems. Unfortunately, it is not trivial to define an optimization function to obtain optimal hyperparameters. Usually, cross validation is applied, but it does not necessarily scale up. Other problems derive from the suboptimality incurred by discrete grid search and the overfitting. Therefore, we develop an ensemble manifold regularization (EMR) framework to approximate the intrinsic manifold by combining several initial guesses. Algorithmically, we designed EMR carefully so it 1) learns both the composite manifold and the semi-supervised learner jointly, 2) is fully automatic for learning the intrinsic manifold hyperparameters implicitly, 3) is conditionally optimal for intrinsic manifold approximation under a mild and reasonable assumption, and 4) is scalable for a large number of candidate manifold hyperparameters, from both time and space perspectives. Furthermore, we prove the convergence property of EMR to the deterministic matrix at rate root-n. Extensive experiments over both synthetic and real data sets demonstrate the effectiveness of the proposed framework.
Su, Y., Gao, X., Li, X. & Tao, D. 2012, 'Multivariate multilinear regression', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 6, pp. 1560-1573.
Conventional regression methods, such as multivariate linear regression (MLR) and its extension principal component regression (PCR), deal well with the situations that the data are of the form of low-dimensional vector. When the dimension grows higher, it leads to the under sample problem (USP): the dimensionality of the feature space is much higher than the number of training samples. However, little attention has been paid to such a problem. This paper first adopts an in-depth investigation to the USP in PCR, which answers three questions: 1) Why is USP produced? 2) What is the condition for USP, and 3) How is the influence of USP on regression. With the help of the above analysis, the principal components selection problem of PCR is presented. Subsequently, to address the problem of PCR, a multivariate multilinear regression (MMR) model is proposed which gives a substitutive solution to MLR, under the condition of multilinear objects. The basic idea of MMR is to transfer the multilinear structure of objects into the regression coefficients as a constraint. As a result, the regression problem is reduced to find two low-dimensional coefficients so that the principal components selection problem is avoided. Moreover, the sample size needed for solving MMR is greatly reduced so that USP is alleviated. As there is no closed-form solution for MMR, an alternative projection procedure is designed to obtain the regression matrices. For the sake of completeness, the analysis of computational cost and the proof of convergence are studied subsequently. Furthermore, MMR is applied to model the fitting procedure in the active appearance model (AAM). Experiments are conducted on both the carefully designed synthesizing data set and AAM fitting databases verified the theoretical analysis.
Yu, J., Cheng, J., Wang, J. & Tao, D. 2012, 'Transductive Cartoon Retrieval by Multiple Hypergraph Learning', Lecture Notes in Computer Science, vol. 7665, pp. 269-276.
Cartoon characters retrieval frequently suffers from the distance estimation problem. In this paper, a multiple hypergraph fusion based approach is presented to solve this problem. We build multiple hypergraphs on cartoon characters based on their features. In these hypergraphs, each vertex is a character, and an edge links to multiple vertices. In this way, the distance estimation between characters is avoided and the high-order relationship among characters can be explored. The experiments of retrieval are conducted on cartoon datasets, and the results demonstrate that the proposed approach can achieve better performance than state-of-the-arts methods.
Fei, G., Tao, D., Li, X., Gao, X. & He, L. 2012, 'Local Structure Divergence Index for Image Quality Assessment', Lecture Notes in Computer Science, vol. 7667, pp. 337-344.
Image quality assessment (IQA) algorithms are important for image-processing systems. And structure information plays a significant role in the development of IQA metrics. In contrast to existing structure driven IQA algorithms that measure the structure information using the normalized image or gradient amplitudes, we present a new Local Structure Divergence (LSD) index based on the local structures contained in an image. In particular, we exploit the steering kernels to describe local structures. Afterward, we estimate the quality of a given image by calculating the symmetric Kullback-Leibler divergence (SKLD) between kernels of the reference image and the distorted image. Experimental results on the LIVE database II show that LSD performs consistently with the human perception with a high confidence, and outperforms representative structure driven IQA metrics across various distortions
Gao, Y., Wang, M., Tao, D., Ji, R. & Dai, Q. 2012, '3D Object Retrieval and Recognition With Hypergraph Analysis', IEEE Transactions On Image Processing, vol. 21, no. 9, pp. 4290-4303.
View-based 3-D object retrieval and recognition has become popular in practice, e.g., in computer aided design. It is difficult to precisely estimate the distance between two objects represented by multiple views. Thus, current view-based 3-D object retrieval and recognition methods may not perform well. In this paper, we propose a hypergraph analysis approach to address this problem by avoiding the estimation of the distance between objects. In particular, we construct multiple hypergraphs for a set of 3-D objects based on their 2-D views. In these hypergraphs, each vertex is an object, and each edge is a cluster of views. Therefore, an edge connects multiple vertices. We define the weight of each edge based on the similarities between any two views within the cluster. Retrieval and recognition are performed based on the hypergraphs. Therefore, our method can explore the higher order relationship among objects and does not use the distance between objects. We conduct experiments on the National Taiwan University 3-D model dataset and the ETH 3-D object collection. Experimental results demonstrate the effectiveness of the proposed method by comparing with the state-of-the-art methods.
Si, S., Tao, D., Wang, M. & Chan, K.P. 2012, 'Social image annotation via cross-domain subspace learning', Multimedia Tools and Applications, vol. 56, no. 1, pp. 91-108.
In recent years, cross-domain learning algorithms have attracted much attention to solve labeled data insufficient problem. However, these cross-domain learning algorithms cannot be applied for subspace learning, which plays a key role in multimedia processing. This paper envisions the cross-domain discriminative subspace learning and provides an effective solution to cross-domain subspace learning. In particular, we propose the cross-domain discriminative locally linear embedding or CDLLE for short. CDLLE connects the training and the testing samples by minimizing the quadratic distance between the distribution of the training samples and that of the testing samples. Therefore, a common subspace for data representation can be preserved. We basically expect the discriminative information to separate the concepts in the training set can be shared to separate the concepts in the testing set as well and thus we have a chance to address above cross-domain problem duly. The margin maximization is duly adopted in CDLLE so the discriminative information for separating different classes can be well preserved. Finally, CDLLE encodes the local geometry of each training samples through a series of linear coefficients which can reconstruct a given sample by its intra-class neighbour samples and thus can locally preserve the intra-class local geometry. Experimental evidence on NUS-WIDE, a popular social image database collected from Flickr, and MSRA-MM, a popular real-world web image annotation database collected from the Internet by using Microsoft Live Search, demonstrates the effectiveness of CDLLE for real-world cross-domain applications. &copy; 2010 Springer Science+Business Media, LLC.
Cheng, J.L., Qiao, M., Bian, W. & Tao, D. 2011, '3D Human Posture Segmentation By Spectral Clustering With Surface Normal Constraint', Signal Processing, vol. 91, no. 9, pp. 2204-2212.
In this paper, we propose a new algorithm for partitioning human posture represented by 3D point clouds sampled from the surface of human body. The algorithm is formed as a constrained extension of the recently developed segmentation method, spectral clu
Guan, N., Tao, D., Luo, Z. & Yuan, B. 2011, 'Manifold Regularized Discriminative Non-negative Matrix Factorization With Fast Gradient Descent', IEEE Transactions On Image Processing, vol. 20, no. 7, pp. 2030-2048.
AbstractNonnegative matrix factorization (NMF) has become a popular data-representation method and has been widely used in image processing and pattern-recognition problems. This is because the learned bases can be interpreted as a natural parts-based representation of data and this interpretation is consistent with the psychological intuition of combining parts to form a whole. For practical classification tasks, however, NMF ignores both the local geometry of data and the discriminative information of different classes. In addition, existing research results show that the learned basis is unnecessarily parts-based because there is neither explicit nor implicit constraint to ensure the representation parts-based. In this paper, we introduce the manifold regularization and the margin maximization to NMF and obtain the manifold regularized discriminative NMF (MD-NMF) to overcome the aforementioned problems. The multiplicative update rule (MUR) can be applied to optimizing MD-NMF, but it converges slowly. In this paper, we propose a fast gradient descent (FGD) to optimize MD-NMF. FGD contains a Newton method that searches the optimal step length, and thus, FGD converges much faster than MUR. In addition,FGD includes MUR as a special case and can be applied to optimizing NMF and its variants. For a problem with 165 samples in .., FGD converges in 28 s, while MUR requires 282 s. We also apply FGD in a variant of MD-NMF and experimental results confirm its efficiency. Experimental results on several face image datasets suggest the effectiveness of MD-NMF.
Sha, T., Song, M., Bu, J., Chen, C. & Tao, D. 2011, 'Feature Level Analysis For 3D Facial Expression Recognition', Neurocomputing, vol. 74, no. 12-13, pp. 2135-2141.
3D facial expression recognition has great potential in human computer interaction and intelligent robot systems. In this paper, we propose a two-step approach which combines both the feature selection and the feature fusion techniques to choose more com
Bian, W. & Tao, D. 2011, 'Max-Min Distance Analysis By Using Sequential SDP Relaxation For Dimension Reduction', IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 33, no. 5, pp. 1037-1050.
Abstract - We propose a new criterion for discriminative dimension reduction, max-min distance analysis (MMDA). Given a data set with C classes, represented by homoscedastic Gaussians, MMDA maximizes the minimum pairwise distance of these C classes in the selected low-dimensional subspace. Thus, unlike Fishers linear discriminant analysis (FLDA) and other popular discriminative dimension reduction criteria, MMDA duly considers the separation of all class pairs. To deal with general case of data distribution, we also extend MMDA to kernel MMDA (KMMDA). Dimension reduction via MMDA/KMMDA leads to a nonsmooth max-min optimization problem with orthonormal constraints. We develop a sequential convex relaxation algorithm to solve it approximately. To evaluate the effectiveness of the proposed criterion and the corresponding algorithm, we conduct classification and data visualization experiments on both synthetic data and real data sets. Experimental results demonstrate the effectiveness of MMDA/KMMDA associated with the proposed optimization algorithm.
Xie, B., Wang, M. & Tao, D. 2011, 'Toward The Optimization Of Normalized Graph Laplacian', Ieee Transactions On Neural Networks, vol. 22, no. 4, pp. 660-666.
AbstractNormalized graph Laplacian has been widely used in many practical machine learning algorithms, e.g., spectral clustering and semisupervised learning. However, all of them use the Euclidean distance to construct the graph Laplacian, which does not necessarily reflect the inherent distribution of the data. In this brief, we propose a method to directly optimize the normalized graph Laplacian by using pairwise constraints. The learned graph is consistent with equivalence and nonequivalence pairwise relationships, and thus it can better represent similarity between samples. Meanwhile, our approach, unlike metric learning, automatically determines the scale factor during the optimization. The learned normalized Laplacian matrix can be directly applied in spectral clustering and semisupervised learning algorithms. Comprehensive experiments demonstrate the effectiveness of the proposed approach.
Gao, X., Wang, B., Tao, D. & Li, X. 2011, 'A Relay Level Set Method For Automatic Image Segmentation', Ieee Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 41, no. 2, pp. 518-525.
AbstractThis paper presents a new image segmentation method that applies an edge-based level set method in a relay fashion. The proposed method segments an image in a series of nested subregions that are automatically created by shrinking the stabilized curves in their previous subregions. The final result is obtained by combining all boundaries detected in these subregions. The proposed method has the following three advantages: 1) It can be automatically executed without humancomputer interactions; 2) it applies the edge-based level set method with relay fashion to detect all boundaries; and 3) it automatically obtains a full segmentation without specifying the number of relays in advance. The comparison experiments illustrate that the proposed method performs better than the representative level set methods, and it can obtain similar or better results compared with other popular segmentation algorithms.
Zhang, L., Zhang, L., Tao, D. & Huang, X. 2011, 'A Multifeature Tensor For Remote-Sensing Target Recognition', IEEE Geoscience and Remote Sensing Letters, vol. 8, no. 2, pp. 374-378.
In remote-sensing image target recognition, the target or background object is usually transformed to a feature vector, such as a spectral feature vector. However, this kind of vector represents only one pixel of a remote-sensing image that considers the
Liu, D., Chen, Q., Yu, J., Gu, H., Tao, D. & Seah, H. 2011, 'Stroke Correspondence Construction Using Manifold Learning', Computer Graphics Forum, vol. 30, no. 8, pp. 2194-2207.
Stroke correspondence construction is a precondition for generating inbetween frames from a set of key frames. In our case, each stroke in a key frame is a vector represented as a Disk B-Spline Curve (DBSC) which is a flexible and compact vector format. However, it is not easy to construct correspondences between multiple DBSC strokes effectively because of the following points: (1) with the use of shape descriptors, the dimensionality of the feature space is high; (2) the number of strokes in different key frames is usually large and different from each other and (3) the length of corresponding strokes can be very different. The first point makes matching difficult. The other two points imply many to many and part to whole correspondences between strokes. To solve these problems, this paper presents a DBSC stroke correspondence construction approach, which introduces a manifold learning technique to the matching process. Moreover, in order to handle the mapping between unequal numbers of strokes with different lengths, a stroke reconstruction algorithm is developed to convert the many to many and part to whole stroke correspondences to one to one compound stroke correspondence.
Gao, X., Wang, X., Li, X. & Tao, D. 2011, 'Transfer latent variable model based on divergence analysis', Pattern Recognition, vol. 44, no. 10-11, pp. 2358-2366.
Latent variable models are powerful dimensionality reduction approaches in machine learning and pattern recognition. However, this kind of methods only works well under a necessary and strict assumption that the training samples and testing samples are independent and identically distributed. When the samples come from different domains, the distribution of the testing dataset will not be identical with the training dataset. Therefore, the performance of latent variable models will be degraded for the reason that the parameters of the training model do not suit for the testing dataset. This case limits the generalization and application of the traditional latent variable models. To handle this issue, a transfer learning framework for latent variable model is proposed which can utilize the distance (or divergence) of the two datasets to modify the parameters of the obtained latent variable model. So we do not need to rebuild the model and only adjust the parameters according to the divergence, which will adopt different datasets. Experimental results on several real datasets demonstrate the advantages of the proposed framework. (C) 2010 Elsevier Ltd. All rights reserved.
Gao, X., Niu, Z., Tao, D. & Li, X. 2011, 'Non-Goal Scene Analysis for Soccer Video', Neurocomputing, vol. 74, no. 4, pp. 540-548.
The broadcast soccer video is usually recorded by one main camera, which is constantly gazing somewhere of playfield where a highlight event is happening. So the camera parameters and their variety have close relationship with semantic information of soccer video, and much interest has been caught in camera calibration for soccer video. The previous calibration methods either deal with goal scene, or have strict calibration conditions and high complexity. So, it does not properly handle the non-goal scene such as midfield or center-forward scene. In this paper, based on a new soccer field model, a field symbol extraction algorithm is proposed to extract the calibration information. Then a two-stage calibration approach is developed which can calibrate camera not only for goal scene but also for non-goal scene. The preliminary experimental results demonstrate its robustness and accuracy.
Wang, Y., Tao, D., Gao, X., Li, X. & Wang, B. 2011, 'Mammographic Mass Segmentation: Embedding Multiple Features In Vector-Valued Level Set In Ambiguous Regions', Pattern Recognition, vol. 44, no. 9, pp. 1903-1915.
Mammographic mass segmentation plays an important role in computer-aided diagnosis systems. It is very challenging because masses are always of low contrast with ambiguous margins, connected with the normal tissues, and of various scales and complex shap
Gao, X., Chen, J.F., Tao, D. & Li, X. 2011, 'Multi-Sensor Centralized Fusion Without Measurement Noise Covariance By Variational Bayesian Approximation', IEEE Transactions On Aerospace And Electronic Systems, vol. 47, no. 1, pp. 718-727.
The work presented here solves the multi-sensor centralized fusion problem in the linear Gaussian model without the measurement noise variance. We generalize the variational Bayesian approximation based adaptive Kalman filter (VB_AKF) from the single sen
Yu, J., Tao, D., Wang, M. & Cheng, J. 2011, 'Semi-automatic cartoon generation by motion planning', Multimedia Systems Multimedia Systems, vol. 17, no. 5, pp. 409-419.
To reduce tedious work in cartoon animation, some computer-assisted systems including automatic Inbetweening and cartoon reusing systems have been proposed. In existing automatic Inbetweening systems, accurate correspondence construction, which is a prer
Yu, J., Liu, D., Tao, D. & Seah, H. 2011, 'Complex Object Correspondence Construction in Two-Dimensional Animation', IEEE Transactions On Image Processing, vol. 20, no. 11, pp. 3257-3269.
Correspondence construction of objects in key frames is the precondition for inbetweening and coloring in 2-D computer-assisted animation production. Since each frame of an animation consists of multiple layers, objects are complex in terms of shape and
Gao, X., Fu, R., Li, X., Tao, D., Zhang, B. & Yang, H. 2011, 'Aurora image segmentation by combining patch and texture thresholding', Computer Vision and Image Understanding, vol. 115, no. 3, pp. 390-402.
The proportion of aurora to the field-of-view in temporal series of all-sky images is an important index to investigate the evolvement of aurora. To obtain such an index, a crucial phase is to segment the aurora from the background of sky. A new aurora s
Gao, X., Wang, X., Tao, D. & Li, X. 2011, 'Supervised Gaussian Process Latent Variable Model for Dimensionality Reduction', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 41, no. 2, pp. 425-434.
The Gaussian process latent variable model (GP-LVM) has been identified to be an effective probabilistic approach for dimensionality reduction because it can obtain a low-dimensional manifold of a data set in an unsupervised fashion. Consequently, the GP
Huang, K., Tao, D., Yuan, Y., Li, X. & Tan, T. 2011, 'Biologically Inspired Features for Scene Classification in Video Surveillance', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 41, no. 1, pp. 307-313.
Inspired by human visual cognition mechanism, this paper first presents a scene classification method based on an improved standard model feature. Compared with state-of-the-art efforts in scene classification, the newly proposed method is more robust, m
Zhou, T., Tao, D. & Wu, X. 2011, 'Manifold elastic net: a unified framework for sparse dimension reduction', Data Mining and Knowledge Discovery, vol. 22, no. 3, pp. 340-371.
It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square pr
He, L., Gao, X., Lu, W., Li, X. & Tao, D. 2011, 'Image quality assessment based on S-CIELAB model', Signal, Image and Video Processing, vol. 5, no. 3, pp. 283-290.
This paper proposes a new image quality assessment framework which is based on color perceptual model. By analyzing the shortages of the existing image quality assessment methods and combining the color perceptual model, the general framework of color im
Zhang, K., Gao, X., Li, X. & Tao, D. 2011, 'Partially Supervised Neighbor Embedding for Example-Based Image Super-Resolution', IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 2, pp. 230-239.
Neighbor embedding algorithm has been widely used in example-based super-resolution reconstruction from a single frame, which makes the assumption that neighbor patches embedded are contained in a single manifold. However, it is not always true for compl
Wang, X., Li, Z. & Tao, D. 2011, 'Subspaces Indexing Model On Grassmann Manifold For Image Search', IEEE Transactions On Image Processing, vol. 20, no. 9, pp. 2627-2635.
Conventional linear subspace learning methods like principal component analysis (PCA), linear discriminant analysis (LDA) derive subspaces from the whole data set. These approaches have limitations in the sense that they are linear while the data distrib
Huang, Q., Tao, D., Li, X., Jin, L. & Wei, G. 2011, 'Exploiting Local Coherent Patterns For Unsupervised Feature Ranking', IEEE Transactions On Systems Man And Cybernetics Part B-cybernetics, vol. 41, no. 6, pp. 1471-1482.
Prior to pattern recognition, feature selection is often used to identify relevant features and discard irrelevant ones for obtaining improved analysis results. In this paper, we aim to develop an unsupervised feature ranking algorithm that evaluates fea
Huang, Y., Huang, K., Tao, D., Tan, T. & Li, X. 2011, 'Enhanced Biologically Inspired Model For Object Recognition', IEEE Transactions On Systems Man And Cybernetics Part B-cybernetics, vol. 41, no. 6, pp. 1668-1680.
The biologically inspired model (BIM) proposed by Serre et al. presents a promising solution to object categorization. It emulates the process of object recognition in primates' visual cortex by constructing a set of scale- and position-tolerant features
Geng, B., Tao, D. & Xu, C. 2011, 'DAML: Domain Adaptation Metric Learning', IEEE Transactions On Image Processing, vol. 20, no. 10, pp. 2980-2989.
The state-of-the-art metric-learning algorithms cannot perform well for domain adaptation settings, such as cross-domain face recognition, image annotation, etc., because labeled data in the source domain and unlabeled ones in the target domain are drawn
He, L., Si, S., Gao, X., Tao, D. & Li, X. 2011, 'A Novel Metric Based On MCA For Image Quality', International Journal Of Wavelets Multiresolution And Information Processing, vol. 9, no. 5, pp. 743-757.
Considering that the Human Visual System (HVS) has different perceptual characteristics for different morphological components, a novel image quality metric is proposed by incorporating Morphological Component Analysis (MCA) and HVS, which is capable of assessing the image with different kinds of distortion. Firstly, reference and distorted images are decomposed into linearly combined texture and cartoon components by MCA respectively. Then these components are turned into perceptual features by Just Noticeable Difference (JND) which integrates masking features, luminance adaptation and Contrast Sensitive Function (CSF). Finally, the discrimination between reference and distorted images perceptual features is quantified using a pooling strategy before the final image quality is obtained. Experimental results demonstrate that the performance of the proposed prevails over some existing methods on LIVE database II
Si, S., Liu, W., Tao, D. & Chan, K. 2011, 'Distribution Calibration In Riemannian Symmetric Space', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 41, no. 4, pp. 921-930.
Distribution calibration plays an important role in cross-domain learning. However, existing distribution distance metrics are not geodesic; therefore, they cannot measure the intrinsic distance between two distributions. In this paper, we calibrate two
Guan, N., Tao, D., Luo, Z. & Yuan, B. 2011, 'Non-negative Patch Alignment Framework', IEEE Transactions on Neural Networks, vol. 22, no. 8, pp. 1218-1230.
In this paper, we present a non-negative patch alignment framework (NPAF) to unify popular non-negative matrix factorization (NMF) related dimension reduction algorithms. It offers a new viewpoint to better understand the common property of different NMF
Gao, X., An, L., Yuan, Y., Tao, D. & Li, X. 2011, 'Lossless Data Embedding Using Generalized Statistical Quantity Histogram', IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 8, pp. 1061-1070.
Histogram-based lossless data embedding (LDE) has been recognized as an effective and efficient way for copyright protection of multimedia. Recently, a LDE method using the statistical quantity histogram has achieved good performance, which utilizes the
Zhang, L., Mei, T., Liu, Y., Tao, D. & Zhou, H. 2011, 'Visual Search Reranking Via Adaptive Particle Swarm Optimization', Pattern Recognition, vol. 44, no. 8, pp. 1811-1820.
Visual search reranking involves an optimization process that uses visual content to recover the 'genuine' ranking list from the helpful but noisy one generated by textual search. This paper presents an evolutionary approach, called Adaptive Particle Swa
Tian, X. & Tao, D. 2011, 'Visual Reranking: From Objectives To Strategies', I E E E MultiMedia Magazine, vol. 18, no. 3, pp. 12-20.
A study of the development of visual reranking methods can facilitate an understanding of the field, offer a clearer view of what has been achieved, and help overcome emerging obstacles in this area.
Gao, X., wang, Q., li, X., Tao, D. & Zhang, K. 2011, 'Zernike moment based image super resolution', IEEE Transactions On Image Processing, vol. 20, no. 10, pp. 2738-2747.
Multiframe super-resolution (SR) reconstruction aims to produce a high-resolution (HR) image using a set of low-resolution (LR) images. In the process of reconstruction, fuzzy registration usually plays a critical role. It mainly focuses on the correlation between pixels of the candidate and the reference images to reconstruct each pixel by averaging all its neighboring pixels. Therefore, the fuzzy-registration-based SR performs well and has been widely applied in practice. However, if some objects appear or disappear among LR images or different angle rotations exist among them, the correlation between corresponding pixels becomes weak. Thus, it will be difficult to use LR images effectively in the process of SR reconstruction. Moreover, if the LR images are noised, the reconstruction quality will be affected seriously. To address or at least reduce these problems, this paper presents a novel SR method based on the Zernike moment, to make the most of possible details in each LR image for high-quality SR reconstruction. Experimental results show that the proposed method outperforms existing methods in terms of robustness and visual effects.
Xie, B., Mu, Y., Tao, D. & Huang, K. 2011, 'm-SNE: Multiview Stochastic Neighbor Embedding', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 41, no. 4, pp. 1088-1096.
Dimension reduction has been widely used in real-world applications such as image retrieval and document classification. In many scenarios, different features (or multiview data) can be obtained, and how to duly utilize them is a challenge. It is not appropriate for the conventional concatenating strategy to arrange features of different views into a long vector. That is because each view has its specific statistical property and physical interpretation. Even worse, the performance of the concatenating strategy will deteriorate if some views are corrupted by noise. In this paper, we propose a multiview stochastic neighbor embedding (m-SNE) that systematically integrates heterogeneous features into a unified representation for subsequent processing based on a probabilistic framework. Compared with conventional strategies, our approach can automatically learn a combination coefficient for each view adapted to its contribution to the data embedding. This combination coefficient plays an important role in utilizing the complementary information in multiview data. Also, our algorithm for learning the combination coefficient converges at a rate of O(1/k2), which is the optimal rate for smooth problems. Experiments on synthetic and real data sets suggest the effectiveness and robustness of m-SNE for data visualization, image retrieval, object categorization, and scene recognition.
Wang, X., Li, Z. & Tao, D. 2011, 'Erratum: Subspaces indexing model on grassmann manifold for image search (IEEE Transactions on Image Processing (2011) 20: 9 (2627-2635))', IEEE Transactions on Image Processing, vol. 20, no. 12, p. 3658.
Gao, X., Wang, B., Tao, D. & Li, X. 2011, 'A unified tensor level set method for image segmentation', Studies in Computational Intelligence, vol. 346, pp. 217-238.
This paper presents a new unified level set model for multiple regional image segmentation. This model builds a unified tensor representation for comprehensively depicting each pixel in the image to be segmented, by which the image aligns itself with a tensor field composed of the elements in form of high order tensor. Then the multi-phase level set functions are evolved in this tensor field by introducing a new weighted distance function. When the evolution converges, the tensor field is partitioned, and meanwhile the image is segmented. The proposed model has following main advantages. Firstly, the unified tensor representation integrates the information from Gaussian smoothed image, which results the model is robust against noise, especially the salt and pepper noise. Secondly, the local geometric features involved into the unified representation increase the weight of boundaries in energy functional, which makes the model more easily to detect the edges in the image and obtain better performance on non-homogenous images. Thirdly, the model offers a general formula for energy functional which can deal with the data type varying from scalar to vector then to tensor, and this formula also unifies single and multi-phase level set methods. We applied the proposed method to synthetic, medical and natural images respectively and obtained promising performance. &copy; 2011 Springer-Verlag Berlin Heidelberg.
Yang, Y., Zhuang, Y., Tao, D., Xu, D., Yu, J. & Luo, J. 2010, 'Recognizing Cartoon Image Gestures for Retrieval and Interactive Cartoon Clip Synthesis', IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 12, pp. 1745-1756.
In this paper, we propose a new method to recognize gestures of cartoon images with two practical applications, i.e., content-based cartoon image retrieval and interactive cartoon clip synthesis. Upon analyzing the unique properties of four types of features including global color histogram, local color histogram (LCH), edge feature (EF), and motion direction feature (MDF), we propose to employ different features for different purposes and in various phases. We use EF to define a graph and then refine its local structure by LCH. Based on this graph, we adopt a transductive learning algorithm to construct local patches for each cartoon image. A spectral method is then proposed to optimize the local structure of each patch and then align these patches globally. MDF is fused with EF and LCH and a cartoon gesture space is constructed for cartoon image gesture recognition. We apply the proposed method to content-based cartoon image retrieval and interactive cartoon clip synthesis. The experiments demonstrate the effectiveness of our method.
Song, M., Tao, D., Chen, C., Li, X. & Chen, C. 2010, 'Color to Gray: Visual Cue Preservation', IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 32, no. 9, pp. 1537-1552.
Both commercial and scientific applications often need to transform color images into gray-scale images, e. g., to reduce the publication cost in printing color images or to help color blind people see visual cues of color images. However, conventional color to gray algorithms are not ready for practical applications because they encounter the following problems: 1) Visual cues are not well defined so it is unclear how to preserve important cues in the transformed gray-scale images; 2) some algorithms have extremely high time cost for computation; and 3) some require human-computer interactions to have a reasonable transformation. To solve or at least reduce these problems, we propose a new algorithm based on a probabilistic graphical model with the assumption that the image is defined over a Markov random field. Thus, color to gray procedure can be regarded as a labeling process to preserve the newly well-defined visual cues of a color image in the transformed gray-scale image. Visual cues are measurements that can be extracted from a color image by a perceiver. They indicate the state of some properties of the image that the perceiver is interested in perceiving. Different people may perceive different cues from the same color image and three cues are defined in this paper, namely, color spatial consistency, image structure information, and color channel perception priority. We cast color to gray as a visual cue preservation procedure based on a probabilistic graphical model and optimize the model based on an integral minimization problem. We apply the new algorithm to both natural color images and artificial pictures, and demonstrate that the proposed approach outperforms representative conventional algorithms in terms of effectiveness and efficiency. In addition, it requires no human-computer interactions.
Deng, C., Gao, X., Li, X. & Tao, D. 2010, 'Local histogram based geometric invariant image watermarking', Signal Processing, vol. 90, no. 12, pp. 3256-3264.
Compared with other existing methods, the feature point-based image watermarking schemes can resist to global geometric attacks and local geometric attacks, especially cropping and random bending attacks (RBAs), by binding watermark synchronization with salient image characteristics. However, the watermark detection rate remains low in the current feature point-based watermarking schemes. The main reason is that both of feature point extraction and watermark embedding are more or less related to the pixel position, which is seriously distorted by the interpolation error and the shift problem during geometric attacks. In view of these facts, this paper proposes a geometrically robust image watermarking scheme based on local histogram. Our scheme mainly consists of three components: (1) feature points extraction and local circular regions (LCRs) construction are conducted by using Harris-Laplace detector; (2) a mechanism of grapy theoretical clustering-based feature selection is used to choose a set of non-overlapped LCRs, then geometrically invariant LCRs are completely formed through dominant orientation normalization; and (3) the histogram and mean statistically independent of the pixel position are calculated over the selected LCRs and utilized to embed watermarks. Experimental results demonstrate that the proposed scheme can provide sufficient robustness against geometric attacks as well as common image processing operations.
Wang, X., Tao, D. & Li, Z. 2010, 'Entropy controlled Laplacian regularization for least square regression', Signal Processing, vol. 90, no. 6, pp. 2043-2049.
Least square regression (LSR) is popular in pattern classification. Compared against other matrix factorization based methods, it is simple yet efficient. However LSR, ignores unlabeled samples in the training stage, so the regression error could be large when the labeled samples are insufficient. To solve this problem, the Laplacian regularization can be used to penalize LSR. Extensive theoretical and experimental results have confirmed the validity of Laplacian regularized least square (LapRLS). However, multiple hyper-parameters have been introduced to estimate the intrinsic manifold induced by the regularization, and thus the time consuming cross-validation should be applied to tune these parameters. To alleviate this problem, we assume the intrinsic manifold is a linear combination of a given set of known manifolds. By further assuming the priors of the given manifolds are equivalent, we introduce the entropy maximization penalty to automatically learn the linear combination coefficients. The entropy maximization trades the smoothness off the complexity. Therefore, the proposed model enjoys the following advantages: (1) it is able to incorporate both labeled and unlabeled data into training process, (2) it is able to learn the manifold hyper-parameters automatically, and (3) it approximates the true probability distribution with respect to prescribed test data. To test the classification performance of our proposed model, we apply the model on three well-known human face datasets, i.e. FERET, ORL, and YALE. Experimental results on these three face datasets suggest the effectiveness and the efficiency of the new model compared against the traditional LSR and the Laplacian regularized least squares.
Li, X., Hu, Y., Gao, X., Tao, D. & Ning, B. 2010, 'A multi-frame image super-resolution method', Signal Processing, vol. 90, no. 2, pp. 405-414.
Multi-frame image super-resolution (SR) aims to utilize information from a set of low-resolution (LR) images to compose a high-resolution (HR) one. As it is desirable or essential in many real applications, recent years have witnessed the growing interest in the problem of multi-frame SR reconstruction. This set of algorithms commonly utilizes a linear observation model to construct the relationship between the recorded LR images to the unknown reconstructed HR image estimates. Recently, regularization-based schemes have been demonstrated to be effective because SR reconstruction is actually an ill-posed problem. Working within this promising framework, this paper first proposes two new regularization items, termed as locally adaptive bilateral total variation and consistency of gradients, to keep edges and flat regions, which are implicitly described in LR images, sharp and smooth, respectively. Thereafter, the combination of the proposed regularization items is superior to existing regularization items because it considers both edges and flat regions while existing ones consider only edges. Thorough experimental results show the effectiveness of the new algorithm for SR reconstruction.
Wen, L., Gao, X., Li, X., Tao, D. & Li, J. 2010, 'Incremental pairwise discriminant analysis based visual tracking', Neurocomputing, vol. 74, no. 1-3, pp. 428-438.
The distinguishment between the object appearance and the background is the useful cues available for visual tracking in which the discriminant analysis is widely applied However due to the diversity of the background observation there are not adequate negative samples from the background which usually lead the discriminant method to tracking failure Thus a natural solution is to construct an object-background pair constrained by the spatial structure which could not only reduce the neg-sample number but also make full use of the background information surrounding the object However this Idea is threatened by the variant of both the object appearance and the spatial-constrained background observation especially when the background shifts as the moving of the object Thus an Incremental pairwise discriminant subspace is constructed in this paper to delineate the variant of the distinguishment In order to maintain the correct the ability of correctly describing the subspace we enforce two novel constraints for the optimal adaptation (1) pairwise data discriminant constraint and (2) subspace smoothness The experimental results demonstrate that the proposed approach can alleviate adaptation drift and achieve better visual tracking results for a large variety of nonstationary scenes
Wang, X., Gao, X., Yuan, Y., Tao, D. & Li, J. 2010, 'Semi-supervised Gaussian process latent variable model with pairwise constraints', Neurocomputing, vol. 73, no. 10-12, pp. 2186-2195.
In machine learning. Gaussian process latent variable model (GP-LVM) has been extensively applied in the field of unsupervised dimensionality reduction. When some supervised information, e.g., pairwise constraints or labels of the data, is available, the traditional GP-LVM cannot directly utilize such supervised information to improve the performance of dimensionality reduction. In this case, it is necessary to modify the traditional GP-LVM to make it capable of handing the supervised or semi-supervised learning tasks. For this purpose, we propose a new semi-supervised GP-LVM framework under the pairwise constraints. Through transferring the pairwise constraints in the observed space to the latent space. the constrained priori information on the latent variables can be obtained. Under this constrained priori, the latent variables are optimized by the maximum a posteriori (MAP) algorithm. The effectiveness of the proposed algorithm is demonstrated with experiments on a variety of data sets.
Wen, J., Gao, X., Yuan, Y., Tao, D. & Li, J. 2010, 'Incremental tensor biased discriminant analysis: A new color-based visual tracking method', Neurocomputing, vol. 73, no. 4-6, pp. 827-839.
Most existing color-based tracking algorithms utilize the statistical color information of the object as the tracking clues, without maintaining the spatial structure within a single chromatic image. Recently, the researches on the multilinear algebra provide the possibility to hold the spatial structural relationship in a representation of the image ensembles. In this paper, a third-order color tensor is constructed to represent the object to be tracked. Considering the influence of the environment changing on the tracking, the biased discriminant analysis (BDA) is extended to the tensor biased discriminant analysis (TBDA) for distinguishing the object from the background. At the same time, an incremental scheme for the TBDA is developed for the tensor biased discriminant subspace online learning, which can be used to adapt to the appearance variant of both the object and background. The experimental results show that the proposed method can track objects precisely undergoing large pose, scale and lighting changes, as well as partial occlusion.
Xiao, B., Gao, X., Tao, D., Yuan, Y. & Li, J. 2010, 'Photo-sketch synthesis and recognition based on subspace learning', Neurocomputing, vol. 73, no. 4-6, pp. 840-852.
This paper aims to reducing difference between sketches and photos by synthesizing sketches from photos, and vice versa, and then performing sketch-sketch/photo-photo recognition with subspace learning based methods. Pseudo-sketch/pseudo-photo patches are synthesized with embedded hidden Markov model. Because these patches are assembled by averaging their overlapping area in most of the local strategy based methods, which leads to blurring effect to the resulted pseudo-sketch/pseudo-photo, we integrate the patches with image quilting. Experiments are carried out to demonstrate that the proposed method is effective to produce pseudo-sketch/pseudo-photo with high quality and achieve promising recognition results.
Mu, Y. & Tao, D. 2010, 'Biologically inspired feature manifold for gait recognition', Neurocomputing, vol. 73, no. 4-6, pp. 895-902.
Using biometric resources to recognize a person has been a recent concentration on computer vision. Previously, biometric research has forced on utilizing iris, finger print, palm print, and shoe print to authenticate and authorized a human. However, these conventional biometric resources suffer from some obviously limitation, such as: strictly distance requirement, too many user cooperation requirement and so on. Compared with the difficulties of utilization through conventional biometric resources, human gait can be easily acquired and utilized in many fields. A human's walk image can reflect the walker's physical characteristics and psychological state, and therefore, the gait feature can be used to recognize a person. In order to achieve better performance of gait recognition we represent the gait image using C1 units, which correspond to the complex cells in human visual cortex, and use a maximum mechanism to keep only the maximum response of each local area of SI units. To enhance the gait recognition rate, we take the label information into account and utilize the discriminative locality alignment (DLA) method to classify, which is a top level discriminate manifold learning based subspace learning algorithm. Experiment on University of South Florida (USF) dataset shows: (I) the proposed C1Gait+DLA algorithms can achieve better performance than the state-of-art algorithms and (2) DLA can duly preserve both the local geometry and the discriminative information for recognition.
Si, S., Tao, D. & Geng, B. 2010, 'Bregman Divergence-Based Regularization for Transfer Subspace Learning', IEEE Transactions On Knowledge And Data Engineering, vol. 22, no. 7, pp. 929-942.
The regularization principals [31] lead approximation schemes to deal with various learning problems, e. g., the regularization of the norm in a reproducing kernel Hilbert space for the ill-posed problem. In this paper, we present a family of subspace le
Fu, R., Gao, X., Li, X., Tao, D., Jian, Y., Li, J., Hu, H. & Yang, H. 2010, 'An integrated aurora image retrieval system: AuroraEye', Journal Of Visual Communication And Image Representation, vol. 21, no. 8, pp. 787-797.
With the digital all-sky imager (ASI) emergence in aurora research, millions of images are captured annually. However, only a fraction of which can be actually used. To address the problem incurred by low efficient manual processing, an integrated image
Si, S., Tao, D. & Chan, K. 2010, 'Evolutionary Cross-Domain Discriminative Hessian Eigenmaps', IEEE Transactions On Image Processing, vol. 19, no. 4, pp. 1075-1086.
Is it possible to train a learning model to separate tigers from elks when we have 1) labeled samples of leopard and zebra and 2) unlabelled samples of tiger and elk at hand? Cross-domain learning algorithms can be used to solve the above problem. Howeve
Tian, X., Tao, D., Hua, X. & Wu, X. 2010, 'Active Reranking for Web Image Search', IEEE Transactions On Image Processing, vol. 19, no. 3, pp. 805-820.
Image search reranking methods usually fail to capture the user's intention when the query term is ambiguous. Therefore, reranking with user interactions, or active reranking, is highly demanded to effectively improve the search performance. The essentia
Bian, W. & Tao, D. 2010, 'Biased Discriminant Euclidean Embedding for Content-Based Image Retrieval', IEEE Transactions On Image Processing, vol. 19, no. 2, pp. 545-554.
With many potential multimedia applications, content-based image retrieval (CBIR) has recently gained more attention for image management and web search. A wide variety of relevance feedback (RF) algorithms have been developed in recent years to improve
Song, D. & Tao, D. 2010, 'Biologically Inspired Feature Manifold for Scene Classification', IEEE Transactions On Image Processing, vol. 19, no. 1, pp. 174-184.
Biologically inspired feature (BIF) and its variations have been demonstrated to be effective and efficient for scene classification. It is unreasonable to measure the dissimilarity between two BIFs based on their Euclidean distance. This is because BIFs
Xia, T., Tao, D., Mei, T. & Zhang, Y. 2010, 'Multiview Spectral Embedding', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 40, no. 6, pp. 1438-1446.
In computer vision and multimedia search, it is common to use multiple features from different views to represent an object. For example, to well characterize a natural scene image, it is essential to find a set of visual features to represent its color,
Song, M., Tao, D., Sun, Z. & Li, X. 2010, 'Visual-Context Boosting for Eye Detection', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 40, no. 6, pp. 1460-1467.
Eye detection plays an important role in many practical applications. This paper presents a novel two-step scheme for eye detection. The first step models an eye by a newly defined visual-context pattern (VCP), and the second step applies semisupervised
Song, M., Tao, D., Liu, Z., Li, X. & Zhou, M. 2010, 'Image Ratio Features for Facial Expression Recognition Application', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 40, no. 3, pp. 779-788.
Video-based facial expression recognition is a challenging problem in computer vision and human-computer interaction. To target this problem, texture features have been extracted and widely used, because they can capture image intensity changes raised by
Wang, B., Gao, X., Tao, D. & Li, X. 2010, 'A Unified Tensor Level Set for Image Segmentation', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 40, no. 3, pp. 857-867.
This paper presents a new region-based unified tensor level set model for image segmentation. This model introduces a three-order tensor to comprehensively depict features of pixels, e.g., gray value and the local geometrical features, such as orientatio
Zhang, T., Huang, K., Li, X., Yang, J. & Tao, D. 2010, 'Discriminative Orthogonal Neighborhood-Preserving Projections for Classification', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 40, no. 1, pp. 253-263.
Orthogonal neighborhood-preserving projection (ONPP) is a recently developed orthogonal linear algorithm for overcoming the out-of-sample problem existing in the well-known manifold learning algorithm, i.e., locally linear embedding. It has been shown th
Gao, X., Wang, Y., Li, X. & Tao, D. 2010, 'On Combining Morphological Component Analysis and Concentric Morphology Model for Mammographic Mass Detection', IEEE Transactions on Information Technology in Biomedicine, vol. 14, no. 2, pp. 266-273.
Mammographic mass detection is an important task for the early diagnosis of breast cancer. However, it is difficult to distinguish masses from normal regions because of their abundant morphological characteristics and ambiguous margins. To improve the ma
Gao, X., Deng, C., Li, X. & Tao, D. 2010, 'Geometric Distortion Insensitive Image Watermarking in Affine Covariant Regions', IEEE Transactions On Systems Man And Cybernetics Part C-Applications And Reviews, vol. 40, no. 3, pp. 278-286.
Feature-based image watermarking schemes, which aim to survive various geometric distortions, have attracted great attention in recent years. Existing schemes have shown robustness against rotation, scaling, and translation, but few are resistant to crop
Gao, X., Su, Y., Li, X. & Tao, D. 2010, 'A Review of Active Appearance Models', IEEE Transactions On Systems Man And Cybernetics Part C-Applications And Reviews, vol. 40, no. 2, pp. 145-158.
Active appearance model (AAM) is a powerful generative method for modeling deformable objects. The model decouples the shape and the texture variations of objects, which is followed by an efficient gradient-based model fitting method. Due to the flexible
Gao, X., Xiao, B., Tao, D. & Li, X. 2010, 'A survey of graph edit distance', Pattern Analysis and Applications, vol. 13, no. 1, pp. 113-129.
Inexact graph matching has been one of the significant research foci in the area of pattern analysis. As an important way to measure the similarity between pairwise graphs error-tolerantly, graph edit distance (GED) is the base of inexact graph matching.
Lu, W., Li, X., Gao, X., Tang, W., Li, J. & Tao, D. 2010, 'A Video Quality Assessment Metric Based on Human Visual System', Cognitive Computation, vol. 2, no. 2, pp. 120-131.
It is important for practical application to design an effective and efficient metric for video quality. The most reliable way is by subjective evaluation. Thus, to design an objective metric by simulating human visual system (HVS) is quite reasonable an
Gao, X., Deng, C., Li, X. & Tao, D. 2010, 'Local Feature Based Geometric-Resistant Image Information Hiding', Cognitive Computation, vol. 2, no. 2, pp. 68-77.
Watermarking aims to hide particular information into some carrier but does not change the visual cognition of the carrier itself. Local features are good candidates to address the watermark synchronization error caused by geometric distortions and have
Lu, W., Zeng, K., Tao, D., Yuan, Y. & Gao, X. 2010, 'No-reference Image Quality Assessment In Contourlet Domain', Neurocomputing, vol. 73, no. 4-6, pp. 784-794.
The target of no-reference (NR) image quality assessment (IQA) is to establish a computational model to predict the visual quality of an image. The existing prominent method is based on natural scene statistics (NSS). It uses the joint and marginal distr
Zhang, C. & Tao, D. 2010, 'Error Bounds for Real Function Classes Based on Discretized Vapnik-Chervonenkis Dimensions', Australian Journal of Intelligent Information Processing Systems, vol. 12, no. 3, pp. 1-5.
The Vapnik-Chervonenkis (VC) dimension plays an impor- tant role in statistical learning theory. In this paper, we propose the discretized VC dimension obtained by discretizing the range of a real function class. Then, we point out that Sauer's Lemma is valid for the discretized VC dimension. We group the real function classes having the innite VC dimension into four categories by using the discretized VC dimension. As a byproduct, we present the equidistantly discretized VC dimension by introducing an equidistant partition to segmenting the range of a real function class. Finally, we obtain the error bounds for real function classes based on the discretized VC dimensions in the PAC-learning framework.
Li, X. & Tao, D. 2010, 'Subspace Learning', Neurocomputing, vol. 73, no. 10-12, pp. 1539-1540.
Tao, D., Li, X., Wu, X. & Maybank, S. 2009, 'Geometric Mean for Subspace Selection', IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 31, no. 2, pp. 260-274.
Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher's linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c - 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the Kullback-Leibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.
Gao, X., An, L., Li, X. & Tao, D. 2009, 'Reversibility improved lossless data hiding', Signal Processing, vol. 89, no. 10, pp. 2053-2065.
Recently, lossless data hiding has attracted increasing interests. As a reversible watermark scheme, the host media and hidden data should be recovered without distortion. A latest lossless data hiding technique based on image blocking and block classification has achieved good performance for image authentication. However, this method cannot always fully restore all the blocks of host images and watermarks. For this purpose, we propose an improved algorithm, which is characterized by two aspects. First, a block skipping scheme (BSS) is developed for the host blocks selection to embed watermark; secondly, the embedding level is modified by a novel parameter model to guarantee that the host blocks can be recovered without distortion as well as the embedded data. Extensive experiments conducted on standard grayscale images, medical images, and color images have demonstrated the effectiveness of the improved lossless data hiding scheme.
Deng, C., Gao, X., Li, X. & Tao, D. 2009, 'A local Tchebichef moments-based robust image watermarking', Signal Processing, vol. 89, no. 8, pp. 1531-1539.
Protection against geometric distortions and common image processing operations with blind detection becomes a much challenging task in image watermarking. To achieve this, in this paper we propose a content-based watermarking scheme that combines the invariant feature extraction with watermark embedding by using Tchebichef moments. Harris-Laplace detector is first adopted to extract feature points, and then non-overlapped disks centered at feature points are generated. These disks are invariant to scaling and translation distortions. For each disk, orientation alignment is then performed to achieve rotation invariant. Finally, the watermark is embedded in magnitudes of Tchebichef moments of each disk via dither modulation to realize the robustness to common image processing operations and the blind detection. Thorough simulation results obtained by using the standard benchmark, Stirmark, demonstrate that the proposed method is robust against various geometric distortions as well as common image processing operations and outperforms representative image watermarking schemes.
Xiao, B., Gao, X., Tao, D. & Li, X. 2009, 'A new approach for face recognition by sketches in photos', Signal Processing, vol. 89, no. 8, pp. 1576-1588.
Face recognition by sketches in photos remains a challenging task. Unlike the existing sketch-photo recognition methods, which convert a photo into sketch and then perform the sketch-photo recognition through sketch-sketch recognition, this paper devotes to synthesizing a photo from the sketch and transforming the sketch-photo recognition to photo-photo recognition to achieve better performance in mixture pattern recognition. The contribution of this paper mainly focuses on two aspects: (1) in view of that there are no many research findings of sketch-photo recognition based on the pseudo-photo synthesis and the existing methods require a large set of training samples, which is nearly impossible to achieve for the high cost of sketch acquisition, we make use of embedded hidden Markov model (EHMM), which can learn the nonlinearity of sketch-photo pair with less training samples, to produce pseudo-photos in terms of sketches; and (2) photos and sketches are divided into patches and pseudo-photo is generated by combining pseudo-photo patches, which makes pseudo-photo more recognizable. Experimental results demonstrate that the newly proposed method is effective to identify face sketches in photo set.
Li, X., Tao, D., Gao, X. & Lu, W. 2009, 'A natural image quality evaluation metric', Signal Processing, vol. 89, no. 4, pp. 548-555.
Reduced-reference (RR) image quality assessment (IQA) metrics evaluate the quality of a distorted (or degraded) image by using some, not all, information of the original (reference) image. In this paper, we propose a novel RR IQA metric based on hybrid wavelets and directional filter banks (HWD). With HWD as a pre-processing stage, the newly proposed metric mainly focuses on subbands coefficients of the distorted and original images. it performs well under low data rate, because only a threshold and several proportion values are recorded from the original images and transmitted. Experiments are carried out upon well recognized data sets and the results demonstrate advantages of the metric compared with existing ones. Moreover, a separate set of experiments shows that this proposed metric has good consistency with human subjective perception.
Zhang, T., Tao, D., Li, X. & Yang, J. 2009, 'Patch Alignment for Dimensionality Reduction', IEEE Transactions On Knowledge And Data Engineering, vol. 21, no. 9, pp. 1299-1313.
Spectral analysis-based dimensionality reduction algorithms are important and have been popularly applied in data mining and computer vision applications. To date many algorithms have been developed, e. g., principal component analysis, locally linear em
Tao, D., Li, X., Lu, W. & Gao, X. 2009, 'Reduced-Reference IQA in Contourlet Domain', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 39, no. 6, pp. 1623-1627.
The human visual system (HVS) provides a suitable cue for image quality assessment (IQA). In this paper, we develop a novel reduced-reference (RR) IQA scheme by incorporating the merits from the contourlet transform, contrast sensitivity function (CSF),
Gao, X., Su, Y., Li, X. & Tao, D. 2009, 'Gabor texture in active appearance models', Neurocomputing, vol. 72, no. 13-15, pp. 3174-3181.
In computer vision applications, Active Appearance Models (AAMs) is usually used to model the shape and the gray-level appearance of an object of interest using statistical methods, such as PCA. However, intensity values used in standard AAMs cannot provide enough information for image alignment. In this paper, we firstly propose to utilize Gabor filters to represent the image texture. The benefit of Gabor-based representation is that it can express local structures of an image. As a result, this representation can lead to more accurate matching when condition changes. Given the problem of the excessive storage and computational complexity of the Gabor. three different Gabor-based image representations are used in AAMs: (1) GaborD is the sum of Gabor filter responses over directions, (2) GaborS is the sum of Gabor filter responses over scales, and (3) GaborSD is the sum of Gabor filter responses over scales and directions. Through a large number of experiments, we show that the proposed Gabor representations lead to more accurate and robust matching between model and images.
Yuan, Y., Li, X., Pang, Y., Lu, X. & Tao, D. 2009, 'Binary Sparse Nonnegative Matrix Factorization', IEEE Transactions On Circuits And Systems For Video Technology, vol. 19, no. 5, pp. 772-777.
This paper presents a fast part-based subspace selection algorithm, termed the binary sparse nonnegative matrix factorization (B-SNMF). Both the training process and the testing process of B-SNMF are much faster than those of binary principal component a
Gao, X., Lu, W., Tao, D. & Li, X. 2009, 'Image Quality Assessment Based on Multiscale Geometric Analysis', IEEE Transactions On Image Processing, vol. 18, no. 7, pp. 1409-1423.
Reduced-reference (RR) image quality assessment (IQA) has been recognized as an effective and efficient way to predict the visual quality of distorted images. The current standard is the wavelet-domain natural image statistics model (WNISM), which applie
Gao, X., Yang, Y., Tao, D. & Li, X. 2009, 'Discriminative optical flow tensor for video semantic analysis', Computer Vision And Image Understanding, vol. 113, no. 3, pp. 372-383.
This paper presents a novel framework for effective video semantic analysis. This framework has two major components, namely, optical flow tensor (OFF) and hidden Markov models (HMMs). OFT and HMMs are employed because: (I) motion is one of the fundament
Huang, K., Tao, D., Yuan, Y., Li, X. & Tan, T. 2009, 'View-Independent Behavior Analysis', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 39, no. 4, pp. 1028-1035.
The motion analysis of the human body is an important topic of research in computer vision devoted to detecting, tracking, and understanding people's physical behavior. This strong interest is driven by a wide spectrum of applications in various areas su
Shen, J., Tao, D. & Li, X. 2009, 'QUC-Tree: Integrating Query Context Information for Efficient Music Retrieval', IEEE Transactions On Multimedia, vol. 11, no. 2, pp. 313-323.
In this paper, we introduce a novel indexing scheme-QUery Context tree (QUC-tree) to facilitate efficient query sensitive music search under different query contexts. Distinguished from the previous approaches, QUC-tree is a balanced multiway tree struct
Li, J., Zhang, L., Tao, D., Sun, H. & Zhao, Q. 2009, 'A Prior Neurophysiologic Knowledge Free Tensor-Based Scheme for Single Trial EEG Classification', IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 17, no. 2, pp. 107-115.
Single trial electroencephalogram (EEG) classification is essential in developing brain-computer interfaces (BCIs). However, popular classification algorithms, e.g., common spatial patterns (CSP), usually highly depend on the prior neurophysiologic knowl
Pang, Y., Li, X., Yuan, Y., Tao, D. & Pan, J. 2009, 'Fast Haar Transform Based Feature Extraction for Face Representation and Recognition', IEEE Transactions on Information Forensics and Security, vol. 4, no. 3, pp. 441-450.
Subspace learning is the process of finding a proper feature subspace and then projecting high-dimensional data onto the learned low-dimensional subspace. The projection operation requires many floating-point multiplications and additions, which makes th
Mu, Y., Tao, D., Li, X. & Murtagh, F. 2009, 'Biologically Inspired Tensor Features', Cognitive Computation, vol. 1, no. 4, pp. 327-341.
According to the research results reported in the past decades, it is well acknowledged that face recognition is not a trivial task. With the development of electronic devices, we are gradually revealing the secret of object recognition in the primate's
Gao, X., Li, X., Fen, J. & Tao, D. 2009, 'Shot-based video retrieval with optical flow tensor and HMMs', Pattern Recognition Letters, vol. 30, no. 2, pp. 140-147.
Video retrieval and indexing research aims to efficiently and effectively manage very large video databases, e.g., CCTV records, which is a key component in video-based object and event analysis. In this paper, for the purpose of video retrieval, we prop
Tao, D., Yuan, Y., Shen, J., Huang, K. & Li, X. 2009, 'Visual information analysis for security', Signal Processing, vol. 89, no. 12, pp. 2311-2312.
Tao, D., Li, X. & Tang, Y.Y. 2009, 'Learning semantics from multimedia content', Pattern Recognition, vol. 42, no. 2, p. 217.
Liu, Q., Li, X., Elgammal, A., Hua, X.S., Xu, D. & Tao, D. 2009, 'Introduction to computer vision and image understanding the special issue on video analysis', Computer Vision and Image Understanding, vol. 113, no. 3, pp. 317-318.
Shan, S., Liu, Q., Tao, D., Xu, D., Yan, S. & Li, X. 2009, 'Introduction to the special issue on Video-based Object and Event Analysis', Pattern Recognition Letters, vol. 30, no. 2, p. 87.
Lu, W., Gao, X., Tao, D. & Li, X. 2008, 'A wavelet-based image quality assessment method', International Journal of Wavelets, Multiresolution and Information Processing, vol. 6, no. 4, pp. 541-551.
Image quality is a key characteristic in image processing, (10,11) image retrieval, (12,13) and biometrics.(14) In this paper, a novel reduced- reference image quality assessment method is proposed based on wavelet transform. By simulating the human visu
Gao, X., Lu, W., Li, X. & Tao, D. 2008, 'Wavelet-based contourlet in quality evaluation of digital images', Neurocomputing, vol. 72, no. 1-3, pp. 378-385.
Feature extraction is probably the most important stage in image quality evaluation-effective features can well reflect the quality of digital images and vice versa. As a non-redundant sparse representation, contourlet transform can effectively reflect v
Pang, Y., Tao, D., Yuan, Y. & Li, X. 2008, 'Binary two-dimensional PCA', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 38, no. 4, pp. 1176-1180.
Fast training and testing procedures are crucial in biometrics recognition research. Conventional algorithms, e.g., principal component, analysis (PCA), fail to efficiently work on large-scale and high-resolution image data sets. By incorporating merits
Tao, D., Li, X., Wu, X. & Maybank, S. 2008, 'Tensor Rank One Discriminant Analysis - A convergent method for discriminative multilinear subspace selection', Neurocomputing, vol. 71, no. 10-12, pp. 1866-1882.
This paper proposes Tensor Rank One Discriminant Analysis (TR1DA) in which general tensors are input for pattern classification. TR1DA is based on Differential Scatter Discriminant Criterion (DSDC) and Tensor Rank One Analysis (TR1A). DSDC is a generaliz
Gao, X., Zhong, J., Tao, D. & Li, X. 2008, 'Local face sketch synthesis learning', Neurocomputing, vol. 71, no. 10-12, pp. 1921-1930.
Facial sketch synthesis (FSS) is crucial in sketch-based face recognition. This paper proposes an automatic FSS algorithm with local strategy based on embedded hidden Markov model (E-HMM) and selective ensemble (SE). By using E-HMM to model the nonlinear
Li, X., Tao, D., Maybank, S. & Yuan, Y. 2008, 'Visual music and musical vision', Neurocomputing, vol. 71, no. 10-12, pp. 2023-2028.
This paper aims to bridge human hearing and vision from the viewpoint of database search for images or music. The semantic content of an image can be illustrated with music or conversely images can be associated with a piece of music. The theoretical bas
Zhang, T., Li, X., Tao, D. & Yang, J. 2008, 'Local Coordinates Alignment (LCA): A novel manifold learning approach', International Journal of Pattern Recognition and Artificial Intelligence, vol. 22, no. 4, pp. 667-690.
Manifold learning has been demonstrated as an effective way to represent intrinsic geometrical structure of samples. In this paper, a new manifold learning approach, named Local Coordinates Alignment (LCA), is developed based on the alignment technique.
Zhang, T., Li, X., Tao, D. & Yang, J. 2008, 'Multimodal biometrics using geometry preserving projections', Pattern Recognition, vol. 41, no. 3, pp. 805-813.
Multimodal biometric system utilizes two or more individual modalities, e.g., face, gait, and fingerprint, to improve the recognition accuracy of conventional unimodal methods. However, existing multimodal biometric methods neglect interactions of differ
Li, X., Maybank, S., Yan, S., Tao, D. & Xu, D. 2008, 'Gait components and their application to gender recognition', IEEE Transactions On Systems Man And Cybernetics Part C-Applications And Reviews, vol. 38, no. 2, pp. 145-155.
Human gait is a promising biometrics; resource. In this paper, the information about gait is obtained from the motions of the different parts of the silhouette. The human silhouette is segmented into seven components, namely head, arm, trunk, thigh, fron
Tao, D., Tang, X. & Li, X. 2008, 'Which components are important for interactive image searching?', IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 1, pp. 3-11.
With many potential industrial applications, content-based image retrieval (CBIR) has recently gained more attention for image management and web searching. As an important too] to capture users' preferences and thus to improve the performance of CBIR sy
Shen, J., Tao, D. & Li, X. 2008, 'Modality Mixture Projections for Semantic Video Event Detection', IEEE Transactions On Circuits And Systems For Video Technology, vol. 18, no. 11, pp. 1587-1596.
Event detection is one of the most fundamental components for various kinds of domain applications of video information system. In recent years,, it has gained a considerable interest of practitioners and academics from different areas. While detecting v
Tao, D., Song, M., Li, X., Shen, J., Sun, J., Wu, X., Faloutsos, C. & Maybank, S. 2008, 'Bayesian Tensor Approach for 3-D Face Modeling', IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 10, pp. 1397-1410.
Effectively modeling a collection of three-dimensional (3-D) faces is an important task in various applications, especially facial expression-driven ones, e.g., expression generation, retargeting, and synthesis. These 3-D faces naturally form a set of se
Li, N., Chen, C., Wang, Q., Song, M., Tao, D. & Li, X. 2008, 'Avatar motion control by natural body movement via camera', Neurocomputing, vol. 72, no. 1-3, pp. 648-652.
With the popularity of cameras and rapid development of computer vision technology, vision-based HCI is attracting extensive interests. In this paper, we present a system for controlling avatars by natural body movement via a single web-camera. A pose da
Gao, X., Xiao, B., Tao, D. & Li, X. 2008, 'Image categorization: Graph edit distance + edge direction histogram', Pattern Recognition, vol. 41, no. 10, pp. 3179-3191.
This paper presents a novel algorithm for computing graph edit distance (GED) in image categorization. This algorithm is purely structural, i.e., it needs only connectivity structure of the graph and does not draw on node or edge attributes. There are tw
Li, J., Li, X. & Tao, D. 2008, 'KPCA for semantic object extraction in images', Pattern Recognition, vol. 41, no. 10, pp. 3244-3250.
In this paper, we kernelize conventional clustering algorithms from a novel point of view. Based on the fully mathematical proof, we first demonstrate that kernel KMeans (KKMeans) is equivalent to kernel principal component analysis (KPCA) prior to the c
Xiao, B., Gao, X., Tao, D. & Li, X. 2008, 'HMM-based graph edit distance for image indexing', International Journal of Imaging Systems and Technology, vol. 18, no. 2-3, pp. 209-218.
Most of the existing graph edit distance (GED) algorithms require cost functions which are difficult to be defined exactly. In this article, we propose a cost function free algorithm for computing GED. It only depends on the distribution of nodes rather
Gao, X., Xiao, B., Tao, D. & Li, X. 2008, 'Image Categorization: Graph Edit Distance Plus Edge Direction Histogram', Pattern Recognition, vol. 41, no. 10, pp. 3179-3191.
This paper presents a novel algorithm for computing graph edit distance (GED) in image categorization. This algorithm is purely structural, i.e., it needs only connectivity structure of the graph and does not draw on node or edge attributes. There are tw
Tao, D., Shen, J. & Li, X. 2008, 'Guest editorial: Special issue on multimedia information retrieval', International Journal of Imaging Systems and Technology, vol. 18, no. 2-3, p. 85.
Sun, J., Tao, D., Papadimitriou, S., Yu, P.S. & Faloutsos, C. 2008, 'Incremental tensor analysis: Theory and applications', ACM Transactions on Knowledge Discovery from Data, vol. 2, no. 3.
How do we find patterns in author-keyword associations, evolving over time? Or in data cubes (tensors), with product-branchcustomer sales information? And more generally, how to summarize high-order data cubes (tensors)? How to incrementally update these patterns over time? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks, and many more settings. However, they have only two orders (i.e., matrices, like author and keyword in the previous example). We propose to envision such higher-order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semi-infinite streams. Thus, we introduce a general framework, incremental tensor analysis (ITA), which efficiently computes a compact summary for high-order and high-dimensional data, and also reveals the hidden correlations. Three variants of ITA are presented: (1) dynamic tensor analysis (DTA); (2) streaming tensor analysis (STA); and (3) window-based tensor analysis (WTA). In paricular, we explore several fundamental design trade-offs such as space efficiency, computational cost, approximation accuracy, time dependency, and model complexity. We implement all our methods and apply them in several real settings, such as network anomaly detection, multiway latent semantic indexing on citation networks, and correlation study on sensor measurements. Our empirical studies show that the proposed methods are fast and accurate and that they find interesting patterns and outliers on the real datasets. &copy; 2008 ACM.
Tao, D. & Li, X. 2008, 'Neurocomputing for vision research', Neurocomputing, vol. 71, no. 10-12, pp. 1769-1770.
Xu, D., Yan, S., Tao, D., Lin, S. & Zhang, H. 2007, 'Marginal Fisher analysis and its variants for human gait recognition and content-based image retrieval', IEEE Transactions On Image Processing, vol. 16, no. 11, pp. 2811-2821.
Dimensionality reduction algorithms, which aim to select a small set of efficient and discriminant features, have attracted great attention for human gait recognition and content-based image retrieval (CBIR). In this paper, we present extensions of our r
Tao, D., Li, X., Wu, X. & Maybank, S. 2007, 'General tensor discriminant analysis and Gabor features for gait recognition', IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 29, no. 10, pp. 1700-1715.
Traditional image representations are not suited to conventional classification methods such as the linear discriminant analysis (LDA) because of the under sample problem (USP): the dimensionality of the feature space is much higher than the number of tr
Tao, D., Li, X., Wu, X., Hu, W. & Maybank, S. 2007, 'Supervised tensor learning', Knowledge And Information Systems, vol. 13, no. 1, pp. 1-42.
Tensor representation is helpful to reduce the small sample size problem in discriminative subspace selection. As pointed by this paper, this is mainly because the structure information of objects in computer vision research is a reasonable constraint to
Tao, D., Li, X. & Maybank, S. 2007, 'Negative samples analysis in relevance feedback', IEEE Transactions On Knowledge And Data Engineering, vol. 19, no. 4, pp. 568-580.
Recently, relevance feedback (RF) in content-based image retrieval (CBIR) has been implemented as an online binary classifier to separate the positive samples from the negative samples, where both sets of samples are labeled by the user. In many applicat
Gao, X., Li, J., Tao, D. & Li, X. 2007, 'Fuzziness Measurement Of Fuzzy Sets And Its Application In Cluster Validity Analysis', International Journal of Fuzzy Systems, vol. 9, no. 4, pp. 188-197.
To measure the fuzziness of fuzzy sets, this paper introduces a distance-based and a fuzzy entropy-based measurements. Then these measurements are generalized to measure the fuzziness of fuzzy partition, namely partition fuzziness. According to the relat
Tao, D., Tang, X., Li, X. & Rui, Y. 2006, 'Direct kernel biased discriminant analysis: A new content-based image retrieval relevance feedback algorithm', IEEE Transactions On Multimedia, vol. 8, no. 4, pp. 716-727.
In recent years, a variety of relevance feedback (RF) schemes have been developed to improve the performance of content-based image retrieval (CBIR). Given user feedback information, the key to a RF scheme is how to select a subset of image features to c
Xu, D., Yan, S., Tao, D., Zhang, L., Li, X. & Zhang, H. 2006, 'Human gait recognition with matrix representation', IEEE Transactions On Circuits And Systems For Video Technology, vol. 16, no. 7, pp. 896-903.