UTS site search

# Professor Dacheng Tao

### Biography

Dacheng Tao is Professor of Computer Science with the Centre for Quantum Computation and Intelligent Systems (QCIS) and the Faculty of Engineering and Information Technology (FEIT) in the University of Technology Sydney (UTS). He takes/took a visiting professorship at many top universities and research institutes, e.g. Birkbeck - University of London, Shanghai Jiaotong University, Huazhong University of Science & Technology, Wuhan University, Northwestern Polytechnic University, Chinese Academy of Sciences, and Xidian University. Previously, he worked as a Nanyang Assistant Professor in the Nanyang Technological University and an Assistant Professor in the Hong Kong Polytechnic University. He received his BEng degree from the University of Science and Technology of China (USTC), his MPhil degree from the Chinese University of Hong Kong (CUHK), and his PhD from the University of London (London).

He mainly applies statistics and mathematics to data analytics problems and his research interests spread across computer vision, computational neuroscience, data science, geoinformatics, image processing, machine learning, medical informatics, multimedia, neural networks and video surveillance. His research results have expounded in one monograph and 400+ publications at prestigious journals and prominent conferences, such as IEEE T-PAMI, T-NNLS, T-IP, T-SP, T-MI, T-KDE, T-CYB, JMLR, IJCV, NIPS, ICML, CVPR, ICCV, ECCV, AISTATS, ICDM, SDM; ACM SIGKDD and Multimedia, with several best paper awards, such as the best theory/algorithm paper runner up award in IEEE ICDM’07, the best student paper award in IEEE ICDM’13, and the 2014 ICDM 10 Year Highest-Impact Paper Award.

He has made notable contributions to universities by providing excellent research student supervision. His PhD students (including co-supervised PhD students) won Chancellor’s Award for the most outstanding PhD thesis across the university in 2012 and 2015, respectively, UTS Chancellor Postdoctoral Fellowship in 2012, the Extraordinary Potential Prize of 2011 Chinese Government Award for Outstanding Self-Financed Students Abroad, Microsoft Fellowship Award, Baidu Fellowship, Beihang “Zhuoyue” Program, the PLA Best PhD Dissertation Award, the Chinese Computer Federation (CCF) Outstanding Dissertation Award, the Award for the Excellent Doctoral Dissertation of Shanghai, the Award for the Excellent Doctoral Dissertation of Beijing, and Excellent PhD Dissertation Award from the National University of Defense Technology.

He is/was a guest editor of 10+ special issues, an editor of 10+ journals, including IEEE Trans. on Big Data (T-BD), IEEE Trans. on Neural Networks and Learning Systems (T-NNLS), IEEE Trans. on Image Processing (T-IP), IEEE Trans. on Cybernetics (T-CYB), IEEE Trans. on Systems, Man and Cybernetics: Part B (T-SMCB), IEEE Trans. on Circuits and Systems for Video Technology (T-CSVT), IEEE Trans. on Knowledge and Data Engineering (T-KDE), Pattern Recognition (Elsevier), Information Sciences (Elsevier), Signal Processing (Elsevier), and Computational Statistics & Data Analysis (Elsevier). He has edited five books on several topics of optical pattern recognition and its applications. He has chaired for conferences, special sessions, invited sessions, workshops, and panels for 60+ times. He has served for nearly 200 major conferences including CVPR, ICCV, ECCV, AAAI, IJCAI, NIPS, ICDM, AISTATS, ACM SIGKDD and Multimedia, and nearly 100 prestigious international journals including T-PAMI, IJCV, JMLR, AIJ, and MLJ.

He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), a Fellow of the Optical Society of America (OSA), a Fellow of the International Association of Pattern Recognition (IAPR), a Fellow of the International Society for Optical Engineering (SPIE), an Elected Member of the International Statistical Institute (ISI), a Fellow of the British Computer Society (BCS), and a Fellow of the Institution of Engineering and Technology (IET/IEE). He is an elected member of the Global Young Academy (GYA). He chairs the IEEE SMC Technical Committee on Cognitive Computing and the IEEE SMC New South Wales Section.

### Professional

Fellow, Institute of Electrical and Electronics Engineers (FIEEE)

Fellow, Optical Society of America (FOSA)

Fellow, International Association of Pattern Recognition (FIAPR)

Fellow, International Society for Optical Engineering (FSPIE)

Fellow, Institution of Engineering and Technology (FIET)

Fellow, British Computer Society (FBCS)

Elected Member, International Statistical Institute (ISI)

Elected Member of the Global Young Academy (GYA)

Adjunct Professor, Faculty of Engineering & Information Technology
Core Member, QCIS - Quantum Computation and Intelligent Systems
Core Member, AAI - Advanced Analytics Institute
Core Member, Joint Research Centre in Intelligent Systems Membership
BEng (USTC), MPhil (CUHK), PhD (London)

### Research Interests

statistics and mathematics for data analysis problems in machine learning, data mining & engineering, computer vision, image processing, multimedia, video surveillance and neuroscience

Image and Video Analysis; Computer Vision; Pattern Recognition; Machine Learning; and Discrete Mathematics

## Chapters

He, X., Luo, S., Tao, D., Xu, C., Yang, J. & Abul Hasan, M. 2015, 'Preface' in MultiMedia Modeling, Springer, Germany, pp. V-VI.
He, X., Xu, C., Tao, D., Luo, S., Yang, J. & Hasan, M.A. 2015, 'Preface' in MultiMedia Modeling (LNCS), Springer, Germany, pp. V-VI.
Luo, Y., Tao, D. & Xu, C. 2013, 'Patch Alignment for Graph Embedding' in Fu, Y. & Ma, Y. (eds), Graph Embedding for Pattern Analysis, Springer New York, New York, NY, USA, pp. 73-118.
Dozens of manifold learning-based dimensionality reduction algorithms have been proposed in the literature. The most representative ones are locally linear embedding (LLE) [65], ISOMAP [76], Laplacian eigenmaps (LE) [4], Hessian eigenmaps (HLLE) [20], and local tangent space alignment (LTSA) [102]. LLE uses linear coefficients, which reconstruct a given example by its neighbors, to represent the local geometry, and then seeks a low-dimensional embedding, in which these coefficients are still suitable for reconstruction. ISOMAP preserves global geodesic distances of all the pairs of examples.
Gao, X., Wang, B., Tao, D. & Li, X. 2011, 'A Unified Tensor Level Set Method for Image Segmentation. Multimedia Analysis, Processing and Communications' in Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E. & Wang, H. (eds), Studies in Computational Intelligence vol 346, Springer-Verlag Berlin, Berlin, pp. 217-238.
This paper presents a new unified level set model for multiple regional image segmentation. This model builds a unified tensor representation for comprehensively depicting each pixel in the image to be segmented, by which the image aligns itself with a t
Xiao, B., Gao, X., Tao, D. & Li, X. 2011, 'Recognition of Sketches in Photos' in Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E. & Wang, H. (eds), Studies in Computational Intelligence vol 346. Multimedia Analysis, Processing and Communications, Springer-Verlag Berlin / Heidelberg, Berlin/Heidelberg, pp. 239-262.
Summary. Face recognition by sketches in photos makes an important complement to face photo recognition. It is challenging because sketches and photos have geometrical deformations and texture difference. Aiming to achieve better performance in mixture pattern recognition, we reduce difference between sketches and photos by synthesizing sketches from photos, and vice versa, and then transform the sketch-photo recognition to photo-photo/sketch-sketch recognition. Pseudo-sketch/pseudo-photo patches are synthesized with embedded hiddenMarkovmodel and integrated to derive pseudo-sketch/pseudo-photo. Experiments are carried out to demonstrate that the proposed methods are effective to produce pseudo-sketch/pseudophoto with high quality and achieve promising recognition results.
Deng, C., Gao, X., Li, X. & Tao, D. 2011, 'Robust Image Watermarking Based on Feature Regions' in Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E. & Wang, H. (eds), Studies in Computational Intelligence vol 346. Multimedia Analysis, Processing and Communications, Springer-Verlag Berlin / Heidelberg, Berlin/Heidelberg, pp. 111-137.
Abstract. In image watermarking, binding the watermark synchronization with the local features has been widely used to provide robustness against geometric distortions as well as common image processing operations. However, in the existing schemes, the problems with random bending attack, nonisotropic scaling, general affine transformation, and combined attacks still remain difficult. In this chapter, we present and discuss the framework of the extraction and selection of the scale-space feature points.We then propose two robust image watermarking algorithms through synchronizing watermarking with the invariant local feature regions centered at feature points. The first algorithm conducts watermark embedding and detection in the affine covariant regions (ACRs). The second algorithm is combining the local circular regions (LCRs) with Tchebichef moments, and local Tchebichef moments (LTMs) are used to embed and detect watermark. These proposed algorithms are evaluated theoretically and experimentally, and are compared with two representative schemes. Experiments are carried out on a set of standard test images, and the preliminary results demonstrate that the developed algorithms improve the performance over these two representative image watermarking schemes in terms of robustness. Towards the overall robustness against geometric distortions and common image processing operations, the LTMs-based method has an advantage over the ACRs-based method.
Bian, W. & Tao, D. 2011, 'Face Subspace Learning' in Li, S.Z. & Jain, A.K. (eds), Handbook of Face Recognition, Springer-Verlag London Limited, London UK, pp. 51-77.
NA
Gao, X., Xiao, B., Tao, D. & Li, X. 2009, 'A Comparative Study of Three Graph Edit Distance Algorithms' in Abraham, A., Hassanien, A.E. & Snasel, V. (eds), Studies in Computational Intelligence Vol 205: Foundations of Computational Intelligence vol 5, Springer, Berlin, pp. 223-242.
Graph edit distance (GED) is widely applied to similarity measurement of graphs in inexact graph matching. Due to the difficulty of defining cost functions reasonably, we do research oil two GED algorithms without cost function definition: the first is c

## Conferences

Yu, D. & Tao, D.C. 2015, 'Frontier of Business Management Challenge in the Dynamics of Big Data', 5th Organizations, Artifacts and Practices (OAP) Workshop, Sydeny.
Sun, H., Li, J., Du, B. & Tao, D. 2016, 'On combining compressed sensing and sparse representations for object tracking', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 32-43.
&copy; Springer International Publishing AG 2016.The tracking algorithm of compressed sensing takes advantage of the objective's background information, but lacks the feedback mechanism towards the results. The 11 sparse tracking algorithm adapts to the changes in the objectives' appearances but at the cost of losing their background information. To enhance the effectiveness and robustness of the algorithm in coping with such distractions as occlusion and illumination variation, this paper proposes a tracking framework with the 11 sparse representation being the detector and compressed sensing algorithm the tracker, and establishes a complementary classifier model. A second-order model updating strategy has therefore been proposed to preserve the most representative templates in the 11 sparse representations. It is concluded that this tracking algorithm is better than the prevalent 8 ones with a respective precision plot of 77.15%, 72.33% and 81.13% and a respective success plot of 77.67%, 74.01%, 81.51% in terms of the overall, occlusion and illumination variation.
Han, B., Zhang, L., Gao, X., Zhao, X. & Tao, D. 2016, 'Embedded locality discriminant GPLVM for dimensionality reduction', Proceedings of the International Joint Conference on Neural Networks, pp. 2431-2438.
&copy; 2016 IEEE.The Gaussian process latent variable model (GPLVM) had been proved to be good at discovering low-dimension manifold from nonlinear high-dimensional data for small training sets. However, for labeled data, GPLVM cannot achieve a better result because it doesn't use the label information. It turned out to be an effective strategy to employ a discriminative prior over the latent space according to the label information. Existing methods for discriminative GPLVM roughly utilized label information data and ignored the natural structure of the data. In this paper, we embedded the locality discriminative information into the GPLVM which not only preserved the locality of data, but also use the label information of samples. Compared to the discriminative GPLVM, our Embedded Locality Discriminant GPLVM (ELD-GPLVM) introduces a local strategy to extract the discriminative information in local region. Experimental results on UCI datasets show that the proposed algorithm has a good performance on no matter small-scale data sets or a larger-scale dataset.
Cai, B., Xu, X. & Tao, D. 2016, 'Real-time video dehazing based on spatio-temporal MRF', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 306-314.
&copy; Springer International Publishing AG 2016.Video dehazing has a wide range of real-time applications, but the challenges mainly come from spatio-temporal coherence and computational efficiency. In this paper, a spatio-temporal optimization framework for real-time video dehazing is proposed, which reduces blocking and flickering artifacts and achieves high-quality enhanced results. We build a Markov Random Field (MRF) with an Intensity Value Prior (IVP) to handle spatial consistency and temporal coherence. By maximizing the MRF likelihood function, the proposed framework estimates the haze concentration and preserves the information optimally. Moreover, to facilitate real-time applications, integral image technique is approximated to reduce the main computational burden. Experimental results demonstrate that the proposed framework is effectively to remove haze and flickering artifacts, and sufficiently fast for real-time applications.
Zhang, F., Li, J., Li, F., Xu, M., Xu, Y. & He, X. 2015, 'Community detection based on links and node features in social networks', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 21st International Conference on Multimedia Modelling, MMM 2015, Springer, Sydney, Australia, pp. 418-429.
&copy; Springer International Publishing Switzerland 2015. Community detection is a significant but challenging task in the field of social network analysis. Many effective methods have been proposed to solve this problem. However, most of them are mainly based on the topological structure or node attributes. In this paper, based on SPAEM [1], we propose a joint probabilistic model to detect community which combines node attributes and topological structure. In our model, we create a novel feature-based weighted network, within which each edge weight is represented by the node feature similarity between two nodes at the end of the edge. Then we fuse the original network and the created network with a parameter and employ expectation-maximization algorithm (EM) to identify a community. Experiments on a diverse set of data, collected from Facebook and Twitter, demonstrate that our algorithm has achieved promising results compared with other algorithms.
Li, Y., Tian, X., Liu, T. & Tao, D. 2015, 'Multi-Task Model and Feature Joint Learning', http://ijcai.org/papers15/contents.php, International Joint Conference on Artificial Intelligence, AAAI Press / International Joint Conferences on Artificial Intelligence, Buenos Aires, Argentia, pp. 3643-3649.
Beveridge, J.R., Zhang, H., Draper, B.A., Flynn, P.J., Feng, Z., Huber, P., Kittler, J., Huang, Z., Li, S., Li, Y., Kan, M., Wang, R., Shan, S., Chen, X., Li, H., Hua, G., Struc, V., Krizaj, J., Ding, C., Tao, D. & Phillips, P.J. 2015, 'Report on the FG 2015 Video Person Recognition Evaluation', Proceedings of the 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2015, IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), IEEE, Ljubljana, Slovenia, pp. 1-8.
&copy; 2015 IEEE. This report presents results from the Video Person Recognition Evaluation held in conjunction with the 11th IEEE International Conference on Automatic Face and Gesture Recognition. Two experiments required algorithms to recognize people in videos from the Point-and-Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. Five groups from around the world participated in the evaluation. The video handheld experiment was included in the International Joint Conference on Biometrics (IJCB) 2014 Handheld Video Face and Person Recognition Competition. The top verification rate from this evaluation is double that of the top performer in the IJCB competition. Analysis shows that the factor most effecting algorithm performance is the combination of location and action: where the video was acquired and what the person was doing.
He, X., Luo, S., Tao, D., Xu, C., Yang, J. & Abul Hasan, M. 2015, 'MultiMedia Modeling: 21st International Conference, MMM 2015 Sydney, NSW, Australia, January 5-7, 2015 Proceedings, Part II', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).
Wu, S., Zhang, X., Guan, N., Tao, D., Huang, X. & Luo, Z. 2015, 'Non-negative low-rank and group-sparse matrix factorization', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 536-547.
&copy; Springer International Publishing Switzerland 2015.Non-negative matrix factorization (NMF) has been a popular data analysis tool and has been widely applied in computer vision. However, conventional NMF methods cannot adaptively learn grouping structure froma dataset.This paper proposes a non-negative low-rank and group-sparse matrix factorization (NLRGS) method to overcome this deficiency. Particularly, NLRGS captures the relationships among examples by constraining rank of the coefficients meanwhile identifies the grouping structure via group sparsity regularization. By both constraints, NLRGS boosts NMF in both classification and clustering. However, NLRGS is difficult to be optimized because it needs to deal with the low-rank constraint. To relax such hard constraint, we approximate the low-rank constraint with the nuclear norm and then develop an optimization algorithm for NLRGS in the frame of augmented Lagrangian method(ALM). Experimental results of both face recognition and clustering on four popular face datasets demonstrate the effectiveness of NLRGS in quantities.
Liu, T. & Tao, D. 2014, 'On the Robustness and Generalization of Cauchy Regression', 2014 4th IEEE International Conference on Information Science and Technology (ICIST), IEEE, Shenzhen, China, pp. 101-106.
It was recently highlighted in a special issue of Nature [1] that the value of big data has yet to be effectively exploited for innovation, competition and productivity. To realize the full potential of big data, big learning algorithms need to be developed to keep pace with the continuous creation, storage and sharing of data. Least squares (LS) and least absolute deviation (LAD) have been successful regression tools used in business, government and society over the past few decades. However, these existing technologies are severely limited by noisy data because their breakdown points are both zero, i.e., they do not tolerate outliers. By appropriately setting the turning constant of Cauchy regression (CR), the maximum possible value (50%) of the breakdown point can be attained. CR therefore has the capability to learn a robust model from noisy big data. Although the theoretical analysis of the breakdown point for CR has been comprehensively investigated, we propose a new approach by interpreting the optimization of an objective function as a sample-weighted procedure. We therefore clearly show the differences of the robustness between LS, LAD and CR. We also study the statistical performance of CR. This study derives the generalization error bounds for CR by analyzing the covering number and Rademacher complexity of the hypothesis class, as well as showing how the scale parameter affects its performance.
Hong, Z., Wang, C., Mei, X., Prokhorov, D. & Tao, D. 2014, 'Tracking Using Multilevel Quantizations', Computer Vision – ECCV 2014, European Conference on Computer Vision, Springer, Switzerland, pp. 155-171.
Most object tracking methods only exploit a single quantization of an image space: pixels, superpixels, or bounding boxes, each of which has advantages and disadvantages. It is highly unlikely that a common optimal quantization level, suitable for tracking all objects in all environments, exists. We therefore propose a hierarchical appearance representation model for tracking, based on a graphical model that exploits shared information across multiple quantization levels. The tracker aims to find the most possible position of the target by jointly classifying the pixels and superpixels and obtaining the best configuration across all levels. The motion of the bounding box is taken into consideration, while Online Random Forests are used to provide pixel- and superpixel-level quantizations and progressively updated on-the-fly. By appropriately considering the multilevel quantizations, our tracker exhibits not only excellent performance in non-rigid object deformation handling, but also its robustness to occlusions. A quantitative evaluation is conducted on two benchmark datasets: a non-rigid object tracking dataset (11 sequences) and the CVPR2013 tracking benchmark (50 sequences). Experimental results show that our tracker overcomes various tracking challenges and is superior to a number of other popular tracking methods.
Xu, Z., Tao, D., Zhang, Y., Wu, J. & Tsoi, A.C. 2014, 'Architectural Style Classification Using Multinomial Latent Logistic Regression', 13th European Conference on Computer Vision, Proceedings, Part I, 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 600-615.
Architectural style classification differs from standard classification tasks due to the rich inter-class relationships between different styles, such as re-interpretation, revival, and territoriality. In this paper, we adopt Deformable Part-based Models (DPM) to capture the morphological characteristics of basic architectural components and propose Multinomial Latent Logistic Regression (MLLR) that introduces the probabilistic analysis and tackles the multi-class problem in latent variable models. Due to the lack of publicly available datasets, we release a new large-scale architectural style dataset containing twenty-five classes. Experimentation on this dataset shows that MLLR in combination with standard global image features, obtains the best classification results. We also present interpretable probabilistic explanations for the results, such as the styles of individual buildings and a style relationship network, to illustrate inter-class relationships.
Hong, Z., Mei, X., Prokhorov, D. & Tao, D. 2013, 'Tracking via Robust Multi-task Multi-view Joint Sparse Representation', Proceedings of IEEE International Conference on Computer Vision, IEEE International Conference on Computer Vision, IEEE, Sydney, pp. 649-656.
Combining multiple observation views has proven beneficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely represented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. We show that the proposed formulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared to several state-of-the-art trackers.
Peng, H., Deng, C., An, L., Gao, X. & Tao, D. 2013, 'Learning to multimodal hash for robust video copy detection', IEEE International Conference on Image Processing, ICIP 2013, IEEE International Conference on Image Processing, ICIP 2013, IEEE, Melbourne, Australia, pp. 4482-4486.
Content-based video copy detection (CBVCD) has attracted increasing attention in recent years. However, video content description and search efficiency are still two challenges in this domain. To cope with these two problems, this paper proposes a novel CBVCD approach with similarity preserving multimodal hash learning (SPM2H). The pre-processed video keyframes are represented as multiple features from different perspectives. SPM2H integrates the multimodal feature fusion and the hashing function learning into a joint framework. Mapping video keyframes into hash codes can conducts fast similarity search in the Hamming space. The experiments show that our approach achieves good performance in accuracy as well as efficiency.
Zhang, T., Ji, R., Liu, W., Tao, D. & Hua, G. 2013, 'Semi-supervised learning with manifold fitted graphs', International Joint Conference on Artificial Intelligence, nternational Joint Conferences on Artificial Intelligence, AAAI, Beijing, China, pp. 1896-1902.
In this paper, we propose a locality-constrained and sparsity-encouraged manifold fitting approach, aiming at capturing the locally sparse manifold structure into neighborhood graph construction by exploiting a principled optimization model. The proposed model formulates neighborhood graph construction as a sparse coding problem with the locality constraint, therefore achieving simultane- ous neighbor selection and edge weight optimiza- tion. The core idea underlying our model is to per- form a sparse manifold fitting task for each data point so that close-by points lying on the same local manifold are automatically chosen to connect and meanwhile the connection weights are acquired by simple geometric reconstruction. We term the nov- el neighborhood graph generated by our proposed optimization model M - Fitted Graph since such a graph stems from sparse manifold fitting. To eval- uate the robustness and effectiveness of M -fitted graphs, we leverage graph-based semi-supervised learning as the testbed. Extensive experiments car- ried out on six benchmark datasets validate that the proposed M -fitted graph is superior to state- of-the-art neighborhood graphs in terms of classi- fication accuracy using popular graph-based semi- supervised learning methods.
Zhou, T. & Tao, D. 2013, 'Shifted Subspaces Tracking on Sparse Outlier for Motion Segmentation', International Joint Conference on Artificial Intelligence, 2013 International Joint Conferences on Artificial Intelligence, AAAI, Beijing, China, pp. 1946-1952.
In low-rank & sparse matrix decomposition, the entries of the sparse part are often assumed to be i.i.d. sampled from a random distribution. But the structure of sparse part, as the central interest of many problems, has been rarely studied. One motivating problem is tracking multiple sparse object flows (motions) in video. We introduce "shifted subspaces tracking (SST)" to segment the motions and recover their trajectories by exploring the low-rank property of background and the shifted subspace property of each motion. SST is composed of two steps, background modeling and flow tracking. In step 1, we propose "semi-soft GoDec" to separate all the motions from the low-rank background L as a sparse outlier S. Its soft-thresholding in updating S significantly speeds up GoDec and facilitates the parameter tuning. In step 2, we update X as S obtained in step 1 and develop "SST algorithm" further decomposing X as X = Si=1k L(i)ot(i)+ S+G, wherein L(i) is a low-rank matrix storing the ith flow after transformation t(i). SST algorithm solves k sub-problems in sequel by alternating minimization, each of which recovers one L(i) and its t(i) by randomized method. Sparsity of L(i) and between-frame affinity are leveraged to save computations. We justify the effectiveness of SST on surveillance video sequences.
Zhou, T. & Tao, D. 2013, 'k-bit Hamming compressed sensing', IEEE International Symposium on Information Theory, 2013 IEEE International Symposium on Information Theory, IEEE, Istanbul, Turkey, pp. 679-683.
Wei, L., Guan, N., Zhang, X., Luo, Z. & Tao, D. 2013, 'Orthogonal Nonnegative Locally Linear Embedding', 2013 IEEE International Conference on Systems, Man, and Cybernetics, 2013 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE, Manchester, UK, pp. 2134-2139.
Nonnegative matrix factorization (NMF) decomposes a nonnegative dataset X into two low-rank nonnegative factor matrices, i.e., W and H, by minimizing either Kullback-Leibler (KL) divergence or Euclidean distance between X and WH. NMF has been widely used in pattern recognition, data mining and computer vision because the non-negativity constraints on both W and H usually yield intuitive parts-based representation. However, NMF suffers from two problems: 1) it ignores geometric structure of dataset, and 2) it does not explicitly guarantee parts-based representation on any datasets. In this paper, we propose an orthogonal nonnegative locally linear embedding (ONLLE) method to overcome aforementioned problems. ONLLE assumes that each example embeds in its nearest neighbors and keeps such relationship in the learned subspace to preserve geometric structure of a dataset. For the purpose of learning parts-based representation, ONLLE explicitly incorporates an orthogonality constraint on the learned basis to keep its spatial locality. To optimize ONLLE, we applied an efficient fast gradient descent (FGD) method on Stiefel manifold which accelerates the popular multiplicative update rule (MUR). The experimental results on real-world datasets show that FGD converges much faster than MUR. To evaluate the effectiveness of ONLLE, we conduct both face recognition and image clustering on real-world datasets by comparing with the representative NMF methods.
Luo, Y., Tao, D., Xu, C., Li, D. & Xu, C. 2013, 'Vector-valued multi-view semi-supervised learning for multi-label image classification', Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI Press, Bellevue, Washington, USA, pp. 647-653.
Images are usually associated with multiple labels and comprised of multiple views, due to each image containing several objects (e.g. a pedestrian, bicycle and tree) and multiple visual features (e.g. color, texture and shape). Currently available tools tend to use either labels or features for classification, but both are necessary to describe the image properly. There have been recent successes in using vector-valued functions, which construct matrix-valued kernels, to explore the multi-label structure in the output space. This has motivated us to develop multi-view vector-valued manifold regularization (MV$^3$MR) in order to integrate multiple features. MV$^3$MR exploits the complementary properties of different features, and discovers the intrinsic local geometry of the compact support shared by different features, under the theme of manifold regularization. We validate the effectiveness of the proposed MV$^3$MR methodology for image classification by conducting extensive experiments on two challenge datasets, PASCAL VOC' 07 and MIR Flickr
Wu, F., Tan, X., Yang, Y., Tao, D., Tang, S. & Zhuang, Y. 2013, 'Supervised Nonnegative Tensor Factorization with Maximum-Margin Constraint', Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI Press, Bellevue, Washington, USA, pp. 962-968.
Non-negative tensor factorization (NTF) has attracted great attention in the machine learning community. In this paper, we extend traditional non-negative tensor factorization into a supervised discriminative decomposition, referred as Supervised Non-negative Tensor Factorization with Maximum-Margin Constraint(SNTFM2). SNTFM2 formulates the optimal discriminative factorization of non-negative tensorial data as a coupled least-squares optimization problem via a maximum-margin method. As a result, SNTFM2 not only faithfully approximates the tensorial data by additive combinations of the basis, but also obtains a strong generalization power to discriminative analysis (in particularfor classification in this paper). The experimental results show the superiority of our proposed model over state-of-the-art techniques on both toy and real world data sets.
Zhou, T. & Tao, D. 2013, 'Greedy Bilateral Sketch, Completion & Smoothing', Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, International Conference on Artificial Intelligence and Statistics, JMLR.org, Scottsdale, AZ, USA, pp. 650-658.
Recovering a large low-rank matrix from highly corrupted, incomplete or sparse outlier overwhelmed observations is the crux of various intriguing statistical problems. We explore the power of "greedy bilateral (GreB)" paradigm in reducing both time and sample complexities for solving these problems. GreB models a low-rank variable as a bilateral factorization, and updates the left and right factors in a mutually adaptive and greedy incremental manner. We detail how to model and solve low-rank approximation, matrix completion and robust PCA in GreBs paradigm. On their MATLAB implementations, approximating a noisy 10000x10000 matrix of rank 500 with SVD accuracy takes 6s; MovieLens10M matrix of size 69878x10677 can be completed in 10s from 30% of 107 ratings with RMSE 0.86 on the rest 70%; the low-rank background and sparse moving outliers in a 120x160 video of 500 frames are accurately separated in 1s. This brings 30 to 100 times acceleration in solving these popular statistical problems
Liu, X., Song, M., Tao, D., Liu, Z., Zhang, L., Chen, C. & Bu, J. 2013, 'Semi-supervised node splitting for random forest construction', 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Portland, Oregon, pp. 492-499.
Node splitting is an important issue in Random Forest but robust splitting requires a large number of training samples. Existing solutions fail to properly partition the feature space if there are insufficient training data. In this paper, we present semi-supervised splitting to overcome this limitation by splitting nodes with the guidance of both labeled and unlabeled data. In particular, we derive a nonparametric algorithm to obtain an accurate quality measure of splitting by incorporating abundant unlabeled data. To avoid the curse of dimensionality, we project the data points from the original high-dimensional feature space onto a low-dimensional subspace before estimation. A unified optimization framework is proposed to select a coupled pair of subspace and separating hyper plane such that the smoothness of the subspace and the quality of the splitting are guaranteed simultaneously. The proposed algorithm is compared with state-of-the-art supervised and semi-supervised algorithms for typical computer vision applications such as object categorization and image segmentation. Experimental results on publicly available datasets demonstrate the superiority of our method.
Deng, C., Ji, R., Liu, W., Tao, D. & Gao, X. 2013, 'Visual Reranking through Weakly Supervised Multi-Graph Learning', IEEE International Conference on Computer Vision, ICCV 2013, IEEE International Conference on Computer Vision, ICCV 2013, IEEE, Sydney, Australia, pp. 2600-2607.
Visual reranking has been widely deployed to refine the quality of conventional content-based image retrieval en- gines. The current trend lies in employing a crowd of re- trieved results stemming from multiple feature modalities to boost the overall performance of visual reranking. Howev- er, a major challenge pertaining to current reranking meth- ods is how to take full advantage of the complementary property of distinct feature modalities. Given a query im- age and one feature modality, a regular visual reranking framework treats the top-ranked images as pseudo positive instances which are inevitably noisy, difficult to reveal this complementary property, and thus lead to inferior ranking performance. This paper proposes a novel image rerank- ing approach by introducing a Co-Regularized Multi-Graph Learning (Co-RMGL) framework, in which the intra-graph and inter-graph constraints are simultaneously imposed to encode affinities in a single graph and consistency across d- ifferent graphs. Moreover, weakly supervised learning driv- en by image attributes is performed to denoise the pseudo- labeled instances, thereby highlighting the unique strength of individual feature modality. Meanwhile, such learning can yield a few anchors in graphs that vitally enable the alignment and fusion of multiple graphs. As a result, an edge weight matrix learned from the fused graph automat- ically gives the ordering to the initially retrieved results. We evaluate our approach on four benchmark image re- trieval datasets, demonstrating a significant performance gain over the state-of-the-arts
Zhou, T., Bian, W. & Tao, D. 2013, 'Divide-and-Conquer Anchoring for Near-Separable Nonnegative Matrix Factorization and Completion in High Dimensions', IEEE 13th International Conference on Data Mining, IEEE 13th International Conference on Data Mining, IEEE, Dallas, TX, USA, pp. 917-926.
Abstract Nonnegative matrix factorization (NMF) becomes tractable in polynomial time with unique solution under separability assumption , which postulates all the data points are contained in the conical hull of a few anchor data points. Recently developed linear programming and greedy pursuit methods can pick out the anchors from noisy data and results in a near-separable NMF. But their efficiency could be seriously weakened in high dimensions. In this paper, we show that the anchors can be precisely located from low- dimensional geometry of the data points even when their high dimensional features suffer from serious incompleteness. Our framework, entitled divide-and-conquer anchoring (DCA), divides the high-dimensional anchoring problem into a few cheaper sub-problems seeking anchors of data projections in low-dimensional random spaces, which can be solved in parallel by any near-separable NMF, and combines all the detected low-dimensional anchors via a fast hypothesis testing to identify the original anchors. We further develop two non- iterative anchoring algorithms in 1D and 2D spaces for data in convex hull and conical hull, respectively. These two rapid algorithms in the ultra low dimensions suffice to generate a robust and efficient near-separable NMF for high-dimensional or incomplete data via DCA. Compared to existing methods, two vital advantages of DCA are its scalability for big data, and capability of handling incomplete and high-dimensional noisy data. A rigorous analysis proves that DCA is able to find the correct anchors of a rank- k matrix by solving O ( k log k ) sub- problems. Finally, we show DCA outperforms state-of-the-art methods on various datasets and tasks.
Zhang, K., Gao, X., Tao, D. & Li, X. 2013, 'Image super-resolution via non-local steering kernel regression regularization', IEEE International Conference on Image Processing, ICIP 2013, IEEE International Conference on Image Processing, ICIP 2013, IEEE, Melbourne, Australia, pp. 943-946.
In this paper, we employ the non-local steering kernel regres- sion to construct an effective regularization term for the sin- gle image super-resolution problem. The proposed method seamlessly integrates the properties of local structural regu- larity and non-local self-similarity existing in natural images, and solves a least squares minimization problem for obtain- ing the desired high-resolution image. Extensive experimen- tal results on both simulated and real low-resolution images demonstrate that the proposed method can restore compelling results with sharp edges and fine textures.
Zhao, H., Cheng, J., Jiang, J. & Tao, D. 2013, 'Multiple instance learning via distance metric optimization', IEEE International Conference on Image Processing, ICIP 2013, IEEE International Conference on Image Processing, ICIP 2013, IEEE, Melbourne, Australia, pp. 2617-2621.
Multiple Instance Learning (MIL) has been widely applied in practice, such as drug activity prediction, content-based im- age retrieval. In MIL, a sample, comprised of a set of in- stances, is called a bag. Labels are assigned to bags instead of instances. The uncertainty of labels on instances makes MIL different from conventional supervised single instance learn- ing (SIL) tasks. Therefore, it is critical to learn an effective mapping to convert an MIL task to an SIL task. In this pa- per, we present OptMILES by learning the optimal transfor- mation on the bag-to-instance similarity measure, exploring the optimal distance metric between instances, by an alternat- ing minimization training procedure. We thoroughly evalu- ate the proposed method on both a synthetic dataset and real world datasets by comparing with representative MIL algo- rithms. The experimental results suggest the effectiveness of OptMILES
Li, J. & Tao, D. 2013, 'A Bayesian factorised covariance model for image analysis', International Joint Conference on Artificial Intelligence, 2013 International Joint Conferences on Artificial Intelligence, AAAI, Beijing, China, pp. 1465-1471.
Mu, Y., Ding, W., Zhou, T. & Tao, D. 2013, 'Constrained stochastic gradient descent for large-scale least squares problem', ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, Chicago, IL, USA, pp. 883-891.
The least squares problem is one of the most important re- gression problems in statistics, machine learning and data mining. In this paper, we present the Constrained Stochas- tic Gradient Descent (CSGD) algorithm to solve the large- scale least squares problem. CSGD improves the Stochastic Gradient Descent (SGD) by imposing a provable constraint that the linear regression line passes through the mean point of all the data points. It results in the best regret bound O (log T ), and fastest convergence speed among all first or- der approaches. Empirical studies justify the effectiveness of CSGD by comparing it with SGD and other state-of-the- art approaches. An example is also given to show how to use CSGD to optimize SGD based least squares problems to achieve a better performance.
Cheng, J., Liu, J., Tao, D., Yin, F., Wong, D.W., Xu, Y. & Wong, T.Y. 2013, 'Superpixel Classification Based Optic Cup Segmentation', Lecture Notes in Computer Science, Springer Berlin Heidelberg, Nagoya, Japan, pp. 421-428.
In this paper, we propose a superpixel classification based optic cup segmentation for glaucoma detection. In the proposed method, each optic disc image is first over-segmented into superpixels. Then mean intensities, center surround statistics and the location features are extracted from each superpixel to classify it as cup or non-cup. The proposed method has been evaluated in one database of 650 images with manual optic cup boundaries marked by trained professionals and one database of 1676 images with diagnostic outcome. Experimental results show average overlapping error around 26.0% compared with manual cup region and area under curve of the receiver operating characteristic curve in glaucoma detection at 0.811 and 0.813 in the two databases, much better than other methods. The method could be used for glaucoma screening.
Xu, C., Tao, D., Li, Y. & Xu, C. 2013, 'Large-margin multi-view Gaussian process for image classification', Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, International Conference on Internet Multimedia Computing and Service, ACM, Huangshan, China, pp. 7-12.
In image classification, the goal is to decide whether an image belongs to a certain category or not. Multiple features are usually employed to comprehend the contents of images substantially for the improvement of classification accuracy. However, it also brings in some new problems that how to effectively combine multiple features together, and how to handle the high-dimensional features from multiple views given the small training set. In this paper, we present a large-margin Gaussian process approach to discover the latent space shared by multiple features. Therefore, multiple features can complement each other in this low-dimensional latent space, which derives a strong discriminative ability from the large-margin principle, and then the following classification task can be effectively accomplished. The resulted objective function can be efficiently solved using the gradient descent techniques. Finally, we demonstrate the advantages of the proposed algorithm on real-world image datasets for discovering discriminative latent space and improving the classification performance.
Gunther, M., Costa-Pazo, A., Ding, C., Boutellaa, E., Chiachia, G., Zhang, H., De Assis Angeloni, M., Struc, V., Khoury, E., Vazquez-Fernandez, E., Tao, D., Bengherabi, M., Cox, D., Kiranyaz, S., De Freitas Pereira, T., Zganec-Gros, J., Argones-Rua, E., Pinto, N., Gabbouj, M., Simoes, F., Dobrisek, S., Gonzalez-Jimenez, D., Rocha, A., Neto, M.U., Pavesic, N., Falcao, A., Violato, R. & Marcel, S. 2013, 'The 2013 face recognition evaluation in mobile environment', Proceedings - 2013 International Conference on Biometrics, ICB 2013.
Automatic face recognition in unconstrained environments is a challenging task. To test current trends in face recognition algorithms, we organized an evaluation on face recognition in mobile environment. This paper presents the results of 8 different participants using two verification metrics. Most submitted algorithms rely on one or more of three types of features: local binary patterns, Gabor wavelet responses including Gabor phases, and color information. The best results are obtained from UNILJ-ALP, which fused several image representations and feature types, and UC-HU, which learns optimal features with a convolutional neural network. Additionally, we assess the usability of the algorithms in mobile devices with limited resources. &copy; 2013 IEEE.
Li, J. & Tao, D. 2012, 'Sampling Normal Distribution Restricted on Multiple Regions', International Conference on Neural Information Processing, Springer-Verlag, Doha, Qatar, pp. 492-500.
We develop an accept-reject sampler for probability densities that have the similar form of a normal density function, but supported on restricted regions. Compared to existing techniques, the proposed method deals with multiple disjoint regions, truncated on one or both sides. For the original problem of sampling from one region, the efficiency is enhanced as well. We verify the desirable attributes of the proposed algorithm by both theoretical analysis and simulation studies.
Wu, Z., Wu, J., Cao, J. & Tao, D. 2012, 'HySAD: a semi-supervised hybrid shilling attack detector for trustworthy product recommendation', KDD '12: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Beijing, China, pp. 985-993.
Shilling attackers apply biased rating profiles to recommender systems for manipulating online product recommendations. Although many studies have been devoted to shilling attack detection, few of them can handle the hybrid shilling attacks that usually happen in practice, and the studies for real-life applications are rarely seen. Moreover, little attention has yet been paid to modeling both labeled and unlabeled user profiles, although there are often a few labeled but numerous unlabeled users available in practice. This paper presents a Hybrid Shilling Attack Detector, or HySAD for short, to tackle these problems. In particular, HySAD introduces MC-Relief to select effective detection metrics, and Semi-supervised Naive Bayes (SNB_lambda) to precisely separate Random-Filler model attackers and Average-Filler model attackers from normal users. Thorough experiments on MovieLens and Netflix datasets demonstrate the effectiveness of HySAD in detecting hybrid shilling attacks, and its robustness for various obfuscated strategies. A real-life case study on product reviews of Amazon.cn is also provided, which further demonstrates that HySAD can effectively improve the accuracy of a collaborative-filtering based recommender system, and provide interesting opportunities for in-depth analysis of attacker behaviors. These, in turn, justify the value of HySAD for real-world applications.
Zhou, T. & Tao, D. 2012, 'Labelset anchored subspace ensemble (LASE) for multi-label annotation', Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ACM International Conference on Multimedia Retrieval, ACM, Hong Kong, pp. 1-8.
In multimedia retrieval, multi-label annotation for image, text and video is challenging and attracts rapidly growing interests in past decades. The main crux of multi-label annotation lies on 1) how to reduce the model complexity when the label space expands exponentially with the increase of the number of labels; and 2) how to leverage the label correlations which have broadly believed useful for boosting annotation performance. In this paper, we propose "labelsets anchored subspace ensemble (LASE)" to solve both problems in an efficient scheme, whose training is a regularized matrix decomposition and prediction is an inference of group sparse representations. In order to shrink the label space, we firstly introduce "label distilling" extracting the frequent labelsets to replace the original labels. In the training stage, the data matrix is decomposed as the sum of several low-rank matrices and a sparse residual via a randomized optimization, where each low-rank part defines a feature subspace mapped by a labelset. A manifold regularization is applied to map the labelset geometry to the geometry of the obtained subspaces. In the prediction stage, the group sparse representation of a new sample on the subspace ensemble is estimated by group lasso. The selected subspaces indicate the labelsets that the sample should be annotated with. Experiments on several benchmark datasets of texts, images, web data and videos validate the appealing performance of LASE in multi-label annotation.
Liu, X., Song, M., Zhang, L., Tao, D., Bu, J. & Chen, C. 2012, 'Pedestrian detection using a mixture mask model', 2012 9th IEEE International Conference on Networking, Sensing and Control (ICNSC), IEEE International Conference on Networking, Sensing and Control (ICNSC), IEEE, Beijing, China, pp. 271-276.
Pedestrian detection is one of the fundamental tasks of an intelligent transportation system. Differences in illumination, posture and point of view make pedestrian detection confront with great challenges. In this paper, we focus on the main defect in the existing methods: the interference of the non-person area. Firstly, we use mapping vectors to map the original feature matrix to the different mask spaces, then using a part-based structure, we implicitly formulate the model into a multiple-instance problem, and finally use a MIL-SVM to solve the problem. Based on the model, we design a system which can find pedestrians from pictures. We give detailed description on the model and the system in this paper. The experimental results on public data sets show that our method decreases the miss rate greatly
Shi, M., Sun, X., Tao, D. & Xu, C. 2012, 'Exploiting visual word co-occurrence for image retrieval', Proceedings of the 20th ACM international Conference on Multimedia, ACM international Conference on Multimedia, ACM, Nara, Japan, pp. 69-78.
Bag-of-visual-words (BOVW) based image representation has received intense attention in recent years and has improved content based image retrieval (CBIR) significantly. BOVW does not consider the spatial correlation between visual words in natural images and thus biases the generated visual words towards noise when the corresponding visual features are not stable. In this paper, we construct a visual word co-occurrence table by exploring visual word co-occurrence extracted from small affine-invariant regions in a large collection of natural images. Based on this visual word co-occurrence table, we first present a novel high-order predictor to accelerate the generation of neighboring visual words. A co-occurrence matrix is introduced to refine the similarity measure for image ranking. Like the inverse document frequency (idf), it down-weights the contribution of the words that are less discriminative because of frequent co-occurrence. We conduct experiments on Oxford and Paris Building datasets, in which the ImageNet dataset is used to implement a large scale evaluation. Thorough experimental results suggest that our method outperforms the state-of-the-art, especially when the vocabulary size is comparatively small. In addition, our method is not much more costly than the BOVW model.
Wang, S., Zhao, Q., Song, M., Bu, J., Chen, C. & Tao, D. 2012, 'Learning Visual Saliency based on Object's Relative Relationship', The 19th International Conference on Neural Information Processing, Springer, Doha, Qatar, pp. 318-327.
As a challenging issue in both computer vision and psychological research, visual attention has arouse a wide range of discussions and studies in recent years. However, conventional computational models mainly focus on low-level information, while high-level information and their interrelationship are ignored. In this paper, we stress the issue of relative relationship between high-level information, and a saliency model based on low-level and high-level analysis is also proposed. Firstly, more than 50 categories of objects are selected from nearly 800 images in MIT data set[1], and concrete quantitative relationship is learned based on detail analysis and computation. Secondly, using the least square regression with constraints method, we demonstrate an optimal saliency model to produce saliency maps. Experimental results indicate that our model outperforms several state-of-art methods and produces better matching to human eye-tracking data.
Chen, D., Cheng, J. & Tao, D. 2012, 'Clustering-based Discriminative Locality Alignment for Face Gender Recognition', 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Vilamoura, Portugal, pp. 4156-4161.
To facilitate human-robot interactions, human gender information is very important. Motivated by the success of manifold learning for visual recognition, we present a novel clustering-based discriminative locality alignment (CDLA) algorithm to discover the low-dimensional intrinsic submanifold from the embedding high-dimensional ambient space for improving the face gender recognition performance. In particular, CDLA exploits the global geometry through k-means clustering, extracts the discriminative information through margin maximization and explores the local geometry through intra cluster sample concentration. These three properties uniquely characterize CDLA for face gender recognition. The experimental results obtained from the FERET data sets suggest the superiority of the proposed method in terms of recognition speed and accuracy by comparing with several representative methods
Zhou, T. & Tao, D. 2012, '1-bit Hamming compressed sensing', IEEE International Symposium on Information Theory - Proceedings, IEEE International Symposium on Information Theory, IEEE, Cambridge, USA, pp. 1862-1866.
Compressed sensing (CS) and 1-bit CS cannot directly recover quantized signals preferred in digital systems and require time consuming recovery. In this paper, we introduce 1-bit Hamming compressed sensing (HCS) that directly recovers a k-bit quantized signal of dimension n from its 1-bit measurements via invoking n times of Kullback-Leibler divergence based nearest neighbor search. Compared to CS and 1-bit CS, 1-bit HCS allows the signal to be dense, takes considerably less (linear and non-iterative) recovery time and requires substantially less measurements. Moreover, 1-bit HCS can accelerate 1bit CS recover. We study a quantized recovery error bound of 1-bit HCS for general signals. Extensive numerical simulations verify the appealing accuracy, robustness, efficiency and consistency of 1-bit HCS.
Zhou, T. & Tao, D. 2012, 'Bilateral random projections', 2012 IEEE International Symposium on Information Theory Proceedings (ISIT), IEEE International Symposium on Information Theory Proceedings (ISIT), IEEE, Cambridge, USA, pp. 1286-1290.
Low-rank structure have been profoundly studied in data mining and machine learning. In this paper, we show a dense matrix X's low-rank approximation can be rapidly built from its left and right random projections Y1 = XA1 and Y2 = XT A2, or bilateral random projection (BRP). We then show power scheme can further improve the precision. The deterministic, average and deviation bounds of the proposed method and its power scheme modification are proved theoretically. The effectiveness and the efficiency of BRP based low-rank approximation is empirically verified on both artificial and real datasets.
He, L., Tao, D., Li, X. & Gao, X. 2012, 'Sparse representation for blind image quality assessment', 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Providence, USA, pp. 1146-1153.
Blind image quality assessment (BIQA) is an important yet difficult task in image processing related applications. Existing algorithms for universal BIQA learn a mapping from features of an image to the corresponding subjective quality or divide the image into different distortions before mapping. Although these algorithms are promising, they face the following problems: (1) they require a large number of samples (pairs of distorted image and its subjective quality) to train a robust mapping; (2) they are sensitive to different datasets; and (3) they have to be retrained when new training samples are available. In this paper, we introduce a simple yet effective algorithm based upon the sparse representation of natural scene statistics (NSS) feature. It consists of three key steps: extracting NSS features in the wavelet domain, representing features via sparse coding, and weighting differential mean opinion scores by the sparse coding coefficients to obtain the final visual quality values. Thorough experiments on standard databases show that the proposed algorithm outperforms representative BIQA algorithms and some full-reference metrics.
Zhang, K., Gao, X., Tao, D. & Li, X. 2012, 'Multi-scale dictionary for single image super-resolution', 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Providence, USA, pp. 1114-1121.
Reconstruction- and example-based super-resolution (SR) methods are promising for restoring a high-resolution (HR) image from low-resolution (LR) image(s). Under large magnification, reconstruction-based methods usually fail to hallucinate visual details while example-based methods sometimes introduce unexpected details. Given a generic LR image, to reconstruct a photo-realistic SR image and to suppress artifacts in the reconstructed SR image, we introduce a multi-scale dictionary to a novel SR method that simultaneously integrates local and non-local priors. The local prior suppresses artifacts by using steering kernel regression to predict the target pixel from a small local area. The non-local prior enriches visual details by taking a weighted average of a large neighborhood as an estimate of the target pixel. Essentially, these two priors are complementary to each other. Experimental results demonstrate that the proposed method can produce high quality SR recovery both quantitatively and perceptually.
Gao, Y., Wang, M., Luan, H., Shen, J., Yan, S. & Tao, D. 2011, 'Tag-Based Social Image Search with Visual-Text Joint Hypergraph Learning', Proceedings of the 2011 ACM Multimedia Conference & Co-Located Workshops, ACM Multimedia, Association for Computing Machinery, Inc. (ACM)., Scottsdale, Arizona, USA, pp. 1517-1520.
Tag-based social image search has attracted great interest and how to order the search results based on relevance level is a research problem. Visual content of images and tags have both been investigated. However, existing methods usually employ tags and visual content separately or sequentially to learn the image relevance. This paper proposes a tag-based image search with visual-text joint hypergraph learning. We simultaneously investigate the bag-of-words and bag-of-visual-words representations of images and accomplish the relevance estimation with a hypergraph learning approach. Each textual or visual word generates a hyperedge in the constructed hypergraph. We conduct experiments with a real-world data set and experimental results demonstrate the effectiveness of our approach.
Li, J. & Tao, D. 2011, 'A Probabilistic Model for Discovering High Level Brain Activities from fMRI', Lecture Notes in Computer Science, International Conference on Neural Information Processing, Springer-Verlag, Shanghai, China, pp. 329-336.
Functional magnetic resonance imaging (fMRI) has provided an invaluable method of investing real time neuron activities. Statistical tools have been developed to recognise the mental state from a batch of fMRI observations over a period. However, an interesting question is whether it is possible to estimate the real time mental states at each moment during the fMRI observation. In this paper, we address this problem by building a probabilistic model of the brain activity. We model the tempo-spatial relations among the hidden high-level mental states and observable low-level neuron activities. We verify our model by experiments on practical fMRI data. The model also implies interesting clues on the task-responsible regions in the brain.
Li, J., Bian, W., Tao, D. & Zhang, C. 2011, 'Learning Colours from Textures by Sparse Manifold Embedding', Lecture Notes in Artificial Intelligence.AI 2011: Advances in Artificial Intelligence.24th Australasian Joint Conference, AI 2011: Advances in Artificial Intelligence.24th Australasian Joint Conference, Springer-Verlag Berlin / Heidelberg, Perth, Australia, pp. 600-608.
The capability of inferring colours from the texture (grayscale contents) of an image is useful in many application areas, when the imaging device/environment is limited. Traditional colour assignment involves intensive human effort. Automatic methods have been proposed to establish relations between image textures and the corresponding colours. Existing research mainly focuses on linear relations. In this paper, we employ sparse constraints in the model of texture-colour relationship. The technique is developed on a locally linear model, which assumes manifold assumption of the distribution of the image data. Given the texture of an image patch, learning the model transfers colours to the texture patch by combining known colours of similar texture patches. The sparse constraint checks the contributing factors in the model and helps improve the stability of the colour transfer. Experiments show that our method gives superior results to those of the previous work.
Li, J. & Tao, D. 2011, 'Wisdom of Crowds: Single Image Super-resolution from the Web', ofWorkshop on Large Scale Visual Analytics with the IEEE International Conference on Data Mining, IEEE- Computer Society, Vancouver, Canada, pp. 812-816.
This paper addresses the problem of learning based single image super-resolution. Previous research on this problem employs human user to provide a set of images that are similar to the target image as a reference. Then the superresolution algorithm can learn from the provided reference images to predict the high resolution details for the target image. We propose a fully automatic scheme, which leverages the knowledge of the entire visual world and to query relevant references from the Internet. The proposed scheme is free of human supervision, and the performance compromise is small. We conduct experiments to show the effectiveness of the method.
Luo, Y., Tao, D., Geng, B., Xu, C. & Maybank, S. 2011, 'Shared Feature Extraction for Semi-supervised Image Classification', Proceedings of ACM Multimedia 2011 and the co-located Workshops, Association for Computing Machinery, Inc. (ACM), Scottsdale, AZ, USA, pp. 1165-1168.
Multi-task learning (MTL) plays an important role in image analysis applications, e.g. image classification, face recognition and image annotation. That is because MTL can estimate the latent shared subspace to represent the common features given a set of images from different tasks. However, the geometry of the data probability distribution is always supported on an intrinsic image sub-manifold that is embedded in a high dimensional Euclidean space. Therefore, it is improper to directly apply MTL to multiclass image classification. In this paper, we propose a manifold regularized MTL (MRMTL) algorithm to discover the latent shared subspace by treating the high-dimensional image space as a submanifold embedded in an ambient space. We conduct experiments on the PASCAL VOC&acirc;07 dataset with 20 classes and the MIR dataset with 38 classes by comparing MRMTL with conventional MTL and several representative image classification algorithms. The results suggest that MRMTL can properly extract the common features for image representation and thus improve the generalization performance of the image classification models.
Wang, S., Song, M., Tao, D., Zhang, L., Bu, J. & Chen, C. 2011, 'Opponent and Feedback: Visual Attention Captured', Lecture Notes in Computer Science. Neural Information Processing. 18th International Conference, ICONIP 2011, International Conference on Neural Information Processing, Springer-Verlag Berlin / Heidelberg, Shanghai, China, pp. 667-675.
Visual attention, as an important issue in computer vision field, has been raised for decades. And many approaches mainly based on the bottom-up or top-down computing models have been put forward to solve this problem. In this paper, we propose a new and effective saliency model which considers the inner opponent relationship of the image information. Inspired by the opponent and feedback mechanism in human perceptive learning, firstly, some opponent models are proposed based on the analysis of original color image information. Secondly, as both positive and negative feedbacks can be learned from the opponent models, we construct the saliency map according to the optimal combination of these feedbacks by using the least square regression with constraints method. Experimental results indicate that our model achieves a better performance both in the simple and complex nature scenes.
Zheng, S., Xie, B., Huang, K. & Tao, D. 2011, 'Multi-view Pedestrian Recognition Using Shared Dictionary Learning with Group Sparsity', Lecture Notes in Computer Science. Neural Information Processing. 18th International Conference, ICONIP 2011, International Conference on Neural Information Processing, Springer-Verlag Berlin / Heidelberg, Shanghai, China, pp. 629-638.
Pedestrian tracking in multi-camera is an important task in intelligent visual surveillance system, but it suffers from the problem of large appearance variations of the same person under different cameras. Inspired by the success of existing view transformation model in multi-view gait recognition, we present a novel view transformation model based approach named shared dictionary learning with group sparsity to address the problem. It projects the pedestrian appearance feature descriptor in probe view into the gallery one before feature descriptors matching. In this case, L1,&acirc; regularization over the latent embedding ensure the lower reconstruction error and more stable feature descriptors generation, comparing with the existing Singular Value Decomposition. Although the overall optimization function is not global convex, the Nesterovs optimal gradient scheme ensure the efficiency and reliability. Experiments on VIPeR dataset show that our approach reaches the state-of-the-art performance.
Mu, Y., Ding, W., Tao, D. & Stepinski, T.T. 2011, 'Biologically Inspired Model for Crater Detection', International Joint Conference on Neural Networks, IEEE International Joint Conference on Neural Networks, IEEE, San Jose, pp. 2487-2495.
Crater detection from panchromatic images has its unique challenges when comparing to the traditional object detection tasks. Craters are numerous, have large range of sizes and textures, and they continuously merge into image backgrounds. Using traditional feature construction methods to describe craters cannot well embody the diversified characteristics of craters. On the other hand, we are gradually revealing the secret of object recognition in the primate&acirc;s visual cortex. Biologically inspired features, designed to mimic the human cortex, have achieved great performance on object detection problem. Therefore, it is time to reconsider crater detection by using biologically inspired features. In this paper, we represent crater images by utilizing the C1 units, which correspond to complex cells in the visual cortex, and pool over the S1 units by using a maximum operation to reserve only the maximum response of each local area of the S1 units. The features generated from the C1 units have the hallmarks of size invariance and location invariance. We further extract a set of improved Haar features on each C1 map which contain gradient texture information. We apply this biologically inspired based Haar feature to crater detection. Because the feature construction process requires a set of biologically inspired transformations, these features are embedded in a high dimension space. We apply a subspace learning algorithm to find the intrinsic discriminative subspace for accurate classification. Experiments on Mars impact crater dataset show the superiority of the proposed method.
Zhang, L., Bian, W., Song, M., Tao, D. & Liu, X. 2011, 'Integrating Local Features into Discriminative Graphlets for Scene Classification', Lecture Notes in Computer Science. Neural Information Processing. 18th International Conference, ICONIP 2011, International Conference, ICONIP, Springer-Verlag Berlin / Heidelberg, Shanghai, China, pp. 657-666.
Scene classification plays an important role in multimedia information retrieval. Since local features are robust to image transformation, they have been used extensively for scene classification. However, it is difficult to encode the spatial relations of local features in the classification process. To solve this problem, Geometric Local Features Integration(GLFI) is proposed. By segmenting a scene image into a set of regions, a so-called Region Adjacency Graph(RAG) is constructed to model their spatial relations. To measure the similarity of two RAGs, we select a few discriminative templates and then use them to extract the corresponding discriminative graphlets(connected subgraphs of an RAG). These discriminative graphlets are further integrated by a boosting strategy for scene classification. Experiments on five datasets validate the effectiveness of our GLFI.
Zhou, T. & Tao, D. 2011, 'GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case', Proceedings of the 28th International Conference on Machine Learning, International Conference on Machine Learning, Omnipress, Bellevue,Washington, USA, pp. 33-40.
Low-rank and sparse structures have been profoundly studied in matrix completion and compressed sensing. In this paper, we develop 'Go Decomposition' (GoDec) to efficiently and robustly estimate the low-rank part L and the sparse part S of a matrix X = L + S + G with noise G. GoDec alternatively assigns the low-rank approximation of X - S to L and the sparse approximation of X - L to S. The algorithm can be significantly accelerated by bilateral random projections (BRP). We also propose GoDec for matrix completion as an important variant. We prove that the objective value ||X - L - S||2F converges to a local minimum, while L and S linearly converge to local optimums. Theoretically, we analyze the influence of L, S and G to the asymptotic/convergence speeds in order to discover the robustness of GoDec. Empirical studies suggest the efficiency, robustness and effectiveness of GoDec comparing with representative matrix decomposition and completion tools, e.g., Robust PCA and OptSpace.
zhuo, Z., Bu, J., Tao, D., Zhang, L., Song, M. & Chen, C. 2011, 'Describing Human Identity Using Attributes', Lecture Notes In Computer Science, Neural Information Processing,18th International Conference, ICONIP 2011, Proceedings, Part II, International Conference on Neural Information Processing, Springer-Verlag, Shanghai, China, pp. 86-94.
Smart surveillance of wide areas requires a system of multiple cameras to keep tracking people by their identities. In such multiview systems, the captured body figures and appearances of human, the orientation as well as the backgrounds are usually different camera by camera, which brings challenges to the view-invariant representation of human towards correct identification. In order to tackle this problem, we introduce an attribute based description of human identity in this paper. Firstly, two groups of attributes responsible for figure and appearance are obtained respectively. Then, Predict-Taken and Predict-Not-Taken schemes are defined to overcome the attribute-loss problem caused by different view of multi-cameras, and the attribute representation of human is obtained consequently. Thirdly, the human identification based on voter-candidate scheme is carried out by taking into account of human outside of the training data. Experimental results show that our method is robust to view changes, attributes-loss and different backgrounds.
Cheng, J., Tao, D., Liu, J., Wong, D.W., Lee, B.H., Baskaran, M., Wong, T.Y. & Aung, T. 2011, 'Focal Biologically Inspired Feature for Glaucoma Type Classification', Lecture Notes in Computer Science. 14th International Conference. Medical Image Computing and Computer-Assisted Intervention MICCAI2011, Medical Image Computing and Computer-Assisted Intervention â MICCAI, Springer-Verlag Berlin / Heidelberg, Toronto, Canada, pp. 91-98.
Glaucoma is an optic nerve disease resulting in loss of vision. There are two common types of glaucoma: open angle glaucoma and angle closure glaucoma. Glaucoma type classification is important in glaucoma diagnosis. Ophthalmologists examine the iridocorneal angle between iris and cornea to determine the glaucoma type. However, manual classification/grading of the iridocorneal angle images is subjective and time consuming. To save workload and facilitate large-scale clinical use, it is essential to determine glaucoma type automatically. In this paper, we propose to use focal biologically inspired feature for the classification. The iris surface is located to determine the focal region. The association between focal biologically inspired feature and angle grades is built. The experimental results show that the proposed method can correctly classify 85.2% images from open angle glaucoma and 84.3% images from angle closure glaucoma. The accuracy could be improved close to 90% with more images included in the training. The results show that the focal biologically inspired feature is effective for automatic glaucoma type classification. It can be used to reduce workload of ophthalmologists and diagnosis cost.
Li, Y., Geng, B., Zha, Z., Tao, D., Yang, L. & Xu, C. 2011, 'Difficulty Guided Image Retrieval using Linear Multiview Embedding', Proceedings of the 2011 ACM Multimedia Conference, ACM, Scottsdale, Arizona, USA, pp. 1169-1172.
Existing image retrieval systems suffer from a radical performance variance for different queries. The bad initial search results for &acirc;difficult&acirc; queries may greatly degrade the performance of their subsequent refinements, especially the refinement that utilizes the information mined from the search results, e.g., pseudo relevance feedback based reranking. In this paper, we tackle this problem by proposing a query difficulty guided image retrieval system, which selectively performs reranking according to the estimated query difficulty. To improve the performance of both reranking and difficulty estimation, we apply multiview embedding (ME) to images represented by multiple different features for integrating a joint subspace by preserving the neighborhood information in each feature space. However, existing ME approaches suffer from both out of sample and huge computational cost problems, and cannot be applied to online reranking or offline large-scale data processing for practical image retrieval systems. Therefore, we propose a linear multiview embedding algorithm which learns a linear transformation from a small set of data and can effectively infer the subspace features of new data. Empirical evaluations on both Oxford and 500K ImageNet datasets suggest the effectiveness of the proposed difficulty guided retrieval system with LME.
Mu, Y., Ding, W., Morabito, M. & Tao, D. 2011, 'Empirical Discriminative Tensor Analysis for Crime Forecasting', Proceedings Knowledge Science, Engineering and Management 5th International Conference, KSEM 2011, International Conference on Knowledge Science, Engineering and Management, Springer, Irvine, USA, pp. 293-304.
Police agencies have been collecting an increasing amount of information to better understand patterns in criminal activity. Recently there is a new trend in using the data collected to predict where and when crime will occur. Crime prediction is greatly beneficial because if it is done accurately, police practitioner would be able to allocate resources to the geographic areas most at risk for criminal activity and ultimately make communities safer. In this paper, we discuss a new four-order tensor representation for crime data. The tensor encodes the longitude, latitude, time, and other relevant incidents. Using the tensor data structure, we propose the Empirical Discriminative Tensor Analysis (EDTA) algorithm to obtain sufficient discriminative information while minimizing empirical risk simultaneously. We examine the algorithm on the crime data collected in one Northeastern city. EDTA demonstrates promising results compared to other existing methods in real world scenarios.
Zhang, C. & Tao, D. 2011, 'Generalization Bound for Infinitely Divisible Empirical Process', Journal of Machine Learning Research Workshop and Conference Proceedings, Fourteenth International Conference on Artificial Intelligence and Statistics, MIT Press, Ft. Lauderdale, FL, USA, pp. 864-872.
In this paper, we study the generalization bound for an empirical process of samples independently drawn from an infinitely divisible (ID) distribution, which is termed as the ID empirical process. In particular, based on a martingale method, we develop deviation inequalities for the sequence of random variables of an ID distribution. By applying the obtained deviation inequalities, we then show the generalization bound for ID empirical process based on the annealed Vapnik-Chervonenkis (VC) entropy. Afterward, according to Sauer's lemma, we get the generalization bound for ID empirical process based on the VC dimension. Finally, by using a resulted result bound, we analyze the asymptotic convergence of ID empirical process and show that the convergence rate of ID empirical process is faster than the results of the generic i.i.d. empirical process (Vapnik, 1999).
Li, Y., Geng, B., Zha, Z., Li, Y., Tao, D. & Xu, C. 2011, 'Query Expansion by Spatial Co-occurrence for Image Retrieval', Proceedings of the 2011 ACM Multimedia Conference & Co-Located Workshops, ACM Multimedia, Association for Computing Machinery, Inc. (ACM), Scottsdale, Arizona, USA, pp. 1177-1180.
The well-known bag-of-features (BoF) model is widely utilized for large scale image retrieval. However, BoF model lacks the spatial information of visual words, which is informative for local features to build up meaningful visual patches. To compensate for the spatial information loss, in this paper, we propose a novel query expansion method called Spatial Co-occurrence Query Expansion (SCQE), by utilizing the spatial co-occurrence information of visual words mined from the database images to boost the retrieval performance. In offline phase, for each visual word in the vocabulary, we treat the visual words that are frequently co-occurred with it in the database images as neighbors, base on which a spatial co-occurrence graph is built. In online phase, a query image can be expanded with some spatial co-occurred but unseen visual words according to the spatial co-occurrence graph, and the retrieval performance can be improved by expanding these visual words appropriately. Experimental results demonstrate that, SCQE achieves promising improvements over the typical BoF baseline on two datasets comprising 5K and 505K images respectively.
Bian, W. & Tao, D. 2011, 'Learning a Distance Metric by Empirical Loss Minimization', Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, AAAI Press/International Joint Conferences on Artificial Intelligence, Barcelona, Catalonia, Spain, pp. 1186-1191.
In this paper, we study the problem of learning a metric and propose a loss function based metric learning framework, in which the metric is estimated by minimizing an empirical risk over a training set. With mild conditions on the instance distribution and the used loss function, we prove that the empirical risk converges to its expected counterpart at rate of root-n. In addition, with the assumption that the best metric that minimizes the expected risk is bounded, we prove that the learned metric is consistent. Two example algorithms are presented by using the proposed loss function based metric learning framework, each of which uses a log loss function and a smoothed hinge loss function, respectively. Experimental results suggest the effectiveness of the proposed algorithms.
Zhang, L., Song, M., Bian, W., Tao, D., Liu, X., Bu, J. & Chen, C. 2011, 'Feature Relationships Hypergraph for Multimodal Recognition', Lecture Notes in Computer Science. Neural Information Processing. 18th International Conference, ICONIP 2011, International Conference on Neural Information Processing, Springer-Verlag Berlin / Heidelberg, Shanghai, China, pp. 589-598.
Utilizing multimodal features to describe multimedia data is a natural way for accurate pattern recognition. However, how to deal with the complex relationships caused by the tremendous multimodal features and the curse of dimensionality are still two crucial challenges. To solve the two problems, a new multimodal features integration method is proposed. Firstly, a so-called Feature Relationships Hypergraph (FRH) is proposed to model the high-order correlations among the multimodal features. Then, based on FRH, the multimodal features are clustered into a set of low-dimensional partitions. And two types of matrices, the interpartition matrix and intra-partition matrix, are computed to quantify the inter- and intra- partition relationships. Finally, a multi-class boosting strategy is developed to obtain a strong classifier by combining the weak classifiers learned from the intra- partition matrices. The experimental results on different datasets validate the effectiveness of our approach
Tao, D., Li, Z., Li, J., Katsaggelos, A., Bian, W., Chen, Y., Fan, J., Hu, Y., Izquierdo, E., Ji, S., Jiang, X., Kwok, J., Li, Q., Liu, J., Loog, M., Lu, H., Lu, Y.L., Maybank, S.J., Pau, D., Ro, Y.M., Shan, C., Shao, L., Smeraldi, F., Song, Y., Wang, F., Xu, Y., Yang, L., Ye, J., Yu, J., Zhang, D., Zhang, J., Zhao, X., Huang, K., Ying, Y. & Zhou, C. 2011, 'Preface', Proceedings - IEEE International Conference on Data Mining, ICDM, pp. xliii-xliv.
Li, J. & Tao, D. 2010, 'Boosted Dynamic Cognitive Activity Recognition from Brain Images', Proceedings - The 9th International Conference on Machine Learning and Applications, ICMLA 2010, International Conference on Machine Learning and Applications, IEEE, Washington, D.C., USA, pp. 361-366.
Functional Magnetic Resonance Imaging (fMRI) has become an important diagnostic tool for measuring brain haemodynamics. Previous research on analysing fMRI data mainly focuses on detecting low-level neuron activation from the ensued haemodynamic activities. An important recent advance is to show that the high-level cognitive status is recognisable from a period of fMRI records. Nevertheless, it would also be helpful to reveal dynamics of cognitive activities during the period. In this paper, we tackle the problem of discovering the dynamic cognitive activities by proposing an algorithm of boosted structure learning. We employ statistic model of random fields to represent the dynamics of the brain. To exploit the rich fMRI observations with reasonable model complexity, we build multiple models, where one model links the cognitive activities to only a fraction of the fMRI observations. We combine the simple models by using an altered AdaBoost scheme for multi-class structure learning and show theoretical justification of the proposed scheme. Empirical test shows the method effectively links the physiological and the psychological activities of the brain.
Zhou, T., Tao, D. & Wu, X. 2010, 'NESVM: a Fast Gradient Method for Support Vector Machines', IEEE International Conference on Data Mining, IEEE International Conference on Data Mining, IEEE, Sydney, Australia, pp. 679-688.
Support vector machines (SVMs) are invaluable tools for many practical applications in artificial intelligence, e.g., classification and event recognition. However, popular SVM solvers are not sufficiently efficient for applications with a great deal of samples as well as a large number of features. In this paper, thus, we present NESVM, a fast gradient SVM solver that can optimize various SVM models, e.g., classical SVM, linear programming SVM and least square SVM. Compared against SVM-Perf \cite{SVM_Perf}\cite{PerfML} (whose convergence rate in solving the dual SVM is upper bounded by $\mathcal O(1/\sqrt{k})$ where $k$ is the number of iterations) and Pegasos \cite{Pegasos} (online SVM that converges at rate $\mathcal O(1/k)$ for the primal SVM), NESVM achieves the optimal convergence rate at $\mathcal O(1/k^{2})$ and a linear time complexity. In particular, NESVM smoothes the non-differentiable hinge loss and $\ell_1$-norm in the primal SVM. Then the optimal gradient method without any line search is adopted to solve the optimization. In each iteration round, the current gradient and historical gradients are combined to determine the descent direction, while the Lipschitz constant determines the step size. Only two matrix-vector multiplications are required in each iteration round. Therefore, NESVM is more efficient than existing SVM solvers. In addition, NESVM is available for both linear and nonlinear kernels. We also propose homotopy NESVM'' to accelerate NESVM by dynamically decreasing the smooth parameter and using the continuation method. Our experiments on census income categorization, indoor/outdoor scene classification, event recognition and scene recognition suggest the efficiency and the effectiveness of NESVM. The MATLAB code of NESVM will be available on our website for further assessment.
Xie, B., Song, M., Mu, Y. & Tao, D. 2010, 'Random Projection Tree and Multiview Embedding for Large-scale Image Retrieval', The 17th International Conference on Neural Information Processing: Models and Applications (ICONIP 2010), International Conference on Neural Information Processing, Springer, Sydney, Australia, pp. 641-649.
Image retrieval on large-scale datasets is challenging. Current indexing schemes, such as k-d tree, suffer from the &acirc;curse of dimensionality&acirc;. In addition, there is no principled approach to integrate various features that measure multiple views of images, such as color histogram and edge directional histogram. We propose a novel retrieval system that tackles these two problems simultaneously. First, we use random projection trees to index data whose complexity only depends on the low intrinsic dimension of a dataset. Second, we apply a probabilistic multiview embedding algorithm to unify different features. Experiments on MSRA large-scale dataset demonstrate the efficiency and effectiveness of the proposed approach.
Li, J. & Tao, D. 2010, 'An Exponential Family Extension to Principal Component Analysis', International Conference on Neural Information Processing 2011, International Conference on Neural Information Processing, Springer, Sydney, Australia, pp. 1-9.
In this paper, we present a unified probabilistic model for constrained factorisation models, which employs exponential family distributions to represent the constrained factors. Our main objective is to provide a versatile framework, on which prototype models with various constraints can be implemented effortlessly. For learning the proposed stochastic model, Gibbs sampling is employed for model inference. We also demonstrate the utility and versatility of the model by experiments.
Zhou, T. & Tao, D. 2010, 'Backward-Forward Least Angle Shrinkage for Sparse Quadratic Optimization', Proceedings, Part I of the 17th International Conference on Neural Information Processing: Theory and Algorithms (ICONIP 2010), International Conference on Neural Information Processing, Springer, Sydney, Australia, pp. 388-396.
In compressed sensing and statistical society, dozens of algorithms have been developed to solve &acirc;1 penalized least square regression, but constrained sparse quadratic optimization (SQO) is still an open problem. In this paper, we propose backward-forward least angle shrinkage (BF-LAS), which provides a scheme to solve general SQO including sparse eigenvalue minimization. BF-LAS starts from the dense solution, iteratively shrinks unimportant variables&acirc; magnitudes to zeros in the backward step for minimizing the &acirc;1 norm, decreases important variables&acirc; gradients in the forward step for optimizing the objective, and projects the solution on the feasible set defined by the constraints. The importance of a variable is measured by its correlation w.r.t the objective and is updated via least angle shrinkage (LAS). We show promising performance of BF-LAS on sparse dimension reduction.
Xie, B., Mu, Y. & Tao, D. 2010, 'm-SNE: Multiview Stochastic Neighbor Embedding', Lecture Notes in Computer Science - Vol 6443 - Proceedings of the 17th International Conference on Neural Information Processing, International Conference on Neural Information Processing, Springer, Sydney, Australia, pp. 338-346.
In many real world applications, different features (or multiview data) can be obtained and how to duly utilize them in dimension reduction is a challenge. Simply concatenating them into a long vector is not appropriate because each view has its specific statistical property and physical interpretation. In this paper, we propose a multiview stochastic neighbor embedding (m-SNE) that systematically integrates heterogeneous features into a unified representation for subsequent processing based on a probabilistic framework. Compared with conventional strategies, our approach can automatically learn a combination coefficient for each view adapted to its contribution to the data embedding. Also, our algorithm for learning the combination coefficient converges at a rate of O(1/k2)O1k2 , which is the optimal rate for smooth problems. Experiments on synthetic and real datasets suggest the effectiveness and robustness of m-SNE for data visualization and image retrieval.
Bian, W., Li, J. & Tao, D. 2010, 'Feature Extraction For FMRI-based Human Brain Activity Recognition', Machine Learning In Medical Imaging, International Workshop on Machine Learning in Medical Imaging, Springer-Verlag Berlin, Beijing, China, pp. 148-156.
Mitchell et al. [9] demonstrated that support vector machines (SVM) are effective to classify the cognitive state of a human subject based on fRMI images observed over a single time interval. However, the direct use of classifiers on active voxels veils
Huang, Y., Huang, K., Tan, T. & Tao, D. 2009, 'A Novel Visual Organization Based On Topological Perception', Computer Vision - ACCV 2009, Pt I, Asian Conference on Computer Vision, Springer-Verlag, Xian, China, pp. 180-189.
What are the primitives of visual perception? The early feature-analysis theory insists on it being a local-to-global process which has acted as the foundation of most computer vision applications for the past 30 years. The early holistic registration th
Gao, X., Liul, N., Lui, W., Tao, D. & Li, X. 2010, 'Spatio-temporal Salience Based Video Quality Assessment', IEEE International Conference On Systems, Man And Cybernetics (SMC 2010), IEEE International Conference on Systems, Man and Cybernetics, IEEE, Istanbul, Turkey, pp. 1501-1505.
It is important to design an effective and efficient objective metric of the video quality in video processing areas. The most reliable way is subjective evaluation, thus the most reasonable objective metric should adequately consider characteristics of
Li, X., He, L., Lu, W., Gao, X. & Tao, D. 2010, 'A Novel Image Quality Metric Based On Morphological Component Analysis', IEEE International Conference On Systems, Man And Cybernetics (SMC 2010), IEEE International Conference on Systems, Man and Cybernetics, IEEE, Istanbul, Turkey, pp. 1449-1454.
Due to that human eye has different perceptual characteristics for different morphological components, so a novel image quality metric is proposed by incorporating morphological component analysis (MCA) and human visual system (HVS), which is capable of
Yan, J., Tao, D., Tian, C., Gao, X. & Li, X. 2010, 'Chinese Text Detection And Location For Images In Multimedia Messaging Service', IEEE International Conference On Systems, Man And Cybernetics (SMC 2010), IEEE International Conference on Systems, Man and Cybernetics, IEEE, Istanbul, Turkey, pp. 3896-3901.
Text detection and recognition for images in multimedia messaging service is a very important task. Since Chinese characters are composed of four kinds of strokes, i.e., horizontal line, top-down vertical line, left-downward slope line and short pausing
Liu, W., Ma, S., Tao, D., Liu, J. & Liu, P. 2010, 'Semi-Supervised Sparse Metric Learning Using Alternating Linearization Optimization', Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data, ACM SIGKDD International Conference on Knowledge Discovery and Data, Association for Computing Machinery, Inc. (ACM), Washington, DC, USA, pp. 1139-1147.
In plenty of scenarios, data can be represented as vectors and then mathematically abstracted as points in a Euclidean space. Because a great number of machine learning and data mining applications need proximity measures over data, a simple and universal distance metric is desirable, and metric learning methods have been explored to produce sensible distance measures consistent with data relationship. However, most existing methods suffer from limited labeled data and expensive training. In this paper, we address these two issues through employing abundant unlabeled data and pursuing sparsity of metrics, resulting in a novel metric learning approach called semi-supervised sparse metric learning. Two important contributions of our approach are: 1) it propagates scarce prior affinities between data to the global scope and incorporates the full affinities into the metric learning; and 2) it uses an efficient alternating linearization method to directly optimize the sparse metric. Compared with conventional methods, ours can effectively take advantage of semi-supervision and automatically discover the sparse metric structure underlying input data patterns. We demonstrate the efficacy of the proposed approach with extensive experiments carried out on six datasets, obtaining clear performance gains over the state-of-the-arts.
Liu, W., Tian, X., Tao, D. & Liu, J. 2010, 'Constrained Metric Learning via Distance Gap Maximization', Proceedings of the Twenty-Fourth AAAi conference on Artificial Intelligence (AAAI-10), AAAI Conference on Artificial Intelligenc, AAAI Press, Atlanta Georgia, pp. 518-524.
Vectored data frequently occur in a variety of fields, which are easy to handle since they can be mathematically abstracted as points residing in a Euclidean space. An appropriate distance metric in the data space is quite demanding for a great number of applications. In this paper, we pose robust and tractable metric learning under pairwise constraints that are expressed as similarity judgements between data pairs. The major features of our approach include: 1) it maximizes the gap between the average squared distance among dissimilar pairs and the average squared distance among similar pairs; 2) it is capable of propagating similar constraints to all data pairs; and 3) it is easy to implement in contrast to the existing approaches using expensive optimization such as semidefinite programming. Our constrained metric learning approach has widespread applicability without being limited to particular backgrounds. Quantitative experiments are performed for classification and retrieval tasks, uncovering the effectiveness of the proposed approach.
Wen, J., Gao, X., Li, X. & Tao, D. 2009, 'Incremental Learning Of Weighted Tensor Subspace For Visual Tracking', 2009 IEEE International Conference On Systems, Man And Cybernetics (SMC 2009), IEEE International Conference on Systems, Man and Cybernetics, IEEE, San Antonio, TX, pp. 3688-3693.
Tensor analysis has been widely utilized in image-related machine learning applications, which has preferable performance over the vector-based approaches for its capability of holding the spatial structure information in some research field. The traditi
Wang, B., Gao, X., Tao, D., Li, X. & Li, J. 2009, 'The Gabor-based Tensor Level Set Method For Multiregional Image Segmentation', Computer Analysis Of Images And Patterns, Proceedings, International Conference on Computer Analysis of Images and Patterns, Springer-Verlag Berlin, Munster, Germany, pp. 987-994.
This paper represents a new level set method for multiregional image segmentation. It employs the Gabor filter bank to extract local geometrical features and builds the pixel tensor representation whose dimensionality is reduced by using the offline tens
Bian, W., Cheng, J.L. & Tao, D. 2009, 'Biased Isomap Projections For Interactive Reranking', ICME: 2009 IEEE International Conference On Multimedia And Expo, Vols 1-3, IEEE International Conference on Multimedia and Expo, IEEE, New York, NY, pp. 1632-1635.
Image search has recently gained more and more attention for various applications. To capture users' intensions and to bridge the gap between the low level visual features and the high level semantics, a dozen of interactive reranking (IR) or relevance f
Song, D. & Tao, D. 2009, 'Discrminative Geometry Preserving Projections', 2009 16th IEEE International Conference On Image Processing, Vols 1-6, IEEE International Conference on Image Processing, IEEE, Cairo, Egypt, pp. 2429-2432.
Dimension reduction algorithms have attracted a lot of attentions in face recognition and human gait recognition because they can select a subset of effective and efficient discriminative features. In this paper, we apply the Discriminative Geometry Pres
Bian, W. & Tao, D. 2009, 'Dirichlet Mixture Allocation For Multiclass Document Collections Modeling', 2009 9th IEEE International Conference On Data Mining, IEEE International Conference on Data Mining, IEEE, Miami Beach, FL, pp. 711-715.
Topic model, Latent Dirichlet Allocation (LDA), is an effective tool for statistical analysis of large collections of documents. In LDA, each document is modeled as a mixture of topics and the topic proportions are generated from the unimodal Dirichlet d
Su, Y., Tao, D., Li, X. & Gao, X. 2009, 'Texture Representation In AAM Using Gabor Wavelet And Local Binary Patterns', 2009 IEEE International Conference On Systems, Man And Cybernetics (SMC 2009), Vols 1-9, IEEE International Conference on Systems, Man and Cybernetics, IEEE, San Antonio, TX, pp. 3274-3279.
Active appearance model (AAM) has been widely used for modeling the shape and the texture of deformable objects and matching new ones effectively. The traditional AAM consists of two parts, shape model and texture model. In the texture model, for the sak
Zhou, T. & Tao, D. 2009, 'Manifold Elastic Net For Sparse Learning', 2009 IEEE International Conference On Systems, Man And Cybernetics (SMC 2009), Vols 1-9, IEEE International Conference on Systems, Man and Cybernetics, IEEE, San Antonio, TX, pp. 3699-3704.
In this paper, we present the manifold elastic net (MEN) for sparse variable selection. MEN combines merits of the manifold regularization and the elastic net regularization, so it considers both the nonlinear manifold structure of a dataset and the spar
Wang, Y., Gao, X., Li, X., Tao, D. & Wang, B. 2009, 'Embedded Geometric Active Contour With Shape Constraint For Mass Segmentation', Computer Analysis Of Images And Patterns, Proceedings, International Conference on Computer Analysis of Images and Patterns, Springer-Verlag Berlin, Munster, Germany, pp. 995-1002.
Mass boundary segmentation plays an important role in computer aided diagnosis (CAD) system. Since the shape and boundary are crucial discriminant features in CAD, the active contour methods are more competitive in mass segmentation. However, the general
Yang, Y., Zhuang, Y., Xu, D., Pan, Y., Tao, D. & Maybank, S. 2009, 'Retrieval Based Interactive Cartoon Synthesis via Unsupervised Bi-Distance Metric Learning', 2009 ACM International Conference on Multimedia Compilation E-Proceedings (with co-located workshops & symposiums), ACM international conference on Multimedia, Association for Computing Machinery, Inc. (ACM), Beijing, China, pp. 311-320.
Cartoons play important roles in many areas, but it requires a lot of labor to produce new cartoon clips. In this paper, we propose a gesture recognition method for cartoon character images with two applications, namely content-based cartoon image retrieval and cartoon clip synthesis. We first define Edge Features (EF) and Motion Direction Features (MDF) for cartoon character images. The features are classified into two different groups, namely intra-features and inter-features. An Unsupervised Bi-Distance Metric Learning (UBDML) algorithm is proposed to recognize the gestures of cartoon character images. Different from the previous research efforts on distance metric learning, UBDML learns the optimal distance metric from the heterogeneous distance metrics derived from intra-features and inter-features. Content-based cartoon character image retrieval and cartoon clip synthesis can be carried out based on the distance metric learned by UBDML. Experiments show that the cartoon character image retrieval has a high precision and that the cartoon clip synthesis can be carried out efficiently.
Bian, W. & Tao, D. 2009, 'Manifold Regularization for SIR with Rate Root-n Convergence', Proceedings of the 2009 Conference ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 22, Annual Conference on Neural Information Processing Systems, Curran Associates, Inc, Vancouver, British Columbia, Canada, pp. 1-9.
In this paper, we study the manifold regularization for the Sliced Inverse Regression (SIR). The manifold regularization improves the standard SIR in two aspects: 1) it encodes the local geometry for SIR and 2) it enables SIR to deal with transductive and semi-supervised learning problems. We prove that the proposed graph Laplacian based regularization is convergent at rate root-n. The projection directions of the regularized SIR are optimized by using a conjugate gradient method on the Grassmann manifold. Experimental results support our theory.
Geng, B., Tao, D., Xu, C., Yang, L. & Hua, X. 2009, 'Ensemble Manifold Regularization', IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Miami USA, pp. 2396-2402.
We propose an automatic approximation of the intrinsic manifold for general semi-supervised learning problems. Unfortunately, it is not trivial to define an optimization function to obtain optimal hyperparameters. Usually, pure cross-validation is considered but it does not necessarily scale up. A second problem derives from the suboptimality incurred by discrete grid search and overfitting problems. As a consequence, we developed an ensemble manifold regularization (EMR) framework to approximate the intrinsic manifold by combining several initial guesses. Algorithmically, we designed EMR very carefully so that it (a) learns both the composite manifold and the semi-supervised classifier jointly; (b) is fully automatic for learning the intrinsic manifold hyperparameters implicitly; (c) is conditionally optimal for intrinsic manifold approximation under a mild and reasonable assumption; and (d) is scalable for a large number of candidate manifold hyperparameters, from both time and space perspectives. Extensive experiments over both synthetic and real datasets show the effectiveness of the proposed framework.
Su, Y., Gao, X., Tao, D. & Li, X. 2008, 'Gabor-based Texture Representation In AAMs', 2008 IEEE International Conference On Systems, Man And Cybernetics (SMC), Vols 1-6, IEEE International Conference on Systems, Man and Cybernetics, IEEE, Singapore, Singapore, pp. 2235-2239.
Active Appearance Models (AAMs) are generative models which can describe deformable objects. However, the texture in basic AAMs is represented using intensity values. Despite its simplicity, this representation does not contain enough information for ima
Liu, W., Tao, D. & Liu, J. 2008, 'Transductive Component Analysis', ICDM 2008: Eighth IEEE International Conference On Data Mining, Proceedings, IEEE International Conference on Data Mining, IEEE Computer Soc, Pisa, Italy, pp. 433-442.
In this paper, we study semi-supervised linear dimensionality reduction. Beyond conventional supervised methods which merely consider labeled instances, the semi-supervised scheme allows to leverage abundant and ample unlabeled instances into learning so
Niu, Z., Gao, X., Tao, D. & Li, X. 2008, 'Semantic Video Shot Segmentation Based On Color Ratio Feature And SVM', Proceedings Of The 2008 International Conference On Cyberworlds, International Conference on Cyberworlds, IEEE Computer Soc, Hangzhou, China, pp. 157-162.
With the fast development of video semantic analysis, there has been increasing attention to the typical issue of the semantic analysis of soccer program. Based on the color feature analysis, this paper focuses on the video shot segmentation problem from
Deng, C., Gao, X., Li, X. & Tao, D. 2008, 'Invariant Image Watermarking Based On Local Feature Regions', Proceedings Of The 2008 International Conference On Cyberworlds, International Conference on Cyberworlds, IEEE Computer Soc, Hangzhou, China, pp. 6-10.
In this paper, a robust image watermarking approach is presented based on image local invariant features. The affine invariant point detector is used to extract feature regions of the given host image. Image normalization and dominant gradient orientatio
Lu, W., Gao, X., Li, X. & Tao, D. 2008, 'An Image Quality Assessment Metric Based Contourlet', 2008 15th IEEE International Conference On Image Processing, Vols 1-5, IEEE International Conference on Image Processing, IEEE, San Diego, CA, pp. 1172-1175.
In reduced-reference (RR) image quality assessment (IQA), the visual quality of distorted images is evaluated with only partial information extracted from original images. In this paper, by considering the information of textures and directions during im
Deng, C., Gao, X., Tao, D. & Li, X. 2008, 'Geometrically Invariant Watermarking Using Affine Covariant Regions', 2008 15th IEEE International Conference On Image Processing, Vols 1-5, IEEE International Conference on Image Processing, IEEE, San Diego, CA, pp. 413-416.
In this paper, we present a robust approach to digital watermarking embedding and retrieval for digital images. The affine-invariant point detector is used to extract feature regions of the given host image. Image normalization and dominant gradient orie
Huang, Y., Huang, K., Wang, L., Tao, D., Tan, T. & Li, X. 2008, 'Enhanced Biologically Inspired Model', 2008 IEEE Conference On Computer Vision And Pattern Recognition, Vols 1-12, IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Anchorage, AK, pp. 2000-2007.
It has been demonstrated by Serre et al. that the biologically inspired model (BIM) is effective for object recognition. It outperforms many state-of-the-art methods in challenging databases. However, BIM has the following three problems: a very heavy co
Jia, W., Deng, C., Tao, D. & Zhang, D. 2008, 'Palmprint Identification Based On Directional Representation', 2008 IEEE International Conference On Systems, Man And Cybernetics (SMC), Vols 1-6, IEEE International Conference on Systems, Man and Cybernetics, IEEE, Singapore, Singapore, pp. 1561-1566.
In this paper, we propose a novel approach for palmprint identification, which contains two interesting components. Firstly, we propose the directional representation for appearance based approaches. The new representation is robust to drastic illuminati
Tao, D., Sun, J., Wu, X., Li, X., Shen, J., Maybank, S. & Faloutsos, C. 2007, 'Probabilistic Tensor Analysis with Akaike and Bayesian Information Criteria', Neural Information Processing. 14th International Conference, ICONIP 2007, International Conference on Neural Information Processing, Springer-Verlag Berlin / Heidelberg, Kitakyushu, Japan, pp. 791-801.
From data mining to computer vision, from visual surveillance to biometrics research, from biomedical imaging to bioinformatics, and from multimedia retrieval to information management, a large amount of data are naturally represented by multidimensional arrays, i.e., tensors. However, conventional probabilistic graphical models with probabilistic inference only model data in vector format, although they are very important in many statistical problems, e.g., model selection. Is it possible to construct multilinear probabilistic graphical models for tensor format data to conduct probabilistic inference, e.g., model selection? This paper provides a positive answer based on the proposed decoupled probabilistic model by developing the probabilistic tensor analysis (PTA), which selects suitable model for tensor format data modeling based on Akaike information criterion (AIC) and Bayesian information criterion (BIC). Empirical studies demonstrate that PTA associated with AIC and BIC selects correct number of models.
Zhang, T., Tao, D. & Yang, J. 2008, 'Discriminative Locality Alignment', Computer Vision ECCV 2008, Proceedings Part I, European Conference on Computer Vision, Springer, Marseille, France, pp. 725-738.
Fisher's linear discriminant analysis (LDA), one of the most popular dimensionality reduction algorithms for classification, has three particular problems: it fails to find the nonlinear structure hidden in the high dimensional data; it assumes all samples contribute equivalently to reduce dimension for classification; and it suffers from the matrix singularity problem. In this paper, we propose a new algorithm, termed Discriminative Locality Alignment (DLA), to deal with these problems. The algorithm operates in the following three stages: first, in part optimization, discriminative information is imposed over patches, each of which is associated with one sample and its neighbors; then, in sample weighting, each part optimization is weighted by the margin degree, a measure of the importance of a given sample; and finally, in whole alignment, the alignment trick is used to align all weighted part optimizations to the whole optimization. Furthermore, DLA is extended to the semi-supervised case, i.e., semi-supervised DLA (SDLA), which utilizes unlabeled samples to improve the classification performance. Thorough empirical studies on the face recognition demonstrate the effectiveness of both DLA and SDLA.
Deng, C., Gao, X., Tao, D. & Li, X. 2007, 'Digital Watermarking In Image Affine Co-variant Regions', Proceedings Of 2007 International Conference On Machine Learning And Cybernetics, Vols 1-7, International Conference on Machine Learning and Cybernetics, IEEE, Hong Kong, China, pp. 2125-2130.
In this paper, we present a robust approach of digital watermarking embedding and retrieval for digital images. This new approach works in special domain and it has two major steps: (1) to extract affine co-variant regions, and (2) to embed watermarks wi
Tao, D., Li, X., Wu, X. & Maybank, S. 2007, 'General Averaged Divergence Analysis', Proceedings of the Seventh IEEE International Conference on Data Mining, IEEE International Conference on Data Mining, IEEE Computer Society, Omaha, Nebraska, pp. 302-311.
Subspace selection is a powerful tool in data mining. An important subspace method is the Fisher&acirc;Rao linear discriminant analysis (LDA), which has been successfully applied in many fields such as biometrics, bioinformatics, and multimedia retrieval. However, LDA has a critical drawback: the projection to a subspace tends to merge those classes that are close together in the original feature space. If the separated classes are sampled from Gaussian distributions, all with identical covariance matrices, then LDA maximizes the mean value of the Kullback&acirc;Leibler (KL) divergences between the different classes. We generalize this point of view to obtain a framework for choosing a subspace by 1) generalizing the KL divergence to the Bregman divergence and 2) generalizing the arithmetic mean to a general mean. The framework is named the general averaged divergence analysis (GADA). Under this GADA framework, a geometric mean divergence analysis (GMDA) method based on the geometric mean is studied. A large number of experiments based on synthetic data show that our method significantly outperforms LDA and several representative LDA extensions.
Tao, D., Li, X., Wu, X. & Maybank, S. 2006, 'Human Carrying Status in Visual Surveillance', Proceedings 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, New York, NY, USA, pp. 1670-1677.
A person&acirc;s gait changes when he or she is carrying an object such as a bag, suitcase or rucksack. As a result, human identification and tracking are made more difficult because the averaged gait image is too simple to represent the carrying status. Therefore, in this paper we first introduce a set of Gabor based human gait appearance models, because Gabor functions are similar to the receptive field profiles in the mammalian cortical simple cells. The very high dimensionality of the feature space makes training difficult. In order to solve this problem we propose a general tensor discriminant analysis (GTDA), which seamlessly incorporates the object (Gabor based human gait appearance model) structure information as a natural constraint. GTDA differs from the previous tensor based discriminant analysis methods in that the training converges. Existing methods fail to converge in the training stage. This makes them unsuitable for practical tasks. Experiments are carried out on the USF baseline data set to recognize a human&acirc;s ID from the gait silhouette. The proposed Gabor gait incorporated with GTDA is demonstrated to significantly outperform the existing appearance-based methods.
Sun, J., Tao, D. & Faloutsos, C. 2006, 'Beyond Streams and Graphs Dynamic Tensor Analysis', Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, International Conference on Knowledge Discovery and Data Mining, ACM Press, Philadelphia PA USA, pp. 374-383.
How do we find patterns in author-keyword associations, evolving over time? Or in DataCubes, with product-branchcustomer sales information? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks and many more. However, they have only two orders, like author and keyword, in the above example. We propose to envision such higher order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semi-infinite streams. Thus, we introduce the dynamic ten- sor analysis (DTA) method, and its variants. DTA provides a compact summary for high-order and high-dimensional data, and it also reveals the hidden correlations. Algorithmically, we designed DTA very carefully so that it is (a) scalable, (b) space efficient (it does not need to store the past) and (c) fully automatic with no need for user defined parameters. Moreover, we propose STA, a streaming tensor analysis method, which provides a fast, streaming approximation to DTA. We implemented all our methods, and applied them in two real settings, namely, anomaly detection and multi-way latent semantic indexing. We used two real, large datasets, one on network flow data (100GB over 1 month) and one from DBLP (200MB over 25 years). Our experiments show that our methods are fast, accurate and that they find interesting patterns and outliers on the real datasets.
Tao, D., Maybank, S., Hu, W. & Li, X. 2005, 'Stable Third-order Tensor Representation For Colour Image Classification', 2005 IEEE/WIC/ACM International Conference On Web Intelligence, Proceedings, IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Soc, Compiegne, France, pp. 641-644.
General tensors can represent colour images more naturally than conventional features: however the general tensors' stability properties are not reported and remain to be a key problem. In this paper, we use the tensor minimax probability (TMPM) to prove
Tao, D. & Tang, X. 2004, 'Random sampling based SVM for relevance feedback image retrieval', Proceedings Of The 2004 IEEE Computer Society Conference On Computer Vision And Pattern Recognition, Vol 2, Conference on Computer Vision and Pattern Recognition, IEEE Computer Soc, Washington, DC, pp. 647-652.
Relevance feedback (RF) schemes based on support vector machine (SVM) have been widely used in content-based image retrieval. However, the performance of SVM based RF is often poor when the number of labeled positive feedback samples is small. This is ma
Tao, D. & Tang, X. 2004, 'Orthogonal Complement Component Analysis For Positive Samples In Svm Based Relevance Feedback Image Retrieval', Proceedings Of The 2004 IEEE Computer Society Conference On Computer Vision And Pattern Recognition, Vol 2, Conference on Computer Vision and Pattern Recognition, IEEE Computer Soc, Washington, DC, pp. 586-591.
Relevance feedback (RF) is an important tool to improve the performance of content-based image retrieval system. Support vector machine (SVM based RF is popular because it can generalize better than most other classifiers. However, directly using SVM in

## Journal articles

Qiao, M., Liu, L., Yu, J., Xu, C. & Tao, D. 2017, 'Diversified dictionaries for multi-instance learning', Pattern Recognition, vol. 64, pp. 407-416.
&copy; 2016 Elsevier LtdMultiple-instance learning (MIL) has been a popular topic in the study of pattern recognition for years due to its usefulness for such tasks as drug activity prediction and image/text classification. In a typical MIL setting, a bag contains a bag-level label and more than one instance/pattern. How to bridge instance-level representations to bag-level labels is a key step to achieve satisfactory classification accuracy results. In this paper, we present a supervised learning method, diversified dictionaries MIL, to address this problem. Our approach, on the one hand, exploits bag-level label information for training class-specific dictionaries. On the other hand, it introduces a diversity regularizer into the class-specific dictionaries to avoid ambiguity between them. To the best of our knowledge, this is the first time that the diversity prior is introduced to solve the MIL problems. Experiments conducted on several benchmark (drug activity and image/text annotation) datasets show that the proposed method compares favorably to state-of-the-art methods.
Du, B., Zhang, M., Zhang, L., Hu, R. & Tao, D. 2017, 'PLTD: Patch-Based Low-Rank Tensor Decomposition for Hyperspectral Images', IEEE Transactions on Multimedia, vol. 19, no. 1, pp. 67-79.
&copy; 1999-2012 IEEE.Recent years has witnessed growing interest in hyperspectral image (HSI) processing. In practice, however, HSIs always suffer from huge data size and mass of redundant information, which hinder their application in many cases. HSI compression is a straightforward way of relieving these problems. However, most of the conventional image encoding algorithms mainly focus on the spatial dimensions, and they need not consider the redundancy in the spectral dimension. In this paper, we propose a novel HSI compression and reconstruction algorithm via patch-based low-rank tensor decomposition (PLTD). Instead of processing the HSI separately by spectral channel or by pixel, we represent each local patch of the HSI as a third-order tensor. Then, the similar tensor patches are grouped by clustering to form a fourth-order tensor per cluster. Since the grouped tensor is assumed to be redundant, each cluster can be approximately decomposed to a coefficient tensor and three dictionary matrices, which leads to a low-rank tensor representation of both the spatial and spectral modes. The reconstructed HSI can then be simply obtained by the product of the coefficient tensor and dictionary matrices per cluster. In this way, the proposed PLTD algorithm simultaneously removes the redundancy in both the spatial and spectral domains in a unified framework. The extensive experimental results on various public HSI datasets demonstrate that the proposed method outperforms the traditional image compression approaches and other tensor-based methods.
Wang, R. & Tao, D. 2016, 'Recent Progress in Image Deblurring'.
This paper comprehensively reviews the recent development of image deblurring, including non-blind/blind, spatially invariant/variant deblurring techniques. Indeed, these techniques share the same objective of inferring a latent sharp image from one or several corresponding blurry images, while the blind deblurring techniques are also required to derive an accurate blur kernel. Considering the critical role of image restoration in modern imaging systems to provide high-quality images under complex environments such as motion, undesirable lighting conditions, and imperfect system components, image deblurring has attracted growing attention in recent years. From the viewpoint of how to handle the ill-posedness which is a crucial issue in deblurring tasks, existing methods can be grouped into five categories: Bayesian inference framework, variational methods, sparse representation-based methods, homography-based modeling, and region-based methods. In spite of achieving a certain level of development, image deblurring, especially the blind case, is limited in its success by complex application conditions which make the blur kernel hard to obtain and be spatially variant. We provide a holistic understanding and deep insight into image deblurring in this review. An analysis of the empirical evidence for representative methods, practical issues, as well as a discussion of promising future directions are also presented.
Xu, C., Liu, T., Tao, D. & Xu, C. 2016, 'Local Rademacher Complexity for Multi-label Learning', IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1495-1507.
We analyze the local Rademacher complexity of empirical risk minimization (ERM)-based multi-label learning algorithms, and in doing so propose a new algorithm for multi-label learning. Rather than using the trace norm to regularize the multi-label predictor, we instead minimize the tail sum of the singular values of the predictor in multi-label learning. Benefiting from the use of the local Rademacher complexity, our algorithm, therefore, has a sharper generalization error bound and a faster convergence rate. Compared to methods that minimize over all singular values, concentrating on the tail singular values results in better recovery of the low-rank structure of the multi-label predictor, which plays an import role in exploiting label correlations. We propose a new conditional singular value thresholding algorithm to solve the resulting objective function. Empirical studies on real-world datasets validate our theoretical results and demonstrate the effectiveness of the proposed algorithm.
Liu, T. & Tao, D. 2016, 'Classification with Noisy Labels by Importance Reweighting', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 3, pp. 447-461.
&copy; 1979-2012 IEEE. In this paper, we study a classification problem in which sample labels are randomly corrupted. In this scenario, there is an unobservable sample with noise-free labels. However, before being observed, the true labels are independently flipped with a probability in [0,0.5) , and the random label noise can be class-conditional. Here, we address two fundamental problems raised by this scenario. The first is how to best use the abundant surrogate loss functions designed for the traditional classification problem when there is label noise. We prove that any surrogate loss function can be used for classification with noisy labels by using importance reweighting, with consistency assurance that the label noise does not ultimately hinder the search for the optimal classifier of the noise-free sample. The other is the open problem of how to obtain the noise rate . We show that the rate is upper bounded by the conditional probability P(Y|X) of the noisy sample. Consequently, the rate can be estimated, because the upper bound can be easily reached in classification problems. Experimental results on synthetic and real datasets confirm the efficiency of our methods.
Chua, T.S., He, X., Liu, W., Piccardi, M., Wen, Y. & Tao, D. 2016, 'Big data meets multimedia analytics', Signal Processing, vol. 124, pp. 1-4.
Deng, J., Liu, Q., Yang, J. & Tao, D. 2016, 'M3 CSR: Multi-view, multi-scale and multi-component cascade shape regression', Image and Vision Computing, vol. 47, pp. 19-26.
&copy; 2015 Elsevier B.V. Automatic face alignment is a fundamental step in facial image analysis. However, this problem continues to be challenging due to the large variability of expression, illumination, occlusion, pose, and detection drift in the real-world face images. In this paper, we present a multi-view, multi-scale and multi-component cascade shape regression (M 3CSR) model for robust face alignment. Firstly, face view is estimated according to the deformable facial parts for learning view specified CSR, which can decrease the shape variance, alleviate the drift of face detection and accelerate shape convergence. Secondly, multi-scale HoG features are used as the shape-index features to incorporate local structure information implicitly, and a multi-scale optimization strategy is adopted to avoid trapping in local optimum. Finally, a component-based shape refinement process is developed to further improve the performance of face alignment. Extensive experiments on the IBUG dataset and the 300-W challenge dataset demonstrate the superiority of the proposed method over the state-of-the-art methods.
Zheng, H., Geng, X., Tao, D. & Jin, Z. 2016, 'A multi-task model for simultaneous face identification and facial expression recognition', Neurocomputing, vol. 171, pp. 515-523.
&copy; 2015 Elsevier B.V. Regarded as two independent tasks, both face identification and facial expression recognition perform poorly given small size training sets. To address this problem, we propose a multi-task facial inference model (MT-FIM) for simultaneous face identification and facial expression recognition. In particular, face identification and facial expression recognition are learnt simultaneously by extracting and utilizing appropriate shared information across them in the framework of multi-task learning, in which the shared information refers to the parameter controlling the sparsity. MT-FIM simultaneously minimizes the within-class scatter and maximizes the distance between different classes to enable the robust performance of each individual task. We conduct comprehensive experiments on three face image databases. The experimental results show that our algorithm outperforms the state-of-the-art algorithms.
Gui, J., Liu, T., Tao, D., Sun, Z. & Tan, T. 2016, 'Representative Vector Machines: A Unified Framework for Classical Classifiers', IEEE Transactions on Cybernetics, vol. 46, no. 8, pp. 1877-1888.
Classifier design is a fundamental problem in pattern recognition. A variety of pattern classification methods such as the nearest neighbor (NN) classifier, support vector machine (SVM), and sparse representation-based classification (SRC) have been proposed in the literature. These typical and widely used classifiers were originally developed from different theory or application motivations and they are conventionally treated as independent and specific solutions for pattern classification. This paper proposes a novel pattern classification framework, namely, representative vector machines (or RVMs for short). The basic idea of RVMs is to assign the class label of a test example according to its nearest representative vector. The contributions of RVMs are twofold. On one hand, the proposed RVMs establish a unified framework of classical classifiers because NN, SVM, and SRC can be interpreted as the special cases of RVMs with different definitions of representative vectors. Thus, the underlying relationship among a number of classical classifiers is revealed for better understanding of pattern classification. On the other hand, novel and advanced classifiers are inspired in the framework of RVMs. For example, a robust pattern classification method called discriminant vector machine (DVM) is motivated from RVMs. Given a test example, DVM first finds its k-NNs and then performs classification based on the robust M-estimator and manifold regularization. Extensive experimental evaluations on a variety of visual recognition tasks such as face recognition (Yale and face recognition grand challenge databases), object categorization (Caltech-101 dataset), and action recognition (Action Similarity LAbeliNg) demonstrate the advantages of DVM over other classifiers.
Liu, T. & Tao, D. 2016, 'On the Performance of Manhattan Nonnegative Matrix Factorization', IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 9, pp. 1851-1863.
Extracting low-rank and sparse structures from matrices has been extensively studied in machine learning, compressed sensing, and conventional signal processing, and has been widely applied to recommendation systems, image reconstruction, visual analytics, and brain signal processing. Manhattan nonnegative matrix factorization (MahNMF) is an extension of the conventional NMF, which models the heavy-tailed Laplacian noise by minimizing the Manhattan distance between a nonnegative matrix X and the product of two nonnegative low-rank factor matrices. Fast algorithms have been developed to restore the low-rank and sparse structures of X in the MahNMF. In this paper, we study the statistical performance of the MahNMF in the frame of the statistical learning theory. We decompose the expected reconstruction error of the MahNMF into the estimation error and the approximation error. The estimation error is bounded by the generalization error bounds of the MahNMF, while the approximation error is analyzed using the asymptotic results of the minimum distortion of vector quantization. The generalization error bound is valuable for determining the size of the training sample needed to guarantee a desirable upper bound for the defect between the expected and empirical reconstruction errors. Statistical performance analysis shows how the reduced dimensionality affects the estimation and approximation errors. Our framework can also be used for analyzing the performance of the NMF.
Wang, S., Tao, D. & Yang, J. 2016, 'Relative Attribute SVM+ Learning for Age Estimation', IEEE Transactions on Cybernetics, vol. 46, no. 3, pp. 827-839.
When estimating age, human experts can provide privileged information that encodes the facial attributes of aging, such as smoothness, face shape, face acne, wrinkles, and bags under-eyes. In automatic age estimation, privileged information is unavailable to test images. To overcome this problem, we hypothesize that asymmetric information can be explored and exploited to improve the generalizability of the trained model. Using the {learning using privileged information} (LUPI) framework, we tested this hypothesis by carefully defining relative attributes for support vector machine (SVM+) to improve the performance of age estimation. We term this specific setting as relative attribute SVM+ (raSVM+), in which the privileged information enables separation of outliers from inliers at the training stage and effectively manipulates slack variables and age determination errors during model training, and thus guides the trained predictor toward a generalizable solution. Experimentally, the superiority of raSVM+ was confirmed by comparing it with state-of-the-art algorithms on the face and gesture recognition research network (FG-NET) and craniofacial longitudinal morphological face aging databases. raSVM+ is a promising development that improves age estimation, with the mean absolute error reaching 4.07 on FG-NET.
Ding, C., Choi, J., Tao, D. & Davis, L.S. 2016, 'Multi-Directional Multi-Level Dual-Cross Patterns for Robust Face Recognition', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 3, pp. 518-531.
&copy; 1979-2012 IEEE. To perform unconstrained face recognition robust to variations in illumination, pose and expression, this paper presents a new scheme to extract 'Multi-Directional Multi-Level Dual-Cross Patterns' (MDML-DCPs) from face images. Specifically, the MDML-DCPs scheme exploits the first derivative of Gaussian operator to reduce the impact of differences in illumination and then computes the DCP feature at both the holistic and component levels. DCP is a novel face image descriptor inspired by the unique textural structure of human faces. It is computationally efficient and only doubles the cost of computing local binary patterns, yet is extremely robust to pose and expression variations. MDML-DCPs comprehensively yet efficiently encodes the invariant characteristics of a face image from multiple levels into patterns that are highly discriminative of inter-personal differences but robust to intra-personal variations. Experimental results on the FERET, CAS-PERL-R1, FRGC 2.0, and LFW databases indicate that DCP outperforms the state-of-the-art local descriptors (e.g., LBP, LTP, LPQ, POEM, tLBP, and LGXP) for both face identification and face verification tasks. More impressively, the best performance is achieved on the challenging LFW and FRGC 2.0 databases by deploying MDML-DCPs in a simple recognition scheme.
Ding, C. & Tao, D. 2016, 'A comprehensive survey on Pose-Invariant Face Recognition', ACM Transactions on Intelligent Systems and Technology, vol. 7, no. 3.
&copy; 2016 ACM. The capacity to recognize faces under varied poses is a fundamental human ability that presents a unique challenge for computer vision systems. Compared to frontal face recognition, which has been intensively studied and has gradually matured in the past few decades, Pose-Invariant Face Recognition (PIFR) remains a largely unsolved problem. However, PIFR is crucial to realizing the full potential of face recognition for real-world applications, since face recognition is intrinsically a passive biometric technology for recognizing uncooperative subjects. In this article, we discuss the inherent difficulties in PIFR and present a comprehensive review of established techniques. Existing PIFR methods can be grouped into four categories, that is, pose-robust feature extraction approaches, multiview subspace learning approaches, face synthesis approaches, and hybrid approaches. The motivations, strategies, pros/cons, and performance of representative approaches are described and compared. Moreover, promising directions for future research are discussed.
Li, X., Liu, T., Deng, J. & Tao, D. 2016, 'Video face editing using temporal-spatial-smooth warping', ACM Transactions on Intelligent Systems and Technology, vol. 7, no. 3.
&copy; 2016 ACM 2157-6904/2016/02-ART32 \$15.00. Editing faces in videos is a popular yet challenging task in computer vision and graphics that encompasses various applications, including facial attractiveness enhancement, makeup transfer, face replacement, and expression manipulation. Directly applying the existing warping methods to video face editing has the major problem of temporal incoherence in the synthesized videos, which cannot be addressed by simply employing face tracking techniques or manual interventions, as it is difficult to eliminate the subtly temporal incoherence of the facial feature point localizations in a video sequence. In this article, we propose a temporal-spatial-smooth warping (TSSW) method to achieve a high temporal coherence for video face editing. TSSW is based on two observations: (1) the control lattices are critical for generating warping surfaces and achieving the temporal coherence between consecutive video frames, and (2) the temporal coherence and spatial smoothness of the control lattices can be simultaneously and effectively preserved. Based upon these observations, we impose the temporal coherence constraint on the control lattices on two consecutive frames, as well as the spatial smoothness constraint on the control lattice on the current frame. TSSW calculates the control lattice (in either the horizontal or vertical direction) by updating the control lattice (in the corresponding direction) on its preceding frame, i.e., minimizing a novel energy function that unifies a data-driven term, a smoothness term, and feature point constraints. The contributions of this article are twofold: (1) we develop TSSW, which is robust to the subtly temporal incoherence of the facial feature point localizations and is effective to preserve the temporal coherence and spatial smoothness of the control lattices for editing faces in videos, and (2) we present a new unified video face editing framework that is capable for improving the performances...
Cai, B., Xu, X., Jia, K., Qing, C. & Tao, D. 2016, 'DehazeNet: An end-to-end system for single image haze removal', IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5187-5198.
&copy; 1992-2012 IEEE.Single image haze removal is a challenging ill-posed problem. Existing methods use various constraints/priors to get plausible dehazing solutions. The key to achieve haze removal is to estimate a medium transmission map for an input hazy image. In this paper, we propose a trainable end-to-end system called DehazeNet, for medium transmission estimation. DehazeNet takes a hazy image as input, and outputs its medium transmission map that is subsequently used to recover a haze-free image via atmospheric scattering model. DehazeNet adopts convolutional neural network-based deep architecture, whose layers are specially designed to embody the established assumptions/priors in image dehazing. Specifically, the layers of Maxout units are used for feature extraction, which can generate almost all haze-relevant features. We also propose a novel nonlinear activation function in DehazeNet, called bilateral rectified linear unit, which is able to improve the quality of recovered haze-free image. We establish connections between the components of the proposed DehazeNet and those used in existing methods. Experiments on benchmark images show that DehazeNet achieves superior performance over existing methods, yet keeps efficient and easy to use.
Ren, W., Huang, K., Tao, D. & Tan, T. 2016, 'Weakly Supervised Large Scale Object Localization with Multiple Instance Learning and Bag Splitting.', IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 405-416.
Localizing objects of interest in images when provided with only image-level labels is a challenging visual recognition task. Previous efforts have required carefully designed features and have difficulty in handling images with cluttered backgrounds. Up-scaling to large datasets also poses a challenge to applying these methods to real applications. In this paper, we propose an efficient and effective learning framework called MILinear, which is able to learn an object localization model from large-scale data without using bounding box annotations. We integrate rich general prior knowledge into a learning model using a large pre-trained convolutional network. Moreover, to reduce ambiguity in positive images, we present a bag-splitting algorithm that iteratively generates new negative bags from positive ones. We evaluate the proposed approach on the challenging Pascal VOC 2007 dataset, and our method outperforms other state-of-the-art methods by a large margin; some results are even comparable to fully supervised models trained with bounding box annotations. To further demonstrate scalability, we also present detection results on the ILSVRC 2013 detection dataset, and our method outperforms supervised deformable part-based model without using box annotations.
Liu, J., Su, H., Hu, W., Zhang, L. & Tao, D. 2016, 'A minimal Munsell value error based laser printer model', Neurocomputing, vol. 204, pp. 231-239.
The image printed by laser printer may be nonlinearly distorted by dot gain and dot loss. In this case, printer model is usually built to suppress this nonlinear distortion and to make sure the printout result is the same as input image. The parameters of the printer model which will directly affect the printout result are key. In this paper, the chroma or density values of printout result is changed into Munsell value, and optimal parameters of printer model are calculated via the calculation of minimal error between Munsell value and input gray value. And then the minimal-Munsell-value-error-based laser printer model (MMVEBLPM) is established and applied in the green noise halftone method. Experimental results showed that the optimal parameters can be calculated fast and the nonlinear distortion of laser printer is suppressed significantly with the proposed model.
Gong, C., Tao, D., Liu, W., Liu, L. & Yang, J. 2016, 'Label Propagation via Teaching-to-Learn and Learning-to-Teach.', IEEE transactions on neural networks and learning systems.
How to propagate label information from labeled examples to unlabeled examples over a graph has been intensively studied for a long time. Existing graph-based propagation algorithms usually treat unlabeled examples equally, and transmit seed labels to the unlabeled examples that are connected to the labeled examples in a neighborhood graph. However, such a popular propagation scheme is very likely to yield inaccurate propagation, because it falls short of tackling ambiguous but critical data points (e.g., outliers). To this end, this paper treats the unlabeled examples in different levels of difficulties by assessing their reliability and discriminability, and explicitly optimizes the propagation quality by manipulating the propagation sequence to move from simple to difficult examples. In particular, we propose a novel iterative label propagation algorithm in which each propagation alternates between two paradigms, teaching-to-learn and learning-to-teach (TLLT). In the teaching-to-learn step, the learner conducts the propagation on the simplest unlabeled examples designated by the teacher. In the learning-to-teach step, the teacher incorporates the learner's feedback to adjust the choice of the subsequent simplest examples. The proposed TLLT strategy critically improves the accuracy of label propagation, making our algorithm substantially robust to the values of tuning parameters, such as the Gaussian kernel width used in graph construction. The merits of our algorithm are theoretically justified and empirically demonstrated through experiments performed on both synthetic and real-world data sets.
Liu, X., Deng, C., Lang, B., Tao, D. & Li, X. 2016, 'Query-Adaptive Reciprocal Hash Tables for Nearest Neighbor Search', IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 907-919.
&copy; 1992-2012 IEEE. Recent years have witnessed the success of binary hashing techniques in approximate nearest neighbor search. In practice, multiple hash tables are usually built using hashing to cover more desired results in the hit buckets of each table. However, rare work studies the unified approach to constructing multiple informative hash tables using any type of hashing algorithms. Meanwhile, for multiple table search, it also lacks of a generic query-adaptive and fine-grained ranking scheme that can alleviate the binary quantization loss suffered in the standard hashing techniques. To solve the above problems, in this paper, we first regard the table construction as a selection problem over a set of candidate hash functions. With the graph representation of the function set, we propose an efficient solution that sequentially applies normalized dominant set to finding the most informative and independent hash functions for each table. To further reduce the redundancy between tables, we explore the reciprocal hash tables in a boosting manner, where the hash function graph is updated with high weights emphasized on the misclassified neighbor pairs of previous hash tables. To refine the ranking of the retrieved buckets within a certain Hamming radius from the query, we propose a query-adaptive bitwise weighting scheme to enable fine-grained bucket ranking in each hash table, exploiting the discriminative power of its hash functions and their complement for nearest neighbor search. Moreover, we integrate such scheme into the multiple table search using a fast, yet reciprocal table lookup algorithm within the adaptive weighted Hamming radius. In this paper, both the construction method and the query-adaptive search method are general and compatible with different types of hashing algorithms using different feature spaces and/or parameter settings. Our extensive experiments on several large-scale benchmarks demonstrate that the proposed techniques can significan...
Liu, T., Gong, M. & Tao, D. 2016, 'Large-Cone Nonnegative Matrix Factorization', IEEE Transactions on Neural Networks and Learning Systems.
Nonnegative matrix factorization (NMF) has been greatly popularized by its parts-based interpretation and the effective multiplicative updating rule for searching local solutions. In this paper, we study the problem of how to obtain an attractive local solution for NMF, which not only fits the given training data well but also generalizes well on the unseen test data. Based on the geometric interpretation of NMF, we introduce two large-cone penalties for NMF and propose large-cone NMF (LCNMF) algorithms. Compared with NMF, LCNMF will obtain bases comprising a larger simplicial cone, and therefore has three advantages. 1) the empirical reconstruction error of LCNMF could mostly be smaller; (2) the generalization ability of the proposed algorithm is much more powerful; and (3) the obtained bases of LCNMF have a low-overlapping property, which enables the bases to be sparse and makes the proposed algorithms very robust. Experiments on synthetic and real-world data sets confirm the efficiency of LCNMF.
You, X., Guo, W., Yu, S., Li, K., Principe, J.C. & Tao, D. 2016, 'Kernel Learning for Dynamic Texture Synthesis', IEEE Transactions on Image Processing, vol. 25, no. 10, pp. 4782-4795.
&copy; 2016 IEEE.Dynamic textures (DTs) that represent moving scenes such as flames, smoke, and waves, exhibit fixed dynamics within a period of time and have been successfully modeled using linear dynamic systems (LDS). In this paper, we show that the widely used LDS model can be approximated using a principal component regression (PCR) model with the main advantage of simplicity. Furthermore, to capture the nonlinearity of training frames, we extend traditional PCR to its kernelized version and introduce kernel principal component regression (KPCR) to model and synthesize DTs. To ensure algorithm stability, we remove the standard state model and directly apply the quantized kernel least mean squares algorithm from signal processing domain to approximate the performance achieved with KPCR. We term this improvement kernel adaptive dynamic texture synthesis (KADTS), which also has the benefits of computational and memory efficiency. These advantages make KADTS ideally suited for real-world applications, since the majority of electronic devices, including cell phones and laptops, suffer from limited memory and real-time constraints. We demonstrate, via both theoretical and experimental analyses, the connections between DT synthesis using KPCR and KADTS with a regularization network theory. We also show the superiority of our proposed algorithms for DT synthesis compared with other dynamic system-based benchmarks. MATLAB code is available from our project homepage http://bmal.hust.edu.cn/project/dts.html.
Xie, L., Tao, D. & Wei, H. 2016, 'Joint structured sparsity regularized multiview dimension reduction for video-based facial expression recognition', ACM Transactions on Intelligent Systems and Technology, vol. 8, no. 2.
&copy; 2016 ACM.Video-based facial expression recognition (FER) has recently received increased attention as a result of its widespread application. Using only one type of feature to describe facial expression in video sequences is often inadequate, because the information available is very complex. With the emergence of different features to represent different properties of facial expressions in videos, an appropriate combination of these features becomes an important, yet challenging, problem. Considering that the dimensionality of these features is usually high, we thus introduce multiview dimension reduction (MVDR) into video-based FER. In MVDR, it is critical to explore the relationships between and within different feature views. To achieve this goal, we propose a novel framework of MVDR by enforcing joint structured sparsity at both inter- and intraview levels. In this way, correlations on and between the feature spaces of different views tend to be well-exploited. In addition, a transformation matrix is learned for each view to discover the patterns contained in the original features, so that the different views are comparable in finding a common representation. The model can be not only performed in an unsupervised manner, but also easily extended to a semisupervised setting by incorporating some domain knowledge. An alternating algorithm is developed for problem optimization, and each subproblem can be efficiently solved. Experiments on two challenging video-based FER datasets demonstrate the effectiveness of the proposed framework.
Xie, Y., Qu, Y., Tao, D., Wu, W., Yuan, Q. & Zhang, W. 2016, 'Hyperspectral Image Restoration via Iteratively Regularized Weighted Schatten p-Norm Minimization', IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4642-4659.
Cao, X., Han, J., Yang, S., Tao, D. & Jiao, L. 2016, 'Band selection and evaluation with spatial information', International Journal of Remote Sensing, vol. 37, no. 19, pp. 4501-4520.
Dong, Y., Du, B., Zhang, L., Zhang, L. & Tao, D. 2016, 'LAM3L: Locally adaptive maximum margin metric learning for visual data classification', Neurocomputing.
&copy; 2016.Visual data classification, which is aimed at determining a unique label for each class, is an increasingly important issue in the machine learning community. In recent years, increasing attention has been paid to the application of metric learning for classification, which has been proven to be a good way to obtain a promising performance. However, as a result of the limited training samples and data with complex distributions, the vast majority of these algorithms usually fail to perform well. This has motivated us to develop a novel locally adaptive maximum margin metric learning (LAM3L) algorithm in order to maximally separate similar and dissimilar classes, based on the changes between the distances before and after the maximum margin metric learning. The experimental results on two widely used UCI datasets and a real hyperspectral dataset demonstrate that the proposed method outperforms the state-of-the-art metric learning methods.
Ding, C. & Tao, D. 2016, 'Pose-invariant face recognition with homography-based normalization', Pattern Recognition.
&copy; 2016 Elsevier Ltd.Pose-invariant face recognition (PIFR) refers to the ability that recognizes face images with arbitrary pose variations. Among existing PIFR algorithms, pose normalization has been proved to be an effective approach which preserves texture fidelity, but usually depends on precise 3D face models or at high computational cost. In this paper, we propose an highly efficient PIFR algorithm that effectively handles the main challenges caused by pose variation. First, a dense grid of 3D facial landmarks are projected to each 2D face image, which enables feature extraction in an pose adaptive manner. Second, for the local patch around each landmark, an optimal warp is estimated based on homography to correct texture deformation caused by pose variations. The reconstructed frontal-view patches are then utilized for face recognition with traditional face descriptors. The homography-based normalization is highly efficient and the synthesized frontal face images are of high quality. Finally, we propose an effective approach for occlusion detection, which enables face recognition with visible patches only. Therefore, the proposed algorithm effectively handles the main challenges in PIFR. Experimental results on four popular face databases demonstrate that the proposed approach performs well on both constrained and unconstrained environments.
Zhao, L., Gao, X., Tao, D. & Li, X. 2015, 'A deep structure for human pose estimation', Signal Processing, vol. 108, pp. 36-45.
&copy; 2014 Elsevier B.V. Articulated human pose estimation in unconstrained conditions is a great challenge. We propose a deep structure that represents a human body in different granularity from coarse-to-fine for better detecting parts and describing spatial constrains between different parts. Typical approaches for this problem just utilize a single level structure, which is difficult to capture various body appearances and hard to model high-order part dependencies. In this paper, we build a three layer Markov network to model the body structure that separates the whole body to poselets (combined parts) then to parts representing joints. Parts at different levels are connected through a parent-child relationship to represent high-order spatial relationships. Unlike other multi-layer models, our approach explores more reasonable granularity for part detection and sophisticatedly designs part connections to model body configurations more effectively. Moreover, each part in our model contains different types so as to capture a wide range of pose modes. And our model is a tree structure, which can be trained jointly and favors exact inference. Extensive experimental results on two challenging datasets show the performance of our model improving or being on-par with state-of-the-art approaches.
Zhang, L., Zhang, L., Tao, D., Huang, X. & Du, B. 2015, 'Compression of hyperspectral remote sensing images by tensor approach', Neurocomputing, vol. 147, pp. 358-363.
Whereas the transform coding algorithms have been proved to be efficient and practical for grey-level and color images compression, they could not directly deal with the hyperspectral images (HSI) by simultaneously considering both the spatial and spectral domains of the data cube. The aim of this paper is to present an HSI compression and reconstruction method based on the multi-dimensional or tensor data processing approach. By representing the observed hyperspectral image cube to a 3-order-tensor, we introduce a tensor decomposition technology to approximately decompose the original tensor data into a core tensor multiplied by a factor matrix along each mode. Thus, the HSI is compressed to the core tensor and could be reconstructed by the multi-linear projection via the factor matrices. Experimental results on particular applications of hyperspectral remote sensing images such as unmixing and detection suggest that the reconstructed data by the proposed approach significantly preserves the HSI's data quality in several aspects. &copy; 2014 Elsevier B.V.
Luo, Y., Tao, D., Ramamohanarao, K., Xu, C. & Wen, Y. 2015, 'Tensor Canonical Correlation Analysis for Multi-View Dimension Reduction', IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 11, pp. 3111-3124.
&copy; 2015 IEEE. Canonical correlation analysis (CCA) has proven an effective tool for two-view dimension reduction due to its profound theoretical foundation and success in practical applications. In respect of multi-view learning, however, it is limited by its capability of only handling data represented by two-view features, while in many real-world applications, the number of views is frequently many more. Although the ad hoc way of simultaneously exploring all possible pairs of features can numerically deal with multi-view data, it ignores the high order statistics (correlation information) which can only be discovered by simultaneously exploring all features. Therefore, in this work, we develop tensor CCA (TCCA) which straightforwardly yet naturally generalizes CCA to handle the data of an arbitrary number of views by analyzing the covariance tensor of the different views. TCCA aims to directly maximize the canonical correlation of multiple (more than two) views. Crucially, we prove that the main problem of multi-view canonical correlation maximization is equivalent to finding the best rank-1 approximation of the data covariance tensor, which can be solved efficiently using the well-known alternating least squares (ALS) algorithm. As a consequence, the high order correlation information contained in the different views is explored and thus a more reliable common subspace shared by all features can be obtained. In addition, a non-linear extension of TCCA is presented. Experiments on various challenge tasks, including large scale biometric structure prediction, internet advertisement classification, and web image annotation, demonstrate the effectiveness of the proposed method.
Ding, C., Xu, C. & Tao, D. 2015, 'Multi-task pose-invariant face recognition', IEEE Transactions on Image Processing, vol. 24, no. 3, pp. 980-993.
Face images captured in unconstrained environments usually contain significant pose variation, which dramatically degrades the performance of algorithms designed to recognize frontal faces. This paper proposes a novel face identification framework capable of handling the full range of pose variations within &plusmn;90&deg; of yaw. The proposed framework first transforms the original pose-invariant face recognition problem into a partial frontal face recognition problem. A robust patch-based face representation scheme is then developed to represent the synthesized partial frontal faces. For each patch, a transformation dictionary is learnt under the proposed multitask learning scheme. The transformation dictionary transforms the features of different poses into a discriminative subspace. Finally, face matching is performed at patch level rather than at the holistic level. Extensive and systematic experimentation on FERET, CMU-PIE, and Multi-PIE databases shows that the proposed method consistently outperforms single-task-based baselines as well as state-of-the-art methods for the pose problem. We further extend the proposed algorithm for the unconstrained face verification problem and achieve top-level performance on the challenging LFW data set.
Zeng, X., Bian, W., Liu, W., Shen, J. & Tao, D. 2015, 'Dictionary Pair Learning on Grassmann Manifolds for Image Denoising', IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4556-4569.
&copy; 2015 IEEE. Image denoising is a fundamental problem in computer vision and image processing that holds considerable practical importance for real-world applications. The traditional patch-based and sparse coding-driven image denoising methods convert 2D image patches into 1D vectors for further processing. Thus, these methods inevitably break down the inherent 2D geometric structure of natural images. To overcome this limitation pertaining to the previous image denoising methods, we propose a 2D image denoising model, namely, the dictionary pair learning (DPL) model, and we design a corresponding algorithm called the DPL on the Grassmann-manifold (DPLG) algorithm. The DPLG algorithm first learns an initial dictionary pair (i.e., the left and right dictionaries) by employing a subspace partition technique on the Grassmann manifold, wherein the refined dictionary pair is obtained through a sub-dictionary pair merging. The DPLG obtains a sparse representation by encoding each image patch only with the selected sub-dictionary pair. The non-zero elements of the sparse representation are further smoothed by the graph Laplacian operator to remove the noise. Consequently, the DPLG algorithm not only preserves the inherent 2D geometric structure of natural images but also performs manifold smoothing in the 2D sparse coding space. We demonstrate that the DPLG algorithm also improves the structural SIMilarity values of the perceptual visual quality for denoised images using the experimental evaluations on the benchmark images and Berkeley segmentation data sets. Moreover, the DPLG also produces the competitive peak signal-to-noise ratio values from popular image denoising algorithms.
Ding, C. & Tao, D. 2015, 'Robust Face Recognition via Multimodal Deep Face Representation', IEEE Transactions on Multimedia, vol. 17, no. 11, pp. 2049-2058.
&copy; 2015 IEEE. Face images appearing in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All of the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the reproducibility of this work. Using the proposed single CNN architecture and limited training data, 98.43% verification rate is achieved on the LFW database. Benefitting from the complementary information contained in multimodal data, our small ensemble system achieves higher than 99.0% recognition rate on LFW using publicly available training set.
He, X., Luo, S., Tao, D., Xu, C. & Yang, J. 2015, 'The 21st International Conference on MultiMedia Modeling', IEEE Multimedia, vol. 22, no. 2, pp. 86-88.
&copy; 2015 IEEE. This report on The 21st International Conference on MultiMedia Modeling provides an overview of the best papers and keynote presentations. It also reviews the special sessions on Personal (Big) Data Modeling for Information Access and Retrieval; Social Geo-Media Analytics and Retrieval; and Image or Video Processing, Semantic Analysis, and Understanding.
Ou, W., You, X., Tao, D., Zhang, P., Tang, Y. & Zhu, Z. 2014, 'Robust face recognition via occlusion dictionary learning', Pattern Recognition, vol. 47, no. 4, pp. 1559-1572.
Sparse representation based classification (SRC) has recently been proposed for robust face recognition. To deal with occlusion, SRC introduces an identity matrix as an occlusion dictionary on the assumption that the occlusion has sparse representation in this dictionary. However, the results show that SRC's use of this occlusion dictionary is not nearly as robust to large occlusion as it is to random pixel corruption. In addition, the identity matrix renders the expanded dictionary large, which results in expensive computation. In this paper, we present a novel method, namely structured sparse representation based classification (SSRC), for face recognition with occlusion. A novel structured dictionary learning method is proposed to learn an occlusion dictionary from the data instead of an identity matrix. Specifically, a mutual incoherence of dictionaries regularization term is incorporated into the dictionary learning objective function which encourages the occlusion dictionary to be as independent as possible of the training sample dictionary. So that the occlusion can then be sparsely represented by the linear combination of the atoms from the learned occlusion dictionary and effectively separated from the occluded face image. The classification can thus be efficiently carried out on the recovered non-occluded face images and the size of the expanded dictionary is also much smaller than that used in SRC. The extensive experiments demonstrate that the proposed method achieves better results than the existing sparse representation based face recognition methods, especially in dealing with large region contiguous occlusion and severe illumination variation, while the computational cost is much lower.
Liu, W., Tao, D., Cheng, J. & Tang, Y. 2014, 'Multiview Hessian discriminative sparse coding for image annotation', Computer Vision and Image Understanding, vol. 118, pp. 50-60.
Sparse coding represents a signal sparsely by using an overcomplete dictionary, and obtains promising performance in practical computer vision applications, especially for signal restoration tasks such as image denoising and image inpainting. In recent years, many discriminative sparse coding algorithms have been developed for classification problems, but they cannot naturally handle visual data represented by multiview features. In addition, existing sparse coding algorithms use graph Laplacian to model the local geometry of the data distribution. It has been identified that Laplacian regularization biases the solution towards a constant function which possibly leads to poor extrapolating power. In this paper, we present multiview Hessian discriminative sparse coding (mHDSC) which seamlessly integrates Hessian regularization with discriminative sparse coding for multiview learning problems. In particular, mHDSC exploits Hessian regularization to steer the solution which varies smoothly along geodesics in the manifold, and treats the label information as an additional view of feature for incorporating the discriminative power for image annotation. We conduct extensive experiments on PASCAL VOC07 dataset and demonstrate the effectiveness of mHDSC for image annotation.
Qiao, M., Cheng, J., Bian, W. & Tao, D. 2014, 'Biview Learning for Human Posture Segmentation from 3D Points Cloud', PLoS One, vol. 9, no. 1, p. e85811.
Posture segmentation plays an essential role in human motion analysis. The state-of-the-art method extracts sufficiently high-dimensional features from 3D depth images for each 3D point and learns an efficient body part classifier. However, high-dimensional features are memory-consuming and difficult to handle on large-scale training dataset. In this paper, we propose an efficient two-stage dimension reduction scheme, termed biview learning, to encode two independent views which are depth-difference features (DDF) and relative position features (RPF). Biview learning explores the complementary property of DDF and RPF, and uses two stages to learn a compact yet comprehensive low-dimensional feature space for posture segmentation. In the first stage, discriminative locality alignment (DLA) is applied to the high-dimensional DDF to learn a discriminative low-dimensional representation. In the second stage, canonical correlation analysis (CCA) is used to explore the complementary property of RPF and the dimensionality reduced DDF. Finally, we train a support vector machine (SVM) over the output of CCA. We carefully validate the effectiveness of DLA and CCA utilized in the two-stage scheme on our 3D human points cloud dataset. Experimental results show that the proposed biview learning scheme significantly outperforms the state-of-the-art method for human posture segmentation.
You, X., Wang, R. & Tao, D. 2014, 'Diverse Expected Gradient Active Learning for Relative Attributes', IEEE Transactions On Image Processing, vol. 23, no. 7, pp. 3203-3217.
Lou, Y., Liu, T. & Tao, D. 2014, 'Decomposition-Based Transfer Distance Metric Learning for Image Classification', IEEE Transactions On Image Processing, vol. 23, no. 9, pp. 3789-3801.
Liu, W., Li, Y., Lin, X., Tao, D. & Wang, Y. 2014, 'Hessian-regularized co-training for social activity recognition.', PloS one, vol. 9, no. 9, p. e108474.
Co-training is a major multi-view learning paradigm that alternately trains two classifiers on two distinct views and maximizes the mutual agreement on the two-view unlabeled data. Traditional co-training algorithms usually train a learner on each view separately and then force the learners to be consistent across views. Although many co-trainings have been developed, it is quite possible that a learner will receive erroneous labels for unlabeled data when the other learner has only mediocre accuracy. This usually happens in the first rounds of co-training, when there are only a few labeled examples. As a result, co-training algorithms often have unstable performance. In this paper, Hessian-regularized co-training is proposed to overcome these limitations. Specifically, each Hessian is obtained from a particular view of examples; Hessian regularization is then integrated into the learner training process of each view by penalizing the regression function along the potential manifold. Hessian can properly exploit the local structure of the underlying data manifold. Hessian regularization significantly boosts the generalizability of a classifier, especially when there are a small number of labeled examples and a large number of unlabeled examples. To evaluate the proposed method, extensive experiments were conducted on the unstructured social activity attribute (USAA) dataset for social activity recognition. Our results demonstrate that the proposed method outperforms baseline methods, including the traditional co-training and LapCo algorithms.
Han, B., Zhao, X., Tao, D., Li, X., Hu, Z. & Hu, H. 2014, 'Dayside aurora classification via BIFs-based sparse representation using manifold learning', International Journal of Computer Mathematics, vol. 91, no. 11, pp. 2415-2426.
&copy; 2013, Taylor & Francis. Aurora is the typical ionosphere track generated by the interaction of solar wind and magnetosphere, whose modality and variation are significant to the study of space weather activity A new aurora classification algorithm based on biologically inspired features (BIFs) and discriminative locality alignment (DLA) is proposed in this paper First, an aurora image is represented by the BIFs, which combines the C1 units from the hierarchical model of object recognition in cortex and the gist features from the saliency map; then, the manifold learning method called DLA is used to obtain the effective sparse representation for auroras based on BIFs; finally, classification results using support vector machine and nearest neighbour with three sets of features: the C1 unit features, the gist features and the BIFs illustrate the effectiveness and robustness of our method on the real aurora image database from Chinese Arctic Yellow River Station.
You, X., Li, Q., Tao, D., Ou, W. & Gong, M. 2014, 'Local metric learning for exemplar-based object detection', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 8, pp. 1265-1276.
Object detection has been widely studied in the computer vision community and it has many real applications, despite its variations, such as scale, pose, lighting, and background. Most classical object detection methods heavily rely on category-based training to handle intra-class variations. In contrast to classical methods that use a rigid category-based representation, exemplar-based methods try to model variations among positives by learning from specific positive samples. However, current existing exemplar-based methods either fail to use any training information or suffer from a significant performance drop when few exemplars are available. In this paper, we design a novel local metric learning approach to well handle exemplar-based object detection task. The main works are two-fold: 1) a novel local metric learning algorithm called exemplar metric learning (EML) is designed and 2) an exemplar-based object detection algorithm based on EML is implemented. We evaluate our method on two generic object detection data sets: UIUC-Car and UMass FDDB. Experiments show that compared with other exemplar-based methods, our approach can effectively enhance object detection performance when few exemplars are available. &copy; 2014 IEEE.
Bian, W. & Tao, D. 2014, 'Asymptotic generalization bound of fisher's linear discriminant analysis', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 12, pp. 2325-2337.
&copy; 1979-2012 IEEE. Fisher's linear discriminant analysis (FLDA) is an important dimension reduction method in statistical pattern recognition. It has been shown that FLDA is asymptotically Bayes optimal under the homoscedastic Gaussian assumption. However, this classical result has the following two major limitations: 1) it holds only for a fixed dimensionality D , and thus does not apply when D and the training sample size N are proportionally large; 2) it does not provide a quantitative description on how the generalization ability of FLDA is affected by D and N. In this paper, we present an asymptotic generalization analysis of FLDA based on random matrix theory, in a setting where both D and N increase and D/N\ [0,1). The obtained lower bound of the generalization discrimination power overcomes both limitations of the classical result, i.e., it is applicable when D and N are proportionally large and provides a quantitative description of the generalization ability of FLDA in terms of the ratio =D/N and the population discrimination power. Besides, the discrimination power bound also leads to an upper bound on the generalization error of binary-classification with FLDA.
Hong, R.C., Wang, M., Gao, Y., Tao, D.C., Li, X.L. & Wu, X.D. 2014, 'Image Annotation by Multiple-Instance Learning With Discriminative Feature Mapping and Selection', IEEE Transactions on Cybernetics, vol. 44, no. 5, pp. 669-680.
Multiple-instance learning (MIL) has been widely investigated in image annotation for its capability of exploring region-level visual information of images. Recent studies show that, by performing feature mapping, MIL can be cast to a single-instance learning problem and, thus, can be solved by traditional supervised learning methods. However, the approaches for feature mapping usually overlook the discriminative ability and the noises of the generated features. In this paper, we propose an MIL method with discriminative feature mapping and feature selection, aiming at solving this problem. Our method is able to explore both the positive and negative concept correlations. It can also select the effective features from a large and diverse set of low-level features for each concept under MIL settings. Experimental results and comparison with other methods demonstrate the effectiveness of our approach.
Hou, W., Gao, X., Tao, D. & Li, X. 2013, 'Visual Saliency Detection Using Information Divergence', Pattern Recognition, vol. 46, no. 10, pp. 2658-2669.
The technique of visual saliency detection supports video surveillance systems by reducing redundant information and highlighting the critical, visually important regions. It follows that information about the image might be of great importance in depict
Cheng, J., Liu, J., Xu, Y., Yin, F., Wong, D., Tan, N., Tao, D., Cheng, C., Aung, T. & Wong, T. 2013, 'Superpixel Classification Based Optic Disc And Optic Cup Segmentation For Glaucoma Screening', IEEE Transactions on Medical Imaging, vol. 32, no. 6, pp. 1019-1032.
Glaucoma is a chronic eye disease that leads to vision loss. As it cannot be cured, detecting the disease in time is important. Current tests using intraocular pressure (IOP) are not sensitive enough for population based glaucoma screening. Optic nerve head assessment in retinal fundus images is both more promising and superior. This paper proposes optic disc and optic cup segmentation using superpixel classification for glaucoma screening. In optic disc segmentation, histograms, and center surround statistics are used to classify each superpixel as disc or non-disc. A self-assessment reliability score is computed to evaluate the quality of the automated optic disc segmentation. For optic cup segmentation, in addition to the histograms and center surround statistics, the location information is also included into the feature space to boost the performance. The proposed segmentation methods have been evaluated in a database of 650 images with optic disc and optic cup boundaries manually marked by trained professionals. Experimental results show an average overlapping error of 9.5% and 24.1% in optic disc and optic cup segmentation, respectively. The results also show an increase in overlapping error as the reliability score is reduced, which justifies the effectiveness of the self-assessment. The segmented optic disc and optic cup are then used to compute the cup to disc ratio for glaucoma screening. Our proposed method achieves areas under curve of 0.800 and 0.822 in two data sets, which is higher than other methods. The methods can be used for segmentation and glaucoma screening. The self-assessment will be used as an indicator of cases with large errors and enhance the clinical deployment of the automatic segmentation and screening.
Zhang, C. & Tao, D. 2013, 'Structure Of Indicator Function Classes With Finite Vapnik-chervonenkis Dimensions', IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 7, pp. 1156-1160.
The Vapnik-Chervonenkis (VC) dimension is used to measure the complexity of a function class and plays an important role in a variety of fields, including artificial neural networks and machine learning. One major concern is the relationship between the VC dimension and inherent characteristics of the corresponding function class. According to Sauer's lemma, if the VC dimension of an indicator function class F is equal to D, the cardinality of the set FS1N will not be larger than Sd=0DCNd. Therefore, there naturally arises a question about the VC dimension of an indicator function class: what kinds of elements will be contained in the function class F if F has a finite VC dimension? In this brief, we answer the above question. First, we investigate the structure of the function class F when the cardinality of the set FS1N reaches the maximum value Sd=0DCNd. Based on the derived result, we then figure out what kinds of elements will be contained in F if F has a finite VC dimension.
Li, J., Bian, W., Tao, D. & Zhang, C. 2013, 'Learning Colours From Textures By Sparse Manifold Embedding', Signal Processing, vol. 93, no. 6, pp. 1485-1495.
The capability of inferring colours from the texture (grayscale contents) of an image is useful in many application areas, when the imaging device/environment is limited. Traditional manual or limited automatic colour assignment involves intensive human
Song, M., Tao, D., Sun, S., Chen, C. & Bu, J. 2013, 'Joint sparse learning for 3-D facial expression generation', IEEE Transactions On Image Processing, vol. 22, no. 8, pp. 3283-3295.
3-D facial expression generation, including synthesis and retargeting, has received intensive attentions in recent years, because it is important to produce realistic 3-D faces with speci?c expressions in modern ?lm production and computer games. In this paper, we present joint sparse learning (JSL) to learn mapping functions and their respective inverses to model the relationship between the high-dimensional 3-D faces (of different expressions and identities) and their corresponding low-dimensional representations. Based on JSL, we can effectively and ef?ciently generate various expressions of a 3-D face by either synthesizing or retargeting. Furthermore, JSL is able to restore 3-D faces with holes by learning a mapping function between incomplete and intact data. Experimental results on a wide range of 3-D faces demonstrate the effectiveness of the proposed approach by comparing with representative ones in terms of quality, time cost, and robustness.
Zhang, L., Zhang, L., Tao, D. & Huang, X. 2013, 'A modified stochastic neighbor embedding for multi-feature dimension reduction of remote sensing images', ISPRS Journal of Photogrammetry and Remote Sensing, vol. 83, no. 1, pp. 30-39.
In automated remote sensing based image analysis, it is important to consider the multiple features of a certain pixel, such as the spectral signature, morphological property, and shape feature, in both the spatial and spectral domains, to improve the classification accuracy. Therefore, it is essential to consider the complementary properties of the different features and combine them in order to obtain an accurate classification rate. In this paper, we introduce a modified stochastic neighbor embedding (MSNE) algorithm for multiple features dimension reduction (DR) under a probability preserving projection framework. For each feature, a probability distribution is constructed based on t-distributed stochastic neighbor embedding (t-SNE), and we then alternately solve t-SNE and learn the optimal combination coefficients for different features in the proposed multiple features DR optimization. Compared with conventional remote sensing image DR strategies, the suggested algorithm utilizes both the spatial and spectral features of a pixel to achieve a physically meaningful low-dimensional feature representation for the subsequent classification, by automatically learning a combination coefficient for each feature. The classification results using hyperspectral remote sensing images (HSI) show that MSNE can effectively improve RS image classification performance
Liu, W. & Tao, D. 2013, 'Multiview hessian regularization for image annotation', IEEE Transactions On Image Processing, vol. 22, no. 7, pp. 2676-2687.
Multiview hessian regularization for image annotation
Wang, X., Bian, W. & Tao, D. 2013, 'Grassmannian regularized structured multi-view embedding for image classification', IEEE Transactions On Image Processing, vol. 22, no. 7, pp. 2646-2660.
Images are usually represented by features from multiple views, e.g., color and texture. In image classification, the goal is to fuse all the multi-view features in a reasonable manner and achieve satisfactory classification performance. However, the fea
Xiao, B., Gao, X., Tao, D. & Li, X. 2013, 'Biview Face Recognition In The Shape-texture Domain', Pattern Recognition, vol. 46, no. 7, pp. 1906-1919.
Face recognition is one of the biometric identification methods with the highest potential. The existing face recognition algorithms relying on the texture information of face images are affected greatly by the variation of expression, scale and illumination. Whereas the algorithms based on the shape topology weaken the influence of illumination to some extent, but the impact of expression, scale and illumination on face recognition is still unsolved. To this end, we propose a new method for face recognition by integrating texture information with shape information, called biview face recognition algorithm. The texture models are constructed by using subspace learning methods and shape topologies are formed by building graphs for face images. The proposed biview face recognition method is compared with recognition algorithms merely based on texture or shape information. Experimental results of recognizing faces under the variation of illumination, expression and scale demonstrate that the performance of the proposed biview face recognition outperforms texture-based and shape-based algorithms.
Li, J. & Tao, D. 2013, 'Exponential Family Factors For Bayesian Factor Analysis', IEEE Transactions On Neural Networks And Learning Systems, vol. 24, no. 6, pp. 964-976.
Expressing data as linear functions of a small number of unknown variables is a useful approach employed by several classical data analysis methods, e.g., factor analysis, principal component analysis, or latent semantic indexing. These models represent the data using the product of two factors. In practice, one important concern is how to link the learned factors to relevant quantities in the context of the application. To this end, various specialized forms of the factors have been proposed to improve interpretability. Toward developing a unified view and clarifying the statistical significance of the specialized factors, we propose a Bayesian model family. We employ exponential family distributions to specify various types of factors, which provide a unified probabilistic formulation. A Gibbs sampling procedure is constructed as a general computation routine. We verify the model by experiments, in which the proposed model is shown to be effective in both emulating existing models and motivating new model designs for particular problem settings.
Pan, Z., You, X., Chen, H., Tao, D. & Pang, B. 2013, 'Generalization Performance Of Magnitude-preserving Semi-supervised Ranking With Graph-based Regularization', Information Sciences, vol. 221, no. 1, pp. 284-296.
Semi-supervised ranking is a relatively new and important learning problem inspired by many applications. We propose a novel graph-based regularized algorithm which learns the ranking function in the semi-supervised learning framework. It can exploit geometry of the data while preserving the magnitude of the preferences. The least squares ranking loss is adopted and the optimal solution of our model has an explicit form. We establish error analysis of our proposed algorithm and demonstrate the relationship between predictive performance and intrinsic properties of the graph. The experiments on three datasets for recommendation task and two quantitative structureactivity relationship datasets show that our method is effective and comparable to some other state-of-the-art algorithms for ranking.
Wang, N., Li, J., Tao, D., Li, X. & Gao, X. 2013, 'Heterogeneous Image Transformation', Pattern Recognition Letters, vol. 34, no. 1, pp. 77-84.
Heterogeneous image transformation (HIT) plays an important role in both law enforcements and digital entertainment. Some available popular transformation methods, like locally linear embedding based, usually generate images with lower definition and blurred details mainly due to two defects: (1) these approaches use a fixed number of nearest neighbors (NN) to model the transformation process, i.e., K-NN-based methods; (2) with overlapping areas averaged, the transformed image is approximately equivalent to be filtered by a low pass filter, which filters the high frequency or detail information. These drawbacks reduce the visual quality and the recognition rate across heterogeneous images. In order to overcome these two disadvantages, a two step framework is constructed based on sparse feature selection (SFS) and support vector regression (SVR). In the proposed model, SFS selects nearest neighbors adaptively based on sparse representation to implement an initial transformation, and subsequently the SVR model is applied to estimate the lost high frequency information or detail information. Finally, by linear superimposing these two parts, the ultimate transformed image is obtained. Extensive experiments on both sketch-photo database and near infraredvisible image database illustrates the effectiveness of the proposed heterogeneous image transformation method.
Mu, Y., Ding, W. & Tao, D. 2013, 'Local Discriminative Distance Metrics Ensemble Learning', Pattern Recognition, vol. 46, no. 8, pp. 2337-2349.
The ultimate goal of distance metric learning is to incorporate abundant discriminative information to keep all data samples in the same class close and those from different classes separated. Local distance metric methods can preserve discriminative information by considering the neighborhood influence. In this paper, we propose a new local discriminative distance metrics (LDDM) algorithm to learn multiple distance metrics from each training sample (a focal sample) and in the vicinity of that focal sample (focal vicinity), to optimize local compactness and local separability. Those locally learned distance metrics are used to build local classifiers which are aligned in a probabilistic framework via ensemble learning. Theoretical analysis proves the convergence rate bound, the generalization bound of the local distance metrics and the final ensemble classifier. We extensively evaluate LDDM using synthetic datasets and large benchmark UCI datasets
Zhang, C. & Tao, D. 2013, 'Risk Bounds Of Learning Processes For Levy Processes', Journal of Machine Learning Research, vol. 14, no. NA, pp. 351-376.
Levy processes refer to a class of stochastic processes, for example, Poisson processes and Brownian motions, and play an important role in stochastic processes and machine learning. Therefore, it is essential to study risk bounds of the learning process
Li, J. & Tao, D. 2013, 'Simple Exponential Family PCA', IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 3, pp. 485-497.
Principal component analysis (PCA) is a widely used model for dimensionality reduction. In this paper, we address the problem of determining the intrinsic dimensionality of a general type data population by selecting the number of principal components for a generalized PCA model. In particular, we propose a generalized Bayesian PCA model, which deals with general type data by employing exponential family distributions. Model selection is realized by empirical Bayesian inference of the model. We name the model as simple exponential family PCA (SePCA), since it embraces both the principal of using a simple model for data representation and the practice of using a simplified computational procedure for the inference. Our analysis shows that the empirical Bayesian inference in SePCA formally realizes an intuitive criterion for PCA model selection - a preserved principal component must sufficiently correlate to data variance that is uncorrelated to the other principal components. Experiments on synthetic and real data sets demonstrate effectiveness of SePCA and exemplify its characteristics for model selection.
Zhang, L., Zhang, L., Tao, D. & Huang, X. 2013, 'Tensor Discriminative Locality Alignment For Hyperspectral Image Spectral-spatial Feature Extraction', IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 1, pp. 242-256.
In this paper, we propose a method for the dimensionality reduction (DR) of spectral-spatial features in hyperspectral images (HSIs), under the umbrella of multilinear algebra, i.e., the algebra of tensors. The proposed approach is a tensor extension of conventional supervised manifold-learning-based DR. In particular, we define a tensor organization scheme for representing a pixel's spectral-spatial feature and develop tensor discriminative locality alignment (TDLA) for removing redundant information for subsequent classification. The optimal solution of TDLA is obtained by alternately optimizing each mode of the input tensors. The methods are tested on three public real HSI data sets collected by hyperspectral digital imagery collection experiment, reflective optics system imaging spectrometer, and airborne visible/infrared imaging spectrometer. The classification results show significant improvements in classification accuracies while using a small number of features.
Luo, Y., Tao, D., Xu, C., Xu, C., Liu, H. & Wen, Y. 2013, 'Multiview Vector-valued Manifold Regularization For Multilabel Image Classification', IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 5, pp. 709-722.
In computer vision, image datasets used for classification are naturally associated with multiple labels and comprised of multiple views, because each image may contain several objects (e.g., pedestrian, bicycle, and tree) and is properly characterized by multiple visual features (e.g., color, texture, and shape). Currently, available tools ignore either the label relationship or the view complementarily. Motivated by the success of the vector-valued function that constructs matrix-valued kernels to explore the multilabel structure in the output space, we introduce multiview vector-valued manifold regularization (MV3MR) to integrate multiple features. MV3MR exploits the complementary property of different features and discovers the intrinsic local geometry of the compact support shared by different features under the theme of manifold regularization. We conduct extensive experiments on two challenging, but popular, datasets, PASCAL VOC' 07 and MIR Flickr, and validate the effectiveness of the proposed MV3MR for image classification.
Luo, Y., Tao, D., Geng, B., Xu, C. & Maybank, S. 2013, 'Manifold Regularized Multi-task Learning for Semi-supervised Multi-label Image Classification', IEEE Transactions On Image Processing, vol. 22, no. 2, pp. 523-532.
It is a significant challenge to classify images with multiple labels by using only a small number of labeled samples. One option is to learn a binary classifier for each label and use manifold regularization to improve the classification performance by exploring the underlying geometric structure of the data distribution. However, such an approach does not perform well in practice when images from multiple concepts are represented by high-dimensional visual features. Thus, manifold regularization is insufficient to control the model complexity. In this paper, we propose a manifold regularized multitask learning (MRMTL) algorithm. MRMTL learns a discriminative subspace shared by multiple classification tasks by exploiting the common structure of these tasks. It effectively controls the model complexity because different tasks limit one another's search volume, and the manifold regularization ensures that the functions in the shared hypothesis space are smooth along the data manifold. We conduct extensive experiments, on the PASCAL VOC'07 dataset with 20 classes and the MIR dataset with 38 classes, by comparing MRMTL with popular image classification algorithms. The results suggest that MRMTL is effective for image classification.
Shi, M., Xu, R., Tao, D. & Xu, C. 2013, 'W-tree Indexing for Fast Visual Word Generation', IEEE Transactions On Image Processing, vol. 22, no. 3, pp. 1209-1222.
The bag-of-visual-words representation has been widely used in image retrieval and visual recognition. The most time-consuming step in obtaining this representation is the visual word generation, i.e., assigning visual words to the corresponding local features in a high-dimensional space. Recently, structures based on multibranch trees and forests have been adopted to reduce the time cost. However, these approaches cannot perform well without a large number of backtrackings. In this paper, by considering the spatial correlation of local features, we can significantly speed up the time consuming visual word generation process while maintaining accuracy. In particular, visual words associated with certain structures frequently co-occur; hence, we can build a co-occurrence table for each visual word for a largescale data set. By associating each visual word with a probability according to the corresponding co-occurrence table, we can assign a probabilistic weight to each node of a certain index structure (e.g., a KD-tree and a K-means tree), in order to re-direct the searching path to be close to its global optimum within a small number of backtrackings. We carefully study the proposed scheme by comparing it with the fast library for approximate nearest neighbors and the random KD-trees on the Oxford data set. Thorough experimental results suggest the efficiency and effectiveness of the new scheme.
Yu, J., Tao, D., Rui, Y. & Cheng, J. 2013, 'Pairwise Constraints Based Multiview Features Fusion for Scene Classification', Pattern Recognition, vol. 46, no. 2, pp. 483-496.
Recently, we have witnessed a surge of interests of learning a low-dimensional subspace for scene classification. The existing methods do not perform well since they do not consider scenes' multiple features from different views in low-dimensional subspace construction. In this paper, we describe scene images by finding a group of features and explore their complementary characteristics. We consider the problem of multiview dimensionality reduction by learning a unified low-dimensional subspace to effectively fuse these features. The new proposed method takes both intraclass and interclass geometries into consideration, as a result the discriminability is effectively preserved because it takes into account neighboring samples which have different labels. Due to the semantic gap, the fusion of multiview features still cannot achieve excellent performance of scene classification in real applications. Therefore, a user labeling procedure is introduced in our approach. Initially, a query image is provided by the user, and a group of images are retrieved by a search engine. After that, users label some images in the retrieved set as relevant or irrelevant with the query. The must-links are constructed between the relevant images, and the cannot-links are built between the irrelevant images. Finally, an alternating optimization procedure is adopted to integrate the complementary nature of different views with the user labeling information, and develop a novel multiview dimensionality reduction method for scene classification. Experiments are conducted on the real-world datasets of natural scenes and indoor scenes, and the results demonstrate that the proposed method has the best performance in scene classification. In addition, the proposed method can be applied to other classification problems. The experimental results of shape classification on Caltech 256 suggest the effectiveness of our method.
Shen, H., Tao, D. & Ma, D. 2013, 'Multiview Locally Linear Embedding for Effective Medical Image Retrieval', Plos One, vol. 8, no. 12, pp. 1-21.
Content-based medical image retrieval continues to gain attention for its potential to assist radiological image interpretation and decision making. Many approaches have been proposed to improve the performance of medical image retrieval system, among wh
Liu, T., Sachdev, P., Lipnicki, D., Jiang, J., Geng, G., Zhu, W., Reppermund, S., Tao, D., Trollor, J., Brodaty, H. & Wen, W. 2013, 'Limited relationships between two-year changes in sulcal morphology and other common neuroimaging indices in the elderly', Neuroimage, vol. 83, no. 1, pp. 12-17.
Measuring the geometry or morphology of sulcal folds has recently become an important approach to investigating neuroanatomy. However, relationships between cortical sulci and other brain structures are poorly understood. The present study investigates h
Du, B., Zhang, L., Tao, D. & Zhang, D. 2013, 'Unsupervised transfer learning for target detection from hyperspectral images', Neurocomputing, vol. 120, no. 1, pp. 72-82.
Target detection has been of great interest in hyperspectral image analysis. Feature extraction from target samples and counterpart backgrounds consist the key to the problem. Traditional target detection methods depend on comparatively fixed feature for
Peng, B., Wu, J., Yuan, H., Guo, Q. & Tao, D. 2013, 'ANEEC: A Quasi-Automatic System for Massive Named Entity Extraction and Categorization', Computer Journal, vol. 56, no. 11, pp. 1328-1346.
Named entity recognition seeks to locate atomic elements in texts and classify them into predefined categories. It is essentially useful for many applications, including microblog analysis and query suggestion. In recent years, with the explosion of Web
Li, J. & Tao, D. 2013, 'A Bayesian Hierarchical Factorization Model for Vector Fields', IEEE Transactions On Image Processing, vol. 22, no. 11, pp. 4510-4521.
Factorization-based techniques explain arrays of observations using a relatively small number of factors and provide an essential arsenal for multi-dimensional data analysis. Most factorization models are, however, developed on general arrays of scalar v
Zhang, K., Gao, X., Tao, D. & Li, X. 2013, 'Single Image Super-Resolution With Multiscale Similarity Learning', IEEE Transactions On Neural Networks And Learning Systems, vol. 24, no. 10, pp. 1648-1659.
Example learning-based image super-resolution (SR) is recognized as an effective way to produce a high-resolution (HR) image with the help of an external training set. The effectiveness of learning-based SR methods, however, depends highly upon the consi
Wang, N., Tao, D., Gao, X., Li, X. & Li, J. 2013, 'Transductive Face Sketch-Photo Synthesis', IEEE Transactions On Neural Networks And Learning Systems, vol. 24, no. 9, pp. 1364-1376.
Face sketch-photo synthesis plays a critical role in many applications, such as law enforcement and digital entertainment. Recently, many face sketch-photo synthesis methods have been proposed under the framework of inductive learning, and these have obt
Zhen, X., Shao, L., Tao, D. & Li, X. 2013, 'Embedding Motion and Structure Features for Action Recognition', IEEE Transactions On Circuits And Systems For Video Technology, vol. 23, no. 7, pp. 1182-1190.
We propose a novel method to model human actions by explicitly coding motion and structure features that are separately extracted from video sequences. Firstly, the motion template (one feature map) is applied to encode the motion information and image p
Tao, D., Wang, D. & Murtagh, F. 2013, 'Machine learning in intelligent image processing', Signal Processing, vol. 93, no. 6, pp. 1399-1400.
NA
Zhou, T. & Tao, D. 2013, 'Double Shrinking for Sparse Dimension Reduction', IEEE Transactions On Image Processing, vol. 22, no. 1, pp. 244-257.
Learning tasks such as classification and clustering usually perform better and cost less (time and space) on compressed representations than on the original data. Previous works mainly compress data via dimension reduction. In this paper, we propose double shrinking to compress image data on both dimensionality and cardinality via building either sparse low-dimensional representations or a sparse projection matrix for dimension reduction. We formulate a double shrinking model (DSM) as an l1 regularized variance maximization with constraint ||x||2=1, and develop a double shrinking algorithm (DSA) to optimize DSM. DSA is a path-following algorithm that can build the whole solution path of locally optimal solutions of different sparse levels. Each solution on the path is a warm start for searching the next sparser one. In each iteration of DSA, the direction, the step size, and the Lagrangian multiplier are deduced from the Karush-Kuhn-Tucker conditions. The magnitudes of trivial variables are shrunk and the importances of critical variables are simultaneously augmented along the selected direction with the determined step length. Double shrinking can be applied to manifold learning and feature selections for better interpretation of features, and can be combined with classification and clustering to boost their performance. The experimental results suggest that double shrinking produces efficient and effective data compression.
Gao, X., Gao, F., Tao, D. & Li, X. 2013, 'Universal Blind Image Quality Assessment Metrics Via Natural Scene Statistics and Multiple Kernel Learning', IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 12, pp. 2013-2026.
Universal blind image quality assessment (IQA) metrics that can work for various distortions are of great importance for image processing systems, because neither ground truths are available nor the distortion types are aware all the time in practice. Existing state-of-the-art universal blind IQA algorithms are developed based on natural scene statistics (NSS). Although NSS-based metrics obtained promising performance, they have some limitations: 1) they use either the Gaussian scale mixture model or generalized Gaussian density to predict the nonGaussian marginal distribution of wavelet, Gabor, or discrete cosine transform coefficients. The prediction error makes the extracted features unable to reflect the change in nonGaussianity (NG) accurately. The existing algorithms use the joint statistical model and structural similarity to model the local dependency (LD). Although this LD essentially encodes the information redundancy in natural images, these models do not use information divergence to measure the LD. Although the exponential decay characteristic (EDC) represents the property of natural images that large/small wavelet coefficient magnitudes tend to be persistent across scales, which is highly correlated with image degradations, it has not been applied to the universal blind IQA metrics; and 2) all the universal blind IQA metrics use the same similarity measure for different features for learning the universal blind IQA metrics, though these features have different properties. To address the aforementioned problems, we propose to construct new universal blind quality indicators using all the three types of NSS, i.e., the NG, LD, and EDC, and incorporating the heterogeneous property of multiple kernel learning (MKL). By analyzing how different distortions affect these statistical properties, we present two universal blind quality assessment models, NSS global scheme and NSS two-step scheme. In the proposed metrics: 1) we exploit the NG of natural images u...
Cheng, J., Bian, W. & Tao, D. 2013, 'Locally regularized sliced inverse regression based 3D hand gesture recognition on a dance robot', Information Sciences, vol. 221, pp. 274-283.
Gesture recognition plays an important role in human machine interactions (HMIs) for multimedia entertainment. In this paper, we present a dimension reduction based approach for dynamic real-time hand gesture recognition. The hand gestures are recorded as acceleration signals by using a handheld with a 3-axis accelerometer sensor installed, and represented by discrete cosine transform (DCT) coefficients. To recognize different hand gestures, we develop a new dimension reduction method, locally regularized sliced inverse regression (LR-SIR), to find an effective low dimensional subspace, in which different hand gestures are well separable, following which recognition can be performed by using simple and efficient classifiers, e.g., nearest mean, k-nearest-neighbor rule and support vector machine. LR-SIR is built upon the well-known sliced inverse regression (SIR), but overcomes its limitation that it ignores the local geometry of the data distribution. Besides, LR-SIR can be effectively and efficiently solved by eigen-decomposition. Finally, we apply the LR-SIR based gesture recognition to control our recently developed dance robot for multimedia entertainment. Thorough empirical studies on `digits-gesture recognition suggest the effectiveness of the new gesture recognition scheme for HMI.
Cheng, J., Xie, C., Bian, W. & Tao, D. 2012, 'Feature fusion for 3D hand gesture recognition by learning a shared hidden space', Pattern Recognition Letters, vol. 33, no. 4, pp. 476-484.
Hand gesture recognition has been intensively applied in various humancomputer interaction (HCI) systems. Different hand gesture recognition methods were developed based on particular features, e.g., gesture trajectories and acceleration signals. However, it has been noticed that the limitation of either features can lead to flaws of a HCI system. In this paper, to overcome the limitations but combine the merits of both features, we propose a novel feature fusion approach for 3D hand gesture recognition. In our approach, gesture trajectories are represented by the intersection numbers with randomly generated line segments on their 2D principal planes, acceleration signals are represented by the coefficients of discrete cosine transformation (DCT). Then, a hidden space shared by the two features is learned by using penalized maximum likelihood estimation (MLE). An iterative algorithm, composed of two steps per iteration, is derived to for this penalized MLE, in which the first step is to solve a standard least square problem and the second step is to solve a Sylvester equation. We tested our hand gesture recognition approach on different hand gesture sets. Results confirm the effectiveness of the feature fusion method.
Zhang, L., Zhang, L., Tao, D. & huang, X. 2012, 'On Combining Multiple Features for Hyperspectral Remote Sensing Image Classification', IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 3, pp. 879-893.
In hyperspectral remote sensing image classification, multiple features, e.g., spectral, texture, and shape features, are employed to represent pixels from different perspectives. It has been widely acknowledged that properly combining multiple features always results in good classification performance. In this paper, we introduce the patch alignment framework to linearly combine multiple features in the optimal way and obtain a unified low-dimensional representation of these multiple features for subsequent classification. Each feature has its particular contribution to the unified representation determined by simultaneously optimizing the weights in the objective function. This scheme considers the specific statistical properties of each feature to achieve a physically meaningful unified low-dimensional representation of multiple features. Experiments on the classification of the hyperspectral digital imagery collection experiment and reflective optics system imaging spectrometer hyperspectral data sets suggest that this scheme is effective.
Song, M., Tao, D., Chen, C., Bu, J., Luo, J. & Zhang, C. 2012, 'Probabilistic Exposure Fusion', IEEE Transactions On Image Processing, vol. 21, no. 1, pp. 341-357.
The luminance of a natural scene is often of high dynamic range (HDR). In this paper, we propose a new scheme to handle HDR scenes by integrating locally adaptive scene detail capture and suppressing gradient reversals introduced by the local adaptation. The proposed scheme is novel for capturing an HDR scene by using a standard dynamic range (SDR) device and synthesizing an image suitable for SDR displays. In particular, we use an SDR capture device to record scene details (i.e., the visible contrasts and the scene gradients) in a series of SDR images with different exposure levels. Each SDR image responds to a fraction of the HDR and partially records scene details. With the captured SDR image series, we first calculate the image luminance levels, which maximize the visible contrasts, and then the scene gradients embedded in these images. Next, we synthesize an SDR image by using a probabilistic model that preserves the calculated image luminance levels and suppresses reversals in the image luminance gradients. The synthesized SDR image contains much more scene details than any of the captured SDR image. Moreover, the proposed scheme also functions as the tone mapping of an HDR image to the SDR image, and it is superior to both global and local tone mapping operators. This is because global operators fail to preserve visual details when the contrast ratio of a scene is large, whereas local operators often produce halos in the synthesized SDR image. The proposed scheme does not require any human interaction or parameter tuning for different scenes. Subjective evaluations have shown that it is preferred over a number of existing approaches.
Song, M., Tao, D., Huang, X., Chen, C. & Bu, J. 2012, 'Three-Dimensional Face Reconstruction From A Single Image By A Coupled RBF Network', Ieee Transactions On Image Processing, vol. 21, no. 5, pp. 2887-2897.
Reconstruction of a 3-D face model from a single 2-D face image is fundamentally important for face recognition and animation because the 3-D face model is invariant to changes of viewpoint, illumination, background clutter, and occlusions. Given a coupl
huang, Q., Tao, D., li, X. & Liew, A. 2012, 'Parallelized Evolutionary Learning for detection of Biclusters in Gene Expression Data', IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 2, pp. 560-570.
The analysis of gene expression data obtained from microarray experiments is important for discovering the biological process of genes. Biclustering algorithms have been proven to be able to group the genes with similar expression patterns under a number of experimental conditions. In this paper, we propose a new biclustering algorithm based on evolutionary learning. By converting the biclustering problem into a common clustering problem, the algorithm can be applied in a search space constructed by the conditions. To further reduce the size of the search space, we randomly separate the full conditions into a number of condition subsets (subspaces), each of which has a smaller number of conditions. The algorithm is applied to each subspace and is able to discover bicluster seeds within a limited computing time. Finally, an expanding and merging procedure is employed to combine the bicluster seeds into larger biclusters according to a homogeneity criterion. We test the performance of the proposed algorithm using synthetic and real microarray data sets. Compared with several previously developed biclustering algorithms, our algorithm demonstrates a significant improvement in discovering additive biclusters.
Liu, T.T., Lipnicki, D., Zhu, W., Tao, D., Zhang, C., Cui, Y., Jin, J., Sachdev, P. & Wen, W. 2012, 'Cortical Gyrification And Sulcal Spans In Early Stage Alzheimer'S Disease', PLoS One, vol. 7, no. 2, pp. 1-5.
Alzheimer's disease (AD) is characterized by an insidious onset of progressive cerebral atrophy and cognitive decline. Previous research suggests that cortical folding and sulcal width are associated with cognitive function in elderly individuals, and th
Tang, J., Zha, Z., Tao, D. & Chua, T. 2012, 'Semantic-Gap-Oriented Active Learning For Multilabel Image Annotation', Ieee Transactions On Image Processing, vol. 21, no. 4, pp. 2354-2360.
User interaction is an effective way to handle the semantic gap problem in image annotation. To minimize user effort in the interactions, many active learning methods were proposed. These methods treat the semantic concepts individually or correlatively.
Zheng, S., Huang, K., tan, T. & Tao, D. 2012, 'A Cascade Fusion Scheme For Gait And Cumulative Foot Pressure Image Recognition', Pattern Recognition, vol. 45, no. 10, pp. 3603-3610.
Cumulative foot pressure images represent the 2D ground reaction force during one gait cycle. Biomedical and forensic studies show that humans can be distinguished by unique limb movement patterns and ground reaction force. Considering continuous gait po
Deng, X., Shen, Y., Song, M., Tao, D., Bu, J. & Chen, C. 2012, 'Video-Based Non-Uniform Object Motion Blur Estimation And Deblurring', Neurocomputing, vol. 86, no. 1, pp. 170-178.
Motion deblurring is a challenging problem in computer vision. Most previous blind deblurring approaches usually assume that the Point Spread Function (PSF) is spatially invariant. However, non-uniform motions exist ubiquitously and cannot be handled suc
Bian, W., Tao, D. & Rui, Y. 2012, 'Cross-Domain Human Action Recognition', Ieee Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 42, no. 2, pp. 298-307.
Conventional human action recognition algorithms cannot work well when the amount of training videos is insufficient. We solve this problem by proposing a transfer topic model (TTM), which utilizes information extracted from videos in the auxiliary domai
Yu, J.X., Bian, W., Song, M., Cheng, J.L. & Tao, D. 2012, 'Graph Based Transductive Learning For Cartoon Correspondence Construction', Neurocomputing, vol. 79, pp. 105-114.
Correspondence construction of characters in key frames is the prerequisite for cartoon animations' automatic inbetweening and coloring. Since each frame of an animation consists of multiple layers, characters are complicated in terms of shape and struct
Yu, J.X., Tao, D. & Wang, M. 2012, 'Adaptive Hypergraph Learning And Its Application In Image Classification', Ieee Transactions On Image Processing, vol. 21, no. 7, pp. 3262-3272.
Recent years have witnessed a surge of interest in graph-based transductive image classification. Existing simple graph-based transductive learning methods only model the pairwise relationship of images, however, and they are sensitive to the radius para
An, L., Gao, X., Yuan, Y., Tao, D., Deng, C. & Ji, F. 2012, 'Content-Adaptive Reliable Robust Lossless Data Embedding', Neurocomputing, vol. 79, no. 1, pp. 1-11.
It is well known that robust lossless data embedding (RLDE) methods can be used to protect copyright of digital images when the intactness of host images is highly demanded and the unintentional attacks may be encountered in data communication. However,
Gao, X., Zhang, K., Tao, D. & Li, X. 2012, 'Image Super-Resolution With Sparse Neighbor Embedding', Ieee Transactions On Image Processing, vol. 21, no. 7, pp. 3194-3205.
Until now, neighbor-embedding-based (NE) algorithms for super-resolution (SR) have carried out two independent processes to synthesize high-resolution (HR) image patches. In the first process, neighbor search is performed using the Euclidean distance met
Gao, X., Zhang, K., Tao, D. & Li, X. 2012, 'Joint Learning For Single-Image Super-Resolution Via A Coupled Constraint', IEEE Transactions On Image Processing, vol. 21, no. 2, pp. 469-480.
The neighbor-embedding (NE) algorithm for single-image super-resolution (SR) reconstruction assumes that the feature spaces of low-resolution (LR) and high-resolution (HR) patches are locally isometric. However, this is not true for SR because of one-to-
Guan, N., Tao, D., Luo, Z. & Yuan, B. 2012, 'Online Nonnegative Matrix Factorization With Robust Stochastic Approximation', IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 7, pp. 1087-1099.
Nonnegative matrix factorization (NMF) has become a popular dimension-reduction method and has been widely applied to image processing and pattern recognition problems. However, conventional NMF learning methods require the entire dataset to reside in th
Zhang, Z. & Tao, D. 2012, 'Slow Feature Analysis For Human Action Recognition', Ieee Transactions On Pattern Analysis And Machine Intelligence, vol. 34, no. 3, pp. 436-450.
Slow Feature Analysis (SFA) extracts slowly varying features from a quickly varying input signal [1]. It has been successfully applied to modeling the visual receptive fields of the cortical neurons. Sufficient experimental results in neuroscience sugges
Guan, N., Tao, D., Luo, Z. & Yuan, B. 2012, 'NeNMF: An Optimal Gradient Method For Nonnegative Matrix Factorization', Ieee Transactions On Signal Processing, vol. 60, no. 6, pp. 2882-2898.
Nonnegative matrix factorization (NMF) is a powerful matrix decomposition technique that approximates a nonnegative matrix by the product of two low-rank nonnegative matrix factors. It has been widely applied to signal processing, computer vision, and da
Zhou, T., Tao, D. & Wu, X. 2012, 'Compressed Labeling On Distilled Labelsets For Multi-Label Learning', Machine Learning, vol. 88, no. 1-2, pp. 69-126.
Directly applying single-label classification methods to the multi-label learning problems substantially limits both the performance and speed due to the imbalance, dependence and high dimensionality of the given label matrix. Existing methods either ign
Li, J., Tao, D. & Li, X. 2012, 'A probabilistic model for image representation via multiple patterns', Pattern Recognition, vol. 45, no. 11, pp. 4044-4053.
For image analysis, an important extension to principal component analysis (PCA) is to treat an image as multiple samples, which helps alleviate the small sample size problem. Various schemes of transforming an image to multiple samples have been proposed. Although having been shown effective in practice, the schemes are mainly based on heuristics and experience. In this paper, we propose a probabilistic PCA model, in which we explicitly represent the transformation scheme and incorporate the scheme as a stochastic component of the model. Therefore fitting the model automatically learns the transformation. Moreover, the learned model allows us to distinguish regions that can be well described by the PCA model from those that need further treatment. Experiments on synthetic images and face data sets demonstrate the properties and utility of the proposed model
Li, J. & Tao, D. 2012, 'On Preserving Original Variables in Bayesian PCA with Application to Image Analysis', IEEE Transactions On Image Processing, vol. 21, no. 12, pp. 4830-4843.
Principal component analysis (PCA) computes a succinct data representation by converting the data to a few new variables while retaining maximum variation. However, the new variables are dif?cult to interpret, because each one is combined with all of the original input variables and has obscure semantics. Under the umbrella of Bayesian data analysis, this paper presents a new prior to explicitly regularize combinations of input variables. In particular, the prior penalizes pair-wise products of the coef?cients of PCA and encourages a sparse model. Compared to the commonly used 1 -regularizer, the proposed prior encourages the sparsity pattern in the resultant coef?cients to be consistent with the intrinsic groups in the original input variables. Moreover, the proposed prior can be explained as recovering a robust estimation of the covariance matrix for PCA. The proposed model is suited for analyzing visual data, where it encourages the output variables to correspond to meaningful parts in the data. We demonstrate the characteristics and effectiveness of the proposed technique through experiments on both synthetic and real data.
Jiang, J., Cheng, J. & Tao, D. 2012, 'Color Biological Features-based Solder Paste Defects Detection And Classification On Printed Circuit Boards', IEEE Transactions on Components, Packaging, and Manufaturing Technology, vol. 2, no. 9, pp. 1536-1544.
Deposited solder paste inspection plays a critical role in surface mounting processes. When detecting solder pastes defects on a printed circuit board, profile measurement-based methods suffer from large system size, high cost, and low speed for inspecti
Zhang, C. & Tao, D. 2012, 'Generalization Bounds Of Erm-based Learning Processes For Continuous-time Markov Chains', IEEE Transactions On Neural Networks And Learning Systems, vol. 23, no. 12, pp. 1872-1883.
Many existing results on statistical learning theory are based on the assumption that samples are independently and identically distributed (i.i.d.). However, the assumption of i.i.d. samples is not suitable for practical application to problems in which
Bian, W. & Tao, D. 2012, 'Constrained Empirical Risk Minimization Framework For Distance Metric Learning', IEEE Transactions On Neural Networks And Learning Systems, vol. 23, no. 8, pp. 1194-1205.
Distance metric learning (DML) has received increasing attention in recent years. In this paper, we propose a constrained empirical risk minimization framework for DML. This framework enriches the state-of-the-art studies on both theoretic and algorithmi
Tian, X., Tao, D. & Rui, Y. 2012, 'Sparse Transfer Learning For Interactive Video Search Reranking', ACM Transactions on Multimedia Computing Communications and Applications, vol. 8, no. 3, pp. 1-19.
Visual reranking is effective to improve the performance of the text-based video search. However, existing reranking algorithms can only achieve limited improvement because of the well-known semantic gap between low-level visual features and high-level s
Gao, X., Wang, N., Tao, D. & Li, X. 2012, 'Face Sketch-photo Synthesis And Retrieval Using Sparse Representation', IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 8, pp. 1213-1226.
Sketch-photo synthesis plays an important role in sketch-based face photo retrieval and photo-based face sketch retrieval systems. In this paper, we propose an automatic sketch-photo synthesis and retrieval algorithm based on sparse representation. The p
Zhang, C., Bian, W., Tao, D. & Weisi, L. 2012, 'Discretized-Vapnik-Chervonenkis Dimension For Analyzing Complexity Of Real Function Classes', IEEE Transactions On Neural Networks And Learning Systems, vol. 23, no. 9, pp. 1461-1472.
In this paper, we introduce the discretized-Vapnik-Chervonenkis (VC) dimension for studying the complexity of a real function class, and then analyze properties of real function classes and neural networks. We first prove that a countable traversal set i
An, L., Gao, X., Yuan, Y. & Tao, D. 2012, 'Robust Lossless Data Hiding Using Clustering And Statistical Quantity Histogram', Neurocomputing, vol. 77, no. 1, pp. 1-11.
Lossless data hiding methods usually fail to recover the hidden messages completely when the watermarked images are attacked. Therefore, the robust lossless data hiding (RLDH), or the robust reversible watermarking technique, is urgently needed to effect
Gao, Y., Wang, M., Tao, D., Ji, R. & Dai, Q. 2012, '3-D Object Retrieval And Recognition With Hypergraph Analysis', IEEE Transactions On Image Processing, vol. 21, no. 9, pp. 4290-4303.
View-based 3-D object retrieval and recognition has become popular in practice, e. g., in computer aided design. It is difficult to precisely estimate the distance between two objects represented by multiple views. Thus, current view-based 3-D object ret
Li, Y., Geng, B., Yang, L., Xu, C. & Bian, W. 2012, 'Query Difficulty Estimation For Image Retrieval', Neurocomputing, vol. 95, no. NA, pp. 48-53.
Query difficulty estimation predicts the performance of the search result of the given query. It is a powerful tool for multimedia retrieval and receives increasing attention. It can guide the pseudo relevance feedback to rerank the image search results
Zhang, K., Mu, G., Yuan, Y., Gao, X. & Tao, D. 2012, 'Video Super-resolution With 3D Adaptive Normalized Convolution', Neurocomputing, vol. 94, no. NA, pp. 140-151.
The classic multi-image-based super-resolution (SR) methods typically take global motion pattern to produce one or multiple high-resolution (HR) versions from a set of low-resolution (LR) images. However, due to the influence of aliasing and noise, it is
Han, Y., Wu, F., Tao, D., Shao, J., Zhuang, Y. & Jiang, J. 2012, 'Sparse Unsupervised Dimensionality Reduction For Multiple View Data', IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 10, pp. 1485-1496.
Different kinds of high-dimensional visual features can be extracted from a single image. Images can thus be treated as multiple view data when taking each type of extracted high-dimensional visual feature as a particular understanding of images. In this
An, L., Gao, X., Li, X., Tao, D., Deng, C. & Li, J. 2012, 'Robust Reversible Watermarking Via Clustering And Enhanced Pixel-wise Masking', IEEE Transactions On Image Processing, vol. 21, no. 8, pp. 3598-3611.
Robust reversible watermarking (RRW) methods are popular in multimedia for protecting copyright, while preserving intactness of host images and providing robustness against unintentional attacks. However, conventional RRW methods are not readily applicab
Li, Y., Geng, B., Tao, D., Zha, Z., Yang, L. & Xu, C. 2012, 'Difficulty Guided Image Retrieval Using Linear Multiple Feature Embedding', IEEE Transactions On Multimedia, vol. 14, no. 6, pp. 1618-1630.
Existing image retrieval systems suffer from a performance variance for different queries. Severe performance variance may greatly degrade the effectiveness of the subsequent query-dependent ranking optimization algorithms, especially those that utilize
Liu, X., Song, M., Zhao, Q., Tao, D., Chen, C. & Bu, J. 2012, 'Attribute-restricted Latent Topic Model For Person Re-identification', Pattern Recognition, vol. 45, no. 12, pp. 4204-4213.
Searching for specific persons from surveillance videos captured by different cameras, known as person re-identification, is a key yet under-addressed challenge. Difficulties arise from the large variations of human appearance in different poses, and fro
Cheng, J., Tao, D., Liu, J., Wong, D.W., Tan, N., Wong, T.Y. & Saw, S. 2012, 'Peripapillary Atrophy Detection By Sparse Biologically Inspired Feature Manifold', IEEE Transactions on Medical Imaging, vol. 31, no. 12, pp. 2355-2365.
Peripapillary atrophy (PPA) is an atrophy of pre-existing retina tissue. Because of its association with eye diseases such as myopia and glaucoma, PPA is an important indicator for diagnosis of these diseases. Experienced ophthalmologists are able to det
Geng, B., Li, Y., Tao, D., Wang, M., Zha, Z. & Xu, C. 2012, 'Parallel lasso for large-scale video concept detection', IEEE Transactions On Multimedia, vol. 14, no. 1, pp. 55-65.
Existing video concept detectors are generally built upon the kernel based machine learning techniques, e.g., support vector machines, regularized least squares, and logistic regression, just to name a few. However, in order to build robust detectors, the learning process suffers from the scalability issues including the high-dimensional multi-modality visual features and the large-scale keyframe examples. In this paper, we propose parallel lasso (Plasso) by introducing the parallel distributed computation to significantly improve the scalability of lasso (the l1 regularized least squares). We apply the parallel incomplete Cholesky factorization to approximate the covariance statistics in the preprocess step, and the parallel primal-dual interior-point method with the Sherman-Morrison-Woodbury formula to optimize the model parameters. For a dataset with n samples in a d-dimensional space, compared with lasso, Plasso significantly reduces complexities from the original O(d3) for computational time and O(d2) for storage space to O(h2d/m) and O(hd/m) , respectively, if the system has m processors and the reduced dimension h is much smaller than the original dimension d
Yu, J., Cheng, J. & Tao, D. 2012, 'Interactive cartoon reusing by transfer learning', Signal Processing, vol. 92, no. 9, pp. 2147-2158.
Cartoon character retrieval is critical for cartoonists to effectively and efficiently make cartoons by reusing existing cartoon data. To successfully achieve these tasks, it is essential to extract visual features to comprehensively represent cartoon characters and accurately estimate dissimilarity between cartoon characters. In this paper, we define three visual features: Hausdorff contour feature (HCF), color histogram (CH) and motion feature (MF), to characterize the shape, color and motion structure information of a cartoon character. The HCF can be referred as intra-features, and the features of CH and MF can be regarded as inter-feature. However, due to the semantic gap, the cartoon retrieval by using these visual features still cannot achieve excellent performance. Since the labeling information has been proven effective to reduce the semantic gap, we introduce a labeling procedure called interactive cartoon labeling (ICL). The labeling information actually reflects users retrieval purpose. A new dimension reduction tool, termed sparse transfer learning (SPA-TL), is adopted to effectively and efficiently encode users search intention. In particular, SPA-TL exploits two pieces of knowledge data, i.e., the labeling knowledge contained in labeled data and the data distribution knowledge contained in all samples (labeled and unlabeled). The low-dimensional subspace is obtained by transferring the user feedback knowledge from labeled samples to unlabeled samples by preserving the sample distribution knowledge. Experimental evaluations in cartoon synthesis suggest the effectiveness of the visual features and SPA-TL.
He, L., Wang, D., Li, X., Tao, D., Gao, X. & Fei, G. 2012, 'Color fractal structure model for reduced-reference colorful image quality assessment', Lecture Notes in Computer Science, vol. 7664, pp. 401-408.
Developing reduced reference image quality assessment (RR-IQA) plays a vital role in dealing with the prediction of the visual quality of distorted images. However, most of existing methods fail to take color information into consideration, although the color distortion is significant for the increasing color images. To solve the aforementioned problem, this paper proposed a novel IQA method which focuses on the color distortion. In particular, we extract color features based on the model of color fractal structure. Then the color and structure features are mapped into visual quality using the support vector regression. Experimental results on the LIVE II database demonstrate that the proposed method has a good consistency with the human perception especially on images with color distortion.
Yu, J., Liu, D., Tao, D. & Seah, H. 2012, 'On Combining Multiple Features for Cartoon Character Retrieval and Clip Synthesis', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 5, pp. 1413-1427.
How do we retrieve cartoon characters accurately? Or how to synthesize new cartoon clips smoothly and efficiently from the cartoon library? Both questions are important for animators and cartoon enthusiasts to design and create new cartoons by utilizing existing cartoon materials. The first key issue to answer those questions is to find a proper representation that describes the cartoon character effectively. In this paper, we consider multiple features from different views, i.e., color histogram, Hausdorff edge feature, and skeleton feature, to represent cartoon characters with different colors, shapes, and gestures. Each visual feature reflects a unique characteristic of a cartoon character, and they are complementary to each other for retrieval and synthesis. However, how to combine the three visual features is the second key issue of our application. By simply concatenating them into a long vector, it will end up with the so-called curse of dimensionality, let alone their heterogeneity embedded in different visual feature spaces. Here, we introduce a semisupervised multiview subspace learning (semi-MSL) algorithm, to encode different features in a unified space. Specifically, under the patch alignment framework, semi- MSL uses the discriminative information from labeled cartoon characters in the construction of local patches where the manifold structure revealed by unlabeled cartoon characters is utilized to capture the geometric distribution. The experimental evaluations based on both cartoon character retrieval and clip synthesis demonstrate the effectiveness of the proposed method for cartoon application. Moreover, additional results of content-based image retrieval on benchmark data suggest the generality of semi-MSL for other applications.
Hong, Z., Mei, X. & Tao, D. 2012, 'Dual-Force Metric Learning for Robust Distracter-Resistant Tracker', Lecture Notes in Computer Science, vol. 7572, pp. 513-527.
In this paper, we propose a robust distracter-resistant tracking approach by learning a discriminative metric that adaptively learns the importance of features on-the-fly
Zhang, Z., Cheng, J., Li, J., Bian, W. & Tao, D. 2012, 'Segment-Based Features for Time Series Classification', Computer Journal, vol. 55, no. 9, pp. 1088-1102.
In this paper, we propose an approach termed segment-based features (SBFs) to classify time series. The approach is inspired by the success of the component- or part-based methods of object recognition in computer vision, in which a visual object is described as a number of characteristic parts and the relations among the parts. Utilizing this idea in the problem of time series classification, a time series is represented as a set of segments and the corresponding temporal relations. First, a number of interest segments are extracted by interest point detection with automatic scale selection. Then, a number of feature prototypes are collected by random sampling from the segment set, where each feature prototype may include single segment or multiple ordered segments. Subsequently, each time series is transformed to a standard feature vector, i.e. SBF, where each entry in the SBF is calculated as the maximum response (maximum similarity) of the corresponding feature prototype to the segment set of the time series.
Yu, J., Wang, M. & Tao, D. 2012, 'Semisupervised multiview distance metric learning for cartoon synthesis', IEEE Transactions On Image Processing, vol. 21, no. 11, pp. 4636-4648.
In image processing, cartoon character classification, retrieval, and synthesis are critical, so that cartoonists can effectively and efficiently make cartoons by reusing existing cartoon data. To successfully achieve these tasks, it is essential to extract visual features that comprehensively represent cartoon characters and to construct an accurate distance metric to precisely measure the dissimilarities between cartoon characters. In this paper, we introduce three visual features, color histogram, shape context, and skeleton, to characterize the color, shape, and action, respectively, of a cartoon character. These three features are complementary to each other, and each feature set is regarded as a single view. However, it is improper to concatenate these three features into a long vector, because they have different physical properties, and simply concatenating them into a high-dimensional feature vector will suffer from the so-called curse of dimensionality. Hence, we propose a semisupervised multiview distance metric learning (SSM-DML). SSM-DML learns the multiview distance metrics from multiple feature sets and from the labels of unlabeled cartoon characters simultaneously, under the umbrella of graph-based semisupervised learning. SSM-DML discovers complementary characteristics of different feature sets through an alternating optimization-based iterative algorithm. Therefore, SSM-DML can simultaneously accomplish cartoon character classification and dissimilarity measurement. On the basis of SSM-DML, we develop a novel system that composes the modules of multiview cartoon character classification, multiview graph-based cartoon synthesis, and multiview retrieval-based cartoon synthesis. Experimental evaluations based on the three modules suggest the effectiveness of SSM-DML in cartoon applications.
Zhang, K., Gao, X., Tao, D. & Li, X. 2012, 'Single image super-resolution with non-local means and steering kernel regression', IEEE Transactions On Image Processing, vol. 21, no. 11, pp. 4544-4556.
Image super-resolution (SR) reconstruction is essentially an ill-posed problem, so it is important to design an effective prior. For this purpose, we propose a novel image SR method by learning both non-local and local regularization priors from a given low-resolution image. The non-local prior takes advantage of the redundancy of similar patches in natural images, while the local prior assumes that a target pixel can be estimated by a weighted average of its neighbors. Based on the above considerations, we utilize the non-local means ?lter to learn a non-local prior and the steering kernel regression to learn a local prior. By assembling the two complementary regularization terms, we propose a maximum a posteriori probability framework for SR recovery. Thorough experimental results suggest that the proposed SR method can reconstruct higher quality results both quantitatively and perceptually
Wang, M., Li, H., Tao, D., Lu, K. & Wu, X. 2012, 'Multimodal graph-based reranking for web image search', IEEE Transactions On Image Processing, vol. 21, no. 11, pp. 4649-4661.
This paper introduces a web image search reranking approach that explores multiple modalities in a graphbased learning scheme. Different from the conventional methods that usually adopt a single modality or integrate multiple modalities into a long feature vector, our approach can effectively integrate the learning of relevance scores, weights of modalities, and the distance metric and its scaling for each modality into a uni?ed scheme. In this way, the effects of different modalities can be adaptively modulated and better reranking performance can be achieved. We conduct experiments on a large dataset that contains more than 1000 queries and 1 million images to evaluate our approach. Experimental results demonstrate that the proposed reranking approach is more robust than using each individual modality, and it also performs better than many existing methods.
Geng, B., Tao, D., Xu, C., Yang, L. & Hua, X. 2012, 'Ensemble Manifold Regularization', IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 34, no. 6, pp. 1227-1233.
We propose an automatic approximation of the intrinsic manifold for general semi-supervised learning (SSL) problems. Unfortunately, it is not trivial to define an optimization function to obtain optimal hyperparameters. Usually, cross validation is applied, but it does not necessarily scale up. Other problems derive from the suboptimality incurred by discrete grid search and the overfitting. Therefore, we develop an ensemble manifold regularization (EMR) framework to approximate the intrinsic manifold by combining several initial guesses. Algorithmically, we designed EMR carefully so it 1) learns both the composite manifold and the semi-supervised learner jointly, 2) is fully automatic for learning the intrinsic manifold hyperparameters implicitly, 3) is conditionally optimal for intrinsic manifold approximation under a mild and reasonable assumption, and 4) is scalable for a large number of candidate manifold hyperparameters, from both time and space perspectives. Furthermore, we prove the convergence property of EMR to the deterministic matrix at rate root-n. Extensive experiments over both synthetic and real data sets demonstrate the effectiveness of the proposed framework.
Su, Y., Gao, X., Li, X. & Tao, D. 2012, 'Multivariate multilinear regression', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 6, pp. 1560-1573.
Conventional regression methods, such as multivariate linear regression (MLR) and its extension principal component regression (PCR), deal well with the situations that the data are of the form of low-dimensional vector. When the dimension grows higher, it leads to the under sample problem (USP): the dimensionality of the feature space is much higher than the number of training samples. However, little attention has been paid to such a problem. This paper first adopts an in-depth investigation to the USP in PCR, which answers three questions: 1) Why is USP produced? 2) What is the condition for USP, and 3) How is the influence of USP on regression. With the help of the above analysis, the principal components selection problem of PCR is presented. Subsequently, to address the problem of PCR, a multivariate multilinear regression (MMR) model is proposed which gives a substitutive solution to MLR, under the condition of multilinear objects. The basic idea of MMR is to transfer the multilinear structure of objects into the regression coefficients as a constraint. As a result, the regression problem is reduced to find two low-dimensional coefficients so that the principal components selection problem is avoided. Moreover, the sample size needed for solving MMR is greatly reduced so that USP is alleviated. As there is no closed-form solution for MMR, an alternative projection procedure is designed to obtain the regression matrices. For the sake of completeness, the analysis of computational cost and the proof of convergence are studied subsequently. Furthermore, MMR is applied to model the fitting procedure in the active appearance model (AAM). Experiments are conducted on both the carefully designed synthesizing data set and AAM fitting databases verified the theoretical analysis.
Yu, J., Cheng, J., Wang, J. & Tao, D. 2012, 'Transductive Cartoon Retrieval by Multiple Hypergraph Learning', Lecture Notes in Computer Science, vol. 7665, pp. 269-276.
Cartoon characters retrieval frequently suffers from the distance estimation problem. In this paper, a multiple hypergraph fusion based approach is presented to solve this problem. We build multiple hypergraphs on cartoon characters based on their features. In these hypergraphs, each vertex is a character, and an edge links to multiple vertices. In this way, the distance estimation between characters is avoided and the high-order relationship among characters can be explored. The experiments of retrieval are conducted on cartoon datasets, and the results demonstrate that the proposed approach can achieve better performance than state-of-the-arts methods.
Fei, G., Tao, D., Li, X., Gao, X. & He, L. 2012, 'Local Structure Divergence Index for Image Quality Assessment', Lecture Notes in Computer Science, vol. 7667, pp. 337-344.
Image quality assessment (IQA) algorithms are important for image-processing systems. And structure information plays a significant role in the development of IQA metrics. In contrast to existing structure driven IQA algorithms that measure the structure information using the normalized image or gradient amplitudes, we present a new Local Structure Divergence (LSD) index based on the local structures contained in an image. In particular, we exploit the steering kernels to describe local structures. Afterward, we estimate the quality of a given image by calculating the symmetric Kullback-Leibler divergence (SKLD) between kernels of the reference image and the distorted image. Experimental results on the LIVE database II show that LSD performs consistently with the human perception with a high confidence, and outperforms representative structure driven IQA metrics across various distortions
Gao, Y., Wang, M., Tao, D., Ji, R. & Dai, Q. 2012, '3D Object Retrieval and Recognition With Hypergraph Analysis', IEEE Transactions On Image Processing, vol. 21, no. 9, pp. 4290-4303.
View-based 3-D object retrieval and recognition has become popular in practice, e.g., in computer aided design. It is difficult to precisely estimate the distance between two objects represented by multiple views. Thus, current view-based 3-D object retrieval and recognition methods may not perform well. In this paper, we propose a hypergraph analysis approach to address this problem by avoiding the estimation of the distance between objects. In particular, we construct multiple hypergraphs for a set of 3-D objects based on their 2-D views. In these hypergraphs, each vertex is an object, and each edge is a cluster of views. Therefore, an edge connects multiple vertices. We define the weight of each edge based on the similarities between any two views within the cluster. Retrieval and recognition are performed based on the hypergraphs. Therefore, our method can explore the higher order relationship among objects and does not use the distance between objects. We conduct experiments on the National Taiwan University 3-D model dataset and the ETH 3-D object collection. Experimental results demonstrate the effectiveness of the proposed method by comparing with the state-of-the-art methods.
Cheng, J.L., Qiao, M., Bian, W. & Tao, D. 2011, '3D Human Posture Segmentation By Spectral Clustering With Surface Normal Constraint', Signal Processing, vol. 91, no. 9, pp. 2204-2212.
In this paper, we propose a new algorithm for partitioning human posture represented by 3D point clouds sampled from the surface of human body. The algorithm is formed as a constrained extension of the recently developed segmentation method, spectral clu
Guan, N., Tao, D., Luo, Z. & Yuan, B. 2011, 'Manifold Regularized Discriminative Non-negative Matrix Factorization With Fast Gradient Descent', IEEE Transactions On Image Processing, vol. 20, no. 7, pp. 2030-2048.
AbstractNonnegative matrix factorization (NMF) has become a popular data-representation method and has been widely used in image processing and pattern-recognition problems. This is because the learned bases can be interpreted as a natural parts-based representation of data and this interpretation is consistent with the psychological intuition of combining parts to form a whole. For practical classification tasks, however, NMF ignores both the local geometry of data and the discriminative information of different classes. In addition, existing research results show that the learned basis is unnecessarily parts-based because there is neither explicit nor implicit constraint to ensure the representation parts-based. In this paper, we introduce the manifold regularization and the margin maximization to NMF and obtain the manifold regularized discriminative NMF (MD-NMF) to overcome the aforementioned problems. The multiplicative update rule (MUR) can be applied to optimizing MD-NMF, but it converges slowly. In this paper, we propose a fast gradient descent (FGD) to optimize MD-NMF. FGD contains a Newton method that searches the optimal step length, and thus, FGD converges much faster than MUR. In addition,FGD includes MUR as a special case and can be applied to optimizing NMF and its variants. For a problem with 165 samples in .., FGD converges in 28 s, while MUR requires 282 s. We also apply FGD in a variant of MD-NMF and experimental results confirm its efficiency. Experimental results on several face image datasets suggest the effectiveness of MD-NMF.
Sha, T., Song, M., Bu, J., Chen, C. & Tao, D. 2011, 'Feature Level Analysis For 3D Facial Expression Recognition', Neurocomputing, vol. 74, no. 12-13, pp. 2135-2141.
3D facial expression recognition has great potential in human computer interaction and intelligent robot systems. In this paper, we propose a two-step approach which combines both the feature selection and the feature fusion techniques to choose more com
Bian, W. & Tao, D. 2011, 'Max-Min Distance Analysis By Using Sequential SDP Relaxation For Dimension Reduction', IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 33, no. 5, pp. 1037-1050.
Abstract - We propose a new criterion for discriminative dimension reduction, max-min distance analysis (MMDA). Given a data set with C classes, represented by homoscedastic Gaussians, MMDA maximizes the minimum pairwise distance of these C classes in the selected low-dimensional subspace. Thus, unlike Fishers linear discriminant analysis (FLDA) and other popular discriminative dimension reduction criteria, MMDA duly considers the separation of all class pairs. To deal with general case of data distribution, we also extend MMDA to kernel MMDA (KMMDA). Dimension reduction via MMDA/KMMDA leads to a nonsmooth max-min optimization problem with orthonormal constraints. We develop a sequential convex relaxation algorithm to solve it approximately. To evaluate the effectiveness of the proposed criterion and the corresponding algorithm, we conduct classification and data visualization experiments on both synthetic data and real data sets. Experimental results demonstrate the effectiveness of MMDA/KMMDA associated with the proposed optimization algorithm.
Xie, B., Wang, M. & Tao, D. 2011, 'Toward The Optimization Of Normalized Graph Laplacian', Ieee Transactions On Neural Networks, vol. 22, no. 4, pp. 660-666.
AbstractNormalized graph Laplacian has been widely used in many practical machine learning algorithms, e.g., spectral clustering and semisupervised learning. However, all of them use the Euclidean distance to construct the graph Laplacian, which does not necessarily reflect the inherent distribution of the data. In this brief, we propose a method to directly optimize the normalized graph Laplacian by using pairwise constraints. The learned graph is consistent with equivalence and nonequivalence pairwise relationships, and thus it can better represent similarity between samples. Meanwhile, our approach, unlike metric learning, automatically determines the scale factor during the optimization. The learned normalized Laplacian matrix can be directly applied in spectral clustering and semisupervised learning algorithms. Comprehensive experiments demonstrate the effectiveness of the proposed approach.
Gao, X., Wang, B., Tao, D. & Li, X. 2011, 'A Relay Level Set Method For Automatic Image Segmentation', Ieee Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 41, no. 2, pp. 518-525.
AbstractThis paper presents a new image segmentation method that applies an edge-based level set method in a relay fashion. The proposed method segments an image in a series of nested subregions that are automatically created by shrinking the stabilized curves in their previous subregions. The final result is obtained by combining all boundaries detected in these subregions. The proposed method has the following three advantages: 1) It can be automatically executed without humancomputer interactions; 2) it applies the edge-based level set method with relay fashion to detect all boundaries; and 3) it automatically obtains a full segmentation without specifying the number of relays in advance. The comparison experiments illustrate that the proposed method performs better than the representative level set methods, and it can obtain similar or better results compared with other popular segmentation algorithms.
Zhang, L., Zhang, L., Tao, D. & Huang, X. 2011, 'A Multifeature Tensor For Remote-Sensing Target Recognition', IEEE Geoscience and Remote Sensing Letters, vol. 8, no. 2, pp. 374-378.
In remote-sensing image target recognition, the target or background object is usually transformed to a feature vector, such as a spectral feature vector. However, this kind of vector represents only one pixel of a remote-sensing image that considers the
Liu, D., Chen, Q., Yu, J., Gu, H., Tao, D. & Seah, H. 2011, 'Stroke Correspondence Construction Using Manifold Learning', Computer Graphics Forum, vol. 30, no. 8, pp. 2194-2207.
Stroke correspondence construction is a precondition for generating inbetween frames from a set of key frames. In our case, each stroke in a key frame is a vector represented as a Disk B-Spline Curve (DBSC) which is a flexible and compact vector format. However, it is not easy to construct correspondences between multiple DBSC strokes effectively because of the following points: (1) with the use of shape descriptors, the dimensionality of the feature space is high; (2) the number of strokes in different key frames is usually large and different from each other and (3) the length of corresponding strokes can be very different. The first point makes matching difficult. The other two points imply many to many and part to whole correspondences between strokes. To solve these problems, this paper presents a DBSC stroke correspondence construction approach, which introduces a manifold learning technique to the matching process. Moreover, in order to handle the mapping between unequal numbers of strokes with different lengths, a stroke reconstruction algorithm is developed to convert the many to many and part to whole stroke correspondences to one to one compound stroke correspondence.
Gao, X., Wang, X., Li, X. & Tao, D. 2011, 'Transfer latent variable model based on divergence analysis', Pattern Recognition, vol. 44, no. 10-11, pp. 2358-2366.
Latent variable models are powerful dimensionality reduction approaches in machine learning and pattern recognition. However, this kind of methods only works well under a necessary and strict assumption that the training samples and testing samples are independent and identically distributed. When the samples come from different domains, the distribution of the testing dataset will not be identical with the training dataset. Therefore, the performance of latent variable models will be degraded for the reason that the parameters of the training model do not suit for the testing dataset. This case limits the generalization and application of the traditional latent variable models. To handle this issue, a transfer learning framework for latent variable model is proposed which can utilize the distance (or divergence) of the two datasets to modify the parameters of the obtained latent variable model. So we do not need to rebuild the model and only adjust the parameters according to the divergence, which will adopt different datasets. Experimental results on several real datasets demonstrate the advantages of the proposed framework. (C) 2010 Elsevier Ltd. All rights reserved.
Gao, X., Niu, Z., Tao, D. & Li, X. 2011, 'Non-Goal Scene Analysis for Soccer Video', Neurocomputing, vol. 74, no. 4, pp. 540-548.
The broadcast soccer video is usually recorded by one main camera, which is constantly gazing somewhere of playfield where a highlight event is happening. So the camera parameters and their variety have close relationship with semantic information of soccer video, and much interest has been caught in camera calibration for soccer video. The previous calibration methods either deal with goal scene, or have strict calibration conditions and high complexity. So, it does not properly handle the non-goal scene such as midfield or center-forward scene. In this paper, based on a new soccer field model, a field symbol extraction algorithm is proposed to extract the calibration information. Then a two-stage calibration approach is developed which can calibrate camera not only for goal scene but also for non-goal scene. The preliminary experimental results demonstrate its robustness and accuracy.
Wang, Y., Tao, D., Gao, X., Li, X. & Wang, B. 2011, 'Mammographic Mass Segmentation: Embedding Multiple Features In Vector-Valued Level Set In Ambiguous Regions', Pattern Recognition, vol. 44, no. 9, pp. 1903-1915.
Mammographic mass segmentation plays an important role in computer-aided diagnosis systems. It is very challenging because masses are always of low contrast with ambiguous margins, connected with the normal tissues, and of various scales and complex shap
Gao, X., Chen, J.F., Tao, D. & Li, X. 2011, 'Multi-Sensor Centralized Fusion Without Measurement Noise Covariance By Variational Bayesian Approximation', IEEE Transactions On Aerospace And Electronic Systems, vol. 47, no. 1, pp. 718-727.
The work presented here solves the multi-sensor centralized fusion problem in the linear Gaussian model without the measurement noise variance. We generalize the variational Bayesian approximation based adaptive Kalman filter (VB_AKF) from the single sen
Yu, J., Tao, D., Wang, M. & Cheng, J. 2011, 'Semi-automatic cartoon generation by motion planning', Multimedia Systems Multimedia Systems, vol. 17, no. 5, pp. 409-419.
To reduce tedious work in cartoon animation, some computer-assisted systems including automatic Inbetweening and cartoon reusing systems have been proposed. In existing automatic Inbetweening systems, accurate correspondence construction, which is a prer
Yu, J., Liu, D., Tao, D. & Seah, H. 2011, 'Complex Object Correspondence Construction in Two-Dimensional Animation', IEEE Transactions On Image Processing, vol. 20, no. 11, pp. 3257-3269.
Correspondence construction of objects in key frames is the precondition for inbetweening and coloring in 2-D computer-assisted animation production. Since each frame of an animation consists of multiple layers, objects are complex in terms of shape and
Gao, X., Fu, R., Li, X., Tao, D., Zhang, B. & Yang, H. 2011, 'Aurora image segmentation by combining patch and texture thresholding', Computer Vision and Image Understanding, vol. 115, no. 3, pp. 390-402.
The proportion of aurora to the field-of-view in temporal series of all-sky images is an important index to investigate the evolvement of aurora. To obtain such an index, a crucial phase is to segment the aurora from the background of sky. A new aurora s
Gao, X., Wang, X., Tao, D. & Li, X. 2011, 'Supervised Gaussian Process Latent Variable Model for Dimensionality Reduction', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 41, no. 2, pp. 425-434.
The Gaussian process latent variable model (GP-LVM) has been identified to be an effective probabilistic approach for dimensionality reduction because it can obtain a low-dimensional manifold of a data set in an unsupervised fashion. Consequently, the GP
Huang, K., Tao, D., Yuan, Y., Li, X. & Tan, T. 2011, 'Biologically Inspired Features for Scene Classification in Video Surveillance', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 41, no. 1, pp. 307-313.
Inspired by human visual cognition mechanism, this paper first presents a scene classification method based on an improved standard model feature. Compared with state-of-the-art efforts in scene classification, the newly proposed method is more robust, m
Zhou, T., Tao, D. & Wu, X. 2011, 'Manifold elastic net: a unified framework for sparse dimension reduction', Data Mining and Knowledge Discovery, vol. 22, no. 3, pp. 340-371.
It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square pr
He, L., Gao, X., Lu, W., Li, X. & Tao, D. 2011, 'Image quality assessment based on S-CIELAB model', Signal, Image and Video Processing, vol. 5, no. 3, pp. 283-290.
This paper proposes a new image quality assessment framework which is based on color perceptual model. By analyzing the shortages of the existing image quality assessment methods and combining the color perceptual model, the general framework of color im
Zhang, K., Gao, X., Li, X. & Tao, D. 2011, 'Partially Supervised Neighbor Embedding for Example-Based Image Super-Resolution', IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 2, pp. 230-239.
Neighbor embedding algorithm has been widely used in example-based super-resolution reconstruction from a single frame, which makes the assumption that neighbor patches embedded are contained in a single manifold. However, it is not always true for compl
Wang, X., Li, Z. & Tao, D. 2011, 'Subspaces Indexing Model On Grassmann Manifold For Image Search', IEEE Transactions On Image Processing, vol. 20, no. 9, pp. 2627-2635.
Conventional linear subspace learning methods like principal component analysis (PCA), linear discriminant analysis (LDA) derive subspaces from the whole data set. These approaches have limitations in the sense that they are linear while the data distrib
Huang, Q., Tao, D., Li, X., Jin, L. & Wei, G. 2011, 'Exploiting Local Coherent Patterns For Unsupervised Feature Ranking', IEEE Transactions On Systems Man And Cybernetics Part B-cybernetics, vol. 41, no. 6, pp. 1471-1482.
Prior to pattern recognition, feature selection is often used to identify relevant features and discard irrelevant ones for obtaining improved analysis results. In this paper, we aim to develop an unsupervised feature ranking algorithm that evaluates fea
Huang, Y., Huang, K., Tao, D., Tan, T. & Li, X. 2011, 'Enhanced Biologically Inspired Model For Object Recognition', IEEE Transactions On Systems Man And Cybernetics Part B-cybernetics, vol. 41, no. 6, pp. 1668-1680.
The biologically inspired model (BIM) proposed by Serre et al. presents a promising solution to object categorization. It emulates the process of object recognition in primates' visual cortex by constructing a set of scale- and position-tolerant features
Geng, B., Tao, D. & Xu, C. 2011, 'DAML: Domain Adaptation Metric Learning', IEEE Transactions On Image Processing, vol. 20, no. 10, pp. 2980-2989.
The state-of-the-art metric-learning algorithms cannot perform well for domain adaptation settings, such as cross-domain face recognition, image annotation, etc., because labeled data in the source domain and unlabeled ones in the target domain are drawn
He, L., Si, S., Gao, X., Tao, D. & Li, X. 2011, 'A Novel Metric Based On MCA For Image Quality', International Journal Of Wavelets Multiresolution And Information Processing, vol. 9, no. 5, pp. 743-757.
Considering that the Human Visual System (HVS) has different perceptual characteristics for different morphological components, a novel image quality metric is proposed by incorporating Morphological Component Analysis (MCA) and HVS, which is capable of assessing the image with different kinds of distortion. Firstly, reference and distorted images are decomposed into linearly combined texture and cartoon components by MCA respectively. Then these components are turned into perceptual features by Just Noticeable Difference (JND) which integrates masking features, luminance adaptation and Contrast Sensitive Function (CSF). Finally, the discrimination between reference and distorted images perceptual features is quantified using a pooling strategy before the final image quality is obtained. Experimental results demonstrate that the performance of the proposed prevails over some existing methods on LIVE database II
Si, S., Liu, W., Tao, D. & Chan, K. 2011, 'Distribution Calibration In Riemannian Symmetric Space', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 41, no. 4, pp. 921-930.
Distribution calibration plays an important role in cross-domain learning. However, existing distribution distance metrics are not geodesic; therefore, they cannot measure the intrinsic distance between two distributions. In this paper, we calibrate two
Guan, N., Tao, D., Luo, Z. & Yuan, B. 2011, 'Non-negative Patch Alignment Framework', IEEE Transactions on Neural Networks, vol. 22, no. 8, pp. 1218-1230.
In this paper, we present a non-negative patch alignment framework (NPAF) to unify popular non-negative matrix factorization (NMF) related dimension reduction algorithms. It offers a new viewpoint to better understand the common property of different NMF
Gao, X., An, L., Yuan, Y., Tao, D. & Li, X. 2011, 'Lossless Data Embedding Using Generalized Statistical Quantity Histogram', IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 8, pp. 1061-1070.
Histogram-based lossless data embedding (LDE) has been recognized as an effective and efficient way for copyright protection of multimedia. Recently, a LDE method using the statistical quantity histogram has achieved good performance, which utilizes the
Zhang, L., Mei, T., Liu, Y., Tao, D. & Zhou, H. 2011, 'Visual Search Reranking Via Adaptive Particle Swarm Optimization', Pattern Recognition, vol. 44, no. 8, pp. 1811-1820.
Visual search reranking involves an optimization process that uses visual content to recover the 'genuine' ranking list from the helpful but noisy one generated by textual search. This paper presents an evolutionary approach, called Adaptive Particle Swa
Tian, X. & Tao, D. 2011, 'Visual Reranking: From Objectives To Strategies', I E E E MultiMedia Magazine, vol. 18, no. 3, pp. 12-20.
A study of the development of visual reranking methods can facilitate an understanding of the field, offer a clearer view of what has been achieved, and help overcome emerging obstacles in this area.
Gao, X., wang, Q., li, X., Tao, D. & Zhang, K. 2011, 'Zernike moment based image super resolution', IEEE Transactions On Image Processing, vol. 20, no. 10, pp. 2738-2747.
Multiframe super-resolution (SR) reconstruction aims to produce a high-resolution (HR) image using a set of low-resolution (LR) images. In the process of reconstruction, fuzzy registration usually plays a critical role. It mainly focuses on the correlation between pixels of the candidate and the reference images to reconstruct each pixel by averaging all its neighboring pixels. Therefore, the fuzzy-registration-based SR performs well and has been widely applied in practice. However, if some objects appear or disappear among LR images or different angle rotations exist among them, the correlation between corresponding pixels becomes weak. Thus, it will be difficult to use LR images effectively in the process of SR reconstruction. Moreover, if the LR images are noised, the reconstruction quality will be affected seriously. To address or at least reduce these problems, this paper presents a novel SR method based on the Zernike moment, to make the most of possible details in each LR image for high-quality SR reconstruction. Experimental results show that the proposed method outperforms existing methods in terms of robustness and visual effects.
Xie, B., Mu, Y., Tao, D. & Huang, K. 2011, 'm-SNE: Multiview Stochastic Neighbor Embedding', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 41, no. 4, pp. 1088-1096.
Dimension reduction has been widely used in real-world applications such as image retrieval and document classification. In many scenarios, different features (or multiview data) can be obtained, and how to duly utilize them is a challenge. It is not appropriate for the conventional concatenating strategy to arrange features of different views into a long vector. That is because each view has its specific statistical property and physical interpretation. Even worse, the performance of the concatenating strategy will deteriorate if some views are corrupted by noise. In this paper, we propose a multiview stochastic neighbor embedding (m-SNE) that systematically integrates heterogeneous features into a unified representation for subsequent processing based on a probabilistic framework. Compared with conventional strategies, our approach can automatically learn a combination coefficient for each view adapted to its contribution to the data embedding. This combination coefficient plays an important role in utilizing the complementary information in multiview data. Also, our algorithm for learning the combination coefficient converges at a rate of O(1/k2), which is the optimal rate for smooth problems. Experiments on synthetic and real data sets suggest the effectiveness and robustness of m-SNE for data visualization, image retrieval, object categorization, and scene recognition.
Wang, X., Li, Z. & Tao, D. 2011, 'Erratum: Subspaces indexing model on grassmann manifold for image search (IEEE Transactions on Image Processing (2011) 20: 9 (2627-2635))', IEEE Transactions on Image Processing, vol. 20, no. 12, p. 3658.
Yang, Y., Zhuang, Y., Tao, D., Xu, D., Yu, J. & Luo, J. 2010, 'Recognizing Cartoon Image Gestures for Retrieval and Interactive Cartoon Clip Synthesis', IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 12, pp. 1745-1756.
In this paper, we propose a new method to recognize gestures of cartoon images with two practical applications, i.e., content-based cartoon image retrieval and interactive cartoon clip synthesis. Upon analyzing the unique properties of four types of features including global color histogram, local color histogram (LCH), edge feature (EF), and motion direction feature (MDF), we propose to employ different features for different purposes and in various phases. We use EF to define a graph and then refine its local structure by LCH. Based on this graph, we adopt a transductive learning algorithm to construct local patches for each cartoon image. A spectral method is then proposed to optimize the local structure of each patch and then align these patches globally. MDF is fused with EF and LCH and a cartoon gesture space is constructed for cartoon image gesture recognition. We apply the proposed method to content-based cartoon image retrieval and interactive cartoon clip synthesis. The experiments demonstrate the effectiveness of our method.
Song, M., Tao, D., Chen, C., Li, X. & Chen, C. 2010, 'Color to Gray: Visual Cue Preservation', IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 32, no. 9, pp. 1537-1552.
Both commercial and scientific applications often need to transform color images into gray-scale images, e. g., to reduce the publication cost in printing color images or to help color blind people see visual cues of color images. However, conventional color to gray algorithms are not ready for practical applications because they encounter the following problems: 1) Visual cues are not well defined so it is unclear how to preserve important cues in the transformed gray-scale images; 2) some algorithms have extremely high time cost for computation; and 3) some require human-computer interactions to have a reasonable transformation. To solve or at least reduce these problems, we propose a new algorithm based on a probabilistic graphical model with the assumption that the image is defined over a Markov random field. Thus, color to gray procedure can be regarded as a labeling process to preserve the newly well-defined visual cues of a color image in the transformed gray-scale image. Visual cues are measurements that can be extracted from a color image by a perceiver. They indicate the state of some properties of the image that the perceiver is interested in perceiving. Different people may perceive different cues from the same color image and three cues are defined in this paper, namely, color spatial consistency, image structure information, and color channel perception priority. We cast color to gray as a visual cue preservation procedure based on a probabilistic graphical model and optimize the model based on an integral minimization problem. We apply the new algorithm to both natural color images and artificial pictures, and demonstrate that the proposed approach outperforms representative conventional algorithms in terms of effectiveness and efficiency. In addition, it requires no human-computer interactions.
Deng, C., Gao, X., Li, X. & Tao, D. 2010, 'Local histogram based geometric invariant image watermarking', Signal Processing, vol. 90, no. 12, pp. 3256-3264.
Compared with other existing methods, the feature point-based image watermarking schemes can resist to global geometric attacks and local geometric attacks, especially cropping and random bending attacks (RBAs), by binding watermark synchronization with salient image characteristics. However, the watermark detection rate remains low in the current feature point-based watermarking schemes. The main reason is that both of feature point extraction and watermark embedding are more or less related to the pixel position, which is seriously distorted by the interpolation error and the shift problem during geometric attacks. In view of these facts, this paper proposes a geometrically robust image watermarking scheme based on local histogram. Our scheme mainly consists of three components: (1) feature points extraction and local circular regions (LCRs) construction are conducted by using Harris-Laplace detector; (2) a mechanism of grapy theoretical clustering-based feature selection is used to choose a set of non-overlapped LCRs, then geometrically invariant LCRs are completely formed through dominant orientation normalization; and (3) the histogram and mean statistically independent of the pixel position are calculated over the selected LCRs and utilized to embed watermarks. Experimental results demonstrate that the proposed scheme can provide sufficient robustness against geometric attacks as well as common image processing operations.
Wang, X., Tao, D. & Li, Z. 2010, 'Entropy controlled Laplacian regularization for least square regression', Signal Processing, vol. 90, no. 6, pp. 2043-2049.
Least square regression (LSR) is popular in pattern classification. Compared against other matrix factorization based methods, it is simple yet efficient. However LSR, ignores unlabeled samples in the training stage, so the regression error could be large when the labeled samples are insufficient. To solve this problem, the Laplacian regularization can be used to penalize LSR. Extensive theoretical and experimental results have confirmed the validity of Laplacian regularized least square (LapRLS). However, multiple hyper-parameters have been introduced to estimate the intrinsic manifold induced by the regularization, and thus the time consuming cross-validation should be applied to tune these parameters. To alleviate this problem, we assume the intrinsic manifold is a linear combination of a given set of known manifolds. By further assuming the priors of the given manifolds are equivalent, we introduce the entropy maximization penalty to automatically learn the linear combination coefficients. The entropy maximization trades the smoothness off the complexity. Therefore, the proposed model enjoys the following advantages: (1) it is able to incorporate both labeled and unlabeled data into training process, (2) it is able to learn the manifold hyper-parameters automatically, and (3) it approximates the true probability distribution with respect to prescribed test data. To test the classification performance of our proposed model, we apply the model on three well-known human face datasets, i.e. FERET, ORL, and YALE. Experimental results on these three face datasets suggest the effectiveness and the efficiency of the new model compared against the traditional LSR and the Laplacian regularized least squares.
Li, X., Hu, Y., Gao, X., Tao, D. & Ning, B. 2010, 'A multi-frame image super-resolution method', Signal Processing, vol. 90, no. 2, pp. 405-414.
Multi-frame image super-resolution (SR) aims to utilize information from a set of low-resolution (LR) images to compose a high-resolution (HR) one. As it is desirable or essential in many real applications, recent years have witnessed the growing interest in the problem of multi-frame SR reconstruction. This set of algorithms commonly utilizes a linear observation model to construct the relationship between the recorded LR images to the unknown reconstructed HR image estimates. Recently, regularization-based schemes have been demonstrated to be effective because SR reconstruction is actually an ill-posed problem. Working within this promising framework, this paper first proposes two new regularization items, termed as locally adaptive bilateral total variation and consistency of gradients, to keep edges and flat regions, which are implicitly described in LR images, sharp and smooth, respectively. Thereafter, the combination of the proposed regularization items is superior to existing regularization items because it considers both edges and flat regions while existing ones consider only edges. Thorough experimental results show the effectiveness of the new algorithm for SR reconstruction.
Wen, L., Gao, X., Li, X., Tao, D. & Li, J. 2010, 'Incremental pairwise discriminant analysis based visual tracking', Neurocomputing, vol. 74, no. 1-3, pp. 428-438.
The distinguishment between the object appearance and the background is the useful cues available for visual tracking in which the discriminant analysis is widely applied However due to the diversity of the background observation there are not adequate negative samples from the background which usually lead the discriminant method to tracking failure Thus a natural solution is to construct an object-background pair constrained by the spatial structure which could not only reduce the neg-sample number but also make full use of the background information surrounding the object However this Idea is threatened by the variant of both the object appearance and the spatial-constrained background observation especially when the background shifts as the moving of the object Thus an Incremental pairwise discriminant subspace is constructed in this paper to delineate the variant of the distinguishment In order to maintain the correct the ability of correctly describing the subspace we enforce two novel constraints for the optimal adaptation (1) pairwise data discriminant constraint and (2) subspace smoothness The experimental results demonstrate that the proposed approach can alleviate adaptation drift and achieve better visual tracking results for a large variety of nonstationary scenes
Wang, X., Gao, X., Yuan, Y., Tao, D. & Li, J. 2010, 'Semi-supervised Gaussian process latent variable model with pairwise constraints', Neurocomputing, vol. 73, no. 10-12, pp. 2186-2195.
In machine learning. Gaussian process latent variable model (GP-LVM) has been extensively applied in the field of unsupervised dimensionality reduction. When some supervised information, e.g., pairwise constraints or labels of the data, is available, the traditional GP-LVM cannot directly utilize such supervised information to improve the performance of dimensionality reduction. In this case, it is necessary to modify the traditional GP-LVM to make it capable of handing the supervised or semi-supervised learning tasks. For this purpose, we propose a new semi-supervised GP-LVM framework under the pairwise constraints. Through transferring the pairwise constraints in the observed space to the latent space. the constrained priori information on the latent variables can be obtained. Under this constrained priori, the latent variables are optimized by the maximum a posteriori (MAP) algorithm. The effectiveness of the proposed algorithm is demonstrated with experiments on a variety of data sets.
Wen, J., Gao, X., Yuan, Y., Tao, D. & Li, J. 2010, 'Incremental tensor biased discriminant analysis: A new color-based visual tracking method', Neurocomputing, vol. 73, no. 4-6, pp. 827-839.
Most existing color-based tracking algorithms utilize the statistical color information of the object as the tracking clues, without maintaining the spatial structure within a single chromatic image. Recently, the researches on the multilinear algebra provide the possibility to hold the spatial structural relationship in a representation of the image ensembles. In this paper, a third-order color tensor is constructed to represent the object to be tracked. Considering the influence of the environment changing on the tracking, the biased discriminant analysis (BDA) is extended to the tensor biased discriminant analysis (TBDA) for distinguishing the object from the background. At the same time, an incremental scheme for the TBDA is developed for the tensor biased discriminant subspace online learning, which can be used to adapt to the appearance variant of both the object and background. The experimental results show that the proposed method can track objects precisely undergoing large pose, scale and lighting changes, as well as partial occlusion.
Xiao, B., Gao, X., Tao, D., Yuan, Y. & Li, J. 2010, 'Photo-sketch synthesis and recognition based on subspace learning', Neurocomputing, vol. 73, no. 4-6, pp. 840-852.
This paper aims to reducing difference between sketches and photos by synthesizing sketches from photos, and vice versa, and then performing sketch-sketch/photo-photo recognition with subspace learning based methods. Pseudo-sketch/pseudo-photo patches are synthesized with embedded hidden Markov model. Because these patches are assembled by averaging their overlapping area in most of the local strategy based methods, which leads to blurring effect to the resulted pseudo-sketch/pseudo-photo, we integrate the patches with image quilting. Experiments are carried out to demonstrate that the proposed method is effective to produce pseudo-sketch/pseudo-photo with high quality and achieve promising recognition results.
Mu, Y. & Tao, D. 2010, 'Biologically inspired feature manifold for gait recognition', Neurocomputing, vol. 73, no. 4-6, pp. 895-902.
Using biometric resources to recognize a person has been a recent concentration on computer vision. Previously, biometric research has forced on utilizing iris, finger print, palm print, and shoe print to authenticate and authorized a human. However, these conventional biometric resources suffer from some obviously limitation, such as: strictly distance requirement, too many user cooperation requirement and so on. Compared with the difficulties of utilization through conventional biometric resources, human gait can be easily acquired and utilized in many fields. A human's walk image can reflect the walker's physical characteristics and psychological state, and therefore, the gait feature can be used to recognize a person. In order to achieve better performance of gait recognition we represent the gait image using C1 units, which correspond to the complex cells in human visual cortex, and use a maximum mechanism to keep only the maximum response of each local area of SI units. To enhance the gait recognition rate, we take the label information into account and utilize the discriminative locality alignment (DLA) method to classify, which is a top level discriminate manifold learning based subspace learning algorithm. Experiment on University of South Florida (USF) dataset shows: (I) the proposed C1Gait+DLA algorithms can achieve better performance than the state-of-art algorithms and (2) DLA can duly preserve both the local geometry and the discriminative information for recognition.
Si, S., Tao, D. & Geng, B. 2010, 'Bregman Divergence-Based Regularization for Transfer Subspace Learning', IEEE Transactions On Knowledge And Data Engineering, vol. 22, no. 7, pp. 929-942.
The regularization principals [31] lead approximation schemes to deal with various learning problems, e. g., the regularization of the norm in a reproducing kernel Hilbert space for the ill-posed problem. In this paper, we present a family of subspace le
Fu, R., Gao, X., Li, X., Tao, D., Jian, Y., Li, J., Hu, H. & Yang, H. 2010, 'An integrated aurora image retrieval system: AuroraEye', Journal Of Visual Communication And Image Representation, vol. 21, no. 8, pp. 787-797.
With the digital all-sky imager (ASI) emergence in aurora research, millions of images are captured annually. However, only a fraction of which can be actually used. To address the problem incurred by low efficient manual processing, an integrated image
Si, S., Tao, D. & Chan, K. 2010, 'Evolutionary Cross-Domain Discriminative Hessian Eigenmaps', IEEE Transactions On Image Processing, vol. 19, no. 4, pp. 1075-1086.
Is it possible to train a learning model to separate tigers from elks when we have 1) labeled samples of leopard and zebra and 2) unlabelled samples of tiger and elk at hand? Cross-domain learning algorithms can be used to solve the above problem. Howeve
Tian, X., Tao, D., Hua, X. & Wu, X. 2010, 'Active Reranking for Web Image Search', IEEE Transactions On Image Processing, vol. 19, no. 3, pp. 805-820.
Image search reranking methods usually fail to capture the user's intention when the query term is ambiguous. Therefore, reranking with user interactions, or active reranking, is highly demanded to effectively improve the search performance. The essentia
Bian, W. & Tao, D. 2010, 'Biased Discriminant Euclidean Embedding for Content-Based Image Retrieval', IEEE Transactions On Image Processing, vol. 19, no. 2, pp. 545-554.
With many potential multimedia applications, content-based image retrieval (CBIR) has recently gained more attention for image management and web search. A wide variety of relevance feedback (RF) algorithms have been developed in recent years to improve
Song, D. & Tao, D. 2010, 'Biologically Inspired Feature Manifold for Scene Classification', IEEE Transactions On Image Processing, vol. 19, no. 1, pp. 174-184.
Biologically inspired feature (BIF) and its variations have been demonstrated to be effective and efficient for scene classification. It is unreasonable to measure the dissimilarity between two BIFs based on their Euclidean distance. This is because BIFs
Xia, T., Tao, D., Mei, T. & Zhang, Y. 2010, 'Multiview Spectral Embedding', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 40, no. 6, pp. 1438-1446.
In computer vision and multimedia search, it is common to use multiple features from different views to represent an object. For example, to well characterize a natural scene image, it is essential to find a set of visual features to represent its color,
Song, M., Tao, D., Sun, Z. & Li, X. 2010, 'Visual-Context Boosting for Eye Detection', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 40, no. 6, pp. 1460-1467.
Eye detection plays an important role in many practical applications. This paper presents a novel two-step scheme for eye detection. The first step models an eye by a newly defined visual-context pattern (VCP), and the second step applies semisupervised
Song, M., Tao, D., Liu, Z., Li, X. & Zhou, M. 2010, 'Image Ratio Features for Facial Expression Recognition Application', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 40, no. 3, pp. 779-788.
Video-based facial expression recognition is a challenging problem in computer vision and human-computer interaction. To target this problem, texture features have been extracted and widely used, because they can capture image intensity changes raised by
Wang, B., Gao, X., Tao, D. & Li, X. 2010, 'A Unified Tensor Level Set for Image Segmentation', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 40, no. 3, pp. 857-867.
This paper presents a new region-based unified tensor level set model for image segmentation. This model introduces a three-order tensor to comprehensively depict features of pixels, e.g., gray value and the local geometrical features, such as orientatio
Zhang, T., Huang, K., Li, X., Yang, J. & Tao, D. 2010, 'Discriminative Orthogonal Neighborhood-Preserving Projections for Classification', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 40, no. 1, pp. 253-263.
Orthogonal neighborhood-preserving projection (ONPP) is a recently developed orthogonal linear algorithm for overcoming the out-of-sample problem existing in the well-known manifold learning algorithm, i.e., locally linear embedding. It has been shown th
Gao, X., Wang, Y., Li, X. & Tao, D. 2010, 'On Combining Morphological Component Analysis and Concentric Morphology Model for Mammographic Mass Detection', IEEE Transactions on Information Technology in Biomedicine, vol. 14, no. 2, pp. 266-273.
Mammographic mass detection is an important task for the early diagnosis of breast cancer. However, it is difficult to distinguish masses from normal regions because of their abundant morphological characteristics and ambiguous margins. To improve the ma
Gao, X., Deng, C., Li, X. & Tao, D. 2010, 'Geometric Distortion Insensitive Image Watermarking in Affine Covariant Regions', IEEE Transactions On Systems Man And Cybernetics Part C-Applications And Reviews, vol. 40, no. 3, pp. 278-286.
Feature-based image watermarking schemes, which aim to survive various geometric distortions, have attracted great attention in recent years. Existing schemes have shown robustness against rotation, scaling, and translation, but few are resistant to crop
Gao, X., Su, Y., Li, X. & Tao, D. 2010, 'A Review of Active Appearance Models', IEEE Transactions On Systems Man And Cybernetics Part C-Applications And Reviews, vol. 40, no. 2, pp. 145-158.
Active appearance model (AAM) is a powerful generative method for modeling deformable objects. The model decouples the shape and the texture variations of objects, which is followed by an efficient gradient-based model fitting method. Due to the flexible
Gao, X., Xiao, B., Tao, D. & Li, X. 2010, 'A survey of graph edit distance', Pattern Analysis and Applications, vol. 13, no. 1, pp. 113-129.
Inexact graph matching has been one of the significant research foci in the area of pattern analysis. As an important way to measure the similarity between pairwise graphs error-tolerantly, graph edit distance (GED) is the base of inexact graph matching.
Lu, W., Li, X., Gao, X., Tang, W., Li, J. & Tao, D. 2010, 'A Video Quality Assessment Metric Based on Human Visual System', Cognitive Computation, vol. 2, no. 2, pp. 120-131.
It is important for practical application to design an effective and efficient metric for video quality. The most reliable way is by subjective evaluation. Thus, to design an objective metric by simulating human visual system (HVS) is quite reasonable an
Gao, X., Deng, C., Li, X. & Tao, D. 2010, 'Local Feature Based Geometric-Resistant Image Information Hiding', Cognitive Computation, vol. 2, no. 2, pp. 68-77.
Watermarking aims to hide particular information into some carrier but does not change the visual cognition of the carrier itself. Local features are good candidates to address the watermark synchronization error caused by geometric distortions and have
Lu, W., Zeng, K., Tao, D., Yuan, Y. & Gao, X. 2010, 'No-reference Image Quality Assessment In Contourlet Domain', Neurocomputing, vol. 73, no. 4-6, pp. 784-794.
The target of no-reference (NR) image quality assessment (IQA) is to establish a computational model to predict the visual quality of an image. The existing prominent method is based on natural scene statistics (NSS). It uses the joint and marginal distr
Zhang, C. & Tao, D. 2010, 'Error Bounds for Real Function Classes Based on Discretized Vapnik-Chervonenkis Dimensions', Australian Journal of Intelligent Information Processing Systems, vol. 12, no. 3, pp. 1-5.
The Vapnik-Chervonenkis (VC) dimension plays an impor- tant role in statistical learning theory. In this paper, we propose the discretized VC dimension obtained by discretizing the range of a real function class. Then, we point out that Sauer's Lemma is valid for the discretized VC dimension. We group the real function classes having the innite VC dimension into four categories by using the discretized VC dimension. As a byproduct, we present the equidistantly discretized VC dimension by introducing an equidistant partition to segmenting the range of a real function class. Finally, we obtain the error bounds for real function classes based on the discretized VC dimensions in the PAC-learning framework.
Tao, D., Li, X., Wu, X. & Maybank, S. 2009, 'Geometric Mean for Subspace Selection', IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 31, no. 2, pp. 260-274.
Subspace selection approaches are powerful tools in pattern classification and data visualization. One of the most important subspace approaches is the linear dimensionality reduction step in the Fisher's linear discriminant analysis (FLDA), which has been successfully employed in many fields such as biometrics, bioinformatics, and multimedia information management. However, the linear dimensionality reduction step in FLDA has a critical drawback: for a classification task with c classes, if the dimension of the projected subspace is strictly lower than c - 1, the projection to a subspace tends to merge those classes, which are close together in the original feature space. If separate classes are sampled from Gaussian distributions, all with identical covariance matrices, then the linear dimensionality reduction step in FLDA maximizes the mean value of the Kullback-Leibler (KL) divergences between different classes. Based on this viewpoint, the geometric mean for subspace selection is studied in this paper. Three criteria are analyzed: 1) maximization of the geometric mean of the KL divergences, 2) maximization of the geometric mean of the normalized KL divergences, and 3) the combination of 1 and 2. Preliminary experimental results based on synthetic data, UCI Machine Learning Repository, and handwriting digits show that the third criterion is a potential discriminative subspace selection method, which significantly reduces the class separation problem in comparing with the linear dimensionality reduction step in FLDA and its several representative extensions.
Gao, X., An, L., Li, X. & Tao, D. 2009, 'Reversibility improved lossless data hiding', Signal Processing, vol. 89, no. 10, pp. 2053-2065.
Recently, lossless data hiding has attracted increasing interests. As a reversible watermark scheme, the host media and hidden data should be recovered without distortion. A latest lossless data hiding technique based on image blocking and block classification has achieved good performance for image authentication. However, this method cannot always fully restore all the blocks of host images and watermarks. For this purpose, we propose an improved algorithm, which is characterized by two aspects. First, a block skipping scheme (BSS) is developed for the host blocks selection to embed watermark; secondly, the embedding level is modified by a novel parameter model to guarantee that the host blocks can be recovered without distortion as well as the embedded data. Extensive experiments conducted on standard grayscale images, medical images, and color images have demonstrated the effectiveness of the improved lossless data hiding scheme.
Deng, C., Gao, X., Li, X. & Tao, D. 2009, 'A local Tchebichef moments-based robust image watermarking', Signal Processing, vol. 89, no. 8, pp. 1531-1539.
Protection against geometric distortions and common image processing operations with blind detection becomes a much challenging task in image watermarking. To achieve this, in this paper we propose a content-based watermarking scheme that combines the invariant feature extraction with watermark embedding by using Tchebichef moments. Harris-Laplace detector is first adopted to extract feature points, and then non-overlapped disks centered at feature points are generated. These disks are invariant to scaling and translation distortions. For each disk, orientation alignment is then performed to achieve rotation invariant. Finally, the watermark is embedded in magnitudes of Tchebichef moments of each disk via dither modulation to realize the robustness to common image processing operations and the blind detection. Thorough simulation results obtained by using the standard benchmark, Stirmark, demonstrate that the proposed method is robust against various geometric distortions as well as common image processing operations and outperforms representative image watermarking schemes.
Xiao, B., Gao, X., Tao, D. & Li, X. 2009, 'A new approach for face recognition by sketches in photos', Signal Processing, vol. 89, no. 8, pp. 1576-1588.
Face recognition by sketches in photos remains a challenging task. Unlike the existing sketch-photo recognition methods, which convert a photo into sketch and then perform the sketch-photo recognition through sketch-sketch recognition, this paper devotes to synthesizing a photo from the sketch and transforming the sketch-photo recognition to photo-photo recognition to achieve better performance in mixture pattern recognition. The contribution of this paper mainly focuses on two aspects: (1) in view of that there are no many research findings of sketch-photo recognition based on the pseudo-photo synthesis and the existing methods require a large set of training samples, which is nearly impossible to achieve for the high cost of sketch acquisition, we make use of embedded hidden Markov model (EHMM), which can learn the nonlinearity of sketch-photo pair with less training samples, to produce pseudo-photos in terms of sketches; and (2) photos and sketches are divided into patches and pseudo-photo is generated by combining pseudo-photo patches, which makes pseudo-photo more recognizable. Experimental results demonstrate that the newly proposed method is effective to identify face sketches in photo set.
Li, X., Tao, D., Gao, X. & Lu, W. 2009, 'A natural image quality evaluation metric', Signal Processing, vol. 89, no. 4, pp. 548-555.
Reduced-reference (RR) image quality assessment (IQA) metrics evaluate the quality of a distorted (or degraded) image by using some, not all, information of the original (reference) image. In this paper, we propose a novel RR IQA metric based on hybrid wavelets and directional filter banks (HWD). With HWD as a pre-processing stage, the newly proposed metric mainly focuses on subbands coefficients of the distorted and original images. it performs well under low data rate, because only a threshold and several proportion values are recorded from the original images and transmitted. Experiments are carried out upon well recognized data sets and the results demonstrate advantages of the metric compared with existing ones. Moreover, a separate set of experiments shows that this proposed metric has good consistency with human subjective perception.
Zhang, T., Tao, D., Li, X. & Yang, J. 2009, 'Patch Alignment for Dimensionality Reduction', IEEE Transactions On Knowledge And Data Engineering, vol. 21, no. 9, pp. 1299-1313.
Spectral analysis-based dimensionality reduction algorithms are important and have been popularly applied in data mining and computer vision applications. To date many algorithms have been developed, e. g., principal component analysis, locally linear em
Tao, D., Li, X., Lu, W. & Gao, X. 2009, 'Reduced-Reference IQA in Contourlet Domain', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 39, no. 6, pp. 1623-1627.
The human visual system (HVS) provides a suitable cue for image quality assessment (IQA). In this paper, we develop a novel reduced-reference (RR) IQA scheme by incorporating the merits from the contourlet transform, contrast sensitivity function (CSF),
Gao, X., Su, Y., Li, X. & Tao, D. 2009, 'Gabor texture in active appearance models', Neurocomputing, vol. 72, no. 13-15, pp. 3174-3181.
In computer vision applications, Active Appearance Models (AAMs) is usually used to model the shape and the gray-level appearance of an object of interest using statistical methods, such as PCA. However, intensity values used in standard AAMs cannot provide enough information for image alignment. In this paper, we firstly propose to utilize Gabor filters to represent the image texture. The benefit of Gabor-based representation is that it can express local structures of an image. As a result, this representation can lead to more accurate matching when condition changes. Given the problem of the excessive storage and computational complexity of the Gabor. three different Gabor-based image representations are used in AAMs: (1) GaborD is the sum of Gabor filter responses over directions, (2) GaborS is the sum of Gabor filter responses over scales, and (3) GaborSD is the sum of Gabor filter responses over scales and directions. Through a large number of experiments, we show that the proposed Gabor representations lead to more accurate and robust matching between model and images.
Yuan, Y., Li, X., Pang, Y., Lu, X. & Tao, D. 2009, 'Binary Sparse Nonnegative Matrix Factorization', IEEE Transactions On Circuits And Systems For Video Technology, vol. 19, no. 5, pp. 772-777.
This paper presents a fast part-based subspace selection algorithm, termed the binary sparse nonnegative matrix factorization (B-SNMF). Both the training process and the testing process of B-SNMF are much faster than those of binary principal component a
Gao, X., Lu, W., Tao, D. & Li, X. 2009, 'Image Quality Assessment Based on Multiscale Geometric Analysis', IEEE Transactions On Image Processing, vol. 18, no. 7, pp. 1409-1423.
Reduced-reference (RR) image quality assessment (IQA) has been recognized as an effective and efficient way to predict the visual quality of distorted images. The current standard is the wavelet-domain natural image statistics model (WNISM), which applie
Gao, X., Yang, Y., Tao, D. & Li, X. 2009, 'Discriminative optical flow tensor for video semantic analysis', Computer Vision And Image Understanding, vol. 113, no. 3, pp. 372-383.
This paper presents a novel framework for effective video semantic analysis. This framework has two major components, namely, optical flow tensor (OFF) and hidden Markov models (HMMs). OFT and HMMs are employed because: (I) motion is one of the fundament
Huang, K., Tao, D., Yuan, Y., Li, X. & Tan, T. 2009, 'View-Independent Behavior Analysis', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 39, no. 4, pp. 1028-1035.
The motion analysis of the human body is an important topic of research in computer vision devoted to detecting, tracking, and understanding people's physical behavior. This strong interest is driven by a wide spectrum of applications in various areas su
Shen, J., Tao, D. & Li, X. 2009, 'QUC-Tree: Integrating Query Context Information for Efficient Music Retrieval', IEEE Transactions On Multimedia, vol. 11, no. 2, pp. 313-323.
In this paper, we introduce a novel indexing scheme-QUery Context tree (QUC-tree) to facilitate efficient query sensitive music search under different query contexts. Distinguished from the previous approaches, QUC-tree is a balanced multiway tree struct
Li, J., Zhang, L., Tao, D., Sun, H. & Zhao, Q. 2009, 'A Prior Neurophysiologic Knowledge Free Tensor-Based Scheme for Single Trial EEG Classification', IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 17, no. 2, pp. 107-115.
Single trial electroencephalogram (EEG) classification is essential in developing brain-computer interfaces (BCIs). However, popular classification algorithms, e.g., common spatial patterns (CSP), usually highly depend on the prior neurophysiologic knowl
Pang, Y., Li, X., Yuan, Y., Tao, D. & Pan, J. 2009, 'Fast Haar Transform Based Feature Extraction for Face Representation and Recognition', IEEE Transactions on Information Forensics and Security, vol. 4, no. 3, pp. 441-450.
Subspace learning is the process of finding a proper feature subspace and then projecting high-dimensional data onto the learned low-dimensional subspace. The projection operation requires many floating-point multiplications and additions, which makes th
Mu, Y., Tao, D., Li, X. & Murtagh, F. 2009, 'Biologically Inspired Tensor Features', Cognitive Computation, vol. 1, no. 4, pp. 327-341.
According to the research results reported in the past decades, it is well acknowledged that face recognition is not a trivial task. With the development of electronic devices, we are gradually revealing the secret of object recognition in the primate's
Gao, X., Li, X., Fen, J. & Tao, D. 2009, 'Shot-based video retrieval with optical flow tensor and HMMs', Pattern Recognition Letters, vol. 30, no. 2, pp. 140-147.
Video retrieval and indexing research aims to efficiently and effectively manage very large video databases, e.g., CCTV records, which is a key component in video-based object and event analysis. In this paper, for the purpose of video retrieval, we prop
Lu, W., Gao, X., Tao, D. & Li, X. 2008, 'A wavelet-based image quality assessment method', International Journal of Wavelets, Multiresolution and Information Processing, vol. 6, no. 4, pp. 541-551.
Image quality is a key characteristic in image processing, (10,11) image retrieval, (12,13) and biometrics.(14) In this paper, a novel reduced- reference image quality assessment method is proposed based on wavelet transform. By simulating the human visu
Gao, X., Lu, W., Li, X. & Tao, D. 2008, 'Wavelet-based contourlet in quality evaluation of digital images', Neurocomputing, vol. 72, no. 1-3, pp. 378-385.
Feature extraction is probably the most important stage in image quality evaluation-effective features can well reflect the quality of digital images and vice versa. As a non-redundant sparse representation, contourlet transform can effectively reflect v
Pang, Y., Tao, D., Yuan, Y. & Li, X. 2008, 'Binary two-dimensional PCA', IEEE Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 38, no. 4, pp. 1176-1180.
Fast training and testing procedures are crucial in biometrics recognition research. Conventional algorithms, e.g., principal component, analysis (PCA), fail to efficiently work on large-scale and high-resolution image data sets. By incorporating merits
Tao, D., Li, X., Wu, X. & Maybank, S. 2008, 'Tensor Rank One Discriminant Analysis - A convergent method for discriminative multilinear subspace selection', Neurocomputing, vol. 71, no. 10-12, pp. 1866-1882.
This paper proposes Tensor Rank One Discriminant Analysis (TR1DA) in which general tensors are input for pattern classification. TR1DA is based on Differential Scatter Discriminant Criterion (DSDC) and Tensor Rank One Analysis (TR1A). DSDC is a generaliz
Gao, X., Zhong, J., Tao, D. & Li, X. 2008, 'Local face sketch synthesis learning', Neurocomputing, vol. 71, no. 10-12, pp. 1921-1930.
Facial sketch synthesis (FSS) is crucial in sketch-based face recognition. This paper proposes an automatic FSS algorithm with local strategy based on embedded hidden Markov model (E-HMM) and selective ensemble (SE). By using E-HMM to model the nonlinear
Li, X., Tao, D., Maybank, S. & Yuan, Y. 2008, 'Visual music and musical vision', Neurocomputing, vol. 71, no. 10-12, pp. 2023-2028.
This paper aims to bridge human hearing and vision from the viewpoint of database search for images or music. The semantic content of an image can be illustrated with music or conversely images can be associated with a piece of music. The theoretical bas
Zhang, T., Li, X., Tao, D. & Yang, J. 2008, 'Local Coordinates Alignment (LCA): A novel manifold learning approach', International Journal of Pattern Recognition and Artificial Intelligence, vol. 22, no. 4, pp. 667-690.
Manifold learning has been demonstrated as an effective way to represent intrinsic geometrical structure of samples. In this paper, a new manifold learning approach, named Local Coordinates Alignment (LCA), is developed based on the alignment technique.
Zhang, T., Li, X., Tao, D. & Yang, J. 2008, 'Multimodal biometrics using geometry preserving projections', Pattern Recognition, vol. 41, no. 3, pp. 805-813.
Multimodal biometric system utilizes two or more individual modalities, e.g., face, gait, and fingerprint, to improve the recognition accuracy of conventional unimodal methods. However, existing multimodal biometric methods neglect interactions of differ
Li, X., Maybank, S., Yan, S., Tao, D. & Xu, D. 2008, 'Gait components and their application to gender recognition', IEEE Transactions On Systems Man And Cybernetics Part C-Applications And Reviews, vol. 38, no. 2, pp. 145-155.
Human gait is a promising biometrics; resource. In this paper, the information about gait is obtained from the motions of the different parts of the silhouette. The human silhouette is segmented into seven components, namely head, arm, trunk, thigh, fron
Tao, D., Tang, X. & Li, X. 2008, 'Which components are important for interactive image searching?', IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 1, pp. 3-11.
With many potential industrial applications, content-based image retrieval (CBIR) has recently gained more attention for image management and web searching. As an important too] to capture users' preferences and thus to improve the performance of CBIR sy
Shen, J., Tao, D. & Li, X. 2008, 'Modality Mixture Projections for Semantic Video Event Detection', IEEE Transactions On Circuits And Systems For Video Technology, vol. 18, no. 11, pp. 1587-1596.
Event detection is one of the most fundamental components for various kinds of domain applications of video information system. In recent years,, it has gained a considerable interest of practitioners and academics from different areas. While detecting v
Tao, D., Song, M., Li, X., Shen, J., Sun, J., Wu, X., Faloutsos, C. & Maybank, S. 2008, 'Bayesian Tensor Approach for 3-D Face Modeling', IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 10, pp. 1397-1410.
Effectively modeling a collection of three-dimensional (3-D) faces is an important task in various applications, especially facial expression-driven ones, e.g., expression generation, retargeting, and synthesis. These 3-D faces naturally form a set of se
Li, N., Chen, C., Wang, Q., Song, M., Tao, D. & Li, X. 2008, 'Avatar motion control by natural body movement via camera', Neurocomputing, vol. 72, no. 1-3, pp. 648-652.
With the popularity of cameras and rapid development of computer vision technology, vision-based HCI is attracting extensive interests. In this paper, we present a system for controlling avatars by natural body movement via a single web-camera. A pose da
Gao, X., Xiao, B., Tao, D. & Li, X. 2008, 'Image categorization: Graph edit distance + edge direction histogram', Pattern Recognition, vol. 41, no. 10, pp. 3179-3191.
This paper presents a novel algorithm for computing graph edit distance (GED) in image categorization. This algorithm is purely structural, i.e., it needs only connectivity structure of the graph and does not draw on node or edge attributes. There are tw
Li, J., Li, X. & Tao, D. 2008, 'KPCA for semantic object extraction in images', Pattern Recognition, vol. 41, no. 10, pp. 3244-3250.
In this paper, we kernelize conventional clustering algorithms from a novel point of view. Based on the fully mathematical proof, we first demonstrate that kernel KMeans (KKMeans) is equivalent to kernel principal component analysis (KPCA) prior to the c
Xiao, B., Gao, X., Tao, D. & Li, X. 2008, 'HMM-based graph edit distance for image indexing', International Journal of Imaging Systems and Technology, vol. 18, no. 2-3, pp. 209-218.
Most of the existing graph edit distance (GED) algorithms require cost functions which are difficult to be defined exactly. In this article, we propose a cost function free algorithm for computing GED. It only depends on the distribution of nodes rather
Gao, X., Xiao, B., Tao, D. & Li, X. 2008, 'Image Categorization: Graph Edit Distance Plus Edge Direction Histogram', Pattern Recognition, vol. 41, no. 10, pp. 3179-3191.
This paper presents a novel algorithm for computing graph edit distance (GED) in image categorization. This algorithm is purely structural, i.e., it needs only connectivity structure of the graph and does not draw on node or edge attributes. There are tw
Xu, D., Yan, S., Tao, D., Lin, S. & Zhang, H. 2007, 'Marginal Fisher analysis and its variants for human gait recognition and content-based image retrieval', IEEE Transactions On Image Processing, vol. 16, no. 11, pp. 2811-2821.
Dimensionality reduction algorithms, which aim to select a small set of efficient and discriminant features, have attracted great attention for human gait recognition and content-based image retrieval (CBIR). In this paper, we present extensions of our r
Tao, D., Li, X., Wu, X. & Maybank, S. 2007, 'General tensor discriminant analysis and Gabor features for gait recognition', IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 29, no. 10, pp. 1700-1715.
Traditional image representations are not suited to conventional classification methods such as the linear discriminant analysis (LDA) because of the under sample problem (USP): the dimensionality of the feature space is much higher than the number of tr
Tao, D., Li, X., Wu, X., Hu, W. & Maybank, S. 2007, 'Supervised tensor learning', Knowledge And Information Systems, vol. 13, no. 1, pp. 1-42.
Tensor representation is helpful to reduce the small sample size problem in discriminative subspace selection. As pointed by this paper, this is mainly because the structure information of objects in computer vision research is a reasonable constraint to
Tao, D., Li, X. & Maybank, S. 2007, 'Negative samples analysis in relevance feedback', IEEE Transactions On Knowledge And Data Engineering, vol. 19, no. 4, pp. 568-580.
Recently, relevance feedback (RF) in content-based image retrieval (CBIR) has been implemented as an online binary classifier to separate the positive samples from the negative samples, where both sets of samples are labeled by the user. In many applicat
Gao, X., Li, J., Tao, D. & Li, X. 2007, 'Fuzziness Measurement Of Fuzzy Sets And Its Application In Cluster Validity Analysis', International Journal of Fuzzy Systems, vol. 9, no. 4, pp. 188-197.
To measure the fuzziness of fuzzy sets, this paper introduces a distance-based and a fuzzy entropy-based measurements. Then these measurements are generalized to measure the fuzziness of fuzzy partition, namely partition fuzziness. According to the relat
Tao, D., Tang, X., Li, X. & Rui, Y. 2006, 'Direct kernel biased discriminant analysis: A new content-based image retrieval relevance feedback algorithm', IEEE Transactions On Multimedia, vol. 8, no. 4, pp. 716-727.