UTS site search

Associate Professor Jian Zhang

Biography

Dr. Jian Zhang is an associate professor in Faculty of Engineering and IT and research leader of Multimedia and Media Analytics Program in UTS Advanced Analytics Institute (AAI). Dr. Zhang earned his PhD from School of Information Technology and Electrical Engineering, UNSW@ADFA, Australian Defence Force Academy, at the University of New South Wales in 1999.

Key Research Projects in UTS:

1. Microsoft External Collaboration Project (Pilot funded project): Advanced 3D Deformable Surface Reconstruction and Tracking through RGB-D Cameras. The aim of this project is to develop novel computer vision technology for real time modelling and tracking of 3D dense and deformable surfaces using general RGB-D cameras. The expected outcomes of this project will add significant value to the current RGB-D camera platform when applied in the common scenario in which the RGB-D camera does not move but the deformable objects of interest are moving.

2. Nokia External Collaboration Project (Pilot funded project): Large Scale 3D Image Processing. This project is to develop a novel algorithm for 3D image registration with different point clouds over 3D space. Our research outcome is a critical technology for Nokia’s mobile phone application.

 PhD scholarships are available to fund high profile PhD candidates in the following areas:
  • Image processing & pattern recognition
  • Multimedia information retrieval
  • Social multimedia signal processing
  • 2D/3D Computer vision
  • Surveillance video content analysis
  • Multimedia and new media Analytics 

From January 2004 - July 2011, Dr Zhang was a Principal Researcher with National ICT Australia (NICTA) and a Conjoint Associate Professor in the School of Computer Science and Engineering at the University of New South Wales, where he was a research leader of Multimedia and Video Communications Research at NICTA Sydney Lab in UNSW Kensington campus. He led several NICTA research projects on traffic sensing and surveillance, video content analysis and management for surveillance and robust automated video surveillance for maritime security. All of these project are in the areas of computer vision, multimedia content analysis and management, and multimedia content indexing and query.

From June 1997 – December 2003, Dr Zhang was with Visual Information Processing Lab, Motorola Labs in Sydney as a senior research engineer and later became a principal research engineer and foundation manager of Visual Communications Research Team, Motorola Labs in Sydney, Australia. 

Professional

Jian Zhang is an IEEE Senior Member. He is the member of Multimedia Signal Processing Technical Committee in Signal Processing Society, Jian was Technical Program Chair, 2008 IEEE Multimedia Signal Processing Workshop; Associated Editor, IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT) and Associated Editor, EURASIP Journal on Image and Video Processing. Dr Zhang was Guest Editor of T-CSVT for Video Technology for Special Issue (March 2007) of the Convergence of Knowledge Engineering Semantics and Signal Processing in Audiovisual Information Retrieval. As a General Co-Chair, Jian has chaired the International Conference on Multimedia and Expo (ICME 2012) in Melbourne Australia 2012.

Professional Activities

  • Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT)
  • Associate Editor of  International Journal of Image and Video Processing (EURASIP_JIVP)
  • Senior member of the IEEE and its Communications, Computer, and Signal Processing Societies.
  • Member of Multimedia Signal Processing Technical Committee, IEEE Signal Processing Society
  • Area Chair of 2011 EEE International Conference on Image Processing (ICIP2011)
  • Special Session Chair of 2010 International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS2010)
  • General Co-Chair of 2010 Digital Image Computing: Techniques and Applications (DICTA2010)
  • Publicity Chairs of 2010 IEEE International Conference on Multimedia and Expo (ICME2010)
  • Asia-Pacific Liaison Chair of  Visual Communications and Image Processing (VCIP2010)
  • Technical Co-Chairs of 2008 IEEE Multimedia Signal Processing Workshop (MMSP08)
  • General Co-chair of 2102 IEEE International Conference on Multimedia and Expo (ICME 2012)
  • Technical Program Co-chair of 2014 IEEE Visual Communications and Image Processing (VCIP 2014)
Image of Jian Zhang
Associate Professor, A/DRsch Advanced Analytics Institute
Core Member, Advanced Analytics Institute
Mas. of Sci, Doc.of Philosophy
 
Phone
+61 2 9514 3829
Room
CB11.07.302

Research Interests

Jian Zhang's research interests include multimedia content management; video understanding; and video coding and communication. Multimedia content management provides advanced algorithms to manage, search, and retrieve rich multimedia content. Video understanding focuses on automatic or semi-automatic extraction of semantic information from video sequences. Video coding and communication targets on an efficient method for video compression and robust transmission. Apart from more than 100 paper publications, book chapters, patents and technical reports from his research output, he was co-author of more than ten patents filed in US, UK, Japan and Australia including five issued US patents.

Can supervise: Yes

UTS Short Course: Multimedia Analytics

Conference Papers

wang, s., Zhang, J. & Miao, Z. 2013, 'A New Edge Feature for head-shoulder Detection', IEEE International Conference on Image Processing, Melbourne, Australia, September 2013 in 2013 IEEE International Conference on Image Processing, ed Brian Lovell, David Suter, David Taubman and Min Wu, Piscataway, NJ, USA, pp. 2822-2826.
View/Download from: UTSePress | Publisher's site
In this work, we introduce a new edge feature to improve the head-shoulder detection performance. Since Head-shoulder detection is much vulnerable to vague contour, our new edge feature is designed to extract and enhance the head-shoulder contour and suppress the other contours. The basic idea is that head-shoulder contour can be predicted by filtering edge image with edge patterns, which are generated from edge fragments through a learning process. This edge feature can significantly enhance the object contour such as human head and shoulder known as En-Contour. To evaluate the performance of the new En-Contour, we combine it with HOG+LBP [1] as HOG+LBP+En-Contour. The HOG+LBP is the state-of-the-art feature in pedestrian detection. Because the human head-shoulder detection is a special case of pedestrian detection, we also use it as our baseline. Our experiments have indicated that this new feature significantly improve the HOG+LBP.
Xu, J., Wu, Q., Zhang, J., Shen, F. & Tang, Z. 2013, 'Training boosting-like algorithms with Training boosting-like algorithms with', Melbourne, Australia, September 2013 in 2013 IEEE International Conference on Image Processing, ed Brian Lovell and David Suter, IEEE, Melbourne, Australia, pp. 4302-4306.
View/Download from: UTSePress | Publisher's site
Boosting algorithms have attracted great attention since the first real-time face detector by Viola & Jones through feature selection and strong classifier learning simultaneously. On the other hand, researchers have proposed to decouple such two procedures to improve the performance of Boosting algorithms. Motivated by this, we propose a boosting-like algorithm framework by embedding semi-supervised subspace learning methods. It selects weak classifiers based on class-separability. Combination weights of selected weak classifiers can be obtained by subspace learning. Three typical algorithms are proposed under this framework and evaluated on public data sets. As shown by our experimental results, the proposed methods obtain superior performances over their supervised counterparts and AdaBoost.
Kusakunniran, W., satoh, S., Zhang, J. & Wu, Q. 2013, 'Attribute-based learning for large scale object classification', San Jose, California, USA, July 2013 in 2013 IEEE International Conference on Multimedia and Expo, ed Anup Basu, Nam Ling, Sethuraman (Panch) Panchanathan, IEEE, San Jose, California, USA, pp. 1-6.
View/Download from: UTSePress | Publisher's site
Scalability to large numbers of classes is an important challenge for multi-class classification. It can often be computationally infeasible at test phase when class prediction is performed by using every possible classifier trained for each individual class. This paper proposes an attribute-based learning method to overcome this limitation. First is to define attributes and their associations with object classes automatically and simultaneously. Such associations are learned based on greedy strategy under certain conditions. Second is to learn a classifier for each attribute instead of each class. Then, these trained classifiers are used to predict classes based on their attribute representations. The proposed method also allows trade-off between test-time complexity (which grows linearly with the number of attributes) and accuracy. Experiments based on Animals-with-Attributes and ILSVRC2010 datasets have shown that the performance of our method is promising when compared with the state-of-the-art.
wang, s., Miao, Z. & Zhang, J. 2013, 'Simultaneously detect and segment pedestrian', IEEE International Conference on Multimedia and Expo, San Jose, USA, July 2013 in 2013 IEEE International Conference on Multimedia and Expo, ed Anup Basu, Nam Ling, Sethuraman (Panch) Panchanathan, IEEE, USA, pp. 1-4.
View/Download from: UTSePress | Publisher's site
We present a framework to simultaneously detect and segment pedestrian in images. Our work is based on part-based method. We first segment the image into superpixels, then assemble superpixels into body part candidates by comparing the assembled shape with pre-built template library. A +structure-based+ shape matching algorithm is developed to measure the shape similarity. All the body part candidates are input into our modified AND/OR graph to generate the most reasonable combination. The graph describes the possible variation of body configuration and model the constrain relationship between body parts. We perform comparison experiments on the public database and the results show the effectiveness of our framework.
Shen, Y., Miao, Z. & Zhang, J. 2012, 'Unsupervised Online Learning Trajectory Analysis Based on Weighted Directed Graph', International Conference on Pattern Recognition, Tsukuba, Japan, November 2012 in 2012 21st International Conference on Pattern Recognition (ICPR), ed Jan-Olof Eklundh,Yuichi Ohta,Steven Tanimoto, IEEE, USA, pp. 1306-1309.
View/Download from: UTSePress
In this paper, we propose a novel unsupervised online learning trajectory analysis method based on weighted directed graph. Each trajectory can be represented as a sequence of key points. In the training stage, unsupervised expectation-maximization algorithm (EM) is applied for training data to cluster key points. Each class is a Gaussian distribution. It is considered as a node of the graph. According to the classification of key points, we can build a weighted directed graph to represent the trajectory network in the scene. Each path is a category of trajectories. In the test stage, we adopt online EM algorithm to classify trajectories and update the graph. In the experiments, we test our approach and obtain a good performance compared with state-of-the-art approaches.
Zhang, J., Lu, S., Mei, T., Wang, J., Wang, Z., Feng, D., Sun, J. & Li, S. 2012, 'Browse-to-search', 2012 ACM Multimedia Conference, Nara, Japan, October 2012 in Browse-to-search, ed Noboru Babaguchi,Kiyoharu Aizawa and John Smith, ACM, USA, pp. 1323-1324.
Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 30++40KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit ratemobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server.
Zhang, J., Wu, Y., Lu, S., Mei, T. & Li, S. 2012, 'Local visual words coding for low bit rate mobile visual search', 2012 ACM Multimedia Conference, Nara, Japan., October 2012 in Local visual words coding for low bit rate mobile visual search, ed Noboru Babaguchi,Kiyoharu Aizawa and John Smith, ACM, USA, pp. 989-992.
Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 30++40KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit ratemobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server
Xu, J., Wu, Q., Zhang, J. & Tang, Z. 2012, 'Object Detection Based on Co-Ocurrence GMuLBP Features', 2012 IEEE International Conference on Multimedia and Expo, 2012 IEEE International Conference on Multimedia and Expo, July 2013 in 2012 IEEE International Conference on Multimedia and Expo, ed Jian Zhang, IEEE Computer Society, Melbourne Australia, pp. 943-948.
View/Download from: Publisher's site
Image co-occurrence has shown great powers on object classification because it captures the characteristic of individual features and spatial relationship between them simultaneously. For example, Co-occurrence Histogram of Oriented Gradients (CoHOG) has achieved great success on human detection task. However, the gradient orientation in CoHOG is sensitive to noise. In addition, CoHOG does not take gradient magnitude into account which is a key component to reinforce the feature detection. In this paper, we propose a new LBP feature detector based image co-occurrence. Building on uniform Local Binary Patterns, the new feature detector detects Co-occurrence Orientation through Gradient Magnitude calculation. It is known as CoGMuLBP. An extension version of the GoGMuLBP is also presented. The experimental results on the UIUC car data set show that the proposed features outperform state-of-the-art methods.
Zhang, J. & Liu, X. 2011, 'Active Learning for Human Action Recognition with Gaussian Processes', IEEE International Conference on Image Processing, Brussels, Belgium, September 2011 in Proceedings of 2011 International Conference on Image Processing, ed Benoit Macq,Peter Schelkens,Inald Lagendijk, IEEE, USA, pp. 3253-3256.
View/Download from: UTSePress
This paper presents an active learning approach for recognizing human actions in videos based on multiple kernel combined method. We design the classifier based on Multiple Kernel Learning (MKL) through Gaussian Processes (GP) regression. This classifier is then trained in an active learning approach. In each iteration, one optimal sample is selected to be interactively annotated and incorporated into training set. The selection of the sample is based on the heuristic feedback of the GP classifier. To our knowledge, GP regression MKL based active learning methods have not been applied to address the human action recognition yet. We test this approach on standard benchmarks. This approach outperforms the state-of-the-art techniques in accuracy while requires significantly less training samples.
Quek, A., Wang, Z., Zhang, J. & Feng, D. 2011, 'Structural Image Classification with Graph Neural Networks', Noosa, Queensland, Australia, February 2011 in Proceedings of 2011 International Conference on Digital Image Computing - Techniques and Applications, ed Paul Jackway, IEEE, USA, pp. 416-421.
View/Download from: UTSePress
Many approaches to image classification tend to transform an image into an unstructured set of numeric feature vectors obtained globally and/or locally, and as a result lose important relational information between regions. In order to encode the geometric relationships between image regions, we propose a variety of structural image representations that are not specialised for any particular image category. Besides the traditional grid-partitioning and global segmentation methods, we investigate the use of local scale-invariant region detectors. Regions are connected based not only upon nearest-neighbour heuristics, but also upon minimum spanning trees and Delaunay triangulation. In order to maintain the topological and spatial relationships between regions, and also to effectively process undirected connections represented as graphs, we utilise the recently-proposed graph neural network model. To the best of our knowledge, this is the first utilisation of the model to process graph structures based on local-sampling techniques, for the task of image classification. Our experimental results demonstrate great potential for further work in this domain.
Li, Z., Wu, Q., Zhang, J. & Geers, G. 2011, 'SKRWM based descriptor for pedestrian detection in thermal images', Hangzhou, China, October 2011 in 2011 IEEE 13th International Workshop on Multimedia Signal Processing (MMSP), ed Wen Gao;Anthony Vetro;Zhengyou Zhang, IEEE, USA, pp. 1-6.
View/Download from: UTSePress |
Pedestrian detection in a thermal image is a difficult task due to intrinsic challenges:1) low image resolution, 2) thermal noising, 3) polarity changes, 4) lack of color, texture or depth information. To address these challenges, we propose a novel mid-level feature descriptor for pedestrian detection in thermal domain, which combines pixel-level Steering Kernel Regression Weights Matrix (SKRWM) with their corresponding covariances. SKRWM can properly capture the local structure of pixels, while the covariance computation can further provide the correlation of low level feature. This mid-level feature descriptor not only captures the pixel-level data difference and spatial differences of local structure, but also explores the correlations among low-level features. In the case of human detection, the proposed mid-level feature descriptor can discriminatively distinguish pedestrian from complexity. For testing the performance of proposed feature descriptor, a popular classifier framework based on Principal Component Analysis (PCA) and Support Vector Machine (SVM) is also built. Overall, our experimental results show that proposed approach has overcome the problems caused by background subtraction in [1] while attains comparable detection accuracy compared to the state-of-the-arts.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2011, 'Speed-invariant gait recognition based on Procrustes Shape Analysis using higher-order shape configuration', Brussels, Belgium, September 2011 in 2011 18th IEEE International Conference on Image Processing (ICIP), ed Benoit Macq; Peter Schelkens, IEEE, USA, pp. 545-548.
View/Download from: UTSePress
Walking speed change is considered a typical challenge hindering reliable human gait recognition. This paper proposes a novel method to extract speed-invariant gait feature based on Procrustes Shape Analysis (PSA). Two major components of PSA, i.e., Procrustes Mean Shape (PMS) and Procrustes Distance (PD), are adopted and adapted specifically for the purpose of speed-invariant gait recognition. One of our major contributions in this work is that, instead of using conventional Centroid Shape Configuration (CSC) which is not suitable to describe individual gait when body shape changes particularly due to change of walking speed, we propose a new descriptor named Higher-order derivative Shape Configuration (HSC) which can generate robust speed-invariant gait feature. From the first order to the higher order, derivative shape configuration contains gait shape information of different levels. Intuitively, the higher order of derivative is able to describe gait with shape change caused by the larger change of walking speed. Encouraging experimental results show that our proposed method is efficient for speed-invariant gait recognition and evidently outperforms other existing methods in the literatures.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2011, 'Pairwise Shape configuration-based PSA for gait recognition under small viewing angle change', The 8th IEEE International Conference Advanced Video and Signal-Based Surveillance, Klagenfurt, Austria, August 2011 in 2011 8th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), ed Gian Luca Foresti; Bernhard Rinner, IEEE, USA, pp. 17-22.
View/Download from: UTSePress
Two main components of Procrustes Shape Analysis (PSA) are adopted and adapted specifically to address gait recognition under small viewing angle change: 1) Procrustes Mean Shape (PMS) for gait signature description; 2) Procrustes Distance (PD) for similarity measurement. Pairwise Shape Configuration (PSC) is proposed as a shape descriptor in place of existing Centroid Shape Configuration (CSC) in conventional PSA. PSC can better tolerate shape change caused by viewing angle change than CSC. Small variation of viewing angle makes large impact only on global gait appearance. Without major impact on local spatio-temporal motion, PSC which effectively embeds local shape information can generate robust view-invariant gait feature. To enhance gait recognition performance, a novel boundary re-sampling process is proposed. It provides only necessary re-sampled points to PSC description. In the meantime, it efficiently solves problems of boundary point correspondence, boundary normalization and boundary smoothness. This re-sampling process adopts prior knowledge of body pose structure. Comprehensive experiment is carried out on the CASIA gait database. The proposed method is shown to significantly improve performance of gait recognition under small viewing angle change without additional requirements of supervised learning, known viewing angle and multi-camera system, when compared with other methods in literatures.
Li, Z., Zhang, J., Wu, Q. & Geers, G.D. 2010, 'Feature Enhancement Using Gradient Salience on Thermal Image', Digital Image Computing: Techniques and Applications, Sydney, Australia, December 2010 in Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), ed Jian Zhang, Chunhua Shen, Glenn Geers, Qiang Wu, IEEE Computer Society, Sydney, Australia, pp. 556-562.
View/Download from: UTSePress | Publisher's site
Feature enhancement in an image is to reinforce some exacted features so that it can be used for object classification and detection. As the thermal image is lack of texture and colorful information, the techniques for visual image feature enhancement is insufficient to apply to thermal images. In this paper, we propose a new gradient-based approach for feature enhancement in thermal image. We use the statistical properties of gradient of foreground object profiles, and formulate object features with gradient saliency. Empirical evaluation of the proposed approach shows significant performance improved on human contours which can be used for detection and classification.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2010, 'Multi-view Gait Recognition Based on Motion Regression using Multilayer Perceptron', International Conference Pattern Recognition, Istanbul Turkey, August 2010 in Proceedings: 2010 20th International Conference Pattern Recognition (ICPR 2010), ed M++jdat ++etin, Kim Boyer and Seong-Whan Lee - ICPR 2010 Technical Program Chairs, IEEE Computer Society, Istanbul Turkey, pp. 2186-2189.
View/Download from: UTSePress | Publisher's site
It has been shown that gait is an efficient biometric feature for identifying a person at a distance. However, it is a challenging problem to obtain reliable gait feature when viewing angle changes because the body appearance can be different under the various viewing angles. In this paper, the problem above is formulated as a regression problem where a novel View Transformation Model (VTM) is constructed by adopting Multilayer Perceptron (MLP) as regression tool. It smoothly estimates gait feature under an unknown viewing angle based on motion information in a well selected Region of Interest (ROI) under other existing viewing angles. Thus, this proposal can normalize gait features under various viewing angles into a common viewing angle before gait similarity measurement is carried out. Encouraging experimental results have been obtained based on widely adopted benchmark database.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2010, 'Support Vector Regression for Multi-view Gait Recognition Based on Local Motion Feature Selection', IEEE Conference on Computer Vision and Pattern Recognition, San Francisco CA, USA, June 2010 in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), ed Trevor Darrell; David Hogg;David Jacobs, IEEE Computer Society, Piscataway, USA, pp. 974-981.
View/Download from: UTSePress | Publisher's site
Gait is a well recognized biometric feature that is used to identify a human at a distance. However, in real environment, appearance changes of individuals due to viewing angle changes cause many difficulties for gait recognition. This paper re-formulates this problem as a regression problem. A novel solution is proposed to create a View Transformation Model (VTM) from the different point of view using Support Vector Regression (SVR). To facilitate the process of regression, a new method is proposed to seek local Region of Interest (ROI) under one viewing angle for predicting the corresponding motion information under another viewing angle. Thus, the well constructed VTM is able to transfer gait information under one viewing angle into another viewing angle. This proposal can achieve view-independent gait recognition. It normalizes gait features under various viewing angles into a common viewing angle before similarity measurement is carried out. The extensive experimental results based on widely adopted benchmark dataset demonstrate that the proposed algorithm can achieve significantly better performance than the existing methods in literature.
Kusakunniran, W., Wu, Q., Li, H. & Zhang, J. 2009, 'Automatic gait recognition using weighted binary pattern on video', Advanced Video and Signal Based Surveillance, Genoa, Italy, September 2009 in Proceedings of Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, ed Tubaro, S, IEEE Computer Society, USA, pp. 49-54.
View/Download from: UTSePress |
Human identification by recognizing the spontaneous gait recorded in real-world setting is a tough and not yet fully resolved problem in biometrics research. Several issues have contributed to the difficulties of this task. They include various poses, different clothes, moderate to large changes of normal walking manner due to carrying diverse goods when walking, and the uncertainty of the environments where the people are walking. In order to achieve a better gait recognition, this paper proposes a new method based on Weighted Binary Pattern (WBP). WBP first constructs binary pattern from a sequence of aligned silhouettes. Then, adaptive weighting technique is applied to discriminate significances of the bits in gait signatures. Being compared with most of existing methods in the literatures, this method can better deal with gait frequency, local spatial-temporal human pose features, and global body shape statistics. The proposed method is validated on several well known benchmark databases. The extensive and encouraging experimental results show that the proposed algorithm achieves high accuracy, but with low complexity and computational time.
Kusakunniran, W., Wu, Q., Li, H. & Zhang, J. 2009, 'Multiple Views Gait Recognition using View Transformation Model Based on Optimized Gait Energy Image', IEEE International Conference on Computer Vision Workshops, Kyoto, Japan, September 2009 in Proceedings of 2009 IEEE 12th International Conference on Computer Vision Workshops, ed Cipolla, R, IEEE, USA, pp. 1058-1064.
View/Download from: UTSePress |
Gait is one of well recognized biometrics that has been widely used for human identification. However, the current gait recognition might have difficulties due to viewing angle being changed. This is because the viewing angle under which the gait signature database was generated may not be the same as the viewing angle when the probe data are obtained. This paper proposes a new multi-view gait recognition approach which tackles the problems mentioned above. Being different from other approaches of same category, this new method creates a so called View Transformation Model (VTM) based on spatial-domain Gait Energy Image (GEI) by adopting Singular Value Decomposition (SVD) technique. To further improve the performance of the proposed VTM, Linear Discriminant Analysis (LDA) is used to optimize the obtained GEI feature vectors. When implementing SVD there are a few practical problems such as large matrix size and over-fitting. In this paper, reduced SVD is introduced to alleviate the effects caused by these problems. Using the generated VTM, the viewing angles of gallery gait data and probe gait data can be transformed into the same direction. Thus, gait signatures can be measured without difficulties. The extensive experiments show that the proposed algorithm can significantly improve the multiple view gait recognition performance when being compared to the similar methods in literature.

Journal Articles

Liu, X., Yin, J., Wang, L., Liu, L., Liu, J., Hou, C. & Zhang, J. 2013, 'An Adaptive Approach To Learning Optimal Neighborhood Kernels', IEEE Transactions on Cybernetics, vol. 43, no. 1, pp. 371-384.
View/Download from: UTSePress | Publisher's site
Learning an optimal kernel plays a pivotal role in kernel-based methods. Recently, an approach called optimal neighborhood kernel learning (ONKL) has been proposed, showing promising classification performance. It assumes that the optimal kernel will reside in the neighborhood of a +pre-specified+ kernel. Nevertheless, how to specify such a kernel in a principled way remains unclear. To solve this issue, this paper treats the pre-specified kernel as an extra variable and jointly learns it with the optimal neighborhood kernel and the structure parameters of support vector machines. To avoid trivial solutions, we constrain the pre-specified kernel with a parameterized model. We first discuss the characteristics of our approach and in particular highlight its adaptivity. After that, two instantiations are demonstrated by modeling the pre-specified kernel as a common Gaussian radial basis function kernel and a linear combination of a set of base kernels in the way of multiple kernel learning (MKL), respectively. We show that the optimization in our approach is a min-max problem and can be efficiently solved by employing the extended level method and Nesterov's method. Also, we give the probabilistic interpretation for our approach and apply it to explain the existing kernel learning methods, providing another perspective for their commonness and differences. Comprehensive experimental results on 13 UCI data sets and another two real-world data sets show that via the joint learning process, our approach not only adaptively identifies the pre-specified kernel, but also achieves superior classification performance to the original ONKL and the related MKL algorithms.
Liu, X., Wang, L., Yin, J., Zhu, E. & Zhang, J. 2013, 'An Efficient Approach To Integrating Radius Information Into Multiple Kernel Learning', IEEE Transactions on Cybernetics, vol. 43, no. 2, pp. 557-569.
View/Download from: UTSePress | Publisher's site
Integrating radius information has been demonstrated by recent work on multiple kernel learning (MKL) as a promising way to improve kernel learning performance. Directly integrating the radius of the minimum enclosing ball (MEB) into MKL as it is, however, not only incurs significant computational overhead but also possibly adversely affects the kernel learning performance due to the notorious sensitivity of this radius to outliers. Inspired by the relationship between the radius of the MEB and the trace of total data scattering matrix, this paper proposes to incorporate the latter into MKL to improve the situation. In particular, in order to well justify the incorporation of radius information, we strictly comply with the radius-margin bound of support vector machines (SVMs) and thus focus on the l2-norm soft-margin SVM classifier. Detailed theoretical analysis is conducted to show how the proposed approach effectively preserves the merits of incorporating the radius of the MEB and how the resulting optimization is efficiently solved. Moreover, the proposed approach achieves the following advantages over its counterparts: 1) more robust in the presence of outliers or noisy training samples; 2) more computationally efficient by avoiding the quadratic optimization for computing the radius at each iteration; and 3) readily solvable by the existing off-the-shelf MKL packages. Comprehensive experiments are conducted on University of California, Irvine, protein subcellular localization, and Caltech-101 data sets, and the results well demonstrate the effectiveness and efficiency of our approach.
Xin, J., Chen, K., Bai, L., Liu, D. & Zhang, J. 2013, 'Depth Adaptive Zooming Visual Servoing For A Robot With A Zooming Camera', International Journal of Advanced Robotic Systems, vol. 10, no. 1, pp. 1-11.
View/Download from: UTSePress
AbstractTosolvetheviewvisibilityproblemandkeep theobservedobjectinthefieldofview(FOV)during thevisual servoing,adepthadaptivezoomingvisual servoing strategy for a manipulator robot with a zooming cameraisproposed. Firstly, a zoom control mechanismisintroducedintotherobotvisualservoing system.Itcandynamicallyadjustthecamera+sfieldof viewtokeepallthefeaturepointsontheobjectinthe fieldofviewofthe camera andgethighobjectlocal resolutionattheendofvisualservoing.Secondly,an invariant visual servoing method is employed to control the robot to the desired position under the changingintrinsicparametersofthecamera.Finally,a nonlinear depth adaptive estimation scheme in the invariant space using Lyapunov stability theory is proposedtoestimateadaptivelythedepthoftheimage features on the object. Three kinds of robot 4DOF visual positioning simulation experiments are conducted. The simulation experiment results show that the proposed approach has higher positioning precision.
Lu, S., Zhang, J., Wang, Z. & Feng, D. 2013, 'Fast Human Action Classification And VOI Localization With Enhanced Sparse Coding', Journal of Visual Communication, vol. 24, no. 2, pp. 127-136.
View/Download from: UTSePress | Publisher's site
Sparse coding which encodes the natural visual signal into a sparse space for visual codebook generation and feature quantization, has been successfully utilized for many image classification applications. However, it has been seldom explored for many video analysis tasks. In particular, the increased complexity in characterizing the visual patterns of diverse human actions with both the spatial and temporal variations imposes more challenges to the conventional sparse coding scheme. In this paper, we propose an enhanced sparse coding scheme through learning discriminative dictionary and optimizing the local pooling strategy. Localizing when and where a specific action happens in realistic videos is another challenging task. By utilizing the sparse coding based representations of human actions, this paper further presents a novel coarse-to-fine framework to localize the Volumes of Interest (VOIs) for the actions. Firstly, local visual features are transformed into the sparse signal domain through our enhanced sparse coding scheme. Secondly, in order to avoid exhaustive scan of entire videos for the VOI localization, we extend the Spatial Pyramid Matching into temporal domain, namely Spatial Temporal Pyramid Matching, to obtain the VOI candidates. Finally, a multi-level branch-and-bound approach is developed to refine the VOI candidates. The proposed framework is also able to avoid prohibitive computations in local similarity matching (e.g., nearest neighbors voting). Experimental results on both two popular benchmark datasets (KTH and YouTube UCF) and the widely used localization dataset (MSR) demonstrate that our approach reduces computational cost significantly while maintaining comparable classification accuracy to that of the state-of-the-art methods
Song, Y., Zhang, J., Cao, L. & Sangeux, M. 2013, 'On Discovering the Correlated Relationship between Static and Dynamic Data in Clinical Gait Analysis', Lecture Notes in Computer Science, vol. 8190, no. 1, pp. 563-578.
View/Download from: UTSePress | Publisher's site
`Gait' is a person's manner of walking. Patients may have an abnormal gait due to a range of physical impairment or brain damage. Clinical gait analysis (CGA) is a technique for identifying the underlying impairments that affect a patient+s gait pattern. The CGA is critical for treatment planning. Essentially, CGA tries to use patients+ physical examination results, known as static data, to interpret the dynamic characteristics in an abnormal gait, known as dynamic data. This process is carried out by gait analysis experts, mainly based on their experience which may lead to subjective diagnoses. To facilitate the automation of this process and form a relatively objective diagnosis, this paper proposes a new probabilistic correlated static-dynamic model (CSDM) to discover correlated relationships between the dynamic characteristics of gait and their root cause in the static data space. We propose an EMbased algorithm to learn the parameters of the CSDM. One of the main advantages of the CSDM is its ability to provide intuitive knowledge. For example, the CSDM can describe what kinds of static data will lead to what kinds of hidden gait patterns in the form of a decision tree, which helps us to infer dynamic characteristics based on static data. Our initial experiments indicate that the CSDM is promising for discovering the correlated relationship between physical examination (static) and gait (dynamic) data.
Zhang, J., Wu, Q., Kusakunniran, W., Ma, Y. & Li, H. 2013, 'A New View-Invariant Feature for Cross-View Gait Recognition', IEEE Transactions on Information Forensics and Security, vol. 8, no. 10, pp. 1642-1653.
View/Download from: UTSePress | Publisher's site
Human gait is an important biometric feature which is able to identify a person remotely. However, change of view causes significant difficulties for recognizing gaits. This paper proposes a new framework to construct a new view-invariant feature for cross-view gait recognition. Our view-normalization process is performed in the input layer (i.e., on gait silhouettes) to normalize gaits from arbitrary views. That is, each sequence of gait silhouettes recorded from a certain view is transformed onto the common canonical view by using corresponding domain transformation obtained through invariant low-rank textures (TILTs). Then, an improved scheme of procrustes shape analysis (PSA) is proposed and applied on a sequence of the normalized gait silhouettes to extract a novel view-invariant gait feature based on procrustes mean shape (PMS) and consecutively measure a gait similarity based on procrustes distance (PD). Comprehensive experiments were carried out on widely adopted gait databases. It has been shown that the performance of the proposed method is promising when compared with other existing methods in the literature.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2012, 'Cross-view and multi-view gait recognitions based on view transformation model using multi-layer perceptron', Pattern Recognition Letters, vol. 33, pp. 882-889.
View/Download from: UTSePress | Publisher's site
Gait has been shown to be an efficient biometric feature for human identification at a distance. However, performance of gait recognition can be affected by view variation. This leads to a consequent difficulty of cross-view gait recognition. A novel method is proposed to solve the above difficulty by using view transformation model (VTM). VTM is constructed based on regression processes by adopting multi-layer perceptron (MLP) as a regression tool. VTM estimates gait feature from one view using a well selected region of interest (ROI) on gait feature from another view. Thus, trained VTMs can normalize gait features from across views into the same view before gait similarity is measured. Moreover, this paper proposes a new multi-view gait recognition which estimates gait feature on one view using selected gait features from several other views. Extensive experimental results demonstrate that the proposed method significantly outperforms other baseline methods in literature for both cross-view and multi-view gait recognitions. In our experiments, particularly, average accuracies of 99%, 98% and 93% are achieved for multiple views gait recognition by using 5 cameras, 4 cameras and 3 cameras respectively.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2012, 'Gait Recognition Under Various Viewing Angles Based On Correlated Motion Regression', Ieee Transactions On Circuits And Systems For Video Technology, vol. 22, no. 6, pp. 966-980.
View/Download from: UTSePress | Publisher's site
It is well recognized that gait is an important biometric feature to identify a person at a distance, e. g., in video surveillance application. However, in reality, change of viewing angle causes significant challenge for gait recognition. A novel approa
Thi, T., Cheng, L., Zhang, J., Wang, L. & satoh, S. 2012, 'Integrating local action elements for action analysis', Computer Vision and Image Understanding, vol. 116, no. 3, pp. 378-395.
View/Download from: UTSePress | Publisher's site
In this paper, we propose a framework for human action analysis from video footage. A video action sequence in our perspective is a dynamic structure of sparse local spatial++temporal patches termed action elements, so the problems of action analysis in video are carried out here based on the set of local characteristics as well as global shape of a prescribed action. We first detect a set of action elements that are the most compact entities of an action, then we extend the idea of Implicit Shape Model to space time, in order to properly integrate the spatial and temporal properties of these action elements. In particular, we consider two different recipes to construct action elements: one is to use a Sparse Bayesian Feature Classifier to choose action elements from all detected Spatial Temporal Interest Points, and is termed discriminative action elements. The other one detects affine invariant local features from the holistic Motion History Images, and picks up action elements according to their compactness scores, and is called generative action elements. Action elements detected from either way are then used to construct a voting space based on their local feature representations as well as their global configuration constraints. Our approach is evaluated in the two main contexts of current human action analysis challenges, action retrieval and action classification. Comprehensive experimental results show that our proposed framework marginally outperforms all existing state-of-the-arts techniques on a range of different datasets.
Xu, J., Wu, Q., Zhang, J. & Tang, Z. 2012, 'Fast and Accurate Human Detection Using a Cascade of Boosted MS-LBP Features', IEEE Signal Processing Letters, vol. 19, no. 10, pp. 676-679.
View/Download from: Publisher's site
In this letter, a new scheme for generating local binary patterns (LBP) is presented. This Modi?ed Symmetric LBP (MS-LBP) feature takes advantage of LBP and gradient features. It is then applied into a boosted cascade framework for human detection. By combining MS-LBP with Haar-like feature into the boosted framework, the performances of heterogeneous features based detectors are evaluated for the best trade-off between accuracy and speed. Two feature training schemes, namely Single AdaBoost Training Scheme (SATS) and Dual AdaBoost Training Scheme (DATS) are proposed and compared. On the top of AdaBoost, two multidimensional feature projection methods are described. A comprehensive experiment is presented. Apart from obtaining higher detection accuracy, the detection speed based on DATS is 17 times faster than HOG method.
Thi, T., Cheng, L., Zhang, J., Wang, L. & satoh, S. 2012, 'Structured learning of local features for human action classification and localization', Image & Vision Computing, vol. 30, no. 1, pp. 1-14.
View/Download from: Publisher's site
Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2012, 'Gait Recognition across Various Walking Speeds using Higher-order Shape Configuration based on Differential Composition Model', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 6, pp. 1654-1668.
View/Download from: UTSePress | Publisher's site
Gait has been known as an effective biometric feature to identify a person at a distance. However, variation of walking speeds may lead to significant changes to human walking patterns. It causes many difficulties for gait recognition. A comprehensive analysis has been carried out in this paper to identify such effects. Based on the analysis, Procrustes shape analysis is adopted for gait signature description and relevant similarity measurement. To tackle the challenges raised by speed change, this paper proposes a higher order shape configuration for gait shape description, which deliberately conserves discriminative information in the gait signatures and is still able to tolerate the varying walking speed. Instead of simply measuring the similarity between two gaits by treating them as two unified objects, a differential composition model (DCM) is constructed. The DCM differentiates the different effects caused by walking speed changes on various human body parts. In the meantime, it also balances well the different discriminabilities of each body part on the overall gait similarity measurements. In this model, the Fisher discriminant ratio is adopted to calculate weights for each body part. Comprehensive experiments based on widely adopted gait databases demonstrate that our proposed method is efficient for cross-speed gait recognition and outperforms other state-of-the-art methods.
Zhang, J., Li, N., Yang, Q. & Hu, C. 2012, 'Self-adaptive Chaotic Differential Evolution Algorithm for Solving Constrained Circular Packing Problem', Journal of Computational Information Systems, vol. 8, no. 18, pp. 7747-7755.
View/Download from: UTSePress
Circles packing into a circular container with equilibrium constraint is a NP hard layout optimization problem. It has a broad application in engineering. This paper studies a two-dimensional constrained packing problem. Classical di?erential evolution for solving this problem is easy to fall into local optima. An adaptive chaotic di?erential evolution algorithm is proposed to improve the performance in this paper. The weighting parameters are dynamically adjusted by chaotic mutation in the searching procedure. The penalty factors of the ?tness function are modi?ed during iteration. To keep the diversity of the population, we limit the population+s concentration. To enhance the local search capability, we adopt adaptive mutation of the global optimal individual. The improved algorithm can maintain the basic algorithm+s structure as well as extend the searching scales, and can hold the diversity of population as well as increase the searching accuracy. Furthermore, our improved algorithm can escape from premature and speed up the convergence. Numerical examples indicate the e?ectiveness and efficiency of the proposed algorithm.
Shen, C., Paisitkriangkrai, S. & Zhang, J. 2011, 'Efficiently Learning a Detection Cascade with Sparse Eigenvectors', IEEE Transactions On Image Processing, vol. 19, no. 7, pp. 22-35.
View/Download from: UTSePress
Real-time object detection has many computer vision applications. Since Viola and Jones proposed the first real-time AdaBoost based face detection system, much effort has been spent on improving the boosting method. In this work, we first show that feature selection methods other than boosting can also be used for training an efficient object detector. In particular, we introduce greedy sparse linear discriminant analysis (GSLDA) for its conceptual simplicity and computational efficiency; and slightly better detection performance is achieved compared with . Moreover, we propose a new technique, termed boosted greedy sparse linear discriminant analysis (BGSLDA), to efficiently train a detection cascade. BGSLDA exploits the sample reweighting property of boosting and the class-separability criterion of GSLDA. Experiments in the domain of highly skewed data distributions (e.g., face detection) demonstrate that classifiers trained with the proposed BGSLDA outperforms AdaBoost and its variants. This finding provides a significant opportunity to argue that AdaBoost and similar approaches are not the only methods that can achieve high detection results for real-time object detection.
Paisitkriangkrai, S., Shen, C. & Zhang, J. 2011, 'Incremental Training of a Detector Using Online Sparse Eigendecomposition', IEEE Transactions On Image Processing, vol. 20, no. 1, pp. 213-226.
View/Download from: UTSePress | Publisher's site
The ability to efficiently and accurately detect objects plays a very crucial role for many computer vision tasks. Recently, offline object detectors have shown a tremendous success. However, one major drawback of offline techniques is that a complete set of training data has to be collected beforehand. In addition, once learned, an offline detector cannot make use of newly arriving data. To alleviate these drawbacks, online learning has been adopted with the following objectives: 1) the technique should be computationally and storage efficient; 2) the updated classifier must maintain its high classification accuracy. In this paper, we propose an effective and efficient framework for learning an adaptive online greedy sparse linear discriminant analysis model. Unlike many existing online boosting detectors, which usually apply exponential or logistic loss, our online algorithm makes use of linear discriminant analysis++ learning criterion that not only aims to maximize the class-separation criterion but also incorporates the asymmetrical property of training data distributions. We provide a better alternative for online boosting algorithms in the context of training a visual object detector.We demonstrate the robustness and efficiency of our methods on handwritten digit and face data sets. Our results confirm that object detection tasks benefit significantly when trained in an online manner.

Microsoft Research: Microsoft Corp. One Microsoft Way, Redmond WA 98052-6399, USA

Dr. Zhengyou Zhang,

Dr Philip A. Chou

Dr. Zicheng Liu

Dr. Xian-Sheng Hua

The Collaborative Research Project with MSR US:

1. Microsoft External Collaboration Project (Pilot funded project): Advanced 3D Deformable Surface Reconstruction and Tracking through RGB-D Cameras. The aim of this project is to develop novel computer vision technology for real time modelling and tracking of 3D dense and deformable surfaces using general RGB-D cameras. The expected outcomes of this project will add significant value to the current RGB-D camera platform when applied in the common scenario in which the RGB-D camera does not move but the deformable objects of interest are moving.

-----------------------------------------------------------------------------------------------------

Nokia Research Centre in Finland

Dr. Lixin Fan

The Collaborative Research Project with Nokia Research Centre in Finland:

2. Nokia External Collaboration Project (Pilot funded project): Large Scale 3D Image Processing. This project is to develop a novel algorithm for 3D image registration with different point clouds over 3D space. Our research outcome is a critical technology for Nokia’s mobile phone application.