UTS site search

Associate Professor Jian Zhang

Biography

Dr. Jian Zhang is an associate professor in Faculty of Engineering and IT and research leader of Multimedia and Media Analytics Program in UTS Advanced Analytics Institute (AAI). Dr. Zhang earned his PhD from School of Information Technology and Electrical Engineering, UNSW@ADFA, Australian Defence Force Academy, at the University of New South Wales in 1999. His research interests include multimedia content analysis, cross domain media analysis, computer vision, pattern recognition, social multimedia recommendation, and 3D deformable motion analysis. He has co-authored more than 100 paper publications, book chapters and 7 issued patents in US, UK and China.  

From January 2004 - July 2011, Dr Zhang was a Principal Researcher with National ICT Australia (NICTA) and a Conjoint Associate Professor in the School of Computer Science and Engineering at the University of New South Wales, where he was a research leader of Multimedia and Video Communications Research at NICTA Sydney Lab in UNSW Kensington campus. He led several NICTA research projects on traffic sensing and surveillance, video content analysis and management for surveillance and robust automated video surveillance for maritime security.  

From June 1997 – December 2003, Dr Zhang was with Visual Information Processing Lab, Motorola Labs in Sydney as a senior research engineer and later became a principal research engineer and foundation manager of Visual Communications Research Team, Motorola Labs in Sydney, Australia. He has completed several technology transfers to Motorola product groups.

http://www.multimediauts.org/~jzhang

Professional

Jian Zhang is an IEEE Senior Member. He is the member of Multimedia Signal Processing Technical Committee in Signal Processing Society, Jian was Technical Program Chair, 2008 IEEE Multimedia Signal Processing Workshop; Associated Editor, IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT) and Associated Editor, EURASIP Journal on Image and Video Processing. Dr Zhang was Guest Editor of T-CSVT for Video Technology for Special Issue (March 2007) of the Convergence of Knowledge Engineering Semantics and Signal Processing in Audiovisual Information Retrieval. As a General Co-Chair, Jian has chaired the International Conference on Multimedia and Expo (ICME 2012) in Melbourne Australia 2012.

Professional Activities

  • Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT)
  • Associate Editor of  International Journal of Image and Video Processing (EURASIP_JIVP)
  • Senior member of the IEEE and its Communications, Computer, and Signal Processing Societies.
  • Member of Multimedia Signal Processing Technical Committee, IEEE Signal Processing Society
  • Area Chair of 2011 EEE International Conference on Image Processing (ICIP2011)
  • Special Session Chair of 2010 International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS2010)
  • General Co-Chair of 2010 Digital Image Computing: Techniques and Applications (DICTA2010)
  • Publicity Chairs of 2010 IEEE International Conference on Multimedia and Expo (ICME2010)
  • Asia-Pacific Liaison Chair of  Visual Communications and Image Processing (VCIP2010)
  • Technical Co-Chairs of 2008 IEEE Multimedia Signal Processing Workshop (MMSP08)
  • General Co-chair of 2102 IEEE International Conference on Multimedia and Expo (ICME 2012)
  • Technical Program Co-chair of 2014 IEEE Visual Communications and Image Processing (VCIP 2014)
Image of Jian Zhang
Associate Professor, A/DRsch Advanced Analytics Institute
Core Member, Advanced Analytics Institute
Mas. of Sci, Doc.of Philosophy
 
Phone
+61 2 9514 3829
Room
CB11.07.302

Research Interests

PhD Scholarships:

Update November, 2014, I have two full PhD scholarships to fund high profile PhD candidates in the following areas:

  • Image processing & pattern recognition
  • Multimedia information retrieval
  • Social multimedia signal processing
  • 2D/3D Computer vision
  • Surveillance video content analysis
  • Multimedia and new media Analytics

For international students, in addition to living expense, the scholarship also include tuition fee waiver scholarship.  Please contact me for the details

Funded Research Projects:

Funded projects for which I am the leading chief investigator,

Robust Automated Video Surveillance & Monitoring in Dynamic Scenes”, National ICT Australia (NICTA) - Defence Science and Technology Organization (DSTO) joint project award, $59,000, (2008-2011).

“Visual information enhanced online video search engines”, Microsoft Research Asia (MSRA) funded project, $53,000, (2011)

“Advancing 3D deformable surface reconstruction and tracking through RGB-D cameras” Microsoft Research funded project, $110,000 (2012-2014)

“Large Scale 3D Image Content Processing”, Nokia Research Centre in Finland funded project, $78,000 (2012-2014)

“Virtual Clothing fitting on Mobile”, Huawei Technologies contracted research project, $90,000 (2013-2014)

“Human detection in local residential area”, Huawei Technologies contracted research project, $70,000 (2012-2013)

Key Research Projects:

1. Microsoft External Collaboration Project (Pilot funded project): Advanced 3D Deformable Surface Reconstruction and Tracking through RGB-D Cameras. The aim of this project is to develop novel computer vision technology for real time modeling and tracking of 3D dense and deformable surfaces using general RGB-D cameras. The expected outcomes of this project will add significant value to the current RGB-D camera platform when applied in the common scenario in which the RGB-D camera does not move but the deformable objects of interest are moving.

2. Nokia External Collaboration Project (Pilot funded project): Large Scale 3D Image Processing. This project is to develop a novel algorithm for 3D image registration with different point clouds over 3D space. Our research outcome is a critical technology for Nokia’s mobile phone application.

My Research Students (as the Principal Supervisor)

  1. Mr. Yucheng Wang PhD Candidate (funded by Microsoft Research Project and UTS International Research Scholarship)
  2. Mr. Shangrong Huang PhD Candidate (funded by Microsoft Research Project and UTS International Research Scholarship)
  3. Mr. Hao Cheng PhD – PhD Candidate (funded by Huawei Technologies Research Project and UTS International Research Scholarship)
  4. Mr. Yazhou Yao PhD – PhD Candidate (funded by Chinese Scholarship Council and UTS International Research Scholarship).  
  5. Dr. Worapan Kusakunniran, PhD Candidate (completed in 2013, funded by UNSW International Scholarship and NICTA top-up scholarship, working as a lecture in Mahidol University, Thailand)
  6. Dr. Tuan Hue Thi – PhD Candidate (completed in 2012, funded by NICTA/UNSW International Scholarship, working as Research Scientist in Placemeter, a US Start-up company)  
  7. Dr. Shijun Lu – PhD Candidate (completed in 2012, funded by funded by NICTA/USYD International Scholarship, working in Australia Defence Force)
  8. Dr. Sakrapee (Paul) Paisitkriangkrai – PhD Candidate (completed in 2012, funded by APA and NICTA top-up scholarship, working as a research fellow in University of Adelaide)   
  9. Mr. Cheng Lu -- Research Master (completed 2010, funded NICTA Scholarship)

UTS Short Course: Multimedia Analytics

Chapters

Zhang, J. 2006, 'Error Resilience for Video Coding Service' in Wu, H.R. & Rao, K.R. (eds), Digital Video Image Quality and Perceptual Coding, CRC, Taylor & Francis group, USA, pp. 503-527.
This is part of my thesis

Conferences

wang, S., Zhang, J. & Miao, Z. 2013, 'A New Edge Feature for head-shoulder Detection', 2013 IEEE International Conference on Image Processing, Piscataway, NJ, USA, pp. 2822-2826.
View/Download from: Publisher's site
In this work, we introduce a new edge feature to improve the head-shoulder detection performance. Since Head-shoulder detection is much vulnerable to vague contour, our new edge feature is designed to extract and enhance the head-shoulder contour and suppress the other contours. The basic idea is that head-shoulder contour can be predicted by filtering edge image with edge patterns, which are generated from edge fragments through a learning process. This edge feature can significantly enhance the object contour such as human head and shoulder known as En-Contour. To evaluate the performance of the new En-Contour, we combine it with HOG+LBP [1] as HOG+LBP+En-Contour. The HOG+LBP is the state-of-the-art feature in pedestrian detection. Because the human head-shoulder detection is a special case of pedestrian detection, we also use it as our baseline. Our experiments have indicated that this new feature significantly improve the HOG+LBP.
Xu, J., Wu, Q., Zhang, J., Shen, F. & Tang, Z. 2013, 'Training boosting-like algorithms with Training boosting-like algorithms with', 2013 IEEE International Conference on Image Processing, IEEE, Melbourne, Australia, pp. 4302-4306.
View/Download from: Publisher's site
Boosting algorithms have attracted great attention since the first real-time face detector by Viola & Jones through feature selection and strong classifier learning simultaneously. On the other hand, researchers have proposed to decouple such two procedures to improve the performance of Boosting algorithms. Motivated by this, we propose a boosting-like algorithm framework by embedding semi-supervised subspace learning methods. It selects weak classifiers based on class-separability. Combination weights of selected weak classifiers can be obtained by subspace learning. Three typical algorithms are proposed under this framework and evaluated on public data sets. As shown by our experimental results, the proposed methods obtain superior performances over their supervised counterparts and AdaBoost.
Kusakunniran, W., Satoh, S., Zhang, J. & Wu, Q. 2013, 'Attribute-based learning for large scale object classification', Proceedings - IEEE International Conference on Multimedia and Expo.
View/Download from: Publisher's site
Scalability to large numbers of classes is an important challenge for multi-class classification. It can often be computationally infeasible at test phase when class prediction is performed by using every possible classifier trained for each individual class. This paper proposes an attribute-based learning method to overcome this limitation. First is to define attributes and their associations with object classes automatically and simultaneously. Such associations are learned based on greedy strategy under certain conditions. Second is to learn a classifier for each attribute instead of each class. Then, these trained classifiers are used to predict classes based on their attribute representations. The proposed method also allows trade-off between test-time complexity (which grows linearly with the number of attributes) and accuracy. Experiments based on Animals-with-Attributes and ILSVRC2010 datasets have shown that the performance of our method is promising when compared with the state-of-the-art. 2013 IEEE.
wang, S., Miao, Z. & Zhang, J. 2013, 'Simultaneously detect and segment pedestrian', 2013 IEEE International Conference on Multimedia and Expo, IEEE, USA, pp. 1-4.
View/Download from: Publisher's site
We present a framework to simultaneously detect and segment pedestrian in images. Our work is based on part-based method. We first segment the image into superpixels, then assemble superpixels into body part candidates by comparing the assembled shape with pre-built template library. A structure-based shape matching algorithm is developed to measure the shape similarity. All the body part candidates are input into our modified AND/OR graph to generate the most reasonable combination. The graph describes the possible variation of body configuration and model the constrain relationship between body parts. We perform comparison experiments on the public database and the results show the effectiveness of our framework.
Shen, Y., Miao, Z. & Zhang, J. 2012, 'Unsupervised online learning trajectory analysis based on weighted directed graph', Proceedings - International Conference on Pattern Recognition, pp. 1306-1309.
In this paper, we propose a novel unsupervised online learning trajectory analysis method based on weighted directed graph. Each trajectory can be represented as a sequence of key points. In the training stage, unsupervised expectation-maximization algorithm (EM) is applied for training data to cluster key points. Each class is a Gaussian distribution. It is considered as a node of the graph. According to the classification of key points, we can build a weighted directed graph to represent the trajectory network in the scene. Each path is a category of trajectories. In the test stage, we adopt online EM algorithm to classify trajectories and update the graph. In the experiments, we test our approach and obtain a good performance compared with state-of-the-art approaches. 2012 ICPR Org Committee.
Zhang, J., Lu, S., Mei, T., Wang, J., Wang, Z., Feng, D., Sun, J. & Li, S. 2012, 'Browse-to-search', Browse-to-search, ACM, USA, pp. 1323-1324.
Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 3040KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit ratemobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server.
Zhang, J., Wu, Y., Lu, S., Mei, T. & Li, S. 2012, 'Local visual words coding for low bit rate mobile visual search', Local visual words coding for low bit rate mobile visual search, ACM, USA, pp. 989-992.
Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 3040KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit ratemobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server
Xu, J., Wu, Q., Zhang, J. & Tang, Z. 2012, 'Object Detection Based on Co-Ocurrence GMuLBP Features', 2012 IEEE International Conference on Multimedia and Expo, IEEE Computer Society, Melbourne Australia, pp. 943-948.
View/Download from: Publisher's site
Image co-occurrence has shown great powers on object classification because it captures the characteristic of individual features and spatial relationship between them simultaneously. For example, Co-occurrence Histogram of Oriented Gradients (CoHOG) has achieved great success on human detection task. However, the gradient orientation in CoHOG is sensitive to noise. In addition, CoHOG does not take gradient magnitude into account which is a key component to reinforce the feature detection. In this paper, we propose a new LBP feature detector based image co-occurrence. Building on uniform Local Binary Patterns, the new feature detector detects Co-occurrence Orientation through Gradient Magnitude calculation. It is known as CoGMuLBP. An extension version of the GoGMuLBP is also presented. The experimental results on the UIUC car data set show that the proposed features outperform state-of-the-art methods.
Quek, A., Wang, Z., Zhang, J. & Feng, D. 2011, 'Structural Image Classification with Graph Neural Networks', Proceedings of 2011 International Conference on Digital Image Computing - Techniques and Applications, IEEE, USA, pp. 416-421.
Many approaches to image classification tend to transform an image into an unstructured set of numeric feature vectors obtained globally and/or locally, and as a result lose important relational information between regions. In order to encode the geometric relationships between image regions, we propose a variety of structural image representations that are not specialised for any particular image category. Besides the traditional grid-partitioning and global segmentation methods, we investigate the use of local scale-invariant region detectors. Regions are connected based not only upon nearest-neighbour heuristics, but also upon minimum spanning trees and Delaunay triangulation. In order to maintain the topological and spatial relationships between regions, and also to effectively process undirected connections represented as graphs, we utilise the recently-proposed graph neural network model. To the best of our knowledge, this is the first utilisation of the model to process graph structures based on local-sampling techniques, for the task of image classification. Our experimental results demonstrate great potential for further work in this domain.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2011, 'Pairwise shape configuration-based PSA for gait recognition under small viewing angle change', 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2011, pp. 17-22.
View/Download from: Publisher's site
Two main components of Procrustes Shape Analysis (PSA) are adopted and adapted specifically to address gait recognition under small viewing angle change: 1) Procrustes Mean Shape (PMS) for gait signature description; 2) Procrustes Distance (PD) for similarity measurement. Pairwise Shape Configuration (PSC) is proposed as a shape descriptor in place of existing Centroid Shape Configuration (CSC) in conventional PSA. PSC can better tolerate shape change caused by viewing angle change than CSC. Small variation of viewing angle makes large impact only on global gait appearance. Without major impact on local spatio-temporal motion, PSC which effectively embeds local shape information can generate robust view-invariant gait feature. To enhance gait recognition performance, a novel boundary re-sampling process is proposed. It provides only necessary re-sampled points to PSC description. In the meantime, it efficiently solves problems of boundary point correspondence, boundary normalization and boundary smoothness. This re-sampling process adopts prior knowledge of body pose structure. Comprehensive experiment is carried out on the CASIA gait database. The proposed method is shown to significantly improve performance of gait recognition under small viewing angle change without additional requirements of supervised learning, known viewing angle and multi-camera system, when compared with other methods in literatures. 2011 IEEE.
Zhang, J. & Liu, X. 2011, 'Active Learning for Human Action Recognition with Gaussian Processes', Proceedings of 2011 International Conference on Image Processing, IEEE, USA, pp. 3253-3256.
This paper presents an active learning approach for recognizing human actions in videos based on multiple kernel combined method. We design the classifier based on Multiple Kernel Learning (MKL) through Gaussian Processes (GP) regression. This classifier is then trained in an active learning approach. In each iteration, one optimal sample is selected to be interactively annotated and incorporated into training set. The selection of the sample is based on the heuristic feedback of the GP classifier. To our knowledge, GP regression MKL based active learning methods have not been applied to address the human action recognition yet. We test this approach on standard benchmarks. This approach outperforms the state-of-the-art techniques in accuracy while requires significantly less training samples.
Li, Z., Wu, Q., Zhang, J. & Geers, G. 2011, 'SKRWM based descriptor for pedestrian detection in thermal images', 2011 IEEE 13th International Workshop on Multimedia Signal Processing (MMSP), IEEE, USA, pp. 1-6.
Pedestrian detection in a thermal image is a difficult task due to intrinsic challenges:1) low image resolution, 2) thermal noising, 3) polarity changes, 4) lack of color, texture or depth information. To address these challenges, we propose a novel mid-level feature descriptor for pedestrian detection in thermal domain, which combines pixel-level Steering Kernel Regression Weights Matrix (SKRWM) with their corresponding covariances. SKRWM can properly capture the local structure of pixels, while the covariance computation can further provide the correlation of low level feature. This mid-level feature descriptor not only captures the pixel-level data difference and spatial differences of local structure, but also explores the correlations among low-level features. In the case of human detection, the proposed mid-level feature descriptor can discriminatively distinguish pedestrian from complexity. For testing the performance of proposed feature descriptor, a popular classifier framework based on Principal Component Analysis (PCA) and Support Vector Machine (SVM) is also built. Overall, our experimental results show that proposed approach has overcome the problems caused by background subtraction in [1] while attains comparable detection accuracy compared to the state-of-the-arts.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2011, 'Speed-invariant gait recognition based on procrustes shape analysis using higher-order shape configuration', Proceedings - International Conference on Image Processing, ICIP, pp. 545-548.
View/Download from: Publisher's site
Walking speed change is considered a typical challenge hindering reliable human gait recognition. This paper proposes a novel method to extract speed-invariant gait feature based on Procrustes Shape Analysis (PSA). Two major components of PSA, i.e., Procrustes Mean Shape (PMS) and Procrustes Distance (PD), are adopted and adapted specifically for the purpose of speed-invariant gait recognition. One of our major contributions in this work is that, instead of using conventional Centroid Shape Configuration (CSC) which is not suitable to describe individual gait when body shape changes particularly due to change of walking speed, we propose a new descriptor named Higher-order derivative Shape Configuration (HSC) which can generate robust speed-invariant gait feature. From the first order to the higher order, derivative shape configuration contains gait shape information of different levels. Intuitively, the higher order of derivative is able to describe gait with shape change caused by the larger change of walking speed. Encouraging experimental results show that our proposed method is efficient for speed-invariant gait recognition and evidently outperforms other existing methods in the literatures. 2011 IEEE.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2010, 'Multi-view gait recognition based on motion regression using multilayer perceptron', Proceedings - International Conference on Pattern Recognition, pp. 2186-2189.
View/Download from: Publisher's site
It has been shown that gait is an efficient biometric feature for identifying a person at a distance. However, it is a challenging problem to obtain reliable gait feature when viewing angle changes because the body appearance can be different under the various viewing angles. In this paper, the problem above is formulated as a regression problem where a novel View Transformation Model (VTM) is constructed by adopting Multilayer Perceptron (MLP) as regression tool. It smoothly estimates gait feature under an unknown viewing angle based on motion information in a well selected Region of Interest (ROI) under other existing viewing angles. Thus, this proposal can normalize gait features under various viewing angles into a common viewing angle before gait similarity measurement is carried out. Encouraging experimental results have been obtained based on widely adopted benchmark database. 2010 IEEE.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2010, 'Support vector regression for multi-view gait recognition based on local motion feature selection', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 974-981.
View/Download from: Publisher's site
Gait is a well recognized biometric feature that is used to identify a human at a distance. However, in real environment, appearance changes of individuals due to viewing angle changes cause many difficulties for gait recognition. This paper re-formulates this problem as a regression problem. A novel solution is proposed to create a View Transformation Model (VTM) from the different point of view using Support Vector Regression (SVR). To facilitate the process of regression, a new method is proposed to seek local Region of Interest (ROI) under one viewing angle for predicting the corresponding motion information under another viewing angle. Thus, the well constructed VTM is able to transfer gait information under one viewing angle into another viewing angle. This proposal can achieve view-independent gait recognition. It normalizes gait features under various viewing angles into a common viewing angle before similarity measurement is carried out. The extensive experimental results based on widely adopted benchmark dataset demonstrate that the proposed algorithm can achieve significantly better performance than the existing methods in literature. 2010 IEEE.
Li, Z., Zhang, J., Wu, Q. & Geers, G.D. 2010, 'Feature Enhancement Using Gradient Salience on Thermal Image', Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), IEEE Computer Society, Sydney, Australia, pp. 556-562.
View/Download from: Publisher's site
Feature enhancement in an image is to reinforce some exacted features so that it can be used for object classification and detection. As the thermal image is lack of texture and colorful information, the techniques for visual image feature enhancement is insufficient to apply to thermal images. In this paper, we propose a new gradient-based approach for feature enhancement in thermal image. We use the statistical properties of gradient of foreground object profiles, and formulate object features with gradient saliency. Empirical evaluation of the proposed approach shows significant performance improved on human contours which can be used for detection and classification.
Saesue, W., Chou, C. & Zhang, J. 2010, 'Cross-layer QoS-optimized EDCA adaptation for wireless video streaming', Proceedings of 2010 IEEE 17th International Conference on Image Processing, IEEE, Piscataway, NJ, pp. 2925-2928.
View/Download from: Publisher's site
In this paper, we propose an adaptive cross layer technique that optimally enhance the QoS of wireless video transmission in an IEEE 802.11e WLAN. The optimization takes into account the unequal error protection characteristics of video streaming, the IE
Paisitkriangkrai, S., Shen, C. & Zhang, J. 2010, 'Face Detection with Effective Feature Extraction'.
There is an abundant literature on face detection due to its important role in many vision applications. Since Viola and Jones proposed the first real-time AdaBoost based face detector, Haar-like features have been adopted as the method of choice for frontal face detection. In this work, we show that simple features other than Haar-like features can also be applied for training an effective face detector. Since, single feature is not discriminative enough to separate faces from difficult non-faces, we further improve the generalization performance of our simple features by introducing feature co-occurrences. We demonstrate that our proposed features yield a performance improvement compared to Haar-like features. In addition, our findings indicate that features play a crucial role in the ability of the system to generalize.
Thi, T., Zhang, J., Cheng, L., Wang, L. & Satoh, S. 2010, 'Human action recognition and localization in video using structured learning of local space-time features', 2010 Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance, IEEE, Piscataway, NJ, pp. 204-211.
View/Download from: Publisher's site
This paper presents a unified framework for human action classification and localization in video using structured learning of local space-time features. Each human action class is represented by a set of its own compact set of local patches. In our appr
Thi, T., Cheng, L., Zhang, J. & Wang, L. 2010, 'Implicit motion-shape model: A generic approach for action matching', Proceedings of 2010 IEEE 17th International Conference on Image Processing, IEEE, Piscataway, NJ, pp. 1477-1480.
View/Download from: Publisher's site
We develop a robust technique to find similar matches of human actions in video. Given a query video, Motion History Images (MHI) are constructed for consecutive keyframes. This is followed by dividing the MHI into local Motion-Shape regions, which allow
Wang, W., Zhang, J. & Shen, C. 2010, 'Improved human detection and classification in thermal images', Proceedings - International Conference on Image Processing, ICIP, IEEE, Piscataway, NJ, pp. 2313-2316.
View/Download from: Publisher's site
We present a new method for detecting pedestrians in thermal images. The method is based on the Shape Context Descriptor (SCD) with the Adaboost cascade classifier framework. Compared with standard optical images, thermal imaging cameras offer a clear advantage for night-time video surveillance. It is robust on the light changes in day-time. Experiments show that shape context features with boosting classification provide a significant improvement on human detection in thermal images. In this work, we have also compared our proposed method with rectangle features on the public dataset of thermal imagery. Results show that shape context features are much better than the conventional rectangular features on this task.
Paisitkriangkrai, S., Mei, T., Zhang, J. & Hua, X. 2010, 'Scalable clip-based near-duplicate video detection with ordinal measure', CIVR 2010 - 2010 ACM International Conference on Image and Video Retrieval, NA, pp. 121-128.
View/Download from: Publisher's site
Detection of duplicate or near-duplicate videos on large-scale database plays an important role in video search. In this paper, we analyze the problem of near-duplicates detection and propose a practical and effective solution for real-time large-scale v
Thi, T., Cheng, L., Zhang, J., Wang, L. & Satoh, S. 2010, 'Weakly supervised action recognition using implicit shape models', Proceedings - International Conference on Pattern Recognition, IEEE, Piscataway, NJ, pp. 3517-3520.
View/Download from: Publisher's site
In this paper, we present a robust framework for action recognition in video, that is able to perform competitively against the state-of-the-art methods, yet does not rely on sophisticated background subtraction preprocess to remove background features.
Kusakunniran, W., Wu, Q., Li, H. & Zhang, J. 2009, 'Automatic gait recognition using weighted binary pattern on video', 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2009, pp. 49-54.
View/Download from: Publisher's site
Human identification by recognizing the spontaneous gait recorded in real-world setting is a tough and not yet fully resolved problem in biometrics research. Several issues have contributed to the difficulties of this task. They include various poses, different clothes, moderate to large changes of normal walking manner due to carrying diverse goods when walking, and the uncertainty of the environments where the people are walking. In order to achieve a better gait recognition, this paper proposes a new method based on Weighted Binary Pattern (WBP). WBP first constructs binary pattern from a sequence of aligned silhouettes. Then, adaptive weighting technique is applied to discriminate significances of the bits in gait signatures. Being compared with most of existing methods in the literatures, this method can better deal with gait frequency, local spatial-temporal human pose features, and global body shape statistics. The proposed method is validated on several well known benchmark databases. The extensive and encouraging experimental results show that the proposed algorithm achieves high accuracy, but with low complexity and computational time. 2009 IEEE.
Kusakunniran, W., Wu, Q., Li, H. & Zhang, J. 2009, 'Multiple views gait recognition using view transformation model based on optimized gait energy image', 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops 2009, pp. 1058-1064.
View/Download from: Publisher's site
Gait is one of well recognized biometrics that has been widely used for human identification. However, the current gait recognition might have difficulties due to viewing angle being changed. This is because the viewing angle under which the gait signature database was generated may not be the same as the viewing angle when the probe data are obtained. This paper proposes a new multi-view gait recognition approach which tackles the problems mentioned above. Being different from other approaches of same category, this new method creates a so called View Transformation Model (VTM) based on spatial-domain Gait Energy Image (GEI) by adopting Singular Value Decomposition (SVD) technique. To further improve the performance of the proposed VTM, Linear Discriminant Analysis (LDA) is used to optimize the obtained GEI feature vectors. When implementing SVD there are a few practical problems such as large matrix size and over-fitting. In this paper, reduced SVD is introduced to alleviate the effects caused by these problems. Using the generated VTM, the viewing angles of gallery gait data and probe gait data can be transformed into the same direction. Thus, gait signatures can be measured without difficulties. The extensive experiments show that the proposed algorithm can significantly improve the multiple view gait recognition performance when being compared to the similar methods in literature. 2009 IEEE.
Kusakunniran, W., Li, H. & Zhang, J. 2009, 'A direct method to self-calibrate a surveillance camera by observing a walking pedestrian', 2009 Digital Image Computing: Techniques and Applications, IEEE, Piscataway, NJ, pp. 250-255.
View/Download from: Publisher's site
Recent efforts show that it is possible to calibrate a surveillance camera simply from observing a walking human. This procedure can be seen as a special application of the camera self-calibration technique. Several methods have been proposed along this
Wang, W., Shen, C., Zhang, J. & Paisitkriangkrai, S. 2009, 'A two-layer night-time vehicle detector', 2009 Digital Image Computing: Techniques and Applications, IEEE, Piscataway, NJ, pp. 162-167.
View/Download from: Publisher's site
We present a two-layer night time vehicle detector in this work. At the first layer, vehicle headlight detection [1, 2, 3] is applied to find areas (bounding boxes) where the possible pairs of headlights locate in the image, the Haar feature based AdaBoo
Paisitkriangkrai, S., Shen, C. & Zhang, J. 2009, 'Efficiently training a better visual detector with sparse eigenvectors', 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, IEEE, Piscataway, NJ, pp. 1129-1136.
View/Download from: Publisher's site
Face detection plays an important role in many vision applications. Since Viola and Jones [1] proposed the first real-time AdaBoost based object detection system, much ef- fort has been spent on improving the boosting method. In this work, we first show
Thi, T., Lu, S., Zhang, J., Cheng, L. & Wang, L. 2009, 'Human body articulation for action recognition in video sequences', 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2009, IEEE, Piscataway, NJ, pp. 92-97.
View/Download from: Publisher's site
This paper presents a new technique for action recognition in video using human body part-based approach, combining both local feature description of each body part, and global graphical model structure of the human action. The human body is divided into
Ong, C., Lu, S. & Zhang, J. 2008, 'An approach for enhancing the results of detecting foreground objects and their moving shadows in surveillance video', Digital Image Computing: Techniques and Applications, IEEE, Piscataway, NJ, pp. 242-249.
View/Download from: Publisher's site
Automated surveillance system is becoming increasingly important especially in the fields of computer vision and video processing. This paper describes a novel approach for improving the results of detecting foreground objects and their shadows in indoor
Paisitkriangkrai, S., Shen, C. & Zhang, J. 2008, 'An experimental study on pedestrian classification using local features', Proceedings - IEEE International Symposium on Circuits and Systems, IEEE, Piscataway, NJ, pp. 2741-2744.
View/Download from: Publisher's site
This paper presents an experimental study on pedestrian detection using state-of-the-art local feature extraction and support vector machine (SVM) classifiers. The performance of pedestrian detection using region covariance, histogram of oriented gradien
Yu, J., Zhang, J., Sun, W., Yuan, L. & Peng, G. 2008, 'Crosstalk analysis of a smart sensor unit based on FBG and FOWLI', Proceedings of SPIE - The International Society for Optical Engineering, NA, pp. 0-0.
View/Download from: Publisher's site
The effective optical path method is proposed to analyze the measurement crosstalk of a smart fiber optic sensor unit based on multiplexing fiber Bragg gratings (FBG) and fiber optical white light interferometry (FOWLI). According the analysis, the cross
Luo, C., Cai, X. & Zhang, J. 2008, 'GATE: A novel robust object tracking method using the particle filtering and level set method', Digital Image Computing: Techniques and Applications, IEEE, Piscataway, NJ, pp. 378-385.
View/Download from: Publisher's site
This paper presents a novel algorithm for robust object tracking based on the particle filtering method employed in recursive Bayesian estimation and image segmentation and optimisation techniques employed in active contour models and level set methods.
Saesue, W., Zhang, J. & Chun, T. 2008, 'Hybrid frame-recursive block-based distortion estimation model for wireless video transmission', Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008, IEEE, Piscataway, NJ, pp. 774-779.
View/Download from: Publisher's site
In wireless environments, video quality can be severely degraded due to channel errors. Improving error robustness towards the impact of packet loss in error-prone network is considered as a critical concern in wireless video networking research. Data pa
Luo, C., Cai, X. & Zhang, J. 2008, 'Robust object tracking using the particle filtering and level set methods: A comparative experiment', Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008, IEEE, Piscataway, NJ, pp. 359-364.
View/Download from: Publisher's site
Robust visual tracking has become an important topic of research in computer vision. A novel method for robust object tracking, GATE [11], improves object tracking in complex environments using the particle filtering and the level set-based active contou
Thi, T., Lu, S. & Zhang, J. 2008, 'Self-calibration of traffic surveillance camera using motion tracking', Proceedings of the 11th International IEEE Conference on Intelligent Transportation Systems, IEEE, Piscataway, NJ, pp. 304-309.
View/Download from: Publisher's site
A statistical and computer vision approach using tracked moving vehicle shapes for auto-calibrating traffic surveillance cameras is presented. Vanishing point of the traffic direction is picked up from Linear Regression of all tracked vehicle points. Pre
Thi, T., Robert, K., Lu, S. & Zhang, J. 2008, 'Vehicle classification at nighttime using eigenspaces and support vector machine', 2008 Congress on Image and Signal Processing, IEEE, Piscataway, NJ, pp. 422-426.
View/Download from: Publisher's site
A robust framework to classify vehicles in nighttime traffic using vehicle eigenspaces and support vector machine is presented. In this paper, a systematic approach has been proposed and implemented to classify vehicles from roadside camera video sequenc
Lu, S., Zhang, J. & Feng, D. 2007, 'An efficient method for detecting ghost and left objects in surveillance video', 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2007 Proceedings, NA, pp. 540-545.
View/Download from: Publisher's site
This paper proposes an efficient method for detecting ghost and left objects in surveillance video, which, if not identified, may lead to errors or wasted computation in background modeling and object tracking in surveillance systems. This method contain
Paisitkriangkrai, S., Shen, C. & Zhang, J. 2007, 'An experimental evaluation of local features for pedestrian classification', Proceedings - Digital Image Computing Techniques and Applications: 9th Biennial Conference of the Australian Pattern Recognition Society, DICTA 2007, NA, pp. 53-60.
View/Download from: Publisher's site
The ability to detect pedestrians is a first important step in many computer vision applications such as video surveillance. This paper presents an experimental study on pedestrian detection using state-of-the-art local feature extraction and support vec
Lu, S., Zhang, J. & Feng, D. 2006, 'A knowledge-based approach for detecting unattended packages in surveillance video', Proceedings - IEEE International Conference on Video and Signal Based Surveillance 2006, AVSS 2006, NA, pp. 0-0.
View/Download from: Publisher's site
This paper describes a novel approach for detecting unattended packages in surveillance video. Unlike the traditional approach to just detecting stationary objects in monitored scenes, our approach detects unattended packages based on accumulated knowled
Chen, J., Shen, J., Zhang, J. & Wangsa, K. 2006, 'A novel multimedia database system for efficient image/video retrieval based on hybrid-tree structure', Proceedings of the 2006 International Conference on Machine Learning and Cybernetics, NA, pp. 4353-4358.
View/Download from: Publisher's site
With recent advances in computer vision, image processing and analysis, a retrieval process based on visual content has became a key component in achieving high efficiency image query for large multimedia databases. In this paper, we propose and develop
Mathew, R., Yu, Z. & Zhang, J. 2006, 'Detecting new stable objects in surveillance video', 2005 IEEE 7th Workshop on Multimedia Signal Processing, NA, pp. 0-0.
View/Download from: Publisher's site
We describe a novel method to detect new stable objects in video. This includes detecting new objects that appear in a scene and remain stationary for a period of time. Examples include detecting a dropped bag or a parked car. Our method utilizes the sta
Lu, S., Zhang, J. & Feng, D. 2005, 'Classification of moving humans using eigen-features and support vector machines', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), NA, pp. 522-529.
This paper describes a method of categorizing the moving objects using eigen-features and support vector machines. Eigen-features, generally used in face recognition and static image classification, are applied to classify the moving objects detected fro

Journal articles

Kusakunniran, W., Wu, Q., Zhang, J., Li, H. & Wang, L. 2014, 'Recognizing gaits across views through correlated motion co-clustering', IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 696-709.
View/Download from: Publisher's site
Human gait is an important biometric feature, which can be used to identify a person remotely. However, view change can cause significant difficulties for gait recognition because it will alter available visual features for matching substantially. Moreover, it is observed that different parts of gait will be affected differently by view change. By exploring relations between two gaits from two different views, it is also observed that a part of gait in one view is more related to a typical part than any other parts of gait in another view. A new method proposed in this paper considers such variance of correlations between gaits across views that is not explicitly analyzed in the other existing methods. In our method, a novel motion co-clustering is carried out to partition the most related parts of gaits from different views into the same group. In this way, relationships between gaits from different views will be more precisely described based on multiple groups of the motion co-clustering instead of a single correlation descriptor. Inside each group, a linear correlation between gait information across views is further maximized through canonical correlation analysis (CCA). Consequently, gait information in one view can be projected onto another view through a linear approximation under the trained CCA subspaces. In the end, a similarity between gaits originally recorded from different views can be measured under the approximately same view. Comprehensive experiments based on widely adopted gait databases have shown that our method outperforms the state-of-the-art. 2013 IEEE.
Wu, Y., Ma, B., Yang, M., Zhang, J. & Jia, Y. 2014, 'Metric learning based structural appearance model for robust visual tracking', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 5, pp. 865-877.
View/Download from: Publisher's site
Appearance modeling is a key issue for the success of a visual tracker. Sparse representation based appearance modeling has received an increasing amount of interest in recent years. However, most of existing work utilizes reconstruction errors to compute the observation likelihood under the generative framework, which may give poor performance, especially for significant appearance variations. In this paper, we advocate an approach to visual tracking that seeks an appropriate metric in the feature space of sparse codes and propose a metric learning based structural appearance model for more accurate matching of different appearances. This structural representation is acquired by performing multiscale max pooling on the weighted local sparse codes of image patches. An online multiple instance metric learning algorithm is proposed that learns a discriminative and adaptive metric, thereby better distinguishing the visual object of interest from the background. The multiple instance setting is able to alleviate the drift problem potentially caused by misaligned training examples. Tracking is then carried out within a Bayesian inference framework, in which the learned metric and the structure object representation are used to construct the observation model. Comprehensive experiments on challenging image sequences demonstrate qualitatively and quantitatively that the proposed algorithm outperforms the state-of-the-art methods. 2013 IEEE.
Xu, J., Wu, Q., Zhang, J., Shen, F. & Tang, Z. 2014, 'Boosting Separability in Semisupervised Learning for Object Classification', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 7, pp. 1197-1208.
View/Download from: Publisher's site
Liu, X., Yin, J., Wang, L., Liu, L., Liu, J., Hou, C. & Zhang, J. 2013, 'An Adaptive Approach To Learning Optimal Neighborhood Kernels', IEEE Transactions on Cybernetics, vol. 43, no. 1, pp. 371-384.
View/Download from: Publisher's site
Learning an optimal kernel plays a pivotal role in kernel-based methods. Recently, an approach called optimal neighborhood kernel learning (ONKL) has been proposed, showing promising classification performance. It assumes that the optimal kernel will reside in the neighborhood of a pre-specified kernel. Nevertheless, how to specify such a kernel in a principled way remains unclear. To solve this issue, this paper treats the pre-specified kernel as an extra variable and jointly learns it with the optimal neighborhood kernel and the structure parameters of support vector machines. To avoid trivial solutions, we constrain the pre-specified kernel with a parameterized model. We first discuss the characteristics of our approach and in particular highlight its adaptivity. After that, two instantiations are demonstrated by modeling the pre-specified kernel as a common Gaussian radial basis function kernel and a linear combination of a set of base kernels in the way of multiple kernel learning (MKL), respectively. We show that the optimization in our approach is a min-max problem and can be efficiently solved by employing the extended level method and Nesterov's method. Also, we give the probabilistic interpretation for our approach and apply it to explain the existing kernel learning methods, providing another perspective for their commonness and differences. Comprehensive experimental results on 13 UCI data sets and another two real-world data sets show that via the joint learning process, our approach not only adaptively identifies the pre-specified kernel, but also achieves superior classification performance to the original ONKL and the related MKL algorithms.
Liu, X., Wang, L., Yin, J., Zhu, E. & Zhang, J. 2013, 'An Efficient Approach To Integrating Radius Information Into Multiple Kernel Learning', IEEE Transactions on Cybernetics, vol. 43, no. 2, pp. 557-569.
View/Download from: Publisher's site
Integrating radius information has been demonstrated by recent work on multiple kernel learning (MKL) as a promising way to improve kernel learning performance. Directly integrating the radius of the minimum enclosing ball (MEB) into MKL as it is, however, not only incurs significant computational overhead but also possibly adversely affects the kernel learning performance due to the notorious sensitivity of this radius to outliers. Inspired by the relationship between the radius of the MEB and the trace of total data scattering matrix, this paper proposes to incorporate the latter into MKL to improve the situation. In particular, in order to well justify the incorporation of radius information, we strictly comply with the radius-margin bound of support vector machines (SVMs) and thus focus on the l2-norm soft-margin SVM classifier. Detailed theoretical analysis is conducted to show how the proposed approach effectively preserves the merits of incorporating the radius of the MEB and how the resulting optimization is efficiently solved. Moreover, the proposed approach achieves the following advantages over its counterparts: 1) more robust in the presence of outliers or noisy training samples; 2) more computationally efficient by avoiding the quadratic optimization for computing the radius at each iteration; and 3) readily solvable by the existing off-the-shelf MKL packages. Comprehensive experiments are conducted on University of California, Irvine, protein subcellular localization, and Caltech-101 data sets, and the results well demonstrate the effectiveness and efficiency of our approach.
Xin, J., Chen, K., Bai, L., Liu, D. & Zhang, J. 2013, 'Depth adaptive zooming visual servoing for a robot with a zooming camera', International Journal of Advanced Robotic Systems, vol. 10.
View/Download from: Publisher's site
To solve the view visibility problem and keep the observed object in the field of view (FOV) during the visual servoing, a depth adaptive zooming visual servoing strategy for a manipulator robot with a zooming camera is proposed. Firstly, a zoom control mechanism is introduced into the robot visual servoing system. It can dynamically adjust the camera's field of view to keep all the feature points on the object in the field of view of the camera and get high object local resolution at the end of visual servoing. Secondly, an invariant visual servoing method is employed to control the robot to the desired position under the changing intrinsic parameters of the camera. Finally, a nonlinear depth adaptive estimation scheme in the invariant space using Lyapunov stability theory is proposed to estimate adaptively the depth of the image features on the object. Three kinds of robot 4DOF visual positioning simulation experiments are conducted. The simulation experiment results show that the proposed approach has higher positioning precision. 2013 Xin et al.
Lu, S., Zhang, J., Wang, Z. & Feng, D. 2013, 'Fast Human Action Classification And VOI Localization With Enhanced Sparse Coding', Journal of Visual Communication, vol. 24, no. 2, pp. 127-136.
View/Download from: Publisher's site
Sparse coding which encodes the natural visual signal into a sparse space for visual codebook generation and feature quantization, has been successfully utilized for many image classification applications. However, it has been seldom explored for many video analysis tasks. In particular, the increased complexity in characterizing the visual patterns of diverse human actions with both the spatial and temporal variations imposes more challenges to the conventional sparse coding scheme. In this paper, we propose an enhanced sparse coding scheme through learning discriminative dictionary and optimizing the local pooling strategy. Localizing when and where a specific action happens in realistic videos is another challenging task. By utilizing the sparse coding based representations of human actions, this paper further presents a novel coarse-to-fine framework to localize the Volumes of Interest (VOIs) for the actions. Firstly, local visual features are transformed into the sparse signal domain through our enhanced sparse coding scheme. Secondly, in order to avoid exhaustive scan of entire videos for the VOI localization, we extend the Spatial Pyramid Matching into temporal domain, namely Spatial Temporal Pyramid Matching, to obtain the VOI candidates. Finally, a multi-level branch-and-bound approach is developed to refine the VOI candidates. The proposed framework is also able to avoid prohibitive computations in local similarity matching (e.g., nearest neighbors voting). Experimental results on both two popular benchmark datasets (KTH and YouTube UCF) and the widely used localization dataset (MSR) demonstrate that our approach reduces computational cost significantly while maintaining comparable classification accuracy to that of the state-of-the-art methods
Song, Y., Zhang, J., Cao, L. & Sangeux, M. 2013, 'On Discovering the Correlated Relationship between Static and Dynamic Data in Clinical Gait Analysis', Lecture Notes in Computer Science, vol. 8190, no. 1, pp. 563-578.
View/Download from: Publisher's site
`Gait' is a person's manner of walking. Patients may have an abnormal gait due to a range of physical impairment or brain damage. Clinical gait analysis (CGA) is a technique for identifying the underlying impairments that affect a patients gait pattern. The CGA is critical for treatment planning. Essentially, CGA tries to use patients physical examination results, known as static data, to interpret the dynamic characteristics in an abnormal gait, known as dynamic data. This process is carried out by gait analysis experts, mainly based on their experience which may lead to subjective diagnoses. To facilitate the automation of this process and form a relatively objective diagnosis, this paper proposes a new probabilistic correlated static-dynamic model (CSDM) to discover correlated relationships between the dynamic characteristics of gait and their root cause in the static data space. We propose an EMbased algorithm to learn the parameters of the CSDM. One of the main advantages of the CSDM is its ability to provide intuitive knowledge. For example, the CSDM can describe what kinds of static data will lead to what kinds of hidden gait patterns in the form of a decision tree, which helps us to infer dynamic characteristics based on static data. Our initial experiments indicate that the CSDM is promising for discovering the correlated relationship between physical examination (static) and gait (dynamic) data.
Zhang, J., Wu, Q., Kusakunniran, W., Ma, Y. & Li, H. 2013, 'A New View-Invariant Feature for Cross-View Gait Recognition', IEEE Transactions on Information Forensics and Security, vol. 8, no. 10, pp. 1642-1653.
View/Download from: Publisher's site
Human gait is an important biometric feature which is able to identify a person remotely. However, change of view causes significant difficulties for recognizing gaits. This paper proposes a new framework to construct a new view-invariant feature for cross-view gait recognition. Our view-normalization process is performed in the input layer (i.e., on gait silhouettes) to normalize gaits from arbitrary views. That is, each sequence of gait silhouettes recorded from a certain view is transformed onto the common canonical view by using corresponding domain transformation obtained through invariant low-rank textures (TILTs). Then, an improved scheme of procrustes shape analysis (PSA) is proposed and applied on a sequence of the normalized gait silhouettes to extract a novel view-invariant gait feature based on procrustes mean shape (PMS) and consecutively measure a gait similarity based on procrustes distance (PD). Comprehensive experiments were carried out on widely adopted gait databases. It has been shown that the performance of the proposed method is promising when compared with other existing methods in the literature.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2012, 'Cross-view and multi-view gait recognitions based on view transformation model using multi-layer perceptron', Pattern Recognition Letters, vol. 33, no. 7, pp. 882-889.
View/Download from: Publisher's site
Gait has been shown to be an efficient biometric feature for human identification at a distance. However, performance of gait recognition can be affected by view variation. This leads to a consequent difficulty of cross-view gait recognition. A novel method is proposed to solve the above difficulty by using view transformation model (VTM). VTM is constructed based on regression processes by adopting multi-layer perceptron (MLP) as a regression tool. VTM estimates gait feature from one view using a well selected region of interest (ROI) on gait feature from another view. Thus, trained VTMs can normalize gait features from across views into the same view before gait similarity is measured. Moreover, this paper proposes a new multi-view gait recognition which estimates gait feature on one view using selected gait features from several other views. Extensive experimental results demonstrate that the proposed method significantly outperforms other baseline methods in literature for both cross-view and multi-view gait recognitions. In our experiments, particularly, average accuracies of 99%, 98% and 93% are achieved for multiple views gait recognition by using 5 cameras, 4 cameras and 3 cameras respectively. 2011 Elsevier B.V. All rights reserved.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2012, 'Gait recognition under various viewing angles based on correlated motion regression', IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 6, pp. 966-980.
View/Download from: Publisher's site
It is well recognized that gait is an important biometric feature to identify a person at a distance, e.g., in video surveillance application. However, in reality, change of viewing angle causes significant challenge for gait recognition. A novel approach using regression-based view transformation model (VTM) is proposed to address this challenge. Gait features from across views can be normalized into a common view using learned VTM(s). In principle, a VTM is used to transform gait feature from one viewing angle (source) into another viewing angle (target). It consists of multiple regression processes to explore correlated walking motions, which are encoded in gait features, between source and target views. In the learning processes, sparse regression based on the elastic net is adopted as the regression function, which is free from the problem of overfitting and results in more stable regression models for VTM construction. Based on widely adopted gait database, experimental results show that the proposed method significantly improves upon existing VTM-based methods and outperforms most other baseline methods reported in the literature. Several practical scenarios of applying the proposed method for gait recognition under various views are also discussed in this paper. 2012 IEEE.
Thi, T., Cheng, L., Zhang, J., Wang, L. & satoh, S. 2012, 'Integrating local action elements for action analysis', Computer Vision and Image Understanding, vol. 116, no. 3, pp. 378-395.
View/Download from: Publisher's site
In this paper, we propose a framework for human action analysis from video footage. A video action sequence in our perspective is a dynamic structure of sparse local spatialtemporal patches termed action elements, so the problems of action analysis in video are carried out here based on the set of local characteristics as well as global shape of a prescribed action. We first detect a set of action elements that are the most compact entities of an action, then we extend the idea of Implicit Shape Model to space time, in order to properly integrate the spatial and temporal properties of these action elements. In particular, we consider two different recipes to construct action elements: one is to use a Sparse Bayesian Feature Classifier to choose action elements from all detected Spatial Temporal Interest Points, and is termed discriminative action elements. The other one detects affine invariant local features from the holistic Motion History Images, and picks up action elements according to their compactness scores, and is called generative action elements. Action elements detected from either way are then used to construct a voting space based on their local feature representations as well as their global configuration constraints. Our approach is evaluated in the two main contexts of current human action analysis challenges, action retrieval and action classification. Comprehensive experimental results show that our proposed framework marginally outperforms all existing state-of-the-arts techniques on a range of different datasets.
Xu, J., Wu, Q., Zhang, J. & Tang, Z. 2012, 'Fast and Accurate Human Detection Using a Cascade of Boosted MS-LBP Features', IEEE Signal Processing Letters, vol. 19, no. 10, pp. 676-679.
View/Download from: Publisher's site
In this letter, a new scheme for generating local binary patterns (LBP) is presented. This Modi?ed Symmetric LBP (MS-LBP) feature takes advantage of LBP and gradient features. It is then applied into a boosted cascade framework for human detection. By combining MS-LBP with Haar-like feature into the boosted framework, the performances of heterogeneous features based detectors are evaluated for the best trade-off between accuracy and speed. Two feature training schemes, namely Single AdaBoost Training Scheme (SATS) and Dual AdaBoost Training Scheme (DATS) are proposed and compared. On the top of AdaBoost, two multidimensional feature projection methods are described. A comprehensive experiment is presented. Apart from obtaining higher detection accuracy, the detection speed based on DATS is 17 times faster than HOG method.
Thi, T., Cheng, L., Zhang, J., Wang, L. & satoh, S. 2012, 'Structured learning of local features for human action classification and localization', Image & Vision Computing, vol. 30, no. 1, pp. 1-14.
View/Download from: Publisher's site
Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2012, 'Gait recognition across various walking speeds using higher order shape configuration based on a differential composition model', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 6, pp. 1654-1668.
View/Download from: Publisher's site
Gait has been known as an effective biometric feature to identify a person at a distance. However, variation of walking speeds may lead to significant changes to human walking patterns. It causes many difficulties for gait recognition. A comprehensive analysis has been carried out in this paper to identify such effects. Based on the analysis, Procrustes shape analysis is adopted for gait signature description and relevant similarity measurement. To tackle the challenges raised by speed change, this paper proposes a higher order shape configuration for gait shape description, which deliberately conserves discriminative information in the gait signatures and is still able to tolerate the varying walking speed. Instead of simply measuring the similarity between two gaits by treating them as two unified objects, a differential composition model (DCM) is constructed. The DCM differentiates the different effects caused by walking speed changes on various human body parts. In the meantime, it also balances well the different discriminabilities of each body part on the overall gait similarity measurements. In this model, the Fisher discriminant ratio is adopted to calculate weights for each body part. Comprehensive experiments based on widely adopted gait databases demonstrate that our proposed method is efficient for cross-speed gait recognition and outperforms other state-of-the-art methods. 1996-2012 IEEE.
Zhang, J., Li, N., Yang, Q. & Hu, C. 2012, 'Self-adaptive chaotic differential evolution algorithm for solving constrained circular packing problem', Journal of Computational Information Systems, vol. 8, no. 18, pp. 7747-7755.
Circles packing into a circular container with equilibrium constraint is a NP hard layout optimization problem. It has a broad application in engineering. This paper studies a two-dimensional constrained packing problem. Classical differential evolution for solving this problem is easy to fall into local optima. An adaptive chaotic differential evolution algorithm is proposed to improve the performance in this paper. The weighting parameters are dynamically adjusted by chaotic mutation in the searching procedure. The penalty factors of the fitness function are modified during iteration. To keep the diversity of the population, we limit the population's concentration. To enhance the local search capability, we adopt adaptive mutation of the global optimal individual. The improved algorithm can maintain the basic algorithm's structure as well as extend the searching scales, and can hold the diversity of population as well as increase the searching accuracy. Furthermore, our improved algorithm can escape from premature and speed up the convergence. Numerical examples indicate the effectiveness and efficiency of the proposed algorithm. 2012 Binary Information Press.
Paisitkriangkrai, S., Shen, C. & Zhang, J. 2010, 'Incremental Training of a Detector Using Online Sparse Eigen-decomposition'.
View/Download from: Publisher's site
The ability to efficiently and accurately detect objects plays a very crucial role for many computer vision tasks. Recently, offline object detectors have shown a tremendous success. However, one major drawback of offline techniques is that a complete set of training data has to be collected beforehand. In addition, once learned, an offline detector can not make use of newly arriving data. To alleviate these drawbacks, online learning has been adopted with the following objectives: (1) the technique should be computationally and storage efficient; (2) the updated classifier must maintain its high classification accuracy. In this paper, we propose an effective and efficient framework for learning an adaptive online greedy sparse linear discriminant analysis (GSLDA) model. Unlike many existing online boosting detectors, which usually apply exponential or logistic loss, our online algorithm makes use of LDA's learning criterion that not only aims to maximize the class-separation criterion but also incorporates the asymmetrical property of training data distributions. We provide a better alternative for online boosting algorithms in the context of training a visual object detector. We demonstrate the robustness and efficiency of our methods on handwriting digit and face data sets. Our results confirm that object detection tasks benefit significantly when trained in an online manner.
Shen, C., Paisitkriangkrai, S. & Zhang, J. 2009, 'Efficiently Learning a Detection Cascade with Sparse Eigenvectors'.
In this work, we first show that feature selection methods other than boosting can also be used for training an efficient object detector. In particular, we introduce Greedy Sparse Linear Discriminant Analysis (GSLDA) \cite{Moghaddam2007Fast} for its conceptual simplicity and computational efficiency; and slightly better detection performance is achieved compared with \cite{Viola2004Robust}. Moreover, we propose a new technique, termed Boosted Greedy Sparse Linear Discriminant Analysis (BGSLDA), to efficiently train a detection cascade. BGSLDA exploits the sample re-weighting property of boosting and the class-separability criterion of GSLDA.
Paisitkriangkrai, S., Shen, C. & Zhang, J. 2008, 'Performance evaluation of local features in human classification and detection', IET Computer Vision, vol. 2, no. 4, pp. 236-246.
View/Download from: Publisher's site
Detecting pedestrians accurately is the first fundamental step for many computer vision applications such as video surveillance, smart vehicles, intersection traffic analysis and so on. The authors present an experimental study on pedestrian detection us
Paisitkriangkrai, S., Shen, C. & Zhang, J. 2008, 'Fast pedestrian detection using a cascade of boosted covariance features', IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 8, pp. 1140-1151.
View/Download from: Publisher's site
Efficiently and accurately detecting pedestrians plays a very important role in many computer vision applications such as video surveillance and smart cars. In order to find the right feature for this task, we first present a comprehensive experimental s
Lu, S., Zhang, J. & Dagan, F. 2007, 'Detecting unattended packages through human activity recognition and object association', Pattern Recognition, vol. 40, no. 8, pp. 2173-2184.
View/Download from: Publisher's site
This paper provides a novel approach to detect unattended packages in public venues. Different from previous works on this topic which are mostly limited to detecting static objects where no human is nearby, we provide a solution which can detect an unat
Zhang, J., Arnold, J. & Frater, M. 2000, 'A cell-loss concealment technique for MPEG-2 coded video', IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 4, pp. 659-665.
View/Download from: Publisher's site
Audio-visual and other multimedia services are seen as important sources of traffic for future telecommunication networks, including wireless networks. A major drawback with some wireless networks is that they introduce a significant number of transmissi
Arnold, J., Frater, M. & Zhang, J. 1999, 'Error resilience in the MPEG-2 video coding standard for cell based networks - a review', Signal Processing: Image Communication, vol. 14, no. 6, pp. 607-633.
View/Download from: Publisher's site
The MPEG-2 video coding standard is being extensively used worldwide for the provision of digital video services. Many of these applications involve the transport of MPEG-2 video over cell-based (or packet) networks. Examples include the broadband integr
Frater, M., Arnold, J. & Zhang, J. 1999, 'MPEG 2 video error resilience experiments: The importance considering the impact of the systems layer', Signal Processing: Image Communication, vol. 14, no. 3, pp. 269-275.
View/Download from: Publisher's site
With increasing interest in the transport of video traffic over lossy networks, several techniques for improving the quality of video services in the presence of loss have been proposed, often using the MPEG 2 video coding algorithm as a basis. Many of t
Zhang, J., Frater, M., Arnold, J. & Percival, T. 1997, 'MPEG 2 video services for wireless ATM networks', IEEE Journal on Selected Areas in Communications, vol. 15, no. 1, pp. 119-127.
View/Download from: Publisher's site
Audio-visual and other multimedia services are seen as an important source of traffic for future telecommunications networks, including wireless networks. In this paper, we examine the impact of the properties of a 50 Mb/s asynchronous transfer mode (ATM

Microsoft Research: Microsoft Corp. One Microsoft Way, Redmond WA 98052-6399, USA

Dr. Zhengyou Zhang,

Dr Philip A. Chou

Dr. Zicheng Liu

Dr. Xian-Sheng Hua

The Collaborative Research Project with MSR US:

1. Microsoft External Collaboration Project (Pilot funded project): Advanced 3D Deformable Surface Reconstruction and Tracking through RGB-D Cameras. The aim of this project is to develop novel computer vision technology for real time modelling and tracking of 3D dense and deformable surfaces using general RGB-D cameras. The expected outcomes of this project will add significant value to the current RGB-D camera platform when applied in the common scenario in which the RGB-D camera does not move but the deformable objects of interest are moving.

-----------------------------------------------------------------------------------------------------

Nokia Research Centre in Finland

Dr. Lixin Fan

The Collaborative Research Project with Nokia Research Centre in Finland:

2. Nokia External Collaboration Project (Pilot funded project): Large Scale 3D Image Processing. This project is to develop a novel algorithm for 3D image registration with different point clouds over 3D space. Our research outcome is a critical technology for Nokia’s mobile phone application.