UTS site search

Associate Professor Jian Zhang

Biography

Dr. Jian Zhang is an associate professor in Faculty of Engineering and IT and research leader of Multimedia and Media Analytics Program in UTS Advanced Analytics Institute (AAI). Dr. Zhang earned his PhD from School of Information Technology and Electrical Engineering, UNSW@ADFA, Australian Defence Force Academy, at the University of New South Wales in 1999.

Key Research Projects in UTS:

1. Microsoft External Collaboration Project (Pilot funded project): Advanced 3D Deformable Surface Reconstruction and Tracking through RGB-D Cameras. The aim of this project is to develop novel computer vision technology for real time modelling and tracking of 3D dense and deformable surfaces using general RGB-D cameras. The expected outcomes of this project will add significant value to the current RGB-D camera platform when applied in the common scenario in which the RGB-D camera does not move but the deformable objects of interest are moving.

2. Nokia External Collaboration Project (Pilot funded project): Large Scale 3D Image Processing. This project is to develop a novel algorithm for 3D image registration with different point clouds over 3D space. Our research outcome is a critical technology for Nokia’s mobile phone application.

 PhD scholarships are available to fund high profile PhD candidates in the following areas:
  • Image processing & pattern recognition
  • Multimedia information retrieval
  • Social multimedia signal processing
  • 2D/3D Computer vision
  • Surveillance video content analysis
  • Multimedia and new media Analytics 

From January 2004 - July 2011, Dr Zhang was a Principal Researcher with National ICT Australia (NICTA) and a Conjoint Associate Professor in the School of Computer Science and Engineering at the University of New South Wales, where he was a research leader of Multimedia and Video Communications Research at NICTA Sydney Lab in UNSW Kensington campus. He led several NICTA research projects on traffic sensing and surveillance, video content analysis and management for surveillance and robust automated video surveillance for maritime security. All of these project are in the areas of computer vision, multimedia content analysis and management, and multimedia content indexing and query.

From June 1997 – December 2003, Dr Zhang was with Visual Information Processing Lab, Motorola Labs in Sydney as a senior research engineer and later became a principal research engineer and foundation manager of Visual Communications Research Team, Motorola Labs in Sydney, Australia. 

Professional

Jian Zhang is an IEEE Senior Member. He is the member of Multimedia Signal Processing Technical Committee in Signal Processing Society, Jian was Technical Program Chair, 2008 IEEE Multimedia Signal Processing Workshop; Associated Editor, IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT) and Associated Editor, EURASIP Journal on Image and Video Processing. Dr Zhang was Guest Editor of T-CSVT for Video Technology for Special Issue (March 2007) of the Convergence of Knowledge Engineering Semantics and Signal Processing in Audiovisual Information Retrieval. As a General Co-Chair, Jian has chaired the International Conference on Multimedia and Expo (ICME 2012) in Melbourne Australia 2012.

Professional Activities

  • Associate Editor of IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT)
  • Associate Editor of  International Journal of Image and Video Processing (EURASIP_JIVP)
  • Senior member of the IEEE and its Communications, Computer, and Signal Processing Societies.
  • Member of Multimedia Signal Processing Technical Committee, IEEE Signal Processing Society
  • Area Chair of 2011 EEE International Conference on Image Processing (ICIP2011)
  • Special Session Chair of 2010 International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS2010)
  • General Co-Chair of 2010 Digital Image Computing: Techniques and Applications (DICTA2010)
  • Publicity Chairs of 2010 IEEE International Conference on Multimedia and Expo (ICME2010)
  • Asia-Pacific Liaison Chair of  Visual Communications and Image Processing (VCIP2010)
  • Technical Co-Chairs of 2008 IEEE Multimedia Signal Processing Workshop (MMSP08)
  • General Co-chair of 2102 IEEE International Conference on Multimedia and Expo (ICME 2012)
  • Technical Program Co-chair of 2014 IEEE Visual Communications and Image Processing (VCIP 2014)
Image of Jian Zhang
Associate Professor, A/DRsch Advanced Analytics Institute
Core Member, Advanced Analytics Institute
Mas. of Sci, Doc.of Philosophy
 
Phone
+61 2 9514 3829
Room
CB11.07.302

Research Interests

Jian Zhang's research interests include multimedia content management; video understanding; and video coding and communication. Multimedia content management provides advanced algorithms to manage, search, and retrieve rich multimedia content. Video understanding focuses on automatic or semi-automatic extraction of semantic information from video sequences. Video coding and communication targets on an efficient method for video compression and robust transmission. Apart from more than 100 paper publications, book chapters, patents and technical reports from his research output, he was co-author of more than ten patents filed in US, UK, Japan and Australia including five issued US patents.

Can supervise: Yes

UTS Short Course: Multimedia Analytics

Book Chapters

Zhang, J. 2006, 'Error Resilience for Video Coding Service' in H.R. Wu, K.R. Rao (eds), Digital Video Image Quality and Perceptual Coding, CRC, Taylor & Francis group, USA, pp. 503-527.

Conference Papers

wang, s., Zhang, J. & Miao, Z. 2013, 'A New Edge Feature for head-shoulder Detection', IEEE International Conference on Image Processing, Melbourne, Australia, September 2013 in 2013 IEEE International Conference on Image Processing, ed Brian Lovell, David Suter, David Taubman and Min Wu, Piscataway, NJ, USA, pp. 2822-2826.
View/Download from: OPUS | Publisher's site
In this work, we introduce a new edge feature to improve the head-shoulder detection performance. Since Head-shoulder detection is much vulnerable to vague contour, our new edge feature is designed to extract and enhance the head-shoulder contour and suppress the other contours. The basic idea is that head-shoulder contour can be predicted by filtering edge image with edge patterns, which are generated from edge fragments through a learning process. This edge feature can significantly enhance the object contour such as human head and shoulder known as En-Contour. To evaluate the performance of the new En-Contour, we combine it with HOG+LBP [1] as HOG+LBP+En-Contour. The HOG+LBP is the state-of-the-art feature in pedestrian detection. Because the human head-shoulder detection is a special case of pedestrian detection, we also use it as our baseline. Our experiments have indicated that this new feature significantly improve the HOG+LBP.
Xu, J., Wu, Q., Zhang, J., Shen, F. & Tang, Z. 2013, 'Training boosting-like algorithms with Training boosting-like algorithms with', Melbourne, Australia, September 2013 in 2013 IEEE International Conference on Image Processing, ed Brian Lovell and David Suter, IEEE, Melbourne, Australia, pp. 4302-4306.
View/Download from: OPUS | Publisher's site
Boosting algorithms have attracted great attention since the first real-time face detector by Viola & Jones through feature selection and strong classifier learning simultaneously. On the other hand, researchers have proposed to decouple such two procedures to improve the performance of Boosting algorithms. Motivated by this, we propose a boosting-like algorithm framework by embedding semi-supervised subspace learning methods. It selects weak classifiers based on class-separability. Combination weights of selected weak classifiers can be obtained by subspace learning. Three typical algorithms are proposed under this framework and evaluated on public data sets. As shown by our experimental results, the proposed methods obtain superior performances over their supervised counterparts and AdaBoost.
Kusakunniran, W., satoh, S., Zhang, J. & Wu, Q. 2013, 'Attribute-based learning for large scale object classification', San Jose, California, USA, July 2013 in 2013 IEEE International Conference on Multimedia and Expo, ed Anup Basu, Nam Ling, Sethuraman (Panch) Panchanathan, IEEE, San Jose, California, USA, pp. 1-6.
View/Download from: OPUS | Publisher's site
Scalability to large numbers of classes is an important challenge for multi-class classification. It can often be computationally infeasible at test phase when class prediction is performed by using every possible classifier trained for each individual class. This paper proposes an attribute-based learning method to overcome this limitation. First is to define attributes and their associations with object classes automatically and simultaneously. Such associations are learned based on greedy strategy under certain conditions. Second is to learn a classifier for each attribute instead of each class. Then, these trained classifiers are used to predict classes based on their attribute representations. The proposed method also allows trade-off between test-time complexity (which grows linearly with the number of attributes) and accuracy. Experiments based on Animals-with-Attributes and ILSVRC2010 datasets have shown that the performance of our method is promising when compared with the state-of-the-art.
wang, s., Miao, Z. & Zhang, J. 2013, 'Simultaneously detect and segment pedestrian', IEEE International Conference on Multimedia and Expo, San Jose, USA, July 2013 in 2013 IEEE International Conference on Multimedia and Expo, ed Anup Basu, Nam Ling, Sethuraman (Panch) Panchanathan, IEEE, USA, pp. 1-4.
View/Download from: OPUS | Publisher's site
We present a framework to simultaneously detect and segment pedestrian in images. Our work is based on part-based method. We first segment the image into superpixels, then assemble superpixels into body part candidates by comparing the assembled shape with pre-built template library. A +structure-based+ shape matching algorithm is developed to measure the shape similarity. All the body part candidates are input into our modified AND/OR graph to generate the most reasonable combination. The graph describes the possible variation of body configuration and model the constrain relationship between body parts. We perform comparison experiments on the public database and the results show the effectiveness of our framework.
Shen, Y., Miao, Z. & Zhang, J. 2012, 'Unsupervised Online Learning Trajectory Analysis Based on Weighted Directed Graph', International Conference on Pattern Recognition, Tsukuba, Japan, November 2012 in 2012 21st International Conference on Pattern Recognition (ICPR), ed Jan-Olof Eklundh,Yuichi Ohta,Steven Tanimoto, IEEE, USA, pp. 1306-1309.
View/Download from: OPUS
In this paper, we propose a novel unsupervised online learning trajectory analysis method based on weighted directed graph. Each trajectory can be represented as a sequence of key points. In the training stage, unsupervised expectation-maximization algorithm (EM) is applied for training data to cluster key points. Each class is a Gaussian distribution. It is considered as a node of the graph. According to the classification of key points, we can build a weighted directed graph to represent the trajectory network in the scene. Each path is a category of trajectories. In the test stage, we adopt online EM algorithm to classify trajectories and update the graph. In the experiments, we test our approach and obtain a good performance compared with state-of-the-art approaches.
Zhang, J., Lu, S., Mei, T., Wang, J., Wang, Z., Feng, D., Sun, J. & Li, S. 2012, 'Browse-to-search', 2012 ACM Multimedia Conference, Nara, Japan, October 2012 in Browse-to-search, ed Noboru Babaguchi,Kiyoharu Aizawa and John Smith, ACM, USA, pp. 1323-1324.
Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 30++40KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit ratemobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server.
Zhang, J., Wu, Y., Lu, S., Mei, T. & Li, S. 2012, 'Local visual words coding for low bit rate mobile visual search', 2012 ACM Multimedia Conference, Nara, Japan., October 2012 in Local visual words coding for low bit rate mobile visual search, ed Noboru Babaguchi,Kiyoharu Aizawa and John Smith, ACM, USA, pp. 989-992.
Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 30++40KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit ratemobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server
Xu, J., Wu, Q., Zhang, J. & Tang, Z. 2012, 'Object Detection Based on Co-Ocurrence GMuLBP Features', 2012 IEEE International Conference on Multimedia and Expo, 2012 IEEE International Conference on Multimedia and Expo, July 2013 in 2012 IEEE International Conference on Multimedia and Expo, ed Jian Zhang, IEEE Computer Society, Melbourne Australia, pp. 943-948.
View/Download from: Publisher's site
Image co-occurrence has shown great powers on object classification because it captures the characteristic of individual features and spatial relationship between them simultaneously. For example, Co-occurrence Histogram of Oriented Gradients (CoHOG) has achieved great success on human detection task. However, the gradient orientation in CoHOG is sensitive to noise. In addition, CoHOG does not take gradient magnitude into account which is a key component to reinforce the feature detection. In this paper, we propose a new LBP feature detector based image co-occurrence. Building on uniform Local Binary Patterns, the new feature detector detects Co-occurrence Orientation through Gradient Magnitude calculation. It is known as CoGMuLBP. An extension version of the GoGMuLBP is also presented. The experimental results on the UIUC car data set show that the proposed features outperform state-of-the-art methods.
Zhang, J. & Liu, X. 2011, 'Active Learning for Human Action Recognition with Gaussian Processes', IEEE International Conference on Image Processing, Brussels, Belgium, September 2011 in Proceedings of 2011 International Conference on Image Processing, ed Benoit Macq,Peter Schelkens,Inald Lagendijk, IEEE, USA, pp. 3253-3256.
View/Download from: OPUS
This paper presents an active learning approach for recognizing human actions in videos based on multiple kernel combined method. We design the classifier based on Multiple Kernel Learning (MKL) through Gaussian Processes (GP) regression. This classifier is then trained in an active learning approach. In each iteration, one optimal sample is selected to be interactively annotated and incorporated into training set. The selection of the sample is based on the heuristic feedback of the GP classifier. To our knowledge, GP regression MKL based active learning methods have not been applied to address the human action recognition yet. We test this approach on standard benchmarks. This approach outperforms the state-of-the-art techniques in accuracy while requires significantly less training samples.
Quek, A., Wang, Z., Zhang, J. & Feng, D. 2011, 'Structural Image Classification with Graph Neural Networks', Noosa, Queensland, Australia, February 2011 in Proceedings of 2011 International Conference on Digital Image Computing - Techniques and Applications, ed Paul Jackway, IEEE, USA, pp. 416-421.
View/Download from: OPUS
Many approaches to image classification tend to transform an image into an unstructured set of numeric feature vectors obtained globally and/or locally, and as a result lose important relational information between regions. In order to encode the geometric relationships between image regions, we propose a variety of structural image representations that are not specialised for any particular image category. Besides the traditional grid-partitioning and global segmentation methods, we investigate the use of local scale-invariant region detectors. Regions are connected based not only upon nearest-neighbour heuristics, but also upon minimum spanning trees and Delaunay triangulation. In order to maintain the topological and spatial relationships between regions, and also to effectively process undirected connections represented as graphs, we utilise the recently-proposed graph neural network model. To the best of our knowledge, this is the first utilisation of the model to process graph structures based on local-sampling techniques, for the task of image classification. Our experimental results demonstrate great potential for further work in this domain.
Li, Z., Wu, Q., Zhang, J. & Geers, G. 2011, 'SKRWM based descriptor for pedestrian detection in thermal images', Hangzhou, China, October 2011 in 2011 IEEE 13th International Workshop on Multimedia Signal Processing (MMSP), ed Wen Gao;Anthony Vetro;Zhengyou Zhang, IEEE, USA, pp. 1-6.
View/Download from: OPUS |
Pedestrian detection in a thermal image is a difficult task due to intrinsic challenges:1) low image resolution, 2) thermal noising, 3) polarity changes, 4) lack of color, texture or depth information. To address these challenges, we propose a novel mid-level feature descriptor for pedestrian detection in thermal domain, which combines pixel-level Steering Kernel Regression Weights Matrix (SKRWM) with their corresponding covariances. SKRWM can properly capture the local structure of pixels, while the covariance computation can further provide the correlation of low level feature. This mid-level feature descriptor not only captures the pixel-level data difference and spatial differences of local structure, but also explores the correlations among low-level features. In the case of human detection, the proposed mid-level feature descriptor can discriminatively distinguish pedestrian from complexity. For testing the performance of proposed feature descriptor, a popular classifier framework based on Principal Component Analysis (PCA) and Support Vector Machine (SVM) is also built. Overall, our experimental results show that proposed approach has overcome the problems caused by background subtraction in [1] while attains comparable detection accuracy compared to the state-of-the-arts.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2011, 'Speed-invariant gait recognition based on Procrustes Shape Analysis using higher-order shape configuration', Brussels, Belgium, September 2011 in 2011 18th IEEE International Conference on Image Processing (ICIP), ed Benoit Macq; Peter Schelkens, IEEE, USA, pp. 545-548.
View/Download from: OPUS
Walking speed change is considered a typical challenge hindering reliable human gait recognition. This paper proposes a novel method to extract speed-invariant gait feature based on Procrustes Shape Analysis (PSA). Two major components of PSA, i.e., Procrustes Mean Shape (PMS) and Procrustes Distance (PD), are adopted and adapted specifically for the purpose of speed-invariant gait recognition. One of our major contributions in this work is that, instead of using conventional Centroid Shape Configuration (CSC) which is not suitable to describe individual gait when body shape changes particularly due to change of walking speed, we propose a new descriptor named Higher-order derivative Shape Configuration (HSC) which can generate robust speed-invariant gait feature. From the first order to the higher order, derivative shape configuration contains gait shape information of different levels. Intuitively, the higher order of derivative is able to describe gait with shape change caused by the larger change of walking speed. Encouraging experimental results show that our proposed method is efficient for speed-invariant gait recognition and evidently outperforms other existing methods in the literatures.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2011, 'Pairwise Shape configuration-based PSA for gait recognition under small viewing angle change', The 8th IEEE International Conference Advanced Video and Signal-Based Surveillance, Klagenfurt, Austria, August 2011 in 2011 8th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), ed Gian Luca Foresti; Bernhard Rinner, IEEE, USA, pp. 17-22.
View/Download from: OPUS
Two main components of Procrustes Shape Analysis (PSA) are adopted and adapted specifically to address gait recognition under small viewing angle change: 1) Procrustes Mean Shape (PMS) for gait signature description; 2) Procrustes Distance (PD) for similarity measurement. Pairwise Shape Configuration (PSC) is proposed as a shape descriptor in place of existing Centroid Shape Configuration (CSC) in conventional PSA. PSC can better tolerate shape change caused by viewing angle change than CSC. Small variation of viewing angle makes large impact only on global gait appearance. Without major impact on local spatio-temporal motion, PSC which effectively embeds local shape information can generate robust view-invariant gait feature. To enhance gait recognition performance, a novel boundary re-sampling process is proposed. It provides only necessary re-sampled points to PSC description. In the meantime, it efficiently solves problems of boundary point correspondence, boundary normalization and boundary smoothness. This re-sampling process adopts prior knowledge of body pose structure. Comprehensive experiment is carried out on the CASIA gait database. The proposed method is shown to significantly improve performance of gait recognition under small viewing angle change without additional requirements of supervised learning, known viewing angle and multi-camera system, when compared with other methods in literatures.
Paisitkriangkrai, s., Shen, c. & Zhang, J. 2011, 'Face detection with effective feature extraction', 10th Asian Conference on Computer Vision, ACCV 2010, Queenstown, New Zealand, November 2010 in Computer Vision + ACCV 2010, ed NA, NA, Berlin,, pp. 460-470.
View/Download from: OPUS | Publisher's site
There is an abundant literature on face detection due to its important role in many vision applications. Since Viola and Jones proposed the first real-time AdaBoost based face detector, Haar-like features have been adopted as the method of choice for frontal face detection. In this work, we show that simple features other than Haar-like features can also be applied for training an effective face detector. Since, single feature is not discriminative enough to separate faces from difficult non-faces, we further improve the generalization performance of our simple features by introducing feature co-occurrences. We demonstrate that our proposed features yield a performance improvement compared to Haar-like features. In addition, our findings indicate that features play a crucial role in the ability of the system to generalize.
Li, Z., Zhang, J., Wu, Q. & Geers, G.D. 2010, 'Feature Enhancement Using Gradient Salience on Thermal Image', Digital Image Computing: Techniques and Applications, Sydney, Australia, December 2010 in Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), ed Jian Zhang, Chunhua Shen, Glenn Geers, Qiang Wu, IEEE Computer Society, Sydney, Australia, pp. 556-562.
View/Download from: OPUS | Publisher's site
Feature enhancement in an image is to reinforce some exacted features so that it can be used for object classification and detection. As the thermal image is lack of texture and colorful information, the techniques for visual image feature enhancement is insufficient to apply to thermal images. In this paper, we propose a new gradient-based approach for feature enhancement in thermal image. We use the statistical properties of gradient of foreground object profiles, and formulate object features with gradient saliency. Empirical evaluation of the proposed approach shows significant performance improved on human contours which can be used for detection and classification.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2010, 'Multi-view Gait Recognition Based on Motion Regression using Multilayer Perceptron', International Conference Pattern Recognition, Istanbul Turkey, August 2010 in Proceedings: 2010 20th International Conference Pattern Recognition (ICPR 2010), ed M++jdat ++etin, Kim Boyer and Seong-Whan Lee - ICPR 2010 Technical Program Chairs, IEEE Computer Society, Istanbul Turkey, pp. 2186-2189.
View/Download from: OPUS | Publisher's site
It has been shown that gait is an efficient biometric feature for identifying a person at a distance. However, it is a challenging problem to obtain reliable gait feature when viewing angle changes because the body appearance can be different under the various viewing angles. In this paper, the problem above is formulated as a regression problem where a novel View Transformation Model (VTM) is constructed by adopting Multilayer Perceptron (MLP) as regression tool. It smoothly estimates gait feature under an unknown viewing angle based on motion information in a well selected Region of Interest (ROI) under other existing viewing angles. Thus, this proposal can normalize gait features under various viewing angles into a common viewing angle before gait similarity measurement is carried out. Encouraging experimental results have been obtained based on widely adopted benchmark database.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2010, 'Support Vector Regression for Multi-view Gait Recognition Based on Local Motion Feature Selection', IEEE Conference on Computer Vision and Pattern Recognition, San Francisco CA, USA, June 2010 in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), ed Trevor Darrell; David Hogg;David Jacobs, IEEE Computer Society, Piscataway, USA, pp. 974-981.
View/Download from: OPUS | Publisher's site
Gait is a well recognized biometric feature that is used to identify a human at a distance. However, in real environment, appearance changes of individuals due to viewing angle changes cause many difficulties for gait recognition. This paper re-formulates this problem as a regression problem. A novel solution is proposed to create a View Transformation Model (VTM) from the different point of view using Support Vector Regression (SVR). To facilitate the process of regression, a new method is proposed to seek local Region of Interest (ROI) under one viewing angle for predicting the corresponding motion information under another viewing angle. Thus, the well constructed VTM is able to transfer gait information under one viewing angle into another viewing angle. This proposal can achieve view-independent gait recognition. It normalizes gait features under various viewing angles into a common viewing angle before similarity measurement is carried out. The extensive experimental results based on widely adopted benchmark dataset demonstrate that the proposed algorithm can achieve significantly better performance than the existing methods in literature.
Saesue, w., Chou, c. & Zhang, J. 2010, 'Cross-layer QoS-optimized EDCA adaptation for wireless video streaming', IEEE International Conference on Image Processing, Hong Kong, September 2010 in Proceedings of 2010 IEEE 17th International Conference on Image Processing, ed N/A, IEEE, Piscataway, NJ, pp. 2925-2928.
View/Download from: OPUS | Publisher's site
In this paper, we propose an adaptive cross layer technique that optimally enhance the QoS of wireless video transmission in an IEEE 802.11e WLAN. The optimization takes into account the unequal error protection characteristics of video streaming, the IE
Thi, t., Zhang, J., Cheng, l., Wang, l. & Satoh, s. 2010, 'Human action recognition and localization in video using structured learning of local space-time features', Advanced Video and Signal Based Surveillance, Boston, MA, August 2010 in 2010 Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance, ed N/A, IEEE, Piscataway, NJ, pp. 204-211.
View/Download from: OPUS | Publisher's site
This paper presents a unified framework for human action classification and localization in video using structured learning of local space-time features. Each human action class is represented by a set of its own compact set of local patches. In our appr
Thi, t., Cheng, l., Zhang, J. & Wang, l. 2010, 'Implicit motion-shape model: A generic approach for action matching', IEEE International Conference on Image Processing, Hong Kong, September 2010 in Proceedings of 2010 IEEE 17th International Conference on Image Processing, ed N/A, IEEE, Piscataway, NJ, pp. 1477-1480.
View/Download from: OPUS | Publisher's site
We develop a robust technique to find similar matches of human actions in video. Given a query video, Motion History Images (MHI) are constructed for consecutive keyframes. This is followed by dividing the MHI into local Motion-Shape regions, which allow
Wang, w., Zhang, J. & Shen, c. 2010, 'Improved human detection and classification in thermal images', IEEE International Conference on Image Processing, Hong Kong, September 2010 in Proceedings - International Conference on Image Processing, ICIP, ed N/A, IEEE, Piscataway, NJ, pp. 2313-2316.
View/Download from: OPUS | Publisher's site
We present a new method for detecting pedestrians in thermal images. The method is based on the Shape Context Descriptor (SCD) with the Adaboost cascade classifier framework. Compared with standard optical images, thermal imaging cameras offer a clear advantage for night-time video surveillance. It is robust on the light changes in day-time. Experiments show that shape context features with boosting classification provide a significant improvement on human detection in thermal images. In this work, we have also compared our proposed method with rectangle features on the public dataset of thermal imagery. Results show that shape context features are much better than the conventional rectangular features on this task.
Paisitkriangkrai, s., Mei, t., Zhang, J. & Hua, x. 2010, 'Scalable clip-based near-duplicate video detection with ordinal measure', Conference on Image and Video Retrieval, Xi'an, China, July 2010 in CIVR '10 Proceedings of the ACM International Conference on Image and Video Retrieval, ed N/A, ACM, New York, pp. 121-128.
View/Download from: Publisher's site
Detection of duplicate or near-duplicate videos on large-scale database plays an important role in video search. In this paper, we analyze the problem of near-duplicates detection and propose a practical and effective solution for real-time large-scale v
Thi, t., Cheng, l., Zhang, J., Wang, l. & Satoh, s. 2010, 'Weakly supervised action recognition using implicit shape models', International Conference on Pattern Recognition, Istanbul, August 2010 in 2010 20th International Conference on Pattern Recognition (ICPR), ed N/A, IEEE, Piscataway, NJ, pp. 3517-3520.
View/Download from: OPUS | Publisher's site
In this paper, we present a robust framework for action recognition in video, that is able to perform competitively against the state-of-the-art methods, yet does not rely on sophisticated background subtraction preprocess to remove background features.
Saesue, W., Chou, C.T. & Zhang, J. 2010, 'Video Quality Prediction in the Presence of MAC Contention and Wireless Channel Error', IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks, Montreal, Canada, June 2014 in IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), ed N/A, IEEE, Piscataway, NJ, pp. 1-10.
View/Download from: OPUS
This paper proposes an integrated model to predict the quality of video, expressed in terms of mean square error (MSE) of the received video frames, in an IEEE 802.1 le wireless network. The proposed system takes into account contention at the MAC layer, wireless channel error, queueing at the MAC layer, parameters of different 802.1 le access categories (ACs), and video characteristics of different H.264 data partitions (DPs). To the best of the authors' knowledge, this is the first system that takes these network and video characteristics into consideration to predict video quality in an IEEE 802.1 le network. The proposed system consists of two components. The first component predicts the packet loss rate of each H.264 data partition by using a multi-dimensional discrete-time Markov chain (DTMC) coupled to a M/G/l queue. The second component uses these packet loss rates and the video characteristics to predict the MSE of each received video frames. We verify the accuracy of our combination system by using discrete event simulation and real H.264 coded video sequences.
Khan, A., Zhang, J. & Wang, Y. 2010, 'Appearance-based Re-identification of People in Video', Digital Image Computing Techniques and Applications, Sydney, NSW, December 2010 in 2010 International Conference on Digital Image Computing - Techniques and Applications (DICTA'10), ed N/A, IEEE, Piscataway, NJ, pp. 357-362.
View/Download from: OPUS
This paper introduces the topic of appearance-based re-identification of people in video. This work is based on colour information of people's clothing. Most of the work described in the literature uses full body histogram. This paper evaluates the histogram method and describes ways of including spatial colour information. The paper proposes a colour-based appearance descriptor called Colour Context People Descriptor. All the methods are evaluated extensively. The results are reported in the experiments. It is concluded at the end that adding spatial colour information greatly improves the re-identification results.
Wang, L., Cheng, L., Thi, T. & Zhang, J. 2010, 'Human Action Recognition from Boosted Pose Estimation', Digital Image Computing Techniques and Applications, Sydney, NSW, December 2010 in 2010 International Conference on Digital Image Computing: Techniques and Applications (DICTA), ed N/A, IEEE, Piscataway, NJ, pp. 308-313.
This paper presents a unified framework for recognizing human action in video using human pose estimation. Due to high variation of human appearance and noisy context background, accurate human pose analysis is hard to achieve and rarely employed for the task of action recognition. In our approach, we take advantage of the current success of human detection and view invariability of local feature-based approach to design a pose-based action recognition system. We begin with a frame-wise human detection step to initialize the search space for human local parts, then integrate the detected parts into human kinematic structure using a tree structural graphical model. The final human articulation configuration is eventually used to infer the action class being performed based on each single part behavior and the overall structure variation. In our work, we also show that even with imprecise pose estimation, accurate action recognition can still be achieved based on informative clues from the overall pose part configuration. The promising results obtained from action recognition benchmark have proven our proposed framework is comparable to the existing state-of-the-art action recognition algorithms.
Kusakunniran, W., Wu, Q., Li, H. & Zhang, J. 2009, 'Automatic gait recognition using weighted binary pattern on video', Advanced Video and Signal Based Surveillance, Genoa, Italy, September 2009 in Proceedings of Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, ed Tubaro, S, IEEE Computer Society, USA, pp. 49-54.
View/Download from: OPUS |
Human identification by recognizing the spontaneous gait recorded in real-world setting is a tough and not yet fully resolved problem in biometrics research. Several issues have contributed to the difficulties of this task. They include various poses, different clothes, moderate to large changes of normal walking manner due to carrying diverse goods when walking, and the uncertainty of the environments where the people are walking. In order to achieve a better gait recognition, this paper proposes a new method based on Weighted Binary Pattern (WBP). WBP first constructs binary pattern from a sequence of aligned silhouettes. Then, adaptive weighting technique is applied to discriminate significances of the bits in gait signatures. Being compared with most of existing methods in the literatures, this method can better deal with gait frequency, local spatial-temporal human pose features, and global body shape statistics. The proposed method is validated on several well known benchmark databases. The extensive and encouraging experimental results show that the proposed algorithm achieves high accuracy, but with low complexity and computational time.
Kusakunniran, W., Wu, Q., Li, H. & Zhang, J. 2009, 'Multiple Views Gait Recognition using View Transformation Model Based on Optimized Gait Energy Image', IEEE International Conference on Computer Vision Workshops, Kyoto, Japan, September 2009 in Proceedings of 2009 IEEE 12th International Conference on Computer Vision Workshops, ed Cipolla, R, IEEE, USA, pp. 1058-1064.
View/Download from: OPUS |
Gait is one of well recognized biometrics that has been widely used for human identification. However, the current gait recognition might have difficulties due to viewing angle being changed. This is because the viewing angle under which the gait signature database was generated may not be the same as the viewing angle when the probe data are obtained. This paper proposes a new multi-view gait recognition approach which tackles the problems mentioned above. Being different from other approaches of same category, this new method creates a so called View Transformation Model (VTM) based on spatial-domain Gait Energy Image (GEI) by adopting Singular Value Decomposition (SVD) technique. To further improve the performance of the proposed VTM, Linear Discriminant Analysis (LDA) is used to optimize the obtained GEI feature vectors. When implementing SVD there are a few practical problems such as large matrix size and over-fitting. In this paper, reduced SVD is introduced to alleviate the effects caused by these problems. Using the generated VTM, the viewing angles of gallery gait data and probe gait data can be transformed into the same direction. Thus, gait signatures can be measured without difficulties. The extensive experiments show that the proposed algorithm can significantly improve the multiple view gait recognition performance when being compared to the similar methods in literature.
Kusakunniran, w., Li, h. & Zhang, J. 2009, 'A direct method to self-calibrate a surveillance camera by observing a walking pedestrian', Digital Image Computing Techniques and Applications, Melbourne, VIC, December 2009 in 2009 Digital Image Computing: Techniques and Applications, ed N/A, IEEE, Piscataway, NJ, pp. 250-255.
View/Download from: OPUS | Publisher's site
Recent efforts show that it is possible to calibrate a surveillance camera simply from observing a walking human. This procedure can be seen as a special application of the camera self-calibration technique. Several methods have been proposed along this
Wang, w., Shen, c., Zhang, J. & Paisitkriangkrai, s. 2009, 'A two-layer night-time vehicle detector', Digital Image Computing Techniques and Applications, Melbourne, VIC, December 2009 in 2009 Digital Image Computing: Techniques and Applications, ed N/A, IEEE, Piscataway, NJ, pp. 162-167.
View/Download from: OPUS | Publisher's site
We present a two-layer night time vehicle detector in this work. At the first layer, vehicle headlight detection [1, 2, 3] is applied to find areas (bounding boxes) where the possible pairs of headlights locate in the image, the Haar feature based AdaBoo
Paisitkriangkrai, s., Shen, c. & Zhang, J. 2009, 'Efficiently training a better visual detector with sparse eigenvectors', IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, June 2009 in 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, ed N/A, IEEE, Piscataway, NJ, pp. 1129-1136.
View/Download from: OPUS | Publisher's site
Face detection plays an important role in many vision applications. Since Viola and Jones [1] proposed the first real-time AdaBoost based object detection system, much ef- fort has been spent on improving the boosting method. In this work, we first show
Thi, t., Lu, s., Zhang, J., Cheng, l. & Wang, l. 2009, 'Human body articulation for action recognition in video sequences', IEEE International Conference on Video and Signal Based Surveillance (AVSS), Genova, September 2009 in 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2009, ed N/A, IEEE, Piscataway, NJ, pp. 92-97.
View/Download from: OPUS | Publisher's site
This paper presents a new technique for action recognition in video using human body part-based approach, combining both local feature description of each body part, and global graphical model structure of the human action. The human body is divided into
Ong, c., Lu, s. & Zhang, J. 2008, 'An approach for enhancing the results of detecting foreground objects and their moving shadows in surveillance video', Digital Image Computing Techniques and Applications, Canberra, ACT, December 2008 in Digital Image Computing: Techniques and Applications, ed N/A, IEEE, Piscataway, NJ, pp. 242-249.
View/Download from: OPUS | Publisher's site
Automated surveillance system is becoming increasingly important especially in the fields of computer vision and video processing. This paper describes a novel approach for improving the results of detecting foreground objects and their shadows in indoor
Paisitkriangkrai, s., Shen, c. & Zhang, J. 2008, 'An experimental study on pedestrian classification using local features', IEEE International Symposium on Circuits and Systems, Seattle, WA, May 2008 in Proceedings - IEEE International Symposium on Circuits and Systems, ed N/A, IEEE, Piscataway, NJ, pp. 2741-2744.
View/Download from: OPUS | Publisher's site
This paper presents an experimental study on pedestrian detection using state-of-the-art local feature extraction and support vector machine (SVM) classifiers. The performance of pedestrian detection using region covariance, histogram of oriented gradien
Luo, c., Cai, x. & Zhang, J. 2008, 'GATE: A novel robust object tracking method using the particle filtering and level set method', Digital Image Computing Techniques and Applications, Canberra, ACT, December 2008 in Digital Image Computing: Techniques and Applications, ed N/A, IEEE, Piscataway, NJ, pp. 378-385.
View/Download from: OPUS | Publisher's site
This paper presents a novel algorithm for robust object tracking based on the particle filtering method employed in recursive Bayesian estimation and image segmentation and optimisation techniques employed in active contour models and level set methods.
Saesue, w., Zhang, J. & Chun, t. 2008, 'Hybrid frame-recursive block-based distortion estimation model for wireless video transmission', IEEE International Workshop on Multimedia Signal Processing, Cairns, QLD, October 2008 in Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008, ed N/A, IEEE, Piscataway, NJ, pp. 774-779.
View/Download from: OPUS | Publisher's site
In wireless environments, video quality can be severely degraded due to channel errors. Improving error robustness towards the impact of packet loss in error-prone network is considered as a critical concern in wireless video networking research. Data pa
Luo, c., Cai, x. & Zhang, J. 2008, 'Robust object tracking using the particle filtering and level set methods: A comparative experiment', IEEE International Workshop on Multimedia Signal Processing, Cairns, QLD, October 2008 in Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008, ed N/A, IEEE, Piscataway, NJ, pp. 359-364.
View/Download from: OPUS | Publisher's site
Robust visual tracking has become an important topic of research in computer vision. A novel method for robust object tracking, GATE [11], improves object tracking in complex environments using the particle filtering and the level set-based active contou
Thi, t., Lu, s. & Zhang, J. 2008, 'Self-calibration of traffic surveillance camera using motion tracking', IEEE Conference on Intelligent Transportation Systems, Beijing, China, October 2008 in Proceedings of the 11th International IEEE Conference on Intelligent Transportation Systems, ed N/A, IEEE, Piscataway, NJ, pp. 304-309.
View/Download from: OPUS | Publisher's site
A statistical and computer vision approach using tracked moving vehicle shapes for auto-calibrating traffic surveillance cameras is presented. Vanishing point of the traffic direction is picked up from Linear Regression of all tracked vehicle points. Pre
Thi, t., Robert, k., Lu, s. & Zhang, J. 2008, 'Vehicle classification at nighttime using eigenspaces and support vector machine', International Congress on Image and Signal Processing (CISP), Sanya, Hainan, May 2008 in 2008 Congress on Image and Signal Processing, ed N/A, IEEE, Piscataway, NJ, pp. 422-426.
View/Download from: OPUS | Publisher's site
A robust framework to classify vehicles in nighttime traffic using vehicle eigenspaces and support vector machine is presented. In this paper, a systematic approach has been proposed and implemented to classify vehicles from roadside camera video sequenc
Shen, C., Paisitkriangkra, S. & Zhang, J. 2008, 'Face detection from few training examples', IEEE International Conference on Image Processing, San Diego, CA, October 2008 in 15th IEEE International Conference on Image Processing, 2008. ICIP 2008, ed N/A, IEEE, Piscataway, NJ, pp. 2764-2767.
View/Download from: OPUS
Face detection in images is very important for many multimedia applications. Haar-like wavelet features have become dominant in face detection because of their tremendous success since Viola and Jones [1] proposed their AdaBoost based detection system. While Haar features' simplicity makes rapid computation possible, its discriminative power is limited. As a consequence, a large training dataset is required to train a classifier. This may hamper its application in scenarios that a large labeled dataset is difficult to obtain. In this work, we address the problem of learning to detect faces from a small set of training examples. In particular, we propose to use co- variance features. Also for better classification performance, linear hyperplane classifier based on Fisher discriminant analysis (FDA) is proffered. Compared with the decision stump, FDA is more discriminative and therefore fewer weak learners are needed. We show that the detection rate can be significantly improved with covariance features on a small dataset (a few hundred positive examples), compared to Haar features used in current most face detection systems.
Paisitkriangkra, S., Shen, C. & Zhang, J. 2008, 'Real-time Pedestrian Detection Using a Boosted Multi-layer Classifier', IEEE International Workshop on Visual Surveillance, Marseille France, October 2008 in The Eighth International Workshop on Visual Surveillance, in conjunction with European Conference on Computer Vision (ECCV'08), 2008, ed N/A, Institute of Electrical and Electronics Engineers, United States.
Techniques for detecting pedestrian in still images have attached considerable research interests due to its wide applications such as video surveillance and intelligent transportation systems. In this paper, we propose a novel simpler pedestrian detector using state-of-the-art locally extracted features, namely, covariance features. Covariance features were originally proposed in [1, 2]. Unlike the work in [2], where the feature selection and weak classifier training are performed on the Riemannian manifold, we select features and train weak classifiers in the Euclidean space for faster computation. To this end, AdaBoost with weighted Fisher linear discriminant analysis based weak classifiers are adopted. Multiple layer boosting with heterogeneous features is constructed to exploit the efficiency of the Haarlike feature and the discriminative power of the covariance feature simultaneously. Extensive experiments show that by combining the Haar-like and covariance features, we speed up the original covariance feature detector [2] by up to an order of magnitude in processing time without compromising the detection performance. For the first time, the proposed work enables covariance feature based pedestrian detection to work real-time.
Lu, s., Zhang, J. & Feng, d. 2007, 'An efficient method for detecting ghost and left objects in surveillance video', IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2007, London, September 2007 in 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2007 Proceedings, ed N/A, IEEE, Piscataway, NJ, pp. 540-545.
View/Download from: Publisher's site
This paper proposes an efficient method for detecting ghost and left objects in surveillance video, which, if not identified, may lead to errors or wasted computation in background modeling and object tracking in surveillance systems. This method contain
Paisitkriangkrai, s., Shen, c. & Zhang, J. 2007, 'An experimental evaluation of local features for pedestrian classification', Australian Pattern Recognition Society (APRS), Glenelg, SA, December 2007 in Proceedings - Digital Image Computing Techniques and Applications: 9th Biennial Conference of the Australian Pattern Recognition Society, DICTA 2007, ed N/A, IEEE, Piscataway, NJ, pp. 53-60.
View/Download from: Publisher's site
The ability to detect pedestrians is a first important step in many computer vision applications such as video surveillance. This paper presents an experimental study on pedestrian detection using state-of-the-art local feature extraction and support vec
Xu, J., Ye, G. & Zhang, J. 2007, 'Long-Term Trajectory Extraction for Moving Vehicles', IEEE Workshop on Multimedia Signal Processing, Crete, October 2007 in IEEE 9th Workshop on Multimedia Signal Processing, 2007. MMSP 2007, ed N/A, IEEE, Piscataway, NJ, pp. 223-226.
In recent years, trajectory analysis of moving vehicles in video-based traffic monitoring systems has drawn the attention of many researchers. Trajectory extraction is a fundamental step that is required prior to trajectory analysis. Lots of previous work have focused on trajectory extraction via tracking. However, they often fail to achieve long-term consistent trajectories. In this paper, we propose a robust approach for extracting long-term trajectories of moving vehicles in traffic monitoring using SIFT-descriptor. Experimental results show that the proposed method outperforms tracking-based techniques.
Lu, s., Zhang, J. & Feng, d. 2006, 'A knowledge-based approach for detecting unattended packages in surveillance video', IEEE International Conference on Video and Signal Based Surveillance 2006, AVSS 2006, Sydney, NSW, November 2006 in Proceedings - IEEE International Conference on Video and Signal Based Surveillance 2006, AVSS 2006, ed N/A, IEEE, Piscataway, NJ.
View/Download from: Publisher's site
This paper describes a novel approach for detecting unattended packages in surveillance video. Unlike the traditional approach to just detecting stationary objects in monitored scenes, our approach detects unattended packages based on accumulated knowled
Chen, j., Shen, j., Zhang, J. & Wangsa, k. 2006, 'A novel multimedia database system for efficient image/video retrieval based on hybrid-tree structure', International Conference on Machine Learning and Cybernetics, Dalian, August 2006 in Proceedings of the 2006 International Conference on Machine Learning and Cybernetics, ed N/A, IEEE, Piscataway, NJ, pp. 4353-4358.
View/Download from: Publisher's site
With recent advances in computer vision, image processing and analysis, a retrieval process based on visual content has became a key component in achieving high efficiency image query for large multimedia databases. In this paper, we propose and develop
Mathew, r., Yu, z. & Zhang, J. 2006, 'Detecting new stable objects in surveillance video', IEEE Workshop on Multimedia Signal Processing, Shanghai, October 2005 in 2005 IEEE 7th Workshop on Multimedia Signal Processing, ed N/A, IEEE, Piscataway, NJ.
View/Download from: Publisher's site
We describe a novel method to detect new stable objects in video. This includes detecting new objects that appear in a scene and remain stationary for a period of time. Examples include detecting a dropped bag or a parked car. Our method utilizes the sta
Lu, s., Zhang, J. & Feng, d. 2005, 'Classification of moving humans using eigen-features and support vector machines', International Conference on Computer Analysis of Images and Patterns, CAIP 2005, Versailles, France, September 2005 in Computer Analysis of Images and Patterns: 11th International Conference, CAIP 2005, Versailles, France, September 5-8, 2005. Proceedings, ed N/A, Springer Berlin Heidelberg, Berlin, pp. 522-529.
This paper describes a method of categorizing the moving objects using eigen-features and support vector machines. Eigen-features, generally used in face recognition and static image classification, are applied to classify the moving objects detected fro
Ye, G. & Zhang, J. 2005, 'High-Resolution Image Reconstruction Under Illumination Change', Aisa-pacific Workshop on Visual Information Processing, Hong Kong, China, December 2005 in 2005 Asia-Pacific Workshop on Visual Information Processing, ed N/A, -, -.
In this paper, we propose an approach to high-resolution image reconstruction under illumination change. It is based on the maximum a posteriori framework for performing joint image registration and high-resolution reconstruction. In this approach, an efficient multi-image registration is proposed to estimate the global motion and illumination change between the high-resolution image and low-resolution images. Considering different degradation from frame to frame and registration error, we then present a multichannel regularized HR reconstruction technique. Experimental results demonstrate the efficacy of the proposed approach.
Yu, Z. & Zhang, J. 2004, 'Video Deblocking with Fine-Grained Scalable Complexity for Embedded Mobile Computing', International Conference on Signal Processing, Beijing China, August 2004 in 2004 7th International Conference on Signal Processing, 2004. Proceedings. ICSP '04, ed N/A, IEEE, Piscataway, NJ, pp. 1173-1178.
This paper addresses the need of reducing blocking artifacts after video decompression in embedded mobile computing devices such as mobile phones and PDAs with limited computational capability, where low bit rate coding is usually employed and video deblocking is highly desirable. A novel video deblocking method has been developed which consists of two steps: deblocking mode decision and deblock filtering. Blocking artifacts are detected by examining the value of several adjacent pixels. Depending on the degree of blocking artifacts, a filter mode and a corresponding filtering center are determined for a region of pixels. The deblocking filter is chosen from five different types of candidates including variable center filters and nonsymmetric filters. Extensive experiments show that the proposed algorithm has achieved both lower computational complexity and better visual quality as compared to MPEG-4 VM. Furthermore, targeting the need of embedded mobile computing platforms, a scheme is developed to dynamically scale the complexity (and hence power consumption) of the deblocking algorithm with graceful visual quality degradation.
Lu, S., Zhang, J., Zhang, X. & Zhao, C. 2002, 'An implementation of 2D IDCT using AltiVec', Beijing China, August 2002 in 2002 6th International Conference on Signal Processing, ed N/A, IEEE, Piscataway, NJ, pp. 17-20.
This paper explores the key functions of AltiVec through an implementation of the 2D IDCT algorithm. By providing a benchmark for video processing, the advantage of using the SIMD style of parallel processing has been demonstrated. Besides a variety of instructions available for parallel arithmetic computations, a solution for efficiently reorganising data for parallel processing is also provided by AltiVec. The implementation resulted in a speedup of 10 times compared to its scalar version.
Zhang, X. & Zhang, J. 2000, 'Unified design of symbol-matching-based document image compression and merging system', San Jose, C.A. USA, December 2000 in SPIE Proceedings Document Recognition and Retrieval VIII, ed Paul B. Kantor; Daniel P. Lopresti; Jiangying Zhou, S P I E - International Society for Optical Engineering, United States.
This paper describes a document image compression and merging system, which provides capabilities for automatically indexing from documents to form a document library and for merging the partial image of a document page. Because of the nature of document images, the technique described in this paper is intended to be used to process bi-level text images. The key technology for image merging is correlation analysis. State- of-the-art techniques exist to merge gray-scale and color natural image. However, these techniques do not apply for document image containing much text and they fail too often when used to merge document images regardless of their computational intensive nature. The proposed system solution will provide a reliable correlation analysis technique for document image merging where only bi-level images are primarily available.
Zhang, J., Arnold, J. & Frater, M. 1997, 'Improved Motion-compensated Concealment for MPEG-2 Coded Video', International Workshop on Audio-Visual Services over Packet Networks, Aberdeen, Scotland UK, January 1997 in Proceedings of International Workshop on Audio-Visual Services over Packet Networks, ed N/A, -, -, pp. 11-16.
Zhang, J., Frater, M., Arnold, J. & Percival, t. 1996, 'Video Services and Wireless Local Area Networks', International Workshop on Packet Video, Brisbane, Australia, March 1996 in Proceedings of the 7th International Workshop on Packet Video, ed N/A, IEEE, USA, pp. 207-212.
Frater, M., Arnold, J. & Zhang, J. 1996, 'The importance of Systems Layer in MPEG-2 Error Resilience Experience', Australian Telecommunication Networks and Applications Conference, Melbourne, Australia, December 1996 in Proceedings of Australian Telecommunication Networks & Applications conference 1996, ed N/A, -, -, pp. 417-422.
Frater, M., Arnold, J., Zhang, J. & Canenor, M. 1996, 'Wireless Video: The Impact of the Multiplexing Layer on Error Resilience', International Workshop on Wireless Image/Video Communications, Largborough, UK, September 1996 in Proceedings of the First International Workshop on Wireless Video, ed N/A, IEEE, Piscataway, NJ, pp. 20-25.
The interest that has been shown in the provision of mobile communications systems has led to the study of real time video traffic over wireless and other error prone networks. Many of the papers in this area employ the MPEG-2 video coding algorithm and provide experimental results of the decoded image quality derived from using only the effect of cell loss on the MPEG-2 video layer. In a practical transmission scheme, this video information is combined with the audio and synchronisation information in the MPEG-2 systems layer. Results presented in this paper show that the omission of the effect of cell loss on the MPEG-2 systems layer may lead to a significant underestimation of the degradation in the quality of the decoded video
Zhang, J., Arnold, J. & Frater, M. 1995, 'A New Combinational Video Coding Algorithm', Australian Telecommunication Networks & Applications Conference, Sydney, Australia, December 1995 in Proceedings of Australian Telecommunication Networks & Applications conference 1995, ed N/A, IEEE, USA, pp. 45-47.
Zhang, J. & Bergman, N. 1993, 'A New 8x8 Fast DCT Algorithm for Image Compression', IEEE Visual Signal Processing and Communications, Melbourne, Australia, September 1993 in Proceedings of IEEE Workshop on Visual Signal Processing and Communications 1993, ed King N. Ngan, IEEE, Piscataway, NJ, pp. 57-60.

Journal Articles

Kusakunniran, W., Wu, Q., Zhang, J., Li, H. & Wang, L. 2014, 'Recognizing Gaits across Views through Correlated Motion Co-clustering', IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 696-709.
View/Download from: Publisher's site
Wu, Y., Ma, B., Yang, M., Zhang, J. & Jia, Y. 2014, 'Metric Learning Based Structural Appearance Model for Robust Visual Tracking', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 5, pp. 865-877.
View/Download from: Publisher's site
Xu, J., Wu, Q., Zhang, J., Shen, F. & Tang, Z. 2014, 'Boosting Separability in Semisupervised Learning for Object Classification', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 7, pp. 1197-1208.
View/Download from: Publisher's site
Liu, X., Wang, L., Zhang, J., Yin, J. & Liu, H. 2014, 'Global and Local Structure Preservation for Feature Selection', IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 6, pp. 1083-1095.
View/Download from: Publisher's site
The recent literature indicates that preserving global pairwise sample similarity is of great importance for feature selection and that many existing selection criteria essentially work in this way. In this paper, we argue that besides global pairwise sample similarity, the local geometric structure of data is also critical and that these two factors play different roles in different learning scenarios. In order to show this, we propose a global and local structure preservation framework for feature selection (GLSPFS) which integrates both global pairwise sample similarity and local geometric data structure to conduct feature selection. To demonstrate the generality of our framework, we employ methods that are well known in the literature to model the local geometric data structure and develop three specific GLSPFS-based feature selection algorithms. Also, we develop an efficient optimization algorithm with proven global convergence to solve the resulting feature selection problem. A comprehensive experimental study is then conducted in order to compare our feature selection algorithms with many state-of-the-art ones in supervised, unsupervised, and semisupervised learning scenarios. The result indicates that: 1) our framework consistently achieves statistically significant improvement in selection performance when compared with the currently used algorithms; 2) in supervised and semisupervised learning scenarios, preserving global pairwise similarity is more important than preserving local geometric data structure; 3) in the unsupervised scenario, preserving local geometric data structure becomes clearly more important; and 4) the best feature selection performance is always obtained when the two factors are appropriately integrated. In summary, this paper not only validates the advantages of the proposed GLSPFS framework but also gains more insight into the information to be preserved in different feature selection tasks.
Xu, J., Wu, Q., Zhang, J. & Tang, Z. 2014, 'Exploiting Universum data in AdaBoost using gradient descent', Image & Vision Computing, vol. 32, no. 8, pp. 550-557.
View/Download from: Publisher's site
Recently, Universum data that does not belong to any class of the training data, has been applied for training better classifiers. In this paper, we address a novel boosting algorithm called UAdaBoost that can improve the classification performance of AdaBoost with Universum data. UAdaBoost chooses a function by minimizing the loss for labeled data and Universum data. The cost function is minimized by a greedy, stagewise, functional gradient procedure. Each training stage of UAdaBoost is fast and efficient. The standard AdaBoost weights labeled samples during training iterations while UAdaBoost gives an explicit weighting scheme for Universum samples as well. In addition, this paper describes the practical conditions for the effectiveness of Universum learning. These conditions are based on the analysis of the distribution of ensemble predictions over training samples. Experiments on handwritten digits classification and gender classification problems are presented. As exhibited by our experimental results, the proposed method can obtain superior performances over the standard AdaBoost by selecting proper Universum data.
Liu, X., Yin, J., Wang, L., Liu, L., Liu, J., Hou, C. & Zhang, J. 2013, 'An Adaptive Approach To Learning Optimal Neighborhood Kernels', IEEE Transactions on Cybernetics, vol. 43, no. 1, pp. 371-384.
View/Download from: OPUS | Publisher's site
Learning an optimal kernel plays a pivotal role in kernel-based methods. Recently, an approach called optimal neighborhood kernel learning (ONKL) has been proposed, showing promising classification performance. It assumes that the optimal kernel will reside in the neighborhood of a +pre-specified+ kernel. Nevertheless, how to specify such a kernel in a principled way remains unclear. To solve this issue, this paper treats the pre-specified kernel as an extra variable and jointly learns it with the optimal neighborhood kernel and the structure parameters of support vector machines. To avoid trivial solutions, we constrain the pre-specified kernel with a parameterized model. We first discuss the characteristics of our approach and in particular highlight its adaptivity. After that, two instantiations are demonstrated by modeling the pre-specified kernel as a common Gaussian radial basis function kernel and a linear combination of a set of base kernels in the way of multiple kernel learning (MKL), respectively. We show that the optimization in our approach is a min-max problem and can be efficiently solved by employing the extended level method and Nesterov's method. Also, we give the probabilistic interpretation for our approach and apply it to explain the existing kernel learning methods, providing another perspective for their commonness and differences. Comprehensive experimental results on 13 UCI data sets and another two real-world data sets show that via the joint learning process, our approach not only adaptively identifies the pre-specified kernel, but also achieves superior classification performance to the original ONKL and the related MKL algorithms.
Liu, X., Wang, L., Yin, J., Zhu, E. & Zhang, J. 2013, 'An Efficient Approach To Integrating Radius Information Into Multiple Kernel Learning', IEEE Transactions on Cybernetics, vol. 43, no. 2, pp. 557-569.
View/Download from: OPUS | Publisher's site
Integrating radius information has been demonstrated by recent work on multiple kernel learning (MKL) as a promising way to improve kernel learning performance. Directly integrating the radius of the minimum enclosing ball (MEB) into MKL as it is, however, not only incurs significant computational overhead but also possibly adversely affects the kernel learning performance due to the notorious sensitivity of this radius to outliers. Inspired by the relationship between the radius of the MEB and the trace of total data scattering matrix, this paper proposes to incorporate the latter into MKL to improve the situation. In particular, in order to well justify the incorporation of radius information, we strictly comply with the radius-margin bound of support vector machines (SVMs) and thus focus on the l2-norm soft-margin SVM classifier. Detailed theoretical analysis is conducted to show how the proposed approach effectively preserves the merits of incorporating the radius of the MEB and how the resulting optimization is efficiently solved. Moreover, the proposed approach achieves the following advantages over its counterparts: 1) more robust in the presence of outliers or noisy training samples; 2) more computationally efficient by avoiding the quadratic optimization for computing the radius at each iteration; and 3) readily solvable by the existing off-the-shelf MKL packages. Comprehensive experiments are conducted on University of California, Irvine, protein subcellular localization, and Caltech-101 data sets, and the results well demonstrate the effectiveness and efficiency of our approach.
Xin, J., Chen, K., Bai, L., Liu, D. & Zhang, J. 2013, 'Depth Adaptive Zooming Visual Servoing For A Robot With A Zooming Camera', International Journal of Advanced Robotic Systems, vol. 10, no. 1, pp. 1-11.
View/Download from: OPUS |
AbstractTosolvetheviewvisibilityproblemandkeep theobservedobjectinthefieldofview(FOV)during thevisual servoing,adepthadaptivezoomingvisual servoing strategy for a manipulator robot with a zooming cameraisproposed. Firstly, a zoom control mechanismisintroducedintotherobotvisualservoing system.Itcandynamicallyadjustthecamera+sfieldof viewtokeepallthefeaturepointsontheobjectinthe fieldofviewofthe camera andgethighobjectlocal resolutionattheendofvisualservoing.Secondly,an invariant visual servoing method is employed to control the robot to the desired position under the changingintrinsicparametersofthecamera.Finally,a nonlinear depth adaptive estimation scheme in the invariant space using Lyapunov stability theory is proposedtoestimateadaptivelythedepthoftheimage features on the object. Three kinds of robot 4DOF visual positioning simulation experiments are conducted. The simulation experiment results show that the proposed approach has higher positioning precision.
Lu, S., Zhang, J., Wang, Z. & Feng, D. 2013, 'Fast Human Action Classification And VOI Localization With Enhanced Sparse Coding', Journal of Visual Communication, vol. 24, no. 2, pp. 127-136.
View/Download from: OPUS | Publisher's site
Sparse coding which encodes the natural visual signal into a sparse space for visual codebook generation and feature quantization, has been successfully utilized for many image classification applications. However, it has been seldom explored for many video analysis tasks. In particular, the increased complexity in characterizing the visual patterns of diverse human actions with both the spatial and temporal variations imposes more challenges to the conventional sparse coding scheme. In this paper, we propose an enhanced sparse coding scheme through learning discriminative dictionary and optimizing the local pooling strategy. Localizing when and where a specific action happens in realistic videos is another challenging task. By utilizing the sparse coding based representations of human actions, this paper further presents a novel coarse-to-fine framework to localize the Volumes of Interest (VOIs) for the actions. Firstly, local visual features are transformed into the sparse signal domain through our enhanced sparse coding scheme. Secondly, in order to avoid exhaustive scan of entire videos for the VOI localization, we extend the Spatial Pyramid Matching into temporal domain, namely Spatial Temporal Pyramid Matching, to obtain the VOI candidates. Finally, a multi-level branch-and-bound approach is developed to refine the VOI candidates. The proposed framework is also able to avoid prohibitive computations in local similarity matching (e.g., nearest neighbors voting). Experimental results on both two popular benchmark datasets (KTH and YouTube UCF) and the widely used localization dataset (MSR) demonstrate that our approach reduces computational cost significantly while maintaining comparable classification accuracy to that of the state-of-the-art methods
Song, Y., Zhang, J., Cao, L. & Sangeux, M. 2013, 'On Discovering the Correlated Relationship between Static and Dynamic Data in Clinical Gait Analysis', Lecture Notes in Computer Science, vol. 8190, no. 1, pp. 563-578.
View/Download from: OPUS | Publisher's site
`Gait' is a person's manner of walking. Patients may have an abnormal gait due to a range of physical impairment or brain damage. Clinical gait analysis (CGA) is a technique for identifying the underlying impairments that affect a patient+s gait pattern. The CGA is critical for treatment planning. Essentially, CGA tries to use patients+ physical examination results, known as static data, to interpret the dynamic characteristics in an abnormal gait, known as dynamic data. This process is carried out by gait analysis experts, mainly based on their experience which may lead to subjective diagnoses. To facilitate the automation of this process and form a relatively objective diagnosis, this paper proposes a new probabilistic correlated static-dynamic model (CSDM) to discover correlated relationships between the dynamic characteristics of gait and their root cause in the static data space. We propose an EMbased algorithm to learn the parameters of the CSDM. One of the main advantages of the CSDM is its ability to provide intuitive knowledge. For example, the CSDM can describe what kinds of static data will lead to what kinds of hidden gait patterns in the form of a decision tree, which helps us to infer dynamic characteristics based on static data. Our initial experiments indicate that the CSDM is promising for discovering the correlated relationship between physical examination (static) and gait (dynamic) data.
Kusakunniran, W., Wu, Q., Zhang, J., Ma, Y. & Li, H. 2013, 'A New View-Invariant Feature for Cross-View Gait Recognition', IEEE Transactions on Information Forensics and Security, vol. 8, no. 10, pp. 1642-1653.
View/Download from: OPUS | Publisher's site
Human gait is an important biometric feature which is able to identify a person remotely. However, change of view causes significant difficulties for recognizing gaits. This paper proposes a new framework to construct a new view-invariant feature for cross-view gait recognition. Our view-normalization process is performed in the input layer (i.e., on gait silhouettes) to normalize gaits from arbitrary views. That is, each sequence of gait silhouettes recorded from a certain view is transformed onto the common canonical view by using corresponding domain transformation obtained through invariant low-rank textures (TILTs). Then, an improved scheme of procrustes shape analysis (PSA) is proposed and applied on a sequence of the normalized gait silhouettes to extract a novel view-invariant gait feature based on procrustes mean shape (PMS) and consecutively measure a gait similarity based on procrustes distance (PD). Comprehensive experiments were carried out on widely adopted gait databases. It has been shown that the performance of the proposed method is promising when compared with other existing methods in the literature.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2012, 'Cross-view and multi-view gait recognitions based on view transformation model using multi-layer perceptron', Pattern Recognition Letters, vol. 33, pp. 882-889.
View/Download from: OPUS | Publisher's site
Gait has been shown to be an efficient biometric feature for human identification at a distance. However, performance of gait recognition can be affected by view variation. This leads to a consequent difficulty of cross-view gait recognition. A novel method is proposed to solve the above difficulty by using view transformation model (VTM). VTM is constructed based on regression processes by adopting multi-layer perceptron (MLP) as a regression tool. VTM estimates gait feature from one view using a well selected region of interest (ROI) on gait feature from another view. Thus, trained VTMs can normalize gait features from across views into the same view before gait similarity is measured. Moreover, this paper proposes a new multi-view gait recognition which estimates gait feature on one view using selected gait features from several other views. Extensive experimental results demonstrate that the proposed method significantly outperforms other baseline methods in literature for both cross-view and multi-view gait recognitions. In our experiments, particularly, average accuracies of 99%, 98% and 93% are achieved for multiple views gait recognition by using 5 cameras, 4 cameras and 3 cameras respectively.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2012, 'Gait Recognition Under Various Viewing Angles Based On Correlated Motion Regression', Ieee Transactions On Circuits And Systems For Video Technology, vol. 22, no. 6, pp. 966-980.
View/Download from: OPUS | Publisher's site
It is well recognized that gait is an important biometric feature to identify a person at a distance, e. g., in video surveillance application. However, in reality, change of viewing angle causes significant challenge for gait recognition. A novel approa
Thi, T., Cheng, L., Zhang, J., Wang, L. & satoh, S. 2012, 'Integrating local action elements for action analysis', Computer Vision and Image Understanding, vol. 116, no. 3, pp. 378-395.
View/Download from: OPUS | Publisher's site
In this paper, we propose a framework for human action analysis from video footage. A video action sequence in our perspective is a dynamic structure of sparse local spatial++temporal patches termed action elements, so the problems of action analysis in video are carried out here based on the set of local characteristics as well as global shape of a prescribed action. We first detect a set of action elements that are the most compact entities of an action, then we extend the idea of Implicit Shape Model to space time, in order to properly integrate the spatial and temporal properties of these action elements. In particular, we consider two different recipes to construct action elements: one is to use a Sparse Bayesian Feature Classifier to choose action elements from all detected Spatial Temporal Interest Points, and is termed discriminative action elements. The other one detects affine invariant local features from the holistic Motion History Images, and picks up action elements according to their compactness scores, and is called generative action elements. Action elements detected from either way are then used to construct a voting space based on their local feature representations as well as their global configuration constraints. Our approach is evaluated in the two main contexts of current human action analysis challenges, action retrieval and action classification. Comprehensive experimental results show that our proposed framework marginally outperforms all existing state-of-the-arts techniques on a range of different datasets.
Xu, J., Wu, Q., Zhang, J. & Tang, Z. 2012, 'Fast and Accurate Human Detection Using a Cascade of Boosted MS-LBP Features', IEEE Signal Processing Letters, vol. 19, no. 10, pp. 676-679.
View/Download from: Publisher's site
In this letter, a new scheme for generating local binary patterns (LBP) is presented. This Modi?ed Symmetric LBP (MS-LBP) feature takes advantage of LBP and gradient features. It is then applied into a boosted cascade framework for human detection. By combining MS-LBP with Haar-like feature into the boosted framework, the performances of heterogeneous features based detectors are evaluated for the best trade-off between accuracy and speed. Two feature training schemes, namely Single AdaBoost Training Scheme (SATS) and Dual AdaBoost Training Scheme (DATS) are proposed and compared. On the top of AdaBoost, two multidimensional feature projection methods are described. A comprehensive experiment is presented. Apart from obtaining higher detection accuracy, the detection speed based on DATS is 17 times faster than HOG method.
Thi, T., Cheng, L., Zhang, J., Wang, L. & satoh, S. 2012, 'Structured learning of local features for human action classification and localization', Image & Vision Computing, vol. 30, no. 1, pp. 1-14.
View/Download from: Publisher's site
Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods.
Kusakunniran, W., Wu, Q., Zhang, J. & Li, H. 2012, 'Gait Recognition across Various Walking Speeds using Higher-order Shape Configuration based on Differential Composition Model', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 6, pp. 1654-1668.
View/Download from: OPUS | Publisher's site
Gait has been known as an effective biometric feature to identify a person at a distance. However, variation of walking speeds may lead to significant changes to human walking patterns. It causes many difficulties for gait recognition. A comprehensive analysis has been carried out in this paper to identify such effects. Based on the analysis, Procrustes shape analysis is adopted for gait signature description and relevant similarity measurement. To tackle the challenges raised by speed change, this paper proposes a higher order shape configuration for gait shape description, which deliberately conserves discriminative information in the gait signatures and is still able to tolerate the varying walking speed. Instead of simply measuring the similarity between two gaits by treating them as two unified objects, a differential composition model (DCM) is constructed. The DCM differentiates the different effects caused by walking speed changes on various human body parts. In the meantime, it also balances well the different discriminabilities of each body part on the overall gait similarity measurements. In this model, the Fisher discriminant ratio is adopted to calculate weights for each body part. Comprehensive experiments based on widely adopted gait databases demonstrate that our proposed method is efficient for cross-speed gait recognition and outperforms other state-of-the-art methods.
Zhang, J., Li, N., Yang, Q. & Hu, C. 2012, 'Self-adaptive Chaotic Differential Evolution Algorithm for Solving Constrained Circular Packing Problem', Journal of Computational Information Systems, vol. 8, no. 18, pp. 7747-7755.
View/Download from: OPUS
Circles packing into a circular container with equilibrium constraint is a NP hard layout optimization problem. It has a broad application in engineering. This paper studies a two-dimensional constrained packing problem. Classical di?erential evolution for solving this problem is easy to fall into local optima. An adaptive chaotic di?erential evolution algorithm is proposed to improve the performance in this paper. The weighting parameters are dynamically adjusted by chaotic mutation in the searching procedure. The penalty factors of the ?tness function are modi?ed during iteration. To keep the diversity of the population, we limit the population+s concentration. To enhance the local search capability, we adopt adaptive mutation of the global optimal individual. The improved algorithm can maintain the basic algorithm+s structure as well as extend the searching scales, and can hold the diversity of population as well as increase the searching accuracy. Furthermore, our improved algorithm can escape from premature and speed up the convergence. Numerical examples indicate the e?ectiveness and efficiency of the proposed algorithm.
Shen, C., Paisitkriangkrai, S. & Zhang, J. 2011, 'Efficiently Learning a Detection Cascade with Sparse Eigenvectors', IEEE Transactions On Image Processing, vol. 19, no. 7, pp. 22-35.
View/Download from: OPUS | Publisher's site
Real-time object detection has many computer vision applications. Since Viola and Jones proposed the first real-time AdaBoost based face detection system, much effort has been spent on improving the boosting method. In this work, we first show that feature selection methods other than boosting can also be used for training an efficient object detector. In particular, we introduce greedy sparse linear discriminant analysis (GSLDA) for its conceptual simplicity and computational efficiency; and slightly better detection performance is achieved compared with . Moreover, we propose a new technique, termed boosted greedy sparse linear discriminant analysis (BGSLDA), to efficiently train a detection cascade. BGSLDA exploits the sample reweighting property of boosting and the class-separability criterion of GSLDA. Experiments in the domain of highly skewed data distributions (e.g., face detection) demonstrate that classifiers trained with the proposed BGSLDA outperforms AdaBoost and its variants. This finding provides a significant opportunity to argue that AdaBoost and similar approaches are not the only methods that can achieve high detection results for real-time object detection.
Paisitkriangkrai, S., Shen, C. & Zhang, J. 2011, 'Incremental Training of a Detector Using Online Sparse Eigendecomposition', IEEE Transactions On Image Processing, vol. 20, no. 1, pp. 213-226.
View/Download from: OPUS | Publisher's site
The ability to efficiently and accurately detect objects plays a very crucial role for many computer vision tasks. Recently, offline object detectors have shown a tremendous success. However, one major drawback of offline techniques is that a complete set of training data has to be collected beforehand. In addition, once learned, an offline detector cannot make use of newly arriving data. To alleviate these drawbacks, online learning has been adopted with the following objectives: 1) the technique should be computationally and storage efficient; 2) the updated classifier must maintain its high classification accuracy. In this paper, we propose an effective and efficient framework for learning an adaptive online greedy sparse linear discriminant analysis model. Unlike many existing online boosting detectors, which usually apply exponential or logistic loss, our online algorithm makes use of linear discriminant analysis++ learning criterion that not only aims to maximize the class-separation criterion but also incorporates the asymmetrical property of training data distributions. We provide a better alternative for online boosting algorithms in the context of training a visual object detector.We demonstrate the robustness and efficiency of our methods on handwritten digit and face data sets. Our results confirm that object detection tasks benefit significantly when trained in an online manner.
Lu, S., Zhang, J. & Feng, D. 2009, 'Detecting Ghost and Left Objects in Surveillance Video', International Journal of Pattern Recognition and Artificial Intelligence, vol. 23, no. 7, pp. 1503-1525.
View/Download from: Publisher's site
This paper proposes an efficient method for detecting ghost and left objects in surveillance video, which, if not identified, may lead to errors or wasted computational power in background modeling and object tracking in video surveillance systems. This method contains two main steps: the first one is to detect stationary objects, which narrows down the evaluation targets to a very small number of regions in the input image; the second step is to discriminate the candidates between ghost and left objects. For the first step, we introduce a novel stationary object detection method based on continuous object tracking and shape matching. For the second step, we propose a fast and robust inpainting method to differentiate between ghost and left objects by reconstructing the real background using the candidate+s corresponding regions in the current input and background image. The effectiveness of our method has been validated by experiments over a variety of video sequences and comparisons with existing state-of-art methods.
Paisitkriangkrai, s., Shen, c. & Zhang, J. 2008, 'Performance evaluation of local features in human classification and detection', IET Computer Vision, vol. 2, no. 4, pp. 236-246.
View/Download from: OPUS | Publisher's site
Detecting pedestrians accurately is the first fundamental step for many computer vision applications such as video surveillance, smart vehicles, intersection traffic analysis and so on. The authors present an experimental study on pedestrian detection us
Paisitkriangkrai, s., Shen, c. & Zhang, J. 2008, 'Fast pedestrian detection using a cascade of boosted covariance features', IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 8, pp. 1140-1151.
View/Download from: OPUS | Publisher's site
Efficiently and accurately detecting pedestrians plays a very important role in many computer vision applications such as video surveillance and smart cars. In order to find the right feature for this task, we first present a comprehensive experimental s
Lu, s., Zhang, J. & Dagan, f. 2007, 'Detecting unattended packages through human activity recognition and object association', Pattern Recognition, vol. 40, no. 8, pp. 2173-2184.
View/Download from: Publisher's site
This paper provides a novel approach to detect unattended packages in public venues. Different from previous works on this topic which are mostly limited to detecting static objects where no human is nearby, we provide a solution which can detect an unat
Zhao, C., Ngan, K., Zhang, J., Matthew, R. & Zhang, X. 2002, 'Using Inter Frame Dependence History to Select Intra Refresh Blocks', Electronics Letters, vol. 38, no. 22, pp. 1337-1338.
View/Download from: Publisher's site
To prevent error propagation in predictive video coding, intra-refresh methods are often employed. Here, a new algorithm for macroblock intra-refresh that is based on pixel level inter-frame dependence history is proposed. This new method keeps track of the coding mode of every pixel and refreshes those macroblocks that contain pixels with a long history record. Simulation results show that the new algorithm performs better than the existing methods while maintaining the same bit rate.
Zhang, J., Arnold, j. & Frater, m. 2000, 'A cell-loss concealment technique for MPEG-2 coded video', IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 4, pp. 659-665.
View/Download from: Publisher's site
Audio-visual and other multimedia services are seen as important sources of traffic for future telecommunication networks, including wireless networks. A major drawback with some wireless networks is that they introduce a significant number of transmissi
Arnold, j., Frater, m. & Zhang, J. 1999, 'Error resilience in the MPEG-2 video coding standard for cell based networks - a review', Signal Processing: Image Communication, vol. 14, no. 6, pp. 607-633.
View/Download from: Publisher's site
The MPEG-2 video coding standard is being extensively used worldwide for the provision of digital video services. Many of these applications involve the transport of MPEG-2 video over cell-based (or packet) networks. Examples include the broadband integr
Frater, m., Arnold, j. & Zhang, J. 1999, 'MPEG 2 video error resilience experiments: The importance considering the impact of the systems layer', Signal Processing: Image Communication, vol. 14, no. 3, pp. 269-275.
View/Download from: Publisher's site
With increasing interest in the transport of video traffic over lossy networks, several techniques for improving the quality of video services in the presence of loss have been proposed, often using the MPEG 2 video coding algorithm as a basis. Many of t
Zhang, J., Frater, m., Arnold, j. & Percival, t. 1997, 'MPEG 2 video services for wireless ATM networks', IEEE Journal on Selected Areas in Communications, vol. 15, no. 1, pp. 119-127.
View/Download from: Publisher's site
Audio-visual and other multimedia services are seen as an important source of traffic for future telecommunications networks, including wireless networks. In this paper, we examine the impact of the properties of a 50 Mb/s asynchronous transfer mode (ATM

Microsoft Research: Microsoft Corp. One Microsoft Way, Redmond WA 98052-6399, USA

Dr. Zhengyou Zhang,

Dr Philip A. Chou

Dr. Zicheng Liu

Dr. Xian-Sheng Hua

The Collaborative Research Project with MSR US:

1. Microsoft External Collaboration Project (Pilot funded project): Advanced 3D Deformable Surface Reconstruction and Tracking through RGB-D Cameras. The aim of this project is to develop novel computer vision technology for real time modelling and tracking of 3D dense and deformable surfaces using general RGB-D cameras. The expected outcomes of this project will add significant value to the current RGB-D camera platform when applied in the common scenario in which the RGB-D camera does not move but the deformable objects of interest are moving.

-----------------------------------------------------------------------------------------------------

Nokia Research Centre in Finland

Dr. Lixin Fan

The Collaborative Research Project with Nokia Research Centre in Finland:

2. Nokia External Collaboration Project (Pilot funded project): Large Scale 3D Image Processing. This project is to develop a novel algorithm for 3D image registration with different point clouds over 3D space. Our research outcome is a critical technology for Nokia’s mobile phone application.