Jian Zhang is an Associate Professor in the Faculty of Engineering and IT at the University of Technology, Sydney (UTS). A/Prof Zhang earned his PhD from the School of Information Technology and Electrical Engineering, the University of New South Wales in 1999. Currently, A/Prof Zhang is the Director of the Multimedia Data Analytics Lab (MDAL) in the Global Big Data Technologies Centre (GBDTC) at UTS. He has co-authored more than 180 papers in top journals and refereed conference proceedings from his research output, and filed more than ten patents in the US, UK and Australia, including six US-issued patents. He has actively engaged with research collaboration with industry labs, supervised PhD research students, and developed new multimedia analytics courses at UTS. Since 2011, as a leading chief investigator in the Faculty of Engineering and IT, he has led more than 10 research projects with industry labs, whose total value exceeds A$ 3.5 million. The industry labs include Microsoft Research, Nokia Research Centre in Finland, Toshiba R/D in Japan and Australia, and Canon. You may find detailed information, including scholarship and postdoc opportunities on his Personal Webpage.
A/Prof Zhang’s current research interests include 2D- and 3D-based computer vision, pattern recognition & data analytics, large-scale image and video content analytics, retrieval and mining, and multimedia and social media signal processing. From 1997 to 2003, he was with the Visual Information Processing Laboratory at Motorola Australian Research Centre (MARC) as a Principal Research Engineer and Research Manager of Visual Communications team. While at MARC, he worked on a range of research projects including image processing, video coding and communication, image segmentation, and multimedia content adaptation. From 2004 to 2011, he was a Principal Researcher and Project Leader at Data61 (formerly NICTA) Australia and a Conjoint Associate Professor at the School of Computer Science and Engineering, UNSW. He led three large NICTA research projects in the areas of computer vision, pattern recognition, and video surveillance content analytics.
Research Interest Areas:
· Image Processing & Computer Vision in Video Surveillance
· Pattern Recognition & Data Analytics
· Multimedia and Social Media Signal Processing
· Large Scale Image and Video Content Analysis
· Multimedia Information Retrieval
Prospective students may find information of scholarships funded by my research projects, the university and Australian governments here.
A/Prof Zhang is an IEEE Senior Member. He is a member of the Multimedia Systems & Applications Technical Committee (MSA-TC) and Visual Signal Processing and Communication Technical Committee (VSPC-TC) in IEEE Circuits and Systems Society (CAS). He was also a member of the Multimedia Signal Processing Technical Committee (MMSP-TC) in the IEEE Signal Processing Society. He was the General Co-Chair and Technical Program Co-Chair of the International Conference on Multimedia and Expo (ICME) in 2012 and 2020 respectively, and the Technical Program Co-Chair and General Co-Chair of the IEEE Conference on Visual Communications and Image Processing in 2014 and 2019 respectively. He was an Associate Editor for the IEEE Transactions on Circuits and Systems for Video Technology (2006 – 2015). Currently, he is an Associate Editor for the IEEE Transactions on Multimedia and a Member of Technical Directions Board, IEEE Signal Processing Society. In 2019, A/prof Jian Zhang was elected as a member of the IEEE SPS Technical Directions Board. This gave him the honour of leading the IEEE Signal Processing Society.
1. Chairs of International Conference and Professional Activities (selected key positions)
- Leading General Co-chair of 2012 IEEE International Conference on Multimedia and Expo in Melbourne (ICME12)
- Technical Co-Chairs of 2008 IEEE Multimedia Signal Processing Workshop (MMSP08)
- General Co-Chair of 2010 Digital Image Computing: Techniques and Applications (DICTA2010)
- Technical Program Co-chair of 2014 IEEE Intel. Conf. on Video Communication and Image Processing (VCIP14)
- General Co-chair of the 2019 IEEE Intel. Conf. on Visual Communications and Image Processing (VCIP19)
- Technical Program Co-chair of 2020 IEEE International Conference on Multimedia and Expo (ICME20) in London
2. IEEE Journal editorial boards
- Associate Editor of IEEE Transactions on Multimedia since 2017 (top 25% JCR Q1 rank),
- Associate Editor of IEEE Transactions on Circuits & Systems for Video Technology (top 25% JCR Q1 rank) 2006 – 2015
- Guest Editor of Computer Vision and Image Understanding for Special Issue (March 2016)
He is also serving and have served for the Technical Committees of IEEE SPS and IEEE CASS, for the groups on Multimedia Signal Processing, Mulyimedia Systems and Visual Signal Processing & Communications.
Can supervise: YES
Current research focus (2020)
3D Image Content Processing
Research on 3D object segmentation and registration has recently made significant progress with high demands in application giving way to new emerging 3D sensors such as Lidar. Most recently, our research has focused on 1) extracting distinctive representations of 3D point clouds with or without limited manual annotations, 2) learning-based registration by leveraging the advantages of conventional mathematical theories and recent deep learning to accurately align the 3D data, and 3) exploiting the correlation among points or point blocks (also called “contexts”) to improve the performance of point cloud segmentation methods.
Fine-Grained Categorisation for Large-Scale Image Datasets
Learning from large datasets for fine-grained recognition has attracted significant interest in the computer vision research community. Our research focuses on 1) web data supervised learning by building high-quality datasets, 2) feature alignment to reduce high intra-class variance, and 3) high-order feature extraction to enhance the discrimination of low inter-class variance in fine-grained object recognition.
Multi-view data, which are found in many applications, have drawn much attention in recent research. Our research on this field is focused on 1) multiple kernel learning, and multi-view latent space learning, 2) spatial-temporal data analysis, and 3) sparse high-dimensional data analysis. Corresponding matrix-factorisation models are developed for network-wide traffic predictions and theoretical maths models are also developed for related predictive tasks.
Social Multimedia Signal Processing
Combining multiple modalities (images and texts) to accomplish various social multimedia analysis tasks is attracting more attention, inspired by the fact that spreading and collecting multimodal data through the Internet has become the norm. Our recent research on this topic focuses on 1) multimodal fusion via supervised deep learning and 2) multimodal-based neural autoregression models.
I have two full PhD scholarships to fund high-profile PhD candidates in the following areas:
- Image processing & pattern recognition
- 2D amp; 3D computer vision
- Video/image content analysis
- Data analytics and multimedia information retrieval
- Social multimedia signal processing
- Multimedia and new media Analytics
For international students, in addition to a stipend that covers living expenses, the International Research Scholarship also waives tuition fees. Please contact me for more details.
1. “Automated, real-time monitoring of bird and flock movement and behaviour”, Industry CRC funded project, $188.000 (2020-2021) Role: Leading Chief Investigators.
2. "Deep Learning Based Fish Species Recognition, Size Estimation, and Freshness Measurement", Industry funded project, $150,000 (2019 - 2022) Role: Leading Chief Investigator.
3. "Trusted Fish Provenance and Quality Tracking System”, Industry funded project, $660,000 (2019-2020) Role: One of the Chief Investigators
4. "Automated Sheep Counting in the Live Export Industry", Industry funded project, $270,000 (2018-2020) Role: Leading Chief Investigator
5. "Data Analytics and Video Analysis for Transport of NSW", Industry CRC funded project, $320,000 (2017-2020) Role: Leading Chief Investigator
6. “Motion re-identification”, Industry funded project, $270,000 (2018-2019) Role: One of the Chief Investigators
7. “Technology Development for 3D Human Body Scanning and Measurement”, Industry funded project, $240,000 (2017-2018) Role: Leading Chief Investigator
8. "Safety video surveillance in a mining environment"; Industry and Australian Government funded Project (Innovation Connection), $185,000 (2015-2017) Role: Leading Chief Investigator
9. “Video based human behaviour analysis in shopping area”, Industry funded project, $350.000 (2016-2018) Role: Leading Chief Investigator
10. "3D image Content Processing", Industry Lab funded project, $70,000 (2013-2016) Role: Leading Chief Investigator
11. “Virtual Clothing fitting on Mobile”, Industry Lab funded project, $90,000 (2013-2014) Role: Leading Chief Investigator
12. “Human detection in local residential area”, Industry Lab funded research project, $70,000 (2012-2013) Role: Leading Chief Investigator
13. “Real Time 3D Non-rigid Surface Tracking Through Microsoft Kinect Platform”, Industry Lab funded research project, $110,000 (2012-2013) Role: Leading Chief Investigator
14. “Robust Automated Video Surveillance & Monitoring in Dynamic Scenes”, National ICT Australia (Data61/NICTA) - Defence Science and Technology Organization (DSTO) joint project award, $59,000, (2008-2011) Role: Leading Chief Investigator
48450 Real-time Operating Systems
31338 Network Servers
32520 Systems Administration
49238 Telecommunication Networks Management
Yao, L, Kusakunniran, W, Wu, Q, Zhang, J, Tang, Z & Yang, W 2020, 'Robust gait recognition using hybrid descriptors based on Skeleton Gait Energy Image', Pattern Recognition Letters.View/Download from: Publisher's site
© 2019 Gait features have been widely applied in human identification. The commonly-used representations for gait recognition can be roughly classified into two categories: model-free features and model-based features. However, due to the view variances and clothes changes, model-free features are sensitive to the appearance changes. For model-based features, there is great difficulty in extracting the underlying models from gait sequences. Based on the confidence maps and the part affinity fields produced by a two-branch multi-stage CNN network, a new model-based representation, Skeleton Gait Energy Image (SGEI), has been proposed in this paper. Another contribution is that a hybrid representation has been produced, which uses SGEI to remedy the deficiency of model-free features, Gait Energy Image (GEI) for instance. The experimental performances indicate that our proposed methods are more robust to the cloth changes, and contribute to increasing the robustness of gait recognition in the unconstrained environments with view variances and clothes changes.
Yao, Y, Zhang, J, Shen, F, Liu, L, Zhu, F, Zhang, D & Shen, HT 2020, 'Towards automatic construction of diverse, high-quality image datasets', IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 6, pp. 1199-1211.View/Download from: Publisher's site
© 1989-2012 IEEE. The availability of labeled image datasets has been shown critical for high-level image understanding, which continuously drives the progress of feature designing and models developing. However, constructing labeled image datasets is laborious and monotonous. To eliminate manual annotation, in this work, we propose a novel image dataset construction framework by employing multiple textual queries. We aim at collecting diverse and accurate images for given queries from the Web. Specifically, we formulate noisy textual queries removing and noisy images filtering as a multi-view and multi-instance learning problem separately. Our proposed approach not only improves the accuracy but also enhances the diversity of the selected images. To verify the effectiveness of our proposed approach, we construct an image dataset with 100 categories. The experiments show significant performance gains by using the generated data of our approach on several tasks, such as image classification, cross-dataset generalization, and object detection. The proposed method also consistently outperforms existing weakly supervised and web-supervised approaches.
Gao, G, Yu, Y, Xie, J, Yang, J, Yang, M & Zhang, J 2020, 'Constructing multilayer locality-constrained matrix regression framework for noise robust face super-resolution', Pattern Recognition.View/Download from: Publisher's site
© 2020 Representation learning methods have attracted considerable attention for learning-based face super-resolution in recent years. Conventional methods perform local models learning on low-resolution (LR) manifold and face reconstruction on high-resolution (HR) manifold respectively, leading to unsatisfactory reconstruction performance when the acquired LR face images are severely degraded (e.g., noisy, blurred). To tackle this issue, this paper proposes an efficient multilayer locality-constrained matrix regression (MLCMR) framework to learn the representation of the input LR patch and meanwhile preserve the manifold of the original HR space. Particularly, MLCMR uses nuclear norm regularization to capture the structural characteristic of the representation residual and applies an adaptive neighborhood selection scheme to find the HR patches that are compatible with its neighbors. Also, MLCMR iteratively applies the manifold structure of the desired HR space to induce the representation weights learning in the LR space, aims at reducing the inconsistency gap between different manifolds. Experimental results on widely used FEI database and real-world faces have demonstrated that compared with several state-of-the-art face super-resolution approaches, our proposed approach has the capability of obtaining better results both in objective metrics and visual quality.
Yao, Y, Shen, F, Xie, G, Liu, L, Zhu, F, Zhang, J & Shen, HT 2020, 'Exploiting Web Images for Multi-Output Classification: From Category to Subcategories', IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 31, no. 7, pp. 2348-2360.View/Download from: Publisher's site
Yao, Y, Shen, F, Zhang, J, Liu, L, Tang, Z & Shao, L 2019, 'Extracting Multiple Visual Senses for Web Learning', IEEE Transactions on Multimedia, vol. 21, no. 1, pp. 184-196.View/Download from: Publisher's site
© 1999-2012 IEEE. Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time consuming and labor intensive. To reduce the dependence on manually labeled data, there have been increasing research efforts on learning visual classifiers by directly exploiting web images. One issue that limits their performance is the problem of polysemy. Existing unsupervised approaches attempt to reduce the influence of visual polysemy by filtering out irrelevant images, but do not directly address polysemy. To this end, in this paper, we present a multimodal framework that solves the problem of polysemy by allowing sense-specific diversity in search results. Specifically, we first discover a list of possible semantic senses from untagged corpora to retrieve sense-specific images. Then, we merge visual similar semantic senses and prune noise by using the retrieved images. Finally, we train one visual classifier for each selected semantic sense and use the learned sense-specific classifiers to distinguish multiple visual senses. Extensive experiments on classifying images into sense-specific categories and reranking search results demonstrate the superiority of our proposed approach.
Shen, W, Wu, Y, Yuan, J, Duan, L, Zhang, J & Jia, Y 2019, 'Robust Distracter-Resistive Tracker via Learning a Multi-Component Discriminative Dictionary', IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 7, pp. 2012-2028.View/Download from: Publisher's site
IEEE Discriminative dictionary learning (DDL) provides an appealing paradigm for appearance modeling in visual tracking. However, most existing DDL based trackers cannot handle drastic appearance changes, especially for scenarios with background cluster and/or similar object interference. One reason is that they often suffer from the loss of subtle visual information which is critical to distinguish an object from distracters. In this paper, we explore the use of deep features extracted from the Convolutional Neural Networks (CNNs) to improve the object representation and propose a robust distracter-resistive tracker via learning a multi-component discriminative dictionary. The proposed method exploits both the intra-class and the interclass visual information to learn shared atoms and the classspecific atoms. By imposing several constraints into the objective function, the learned dictionary is reconstructive, compressive and discriminative, thus can better distinguish an object from the background. In addition, our convolutional features (deep features extracted from CNNs) have structural information for object localization and balance the discriminative power and semantic information of the object. Tracking is carried out within a Bayesian inference framework where a joint decision measure is used to construct the observation model. To alleviate the drift problem, the reliable tracking results obtained online are accumulated to update the dictionary. Both the qualitative and quantitative results on the CVPR2013 benchmark, the VOT2015 dataset and the SPOT dataset demonstrate that our tracker achieves better performance over the state-of-the-art approaches.
Cheng, H, Zhang, J, Wu, Q & An, P 2019, 'A computational model for stereoscopic visual saliency prediction', IEEE Transactions on Multimedia, vol. 21, no. 3, pp. 678-689.View/Download from: Publisher's site
© 2018 IEEE. Depth information plays an important role in human vision as it provides additional cues that distinguish objects from their backgrounds. This paper explores depth information for analyzing stereoscopic saliency and presents a computational model that predicts stereoscopic visual saliency based on three aspects of human vision: 1) the pop-out effect; 2) comfort zones; and 3) background effects. Through an analysis of these three phenomena, we find that most of the stereoscopic saliency region can be explained. Our model comprises three modules, each describing one aspect of saliency distribution, and a control function that can be used to adjust the three models independently. The relationship between the three models is not mutually exclusive. One, two, or three phenomena may appear in one image. Therefore, to accurately determine which phenomena the image conforms to, we have devised a selection strategy that chooses the appropriate combination of models based on the content of the image. Our approach is implemented within a framework based on the multifeature analysis. The framework considers surrounding regions, color/depth contrast, and points of interest. The selection strategy can improve the performance of the framework. A series of experiments on two recent eye-tracking datasets shows that our proposed method outperforms several state-of-the-art saliency models.
Yao, Y, Shen, F, Zhang, J, Liu, L, Tang, Z & Shao, L 2019, 'Extracting Privileged Information for Enhancing Classifier Learning', IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 436-450.View/Download from: Publisher's site
© 1992-2012 IEEE. The accuracy of data-driven learning approaches is often unsatisfactory when the training data is inadequate either in quantity or quality. Manually labeled privileged information (PI), e.g., attributes, tags or properties, is usually incorporated to improve classifier learning. However, the process of manually labeling is time-consuming and labor-intensive. Moreover, due to the limitations of personal knowledge, manually labeled PI may not be rich enough. To address these issues, we propose to enhance classifier learning by exploring PI from untagged corpora, which can effectively eliminate the dependency on manually labeled data and obtain much richer PI. In detail, we treat each selected PI as a subcategory and learn one classifier for each subcategory independently. The classifiers for all subcategories are integrated together to form a more powerful category classifier. Particularly, we propose a novel instance-level multi-instance learning model to simultaneously select a subset of training images from each subcategory and learn the optimal SVM classifiers based on the selected images. Extensive experiments on four benchmark data sets demonstrate the superiority of our proposed approach.
Huang, Y, Xu, J, Wu, Q, Zheng, Z, Zhang, Z & Zhang, J 2019, 'Multi-pseudo Regularized Label for Generated Data in Person Re-Identification.', IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1391-1403.View/Download from: Publisher's site
Sufficient training data normally is required to train deeply learned models. However, due to the expensive manual process for labelling large number of images (i.e., annotation), the amount of available training data (i.e., real data) is always limited. To produce more data for training a deep network, Generative Adversarial Network (GAN) can be used to generate artificial sample data (i.e., generated data). However, the generated data usually does not have annotation labels. To solve this problem, in this paper, we propose a virtual label called Multi-pseudo Regularized Label (MpRL) and assign it to the generated data. With MpRL, the generated data will be used as the supplementary of real training data to train a deep neural network in a semi-supervised learning fashion. To build the corresponding relationship between the real data and generated data, MpRL assigns each generated data a proper virtual label which reflects the likelihood of the affiliation of the generated data to predefined training classes in the real data domain. Unlike the traditional label which usually is a single integral number, the virtual label proposed in this work is a set of weight-based values each individual of which is a number in (0,1] called multi-pseudo label and reflects the degree of relation between each generated data to every pre-defined class of real data. A comprehensive evaluation is carried out by adopting two state-of-the-art convolutional neural networks (CNNs) in our experiments to verify the effectiveness of MpRL. Experiments demonstrate that by assigning MpRL to generated data, we can further improve the person re-ID performance on five re-ID datasets, i.e., Market-1501, DukeMTMC-reID, CUHK03, VIPeR, and CUHK01. The proposed method obtains +6.29%, +6.30%, +5.58%, +5.84%, and +3.48% improvements in rank-1 accuracy over a strong CNN baseline on the five datasets respectively, and outperforms state-of-the-art methods.
Wang, Y, Shuai, Y, Zhu, Y, Zhang, J & An, P 2019, 'Jointly learning perceptually heterogeneous features for blind 3D video quality assessment', Neurocomputing, vol. 332, pp. 298-304.View/Download from: Publisher's site
© 2018 Elsevier B.V. 3D videos quality assessment (3D-VQA) is essential to various 3D video processing applications. However, it has not been well investigated on how to make use of perceptual multi-channel video information to improve 3D-VQA under different distortion categories and degrees, especially under asymmetrical distortions. In the paper, we propose a new blind 3D-VQA metric by jointly learning perceptually heterogeneous features. Firstly, a binocular spatio-temporal internal generative mechanism (BST-IGM) is proposed to decompose the views of 3D video into multi-channel videos. Then, we extract perceptually heterogeneous features by proposed multi-channel natural video statistics (MNVS) model, which are characterized 3D video information. Furthermore, a robust AdaBoosting Radial Basis Function (RBF) neural network is utilized to map the features to the overall quality of 3D video. On two benchmark databases, the extensive evaluations demonstrate that the proposed algorithm significantly outperforms several state-of-the-art quality metrics in term of prediction accuracy and robustness.
Yang, D, Zou, YX, Zhang, J & Li, G 2019, 'C-RPNs: Promoting object detection in real world via a cascade structure of Region Proposal Networks', Neurocomputing, vol. 367, pp. 20-30.View/Download from: Publisher's site
© 2019 Elsevier B.V. Recently, significant progresses have been made in object detection on common benchmarks (i.e., Pascal VOC). However, object detection in real world is still challenging due to the serious data imbalance. Images in real world are dominated by easy samples like the wide range of background and some easily recognizable objects, for example. Although two-stage detectors like Faster R-CNN achieved big successes in object detection due to the strategy of extracting region proposals by Region Proposal Network, they show their poor adaption in real-world object detection as a result of without considering mining hard samples during extracting region proposals. To address this issue, we propose a Cascade framework of Region Proposal Networks, referred to as C-RPNs, which adopts multiple stages to mine hard samples while extracting region proposals and learn stronger classifiers. Meanwhile, a feature chain and a score chain are proposed to help learning more discriminative representations for proposals. Moreover, a loss function of cascade stages is designed to train cascade classifiers through backpropagation. Our proposed method has been evaluated on Pascal VOC and several challenging datasets like BSBDV 2017, CityPersons, etc. Our method achieves competitive results compared with the current state-of-the-arts and attains all-sided improvements in error analysis, validating its efficacy for detection in real world.
Ding, G, Zhang, S, Khan, S, Tang, Z, Zhang, J & Porikli, F 2019, 'Feature Affinity-Based Pseudo Labeling for Semi-Supervised Person Re-Identification', IEEE Transactions on Multimedia, vol. 21, no. 11, pp. 2891-2902.View/Download from: Publisher's site
© 1999-2012 IEEE. Vision-based person re-identification aims to match a person's identity across multiple images, which is a fundamental task in multimedia content analysis and retrieval. Deep neural networks have recently manifested great potential in this task. However, a major bottleneck of existing supervised deep networks is their reliance on a large amount of annotated training data. Manual labeling for person identities in large-scale surveillance camera systems is quite challenging and incurs significant costs. Some recent studies adopt generative model outputs as training data augmentation. To more effectively use these synthetic data for an improved feature learning and re-identification performance, this paper proposes a novel feature affinity-based pseudo labeling method with two possible label encodings. To the best of our knowledge, this is the first study that employs pseudo-labeling by measuring the affinity of unlabeled samples with the underlying clusters of labeled data samples using the intermediate feature representations from deep networks. We propose training the network with the joint supervision of cross-entropy loss together with a center regularization term, which not only ensures discriminative feature representation learning but also simultaneously predicts pseudo-labels for unlabeled data. We show that both label encodings can be learned in a unified manner and help improve the overall performance. Our extensive experiments on three person re-identification datasets: Market-1501, DukeMTMC-reID, and CUHK03, demonstrate significant performance boost over the state-of-the-art person re-identification approaches.
Zuo, Y, Wu, Q, Zhang, J & An, P 2018, 'Explicit Edge Inconsistency Evaluation Model for Color-guided Depth Map Enhancement', IEEE Transactions on Circuits and Systems for Video Technology.View/Download from: Publisher's site
Color-guided depth enhancement is to refine depth maps according to the assumption that the depth edges and the color edges at the corresponding locations are consistent. In the methods on such low-level vision task, Markov Random Fields (MRF) including its variants is one of major approaches, which has dominated this area for several years. However, the assumption above is not always true. To tackle the problem, the state-of-the-art solutions are to adjust the weighting coefficient inside the smoothness term of MRF model. These methods are lack of explicit evaluation model to quantitatively measure the inconsistency between the depth edge map and the color edge map, so it cannot adaptively control the efforts of the guidance from the color image for depth enhancement leading to various defects such as texture-copy artifacts and blurring depth edges. In this paper, we propose a quantitative measurement on such inconsistency and explicitly embed it into the smoothness term. The proposed method demonstrates the promising experimental results when compared with benchmark and the state-of-the-art methods on Middlebury datasets, ToF-Mark datasets and NYU datasets.
Huang, X, Zhang, J, Wu, Q, Fan, L & Yuan, C 2018, 'A coarse-to-fine algorithm for matching and registration in 3D cross-source point clouds', IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2965-2977.View/Download from: Publisher's site
IEEE We propose an efficient method to deal with the matching and registration problem found in cross-source point clouds captured by different types of sensors. This task is especially challenging due to the presence of density variation, scale difference, a large proportion of noise and outliers, missing data and viewpoint variation. The proposed method has two stages: in the coarse matching stage, we use the ESF descriptor to select potential K regions from the candidate point clouds for the target. In the fine stage, we propose a scale embedded generative GMM registration method to refine the results from the coarse matching stage. Following the fine stage, both the best region and accurate camera pose relationships between the candidates and target are found. We conduct experiments in which we apply the method to two applications: one is 3D object detection and localization in street-view ourdoor (LiDAR/VSFM) cross-source point clouds, and the other is 3D scene matching and registration in indoor (KinectFusion/VSFM) cross-source point clouds. The experiment results show that the proposed method performs well when compared with the existing methods. It also shows that the proposed method is robust under various sensing techniques such as LiDAR, Kinect and RGB camera.
© 2017 Elsevier B.V. This paper proposes a novel image co-segmentation method, which aims to segment the common objects in a group of images. The proposed method takes advantages of the reliability of simple images and successfully improves the performance. The images are first ranked by the complexities based on their saliency maps. Then, the simple images, in which objects are common and easy to be segmented, are selected and processed to obtain their segmentation results, these segmentation results are taken as the samples of the targeted objects. Finally, the remaining complicated images are segmented with the guidance of the samples. The experiments on the iCoseg dataset demonstrate the outperformance and robustness of the proposed method.
Kusakunniran, W, Wu, Q, Ritthipravat, P & Zhang, J 2018, 'Hard exudates segmentation based on learned initial seeds and iterative graph cut', COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, vol. 158, pp. 173-183.View/Download from: Publisher's site
Wang, Y, Zhang, J, Liu, Z, Wu, Q, Zhang, Z & Jia, Y 2018, 'Depth Super-Resolution on RGB-D Video Sequences With Large Displacement 3D Motion', IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 27, no. 7, pp. 3571-3585.View/Download from: Publisher's site
Zhang, J, Wu, Q, Shen, C, Zhang, J & Lu, J 2018, 'Multilabel Image Classification with Regional Latent Semantic Dependencies', IEEE Transactions on Multimedia, vol. 20, no. 10, pp. 2801-2813.View/Download from: Publisher's site
© 1999-2012 IEEE. Deep convolution neural networks (CNNs) have demonstrated advanced performance on single-label image classification, and various progress also has been made to apply CNN methods on multilabel image classification, which requires annotating objects, attributes, scene categories, etc., in a single shot. Recent state-of-the-art approaches to the multilabel image classification exploit the label dependencies in an image, at the global level, largely improving the labeling capacity. However, predicting small objects and visual concepts is still challenging due to the limited discrimination of the global visual features. In this paper, we propose a regional latent semantic dependencies model (RLSD) to address this problem. The utilized model includes a fully convolutional localization architecture to localize the regions that may contain multiple highly dependent labels. The localized regions are further sent to the recurrent neural networks to characterize the latent semantic dependencies at the regional level. Experimental results on several benchmark datasets show that our proposed model achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images. Also, we set up an upper bound model (RLSD+ft-RPN) using bounding-box coordinates during training, and the experimental results also show that our RLSD can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.
Zhao, J, Mao, X & Zhang, J 2018, 'Learning deep facial expression features from image and optical flow sequences using 3D CNN', Visual Computer, vol. 34, no. 10, pp. 1461-1475.View/Download from: Publisher's site
© 2018, Springer-Verlag GmbH Germany, part of Springer Nature. Facial expression is highly correlated with the facial motion. According to whether the temporal information of facial motion is used or not, the facial expression features can be classified as static and dynamic features. The former, which mainly includes the geometric features and appearance features, can be extracted by convolution or other learning filters; the latter, which are aimed to model the dynamic properties of facial motion, can be calculated through optical flow or other methods, respectively. When 3D convolutional neural networks (CNNs) are introduced, the extraction of two different types of features mentioned above becomes easy. In this paper, one 3D CNN architecture is presented to learn the static and dynamic features from facial image sequences and extract high-level dynamic features from optical flow sequences. Two types of dense optical flow, which contain the tracking information of facial muscle movement, are calculated according to different image pair construction methods. One is the common optical flow, and the other is an enhanced optical flow which is called accumulative optical flow. Four components of each type of optical flow are used in experiments. Three databases, two acted databases and one nearly realistic database, are selected to conduct the experiments. The experiments on the two acted databases achieve state-of-the-art accuracy, and indicate that the vertical component of optical flow has an advantage over other components in recognizing facial expression. The experimental results on the three selected databases show that more discriminative features can be learned from image sequences than from optical flow or accumulative optical flow sequences, and the accumulative optical flow contains more motion information than optical flow if the frame distance of the image pairs used to calculate them is not too large.
Zuo, Y, Wu, Q, Zhang, J & An, P 2018, 'Minimum Spanning Forest With Embedded Edge Inconsistency Measurement Model for Guided Depth Map Enhancement', IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 27, no. 8, pp. 4145-4159.View/Download from: Publisher's site
Xiao, L, Zhang, Y, Zhang, J, Wang, Q & Li, Y 2018, 'Combining HWEBING and HOG-MLBP features for pedestrian detection', The Journal of Engineering, vol. 2018, no. 16, pp. 1421-1426.View/Download from: Publisher's site
© 2017.The goal of this work is to automatically collect a large number of highly relevant natural images from Internet for given queries. A novel automatic image dataset construction framework is proposed by employing multiple query expansions. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora to obtain a richer semantic descriptions, from which the visually non-salient and less relevant expansions are then filtered. After retrieving images from the Internet with filtered expansions, we further filter noisy images by clustering and progressively Convolutional Neural Networks (CNN) based methods. To evaluate the performance of our proposed method for image dataset construction, we build an image dataset with 10 categories. We then run object detections on our image dataset with three other image datasets which were constructed by weak supervised, web supervised and full supervised learning, the experimental results indicated the effectiveness of our method is superior to weak supervised and web supervised state-of-the-art methods. In addition, we do a cross-dataset classification to evaluate the performance of our dataset with two publically available manual labelled dataset STL-10 and CIFAR-10.
© 2017.State-of-the-art performance in human action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, optical flow algorithms are far from perfect in low-resolution (LR) videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points (SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM encodes the spatial layout of features without any needs of body parts segmentation. Experimental results show that the approach is effective and, more importantly, the approach is more general for LR recognition tasks.
Yao, Y, Zhang, J, Shen, F, Hua, X, Xu, J & Tang, Z 2017, 'Exploiting Web Images for Dataset Construction: A Domain Robust Approach', IEEE TRANSACTIONS ON MULTIMEDIA, vol. 19, no. 8, pp. 1771-1784.View/Download from: Publisher's site
Huang, S, Zhang, J, Schonfeld, D, Wang, L & Hua, XS 2017, 'Two-Stage Friend Recommendation Based on Network Alignment and Series Expansion of Probabilistic Topic Model', IEEE Transactions on Multimedia, vol. 19, no. 6, pp. 1314-1326.View/Download from: Publisher's site
© 2017 IEEE. Precise friend recommendation is an important problem in social media. Although most social websites provide some kinds of auto friend searching functions, their accuracies are not satisfactory. In this paper, we propose a more precise auto friend recommendation method with two stages. In the first stage, by utilizing the information of the relationship between texts and users, as well as the friendship information between users, we align different social networks and choose some "possible friends." In the second stage, with the relationship between image features and users, we build a topic model to further refine the recommendation results. Because some traditional methods, such as variational inference and Gibbs sampling, have their limitations in dealing with our problem, we develop a novel method to find out the solution of the topic model based on series expansion. We conduct experiments on the Flickr dataset to show that the proposed algorithm recommends friends more precisely and faster than traditional methods.
Huang, X, Zhang, J, Fan, L, Wu, Q & Yuan, C 2017, 'A Systematic Approach for Cross-Source Point Cloud Registration by Preserving Macro and Micro Structures.', IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol. 26, no. 7, pp. 3261-3276.View/Download from: Publisher's site
We propose a systematic approach for registering cross-source point clouds that come from different kinds of sensors. This task is especially challenging due to the presence of significant missing data, large variations in point density, scale difference, large proportion of noise, and outliers. The robustness of the method is attributed to the extraction of macro and micro structures. Macro structure is the overall structure that maintains similar geometric layout in cross-source point clouds. Micro structure is the element (e.g., local segment) being used to build the macro structure. We use graph to organize these structures and convert the registration into graph matching. With a novel proposed descriptor, we conduct the graph matching in a discriminative feature space. The graph matching problem is solved by an improved graph matching solution, which considers global geometrical constraints. Robust cross source registration results are obtained by incorporating graph matching outcome with RANSAC and ICP refinements. Compared with eight state-of-the-art registration algorithms, the proposed method invariably outperforms on Pisa Cathedral and other challenging cases. In order to compare quantitatively, we propose two challenging cross-source data sets and conduct comparative experiments on more than 27 cases, and the results show we obtain much better performance than other methods. The proposed method also shows high accuracy in same-source data sets.
Cheng, H, Zhang, J, Wu, Q, An, P & Liu, Z 2017, 'Stereoscopic visual saliency prediction based on stereo contrast and stereo focus', EURASIP Journal on Image and Video Processing.View/Download from: Publisher's site
In this paper, we exploit two characteristics of stereoscopic vision: the pop-out effect and the comfort zone. We propose a visual saliency prediction model for stereoscopic images based on stereo contrast and stereo focus models. The stereo contrast model measures stereo saliency based on the color/depth contrast and the pop-out effect. The stereo focus model describes the degree of focus based on monocular focus and the comfort zone. After obtaining the values of the stereo contrast and stereo focus models in parallel, an enhancement based on clustering is performed on both values. We then apply a multi-scale fusion to form the respective maps of the two models. Last, we use a Bayesian integration scheme to integrate the two maps (the stereo contrast and stereo focus maps) into the stereo saliency map. Experimental results on two eye-tracking databases show that our proposed method outperforms the state-of-the-art saliency models.
Edwards, D, Cheng, M, Wong, A, Zhang, J & Wu, Q 2017, 'Ambassadors of Knowledge Sharing: Co-produced Travel Information Through Tourist-Local Social Media Exchange', International Journal of Contemporary Hospitality Management, vol. 29, no. 2, pp. 690-708.View/Download from: Publisher's site
Purpose: The aim of this study is to understand the knowledge sharing structure and co-production of trip-related knowledge through online travel forums.
Design/methodology/approach: The travel forum threads were collected from TripAdvisor Sydney travel forum for the period from 2010 to 2014, which contains 115,847 threads from 8,346 conversations. The data analytical technique was based on a novel methodological approach - visual analytics including semantic pattern generation and network analysis.
Findings: Findings indicate that the knowledge structure is created by community residents who camouflage as local experts, serve as ambassadors of a destination. The knowledge structure presents collective intelligence co-produced by community residents and tourists. Further findings reveal how these community residents associate with each other and form a knowledge repertoire with information covering various travel domain areas.
Practical implications: The study offers valuable insights to help destination management organizations and tour operators identify existing and emerging tourism issues to achieve a competitive destination advantage.
Originality/value: This study highlights the process of social media mediated travel knowledge co-production. It also discovers how community residents engage in reaching out to tourists by camouflaging as ordinary users.
Guo, D, Xu, J, Zhang, J, Xu, M, Cui, Y & He, X 2017, 'User relationship strength modeling for friend recommendation on Instagram', Neurocomputing, vol. 239, pp. 9-18.View/Download from: Publisher's site
© 2017 Elsevier B.V.Social strength modeling in the social media community has attracted increasing research interest. Different from Flickr, which has been explored by many researchers, Instagram is more popular for mobile users and is conducive to likes and comments but seldom investigated. On Instagram, a user can post photos/videos, follow other users, comment and like other users' posts. These actions generate diverse forms of data that result in multiple user relationship views. In this paper, we propose a new framework to discover the underlying social relationship strength. User relationship learning under multiple views and the relationship strength modeling are coupled into one process framework. In addition, given the learned relationship strength, a coarse-to-fine method is proposed for friend recommendation. Experiments on friend recommendations for Instagram are presented to show the effectiveness and efficiency of the proposed framework. As exhibited by our experimental results, it can obtain better performance over other related methods. Although our method has been proposed for Instagram, it can be easily extended to any other social media communities.
Wang, Y, Zhang, J, Liu, Z, Wu, Q, Chou, P, Zhang, Z & Jia, Y 2016, 'Handling Occlusion and Large Displacement through Improved RGB-D Scene Flow Estimation', IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 7, pp. 1265-1278.View/Download from: Publisher's site
The accuracy of scene flow is restricted by several challenges such as occlusion and large displacement motion. When occlusion happens, the positions inside the occluded regions lose their corresponding counterparts in preceding and succeeding frames. Large displacement motion will increase the complexity of motion modeling and computation. Moreover, occlusion and large displacement motion are highly related problems in scene flow estimation, e.g., large displacement motion often leads to considerably occluded regions in the scene. An improved dense scene flow method based on red-green-blue-depth (RGB-D) data is proposed in this paper. To handle occlusion, we model the occlusion status for each point in our problem formulation, and jointly estimate the scene flow and occluded regions. To deal with large displacement motion, we employ an over-parameterized scene flow representation to model both the rotation and translation components of the scene flow, since large displacement motion cannot be well approximated using translational motion only. Furthermore, we employ a two-stage optimization procedure for this overparameterized scene flow representation. In the first stage, we propose a new RGB-D PatchMatch method, which is mainly applied in the RGB-D image space to reduce the computational complexity introduced by the large displacement motion. According to the quantitative evaluation based on the Middlebury data set, our method outperforms other published methods. The improved performance is also comprehensively confirmed on the real data acquired by Kinect sensor.
Huang, S, Zhang, J, Wang, L & Hua, X-S 2016, 'Social Friend Recommendation Based on Multiple Network Correlation', IEEE TRANSACTIONS ON MULTIMEDIA, vol. 18, no. 2, pp. 287-299.View/Download from: Publisher's site
Ma, X, Liu, D, Zhang, J & Xin, J 2015, 'A fast affine-invariant features for image stitching under large viewpoint changes', NEUROCOMPUTING, vol. 151, pp. 1430-1438.View/Download from: Publisher's site
Zhou, T, Lu, Y, Lv, F, Di, H, Zhao, Q & Zhang, J 2015, 'Abrupt motion tracking via nearest neighbor field driven stochastic sampling', Neurocomputing, vol. 165, pp. 350-360.View/Download from: Publisher's site
Stochastic sampling based trackers have shown good performance for abrupt motion tracking so that they have gained popularity in recent years. However, conventional methods tend to use a two-stage sampling paradigm in which the search space needs to be uniformly explored with an inefficient preliminary sampling phase. In this paper, we propose a novel sampling-based method in the Bayesian filtering framework to address the problem. Within the framework, nearest neighbor field estimation is utilized to compute the importance proposal probabilities, which guide the Markov chain search towards promising regions and thus enhance the sampling efficiency; given the motion priors, a smoothing stochastic sampling Monte Carlo algorithm is proposed to approximate the posterior distribution through a smoothing weight-updating scheme. Moreover, to track the abrupt and the smooth motions simultaneously, we develop an abrupt-motion detection scheme which can discover the presence of abrupt motions during online tracking. Extensive experiments on challenging image sequences demonstrate the effectiveness and the robustness of our algorithm in handling the abrupt motions.
Lu, S, Mei, T, Wang, J, Zhang, J, Wang, Z & Li, S 2015, 'Exploratory Product Image Search With Circle-to-Search Interaction', IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 25, no. 7, pp. 1190-1202.View/Download from: Publisher's site
Wang, S, Zhang, J, Han, TX & Miao, Z 2015, 'Sketch-Based Image Retrieval Through Hypothesis-Driven Object Boundary Selection With HLR Descriptor', IEEE Transactions on Multimedia, vol. 17, no. 7, pp. 1045-1057.View/Download from: Publisher's site
The appearance gap between sketches and photo- realistic images is a fundamental challenge in sketch-based image retrieval (SBIR) systems. The existence of noisy edges on photo- realistic images is a key factor in the enlargement of the appearance gap and significantly degrades retrieval performance . To bridge the gap, we propose a framework consisting of a new line segment -based descriptor named histogram of line relationship (HLR) and a new noise impact reduction algorithm known as object boundary selection . HLR treats sketches and extracted edges of photo- realistic images as a series of piece-wise line segments and captures the relationship between them. Based on the HLR, the object boundary selection algorithm aims to reduce the impact of noisy edges by selecting the shaping edges that best correspond to the object boundaries. Multiple hypotheses are generated for descriptors by hypothetical edge selection. The selection algorithm is formulated to find the best combination of hypotheses to maximize the retrieval score; a fast method is also proposed. To reduce the distraction of false matches in the scoring process, two constraints on spatial and coherent aspects are introduced . We tested the HLR descriptor and the proposed framework on public datasets and a new image dataset of three million images, which we recently collected for SBIR evaluation purposes. We compared the proposed HLR with state-of-the-art descriptors (SHoG, GF-HOG). The experimental results show that our HLR descriptor outperforms them. Combined with the object boundary selection algorithm, our framework significantly improves SBIR performance.
Wu, Y, Jia, Y, Li, P, Zhang, J & Yuan, J 2015, 'Manifold Kernel Sparse Representation of Symmetric Positive-Definite Matrices and Its Applications', IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 24, no. 11, pp. 3729-3741.View/Download from: Publisher's site
Cui, Y, Zhang, J, Guo, D & Jin, Z 2015, 'Robust facial landmark localization using classified random ferns and pose-based initialization', SIGNAL PROCESSING, vol. 110, pp. 46-53.View/Download from: Publisher's site
BACKGROUND: Vision-based surveillance and monitoring is a potential alternative for early detection of respiratory disease outbreaks in urban areas complementing molecular diagnostics and hospital and doctor visit-based alert systems. Visible actions representing typical flu-like symptoms include sneeze and cough that are associated with changing patterns of hand to head distances, among others. The technical difficulties lie in the high complexity and large variation of those actions as well as numerous similar background actions such as scratching head, cell phone use, eating, drinking and so on. RESULTS: In this paper, we make a first attempt at the challenging problem of recognizing flu-like symptoms from videos. Since there was no related dataset available, we created a new public health dataset for action recognition that includes two major flu-like symptom related actions (sneeze and cough) and a number of background actions. We also developed a suitable novel algorithm by introducing two types of Action Matching Kernels, where both types aim to integrate two aspects of local features, namely the space-time layout and the Bag-of-Words representations. In particular, we show that the Pyramid Match Kernel and Spatial Pyramid Matching are both special cases of our proposed kernels. Besides experimenting on standard testbed, the proposed algorithm is evaluated also on the new sneeze and cough set. Empirically, we observe that our approach achieves competitive performance compared to the state-of-the-arts, while recognition on the new public health dataset is shown to be a non-trivial task even with simple single person unobstructed view. CONCLUSIONS: Our sneeze and cough video dataset and newly developed action recognition algorithm is the first of its kind and aims to kick-start the field of action recognition of flu-like symptoms from videos. It will be challenging but necessary in future developments to consider more complex real-life scenario of detecting ...
Kusakunniran, W, Wu, Q, Li, H, Zhang, J & Wang, L 2014, 'Recognizing Gaits across Views through Correlated Motion Co-clustering', IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 696-709.View/Download from: Publisher's site
Wang, D, Yuan, C, Sun, Y, Zhang, J & Jin, X 2014, 'A fast mode decision algorithm applied to Coarse-Grain quality Scalable Video Coding', Journal of Visual Communication and Image Representation, vol. 25, no. 7, pp. 1631-1639.View/Download from: Publisher's site
Liu, XW, Wang, L, Zhang, J, Yin, JP & Liu, H 2014, 'Global and Local Structure Preservation for Feature Selection', IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 6, pp. 1083-1095.View/Download from: Publisher's site
The recent literature indicates that preserving global pairwise sample similarity is of great importance for feature selection and that many existing selection criteria essentially work in this way. In this paper, we argue that besides global pairwise sample similarity, the local geometric structure of data is also critical and that these two factors play different roles in different learning scenarios. In order to show this, we propose a global and local structure preservation framework for feature selection (GLSPFS) which integrates both global pairwise sample similarity and local geometric data structure to conduct feature selection. To demonstrate the generality of our framework, we employ methods that are well known in the literature to model the local geometric data structure and develop three specific GLSPFS-based feature selection algorithms. Also, we develop an efficient optimization algorithm with proven global convergence to solve the resulting feature selection problem. A comprehensive experimental study is then conducted in order to compare our feature selection algorithms with many state-of-the-art ones in supervised, unsupervised, and semisupervised learning scenarios. The result indicates that: 1) our framework consistently achieves statistically significant improvement in selection performance when compared with the currently used algorithms; 2) in supervised and semisupervised learning scenarios, preserving global pairwise similarity is more important than preserving local geometric data structure; 3) in the unsupervised scenario, preserving local geometric data structure becomes clearly more important; and 4) the best feature selection performance is always obtained when the two factors are appropriately integrated. In summary, this paper not only validates the advantages of the proposed GLSPFS framework but also gains more insight into the information to be preserved in different feature selection tasks.
Lu, S, Mei, T, Wang, J, Zhang, J, Wang, Z & Li, S 2014, 'Browse-to-Search: Interactive Exploratory Search with Visual Entities', ACM Transactions on Information Systems, vol. 32, no. 4.View/Download from: Publisher's site
With the development of image search technology, users are no longer satisfied with searching for images using just metadata and textual descriptions. Instead, more search demands are focused on retrieving images based on similarities in their contents (textures, colors, shapes etc.). Nevertheless, one image may deliver rich or complex content and multiple interests. Sometimes users do not sufficiently define or describe their seeking demands for images even when general search interests appear, owing to a lack of specific knowledge to express their intents. A new form of information seeking activity, referred to as exploratory search, is emerging in the research community, which generally combines browsing and searching content together to help users gain additional knowledge and form accurate queries, thereby assisting the users with their seeking and investigation activities. However, there have been few attempts at addressing integrated exploratory search solutions when image browsing is incorporated into the exploring loop. In this work, we investigate the challenges of understanding users' search interests from the images being browsed and infer their actual search intentions. We develop a novel system to explore an effective and efficient way for allowing users to seamlessly switch between browse and search processes, and naturally complete visual-based exploratory search tasks. The system, called Browse-to-Search enables users to specify their visual search interests by circling any visual objects in the webpages being browsed, and then the system automatically forms the visual entities to represent users' underlying intent. One visual entity is not limited by the original image content, but also encapsulated by the textual-based browsing context and the associated heterogeneous attributes. We use large-scale image search technology to find the associated textual attributes from the repository. Users can then utilize the encapsulated visual entities to co...
Wu, Y, Ma, B, Yang, M, Zhang, J & Jia, Y 2014, 'Metric Learning Based Structural Appearance Model for Robust Visual Tracking', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 5, pp. 865-877.View/Download from: Publisher's site
Xu, J, Wu, Q, Zhang, J, Shen, F & Tang, Z 2014, 'Boosting Separability in Semisupervised Learning for Object Classification', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 7, pp. 1197-1208.View/Download from: Publisher's site
Tushar, W, Zhang, JA, Smith, DB, Poor, HV & Thiébaux, S 2014, 'Prioritizing Consumers in Smart Grid: A Game Theoretic Approach', IEEE Transactions on Smart Grid, vol. 5, no. 3, pp. 1429-1438.View/Download from: Publisher's site
This paper proposes an energy management technique for a consumer-to-grid system in smart grid. The benefit to consumers is made the primary concern to encourage consumers to participate voluntarily in energy trading with the central power station (CPS) in situations of energy deficiency. A novel system model motivating energy trading under the goal of social optimality is proposed. A single-leader multiple-follower Stackelberg game is then studied to model the interactions between the CPS and a number of energy consumers (ECs), and to find optimal distributed solutions for the optimization problem based on the system model. The CPS is considered as a leader seeking to minimize its total cost of buying energy from the ECs, and the ECs are the followers who decide on how much energy they will sell to the CPS for maximizing their utilities. It is shown that the game, which can be implemented distributedly, possesses a socially optimal solution, in which the sum of the benefits to all consumers is maximized, as the total cost to the CPS is minimized. Numerical analysis confirms the effectiveness of the game.
Zhang, JA 2013, 'Response to" On Mathematical Equivalence Between Vector OFDM and Quadrature OFDMA"', Communications, IEEE Transactions on, vol. 61, pp. 815-815.
Liu, X, Yin, J, Wang, L, Liu, L, Liu, J, Hou, C & Zhang, J 2013, 'An Adaptive Approach To Learning Optimal Neighborhood Kernels', IEEE Transactions on Cybernetics, vol. 43, no. 1, pp. 371-384.View/Download from: Publisher's site
Learning an optimal kernel plays a pivotal role in kernel-based methods. Recently, an approach called optimal neighborhood kernel learning (ONKL) has been proposed, showing promising classification performance. It assumes that the optimal kernel will reside in the neighborhood of a pre-specified kernel. Nevertheless, how to specify such a kernel in a principled way remains unclear. To solve this issue, this paper treats the pre-specified kernel as an extra variable and jointly learns it with the optimal neighborhood kernel and the structure parameters of support vector machines. To avoid trivial solutions, we constrain the pre-specified kernel with a parameterized model. We first discuss the characteristics of our approach and in particular highlight its adaptivity. After that, two instantiations are demonstrated by modeling the pre-specified kernel as a common Gaussian radial basis function kernel and a linear combination of a set of base kernels in the way of multiple kernel learning (MKL), respectively. We show that the optimization in our approach is a min-max problem and can be efficiently solved by employing the extended level method and Nesterov's method. Also, we give the probabilistic interpretation for our approach and apply it to explain the existing kernel learning methods, providing another perspective for their commonness and differences. Comprehensive experimental results on 13 UCI data sets and another two real-world data sets show that via the joint learning process, our approach not only adaptively identifies the pre-specified kernel, but also achieves superior classification performance to the original ONKL and the related MKL algorithms.
Liu, X, Wang, L, Yin, J, Zhu, E & Zhang, J 2013, 'An Efficient Approach To Integrating Radius Information Into Multiple Kernel Learning', IEEE Transactions on Cybernetics, vol. 43, no. 2, pp. 557-569.View/Download from: Publisher's site
Integrating radius information has been demonstrated by recent work on multiple kernel learning (MKL) as a promising way to improve kernel learning performance. Directly integrating the radius of the minimum enclosing ball (MEB) into MKL as it is, however, not only incurs significant computational overhead but also possibly adversely affects the kernel learning performance due to the notorious sensitivity of this radius to outliers. Inspired by the relationship between the radius of the MEB and the trace of total data scattering matrix, this paper proposes to incorporate the latter into MKL to improve the situation. In particular, in order to well justify the incorporation of radius information, we strictly comply with the radius-margin bound of support vector machines (SVMs) and thus focus on the l2-norm soft-margin SVM classifier. Detailed theoretical analysis is conducted to show how the proposed approach effectively preserves the merits of incorporating the radius of the MEB and how the resulting optimization is efficiently solved. Moreover, the proposed approach achieves the following advantages over its counterparts: 1) more robust in the presence of outliers or noisy training samples; 2) more computationally efficient by avoiding the quadratic optimization for computing the radius at each iteration; and 3) readily solvable by the existing off-the-shelf MKL packages. Comprehensive experiments are conducted on University of California, Irvine, protein subcellular localization, and Caltech-101 data sets, and the results well demonstrate the effectiveness and efficiency of our approach.
Xin, J, Chen, K, Bai, L, Liu, D & Zhang, J 2013, 'Depth Adaptive Zooming Visual Servoing For A Robot With A Zooming Camera', International Journal of Advanced Robotic Systems, vol. 10, no. 1, pp. 1-11.View/Download from: Publisher's site
Abstract To solve the view visibility problem and keep the observed object in the field of view (FOV) during the visual servoing, a depth adaptive zooming visual servoing strategy for a manipulator robot with a zooming camera is proposed. Firstly, a zoom control mechanism is introduced into the robot visual servoing system. It can dynamically adjust the cameras field of view to keep all the feature points on the object in the field of view of the camera and get high object local resolution at the end of visual servoing. Secondly, an invariant visual servoing method is employed to control the robot to the desired position under the changing intrinsic parameters of the camera. Finally, a nonlinear depth adaptive estimation scheme in the invariant space using Lyapunov stability theory is proposed to estimate adaptively the depth of the image features on the object. Three kinds of robot 4DOF visual positioning simulation experiments are conducted. The simulation experiment results show that the proposed approach has higher positioning precision.
Lu, S, Zhang, J, Wang, Z & Feng, D 2013, 'Fast Human Action Classification And VOI Localization With Enhanced Sparse Coding', Journal of Visual Communication, vol. 24, no. 2, pp. 127-136.View/Download from: Publisher's site
Sparse coding which encodes the natural visual signal into a sparse space for visual codebook generation and feature quantization, has been successfully utilized for many image classification applications. However, it has been seldom explored for many video analysis tasks. In particular, the increased complexity in characterizing the visual patterns of diverse human actions with both the spatial and temporal variations imposes more challenges to the conventional sparse coding scheme. In this paper, we propose an enhanced sparse coding scheme through learning discriminative dictionary and optimizing the local pooling strategy. Localizing when and where a specific action happens in realistic videos is another challenging task. By utilizing the sparse coding based representations of human actions, this paper further presents a novel coarse-to-fine framework to localize the Volumes of Interest (VOIs) for the actions. Firstly, local visual features are transformed into the sparse signal domain through our enhanced sparse coding scheme. Secondly, in order to avoid exhaustive scan of entire videos for the VOI localization, we extend the Spatial Pyramid Matching into temporal domain, namely Spatial Temporal Pyramid Matching, to obtain the VOI candidates. Finally, a multi-level branch-and-bound approach is developed to refine the VOI candidates. The proposed framework is also able to avoid prohibitive computations in local similarity matching (e.g., nearest neighbors voting). Experimental results on both two popular benchmark datasets (KTH and YouTube UCF) and the widely used localization dataset (MSR) demonstrate that our approach reduces computational cost significantly while maintaining comparable classification accuracy to that of the state-of-the-art methods
Zhang, J, Wu, Q, Kusakunniran, W, Ma, Y & Li, H 2013, 'A New View-Invariant Feature for Cross-View Gait Recognition', IEEE Transactions on Information Forensics and Security, vol. 8, no. 10, pp. 1642-1653.View/Download from: Publisher's site
Human gait is an important biometric feature which is able to identify a person remotely. However, change of view causes significant difficulties for recognizing gaits. This paper proposes a new framework to construct a new view-invariant feature for cross-view gait recognition. Our view-normalization process is performed in the input layer (i.e., on gait silhouettes) to normalize gaits from arbitrary views. That is, each sequence of gait silhouettes recorded from a certain view is transformed onto the common canonical view by using corresponding domain transformation obtained through invariant low-rank textures (TILTs). Then, an improved scheme of procrustes shape analysis (PSA) is proposed and applied on a sequence of the normalized gait silhouettes to extract a novel view-invariant gait feature based on procrustes mean shape (PMS) and consecutively measure a gait similarity based on procrustes distance (PD). Comprehensive experiments were carried out on widely adopted gait databases. It has been shown that the performance of the proposed method is promising when compared with other existing methods in the literature.
Zhang, JA, Collings, IB, Chen, CS, Roullet, L, Luo, L, Ho, S-W & Yuan, J 2013, 'Evolving small-cell communications towards mobile-over-FTTx networks', IEEE Communications Magazine, vol. 51, no. 12, pp. 92-101.View/Download from: Publisher's site
Small cell techniques are recognized as the best way to deliver high capacity for broadband cellular communications. Femtocell and distributed antenna systems (DASs) are important components in the overall small cell story, but are not the complete solution. They have major disadvantages of very limited cooperation capability and expensive deployment cost, respectively. In this article, we propose a novel mobile-over-FTTx (MoF) network architecture, where an FTTx network is enhanced as an integrated rather than a simple backhauling component of a new mobile network delivering low-cost and powerful small cell solutions. In part, the MoF architecture combines the advantages of femtocells and DASs, while overcoming their disadvantages. Implementation challenges and potential solutions are discussed. Simulation results are presented and demonstrate the strong potential of the MoF in boosting the capacity of mobile networks.
Zhang, JA, Huang, X, Suzuki, H & Chen, Z 2013, 'Gaussian approximation based interpolation for channel matrix inversion in MIMO-OFDM systems', IEEE Transactions on Wireless Communications, vol. 12, no. 3, pp. 1407-1417.View/Download from: Publisher's site
Channel matrix inversion, which requires significant hardware resource and computational power, is a very challenging problem in MIMO-OFDM systems. Casting the frequency-domain channel matrix into a polynomial matrix, interpolation-based matrix inversion provides a promising solution to this problem. In this paper, we propose novel algorithms for interpolation based matrix inversion, which require little prior information of the channel matrix and enable the use of simple low-complexity interpolators such as spline and low pass filter interpolators. By invoking the central limit theorem, we show that a Gaussian approximation function well characterizes the power of the polynomial coefficients. Some low-complexity and efficient schemes are then proposed to estimate the parameters of the Gaussian function. With these estimated parameters, we introduce phase shifted interpolation and propose two algorithms which can achieve good interpolation accuracy using general low-complexity interpolators. Simulation results show that up to 85% complexity saving can be achieved with small performance degradation.
This letter extends our previous work on layered inverse Fast Fourier Transform (IFFT) structure to a multistage layered IFFT structure where data symbols can input at different stages of the IFFT. We first show that part of the IFFT in the transmitter of an OFDM system can be shifted to the receiver, while a conventional one-tap frequency-domain equalizer is still applicable. We then propose two IFFT split schemes based on decimation-in-time and decimation-in-frequency IFFT algorithms to enable interference-free symbol recovery with simple linear equalizers. Applications of the proposed schemes in multiple access communications are investigated. Simulation results demonstrate the effectiveness of the proposed schemes in improving bit-error-rate performance.
Zhang, JA, Yang, T & Chen, Z 2013, 'Under-determined training and estimation for distributed transmit beamforming systems', IEEE Transactions on Wireless Communications, vol. 12, no. 4, pp. 1936-1946.View/Download from: Publisher's site
Distributed transmit beamforming (DTB) can significantly boost the signal-to-noise ratio (SNR) of a wireless communication system. To realize the benefits of DTB, generating and feeding back beamforming vector are very challenging tasks. Existing schemes have either enormous overhead or weak robustness in noisy channels. In this paper, we investigate the design of training sequences and beamforming vector estimators in DTB systems. We consider an under-determined case, where the length of training sequence N sent from each node is smaller than the number of source nodes M. We derive the optimal estimation of the beamforming vector that maximizes the beamforming gain and show that it can be well approximated as the linear minimum mean square error (LMMSE) estimator. Based on the LMMSE estimator, we investigate the optimal design of training sequences and propose efficient DTB schemes. We analytically show that these schemes can achieve approximately N times increased SNR in uncorrelated channels, and even higher gain in correlated ones. We also propose a concatenated training scheme which optimally combines the training signals over multiple frames to obtain the beamforming vector. Simulation results demonstrate that the proposed DTB schemes can yield significant gains even at very low SNRs, with total feedback bits much less than those required in the existing schemes.
Zhang, J & Huang, X 2012, 'Autocorrelation based coarse timing with differential normalization', IEEE Transactions on Wireless Communications, vol. 11, no. 2, pp. 526-530.View/Download from: Publisher's site
Two novel differential normalization factors, depending on the severity of carrier frequency offset, are proposed for autocorrelation based coarse timing scheme. Compared with the conventional normalization factor based on signal energy, they improve the robustness of the timing metric to signal-to-noise ratio (SNR), improve the mainlobe sharpness of the timing metric and reduce both missed detection and false alarm probabilities.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2012, 'Cross-view and multi-view gait recognitions based on view transformation model using multi-layer perceptron', Pattern Recognition Letters, vol. 33, pp. 882-889.View/Download from: Publisher's site
Gait has been shown to be an efficient biometric feature for human identification at a distance. However, performance of gait recognition can be affected by view variation. This leads to a consequent difficulty of cross-view gait recognition. A novel method is proposed to solve the above difficulty by using view transformation model (VTM). VTM is constructed based on regression processes by adopting multi-layer perceptron (MLP) as a regression tool. VTM estimates gait feature from one view using a well selected region of interest (ROI) on gait feature from another view. Thus, trained VTMs can normalize gait features from across views into the same view before gait similarity is measured. Moreover, this paper proposes a new multi-view gait recognition which estimates gait feature on one view using selected gait features from several other views. Extensive experimental results demonstrate that the proposed method significantly outperforms other baseline methods in literature for both cross-view and multi-view gait recognitions. In our experiments, particularly, average accuracies of 99%, 98% and 93% are achieved for multiple views gait recognition by using 5 cameras, 4 cameras and 3 cameras respectively.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2012, 'Gait Recognition Under Various Viewing Angles Based On Correlated Motion Regression', Ieee Transactions On Circuits And Systems For Video Technology, vol. 22, no. 6, pp. 966-980.View/Download from: Publisher's site
It is well recognized that gait is an important biometric feature to identify a person at a distance, e. g., in video surveillance application. However, in reality, change of viewing angle causes significant challenge for gait recognition. A novel approa
Thi, T, Cheng, L, Zhang, J, Wang, L & satoh, S 2012, 'Integrating local action elements for action analysis', Computer Vision and Image Understanding, vol. 116, no. 3, pp. 378-395.View/Download from: Publisher's site
In this paper, we propose a framework for human action analysis from video footage. A video action sequence in our perspective is a dynamic structure of sparse local spatialâtemporal patches termed action elements, so the problems of action analysis in video are carried out here based on the set of local characteristics as well as global shape of a prescribed action. We first detect a set of action elements that are the most compact entities of an action, then we extend the idea of Implicit Shape Model to space time, in order to properly integrate the spatial and temporal properties of these action elements. In particular, we consider two different recipes to construct action elements: one is to use a Sparse Bayesian Feature Classifier to choose action elements from all detected Spatial Temporal Interest Points, and is termed discriminative action elements. The other one detects affine invariant local features from the holistic Motion History Images, and picks up action elements according to their compactness scores, and is called generative action elements. Action elements detected from either way are then used to construct a voting space based on their local feature representations as well as their global configuration constraints. Our approach is evaluated in the two main contexts of current human action analysis challenges, action retrieval and action classification. Comprehensive experimental results show that our proposed framework marginally outperforms all existing state-of-the-arts techniques on a range of different datasets.
Xu, J, Wu, Q, Zhang, J & Tang, Z 2012, 'Fast and Accurate Human Detection Using a Cascade of Boosted MS-LBP Features', IEEE Signal Processing Letters, vol. 19, no. 10, pp. 676-679.View/Download from: Publisher's site
In this letter, a new scheme for generating local binary patterns (LBP) is presented. This Modi?ed Symmetric LBP (MS-LBP) feature takes advantage of LBP and gradient features. It is then applied into a boosted cascade framework for human detection. By combining MS-LBP with Haar-like feature into the boosted framework, the performances of heterogeneous features based detectors are evaluated for the best trade-off between accuracy and speed. Two feature training schemes, namely Single AdaBoost Training Scheme (SATS) and Dual AdaBoost Training Scheme (DATS) are proposed and compared. On the top of AdaBoost, two multidimensional feature projection methods are described. A comprehensive experiment is presented. Apart from obtaining higher detection accuracy, the detection speed based on DATS is 17 times faster than HOG method.
Thi, T, Cheng, L, Zhang, J, Wang, L & satoh, S 2012, 'Structured learning of local features for human action classification and localization', Image & Vision Computing, vol. 30, no. 1, pp. 1-14.View/Download from: Publisher's site
Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2012, 'Gait Recognition across Various Walking Speeds using Higher-order Shape Configuration based on Differential Composition Model', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 6, pp. 1654-1668.View/Download from: Publisher's site
Gait has been known as an effective biometric feature to identify a person at a distance. However, variation of walking speeds may lead to significant changes to human walking patterns. It causes many difficulties for gait recognition. A comprehensive analysis has been carried out in this paper to identify such effects. Based on the analysis, Procrustes shape analysis is adopted for gait signature description and relevant similarity measurement. To tackle the challenges raised by speed change, this paper proposes a higher order shape configuration for gait shape description, which deliberately conserves discriminative information in the gait signatures and is still able to tolerate the varying walking speed. Instead of simply measuring the similarity between two gaits by treating them as two unified objects, a differential composition model (DCM) is constructed. The DCM differentiates the different effects caused by walking speed changes on various human body parts. In the meantime, it also balances well the different discriminabilities of each body part on the overall gait similarity measurements. In this model, the Fisher discriminant ratio is adopted to calculate weights for each body part. Comprehensive experiments based on widely adopted gait databases demonstrate that our proposed method is efficient for cross-speed gait recognition and outperforms other state-of-the-art methods.
Zhang, J, Li, N, Yang, Q & Hu, C 2012, 'Self-adaptive Chaotic Differential Evolution Algorithm for Solving Constrained Circular Packing Problem', Journal of Computational Information Systems, vol. 8, no. 18, pp. 7747-7755.
Circles packing into a circular container with equilibrium constraint is a NP hard layout optimization problem. It has a broad application in engineering. This paper studies a two-dimensional constrained packing problem. Classical di?erential evolution for solving this problem is easy to fall into local optima. An adaptive chaotic di?erential evolution algorithm is proposed to improve the performance in this paper. The weighting parameters are dynamically adjusted by chaotic mutation in the searching procedure. The penalty factors of the ?tness function are modi?ed during iteration. To keep the diversity of the population, we limit the populations concentration. To enhance the local search capability, we adopt adaptive mutation of the global optimal individual. The improved algorithm can maintain the basic algorithms structure as well as extend the searching scales, and can hold the diversity of population as well as increase the searching accuracy. Furthermore, our improved algorithm can escape from premature and speed up the convergence. Numerical examples indicate the e?ectiveness and efficiency of the proposed algorithm.
Huang, X, Guo, YJ & Zhang, JA 2012, 'Sample rate conversion using B-spline interpolation for OFDM based software defined radios', IEEE Transactions on Communications, vol. 60, no. 8, pp. 2113-2122.View/Download from: Publisher's site
This paper proposes arbitrary ratio sample rate conversion (SRC) architectures and a simpler B-spline interpolation algorithm for orthogonal frequency division multiplexing (OFDM) based software defined radios (SDRs) with multiband and multi-channel capabilities. Different from conventional standalone digital front-end designs for SDRs, the proposed SRC architectures combine the B-spline interpolation with OFDM modulation and equalization for OFDM transmitter and receiver respectively. With this combined design, the passband droop introduced by the B-spline interpolation can be more efficiently compensated using frequency-domain pre-distortion, instead of conventional time-domain pre-filtering, and hence an overall system complexity reduction is achieved. A novel multi-period B-spline interpolation and re-sampling structure is then constructed, and an interpolation algorithm with lower implementation complexity than that of the conventional Farrow structure is further developed. The SRC performance is also analysed by deriving the signal-to-peak distortion ratio formulas which can be used as design tools for determining the required orders of B-splines in the OFDM transmitter and receiver respectively. Finally, SRC examples used in a high-speed multiband multi-channel microwave backhaul system are given and compared with conventional polyphase filterbank interpolation to demonstrate the practicality and performance of the proposed SRC architectures and interpolation algorithm
Hedley, M & Zhang, J 2012, 'Accurate Wireless Localization in Sports', Computer, vol. 45, pp. 64-70.
Luo, L, Zhang, JA & Davis, LM 2012, 'Space-Time Block Code and Spatial Multiplexing Design for Quadrature-OFDMA Systems', IEEE Transactions on Communications, vol. 60, pp. 3133-3142.View/Download from: Publisher's site
To alleviate the high peak-to-average power ratio (PAPR), high complexity in user terminal and sensitivity to carrier frequency offset (CFO) problems in current orthogonal frequency division multiple access (OFDMA) systems, a Quadrature OFDM (Q-OFDMA) system has been recently proposed in the single-input single-output environment. In this paper we study the realization of multi-input multi-output (MIMO) diversity- and multiplexing- oriented methods for Q-OFDMA systems. An Alamouti-like space-time block code (STBC) and simple detection for spatial multiplexing (SM) for Q-OFDMA systems are constructed, both zero forcing (ZF) and minimum mean square error (MMSE) equalizers are investigated. The proposed STBC is a full diversity scheme, which encodes in intermediate domain and decodes in frequency domain. Analytical and empirical results demonstrate that the Q-OFDMA systems can be implemented flexibly and efficiently in a MIMO framework, and the proposed scheme can be easily applied in OFDMA and Single-Carrier Frequency Division Multiple Access (SC-FDMA) by adjusting the parameters of Q-OFDMA.
This letter proposes simple algorithms for computing a phase shift term, which is introduced to greatly improve the accuracy of complex signal interpolation, applicable to any interpolator. Based on a cost function targeting at minimizing the phase transition between adjacent samples, the phase shift term can be easily computed using either signal statistics obtained in advance or known base samples in real time. Simulation results, exemplified for channel interpolation in OFDM systems, show that the proposed phase estimators can significantly improve the interpolation performance for various interpolators such as spline, low-pass filter, and linear and cubic polynomial interpolators, compared to the case without phase shifting.
Zhang, JA, Huang, X, Cantoni, A & Guo, YJ 2012, 'Sidelobe suppression with orthogonal projection for multicarrier systems', IEEE Transactions on Communications, vol. 60, no. 2, pp. 589-599.View/Download from: Publisher's site
Sidelobe suppression, or out-of-band emission reduction, in multicarrier systems is conventionally achieved via time-domain windowing which is spectrum inefficient. Although some sidelobe cancellation and signal predistortion techniques have been proposed for spectrum shaping, they are generally not well balanced between complexity and suppression performance. In this paper, an efficient and low-complexity sidelobe suppression with orthogonal projection (SSOP) scheme is proposed. The SSOP scheme uses an orthogonal projection matrix for sidelobe suppression, and adopts as few as one reserved subcarrier for recovering the distorted signal in the receiver. Unlike most known approaches, the SSOP scheme requires multiplications as few as the number of subcarriers in the band, and enables straightforward selection of parameters. Analytical and simulation results show that more than 50dB sidelobe suppression can be readily achieved with only a slight degradation in receiver performance.
Paisitkriangkrai, S, Mei, T, Zhang, J & Hua, X-S 2011, 'Clip-based hierarchical representation for near-duplicate video detection', INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, vol. 88, no. 18, pp. 3817-3833.View/Download from: Publisher's site
Shen, C, Paisitkriangkrai, S & Zhang, J 2011, 'Efficiently Learning a Detection Cascade with Sparse Eigenvectors', IEEE Transactions On Image Processing, vol. 19, no. 7, pp. 22-35.View/Download from: Publisher's site
Real-time object detection has many computer vision applications. Since Viola and Jones proposed the first real-time AdaBoost based face detection system, much effort has been spent on improving the boosting method. In this work, we first show that feature selection methods other than boosting can also be used for training an efficient object detector. In particular, we introduce greedy sparse linear discriminant analysis (GSLDA) for its conceptual simplicity and computational efficiency; and slightly better detection performance is achieved compared with . Moreover, we propose a new technique, termed boosted greedy sparse linear discriminant analysis (BGSLDA), to efficiently train a detection cascade. BGSLDA exploits the sample reweighting property of boosting and the class-separability criterion of GSLDA. Experiments in the domain of highly skewed data distributions (e.g., face detection) demonstrate that classifiers trained with the proposed BGSLDA outperforms AdaBoost and its variants. This finding provides a significant opportunity to argue that AdaBoost and similar approaches are not the only methods that can achieve high detection results for real-time object detection.
Paisitkriangkrai, S, Shen, C & Zhang, J 2011, 'Incremental Training of a Detector Using Online Sparse Eigendecomposition', IEEE Transactions On Image Processing, vol. 20, no. 1, pp. 213-226.View/Download from: Publisher's site
The ability to efficiently and accurately detect objects plays a very crucial role for many computer vision tasks. Recently, offline object detectors have shown a tremendous success. However, one major drawback of offline techniques is that a complete set of training data has to be collected beforehand. In addition, once learned, an offline detector cannot make use of newly arriving data. To alleviate these drawbacks, online learning has been adopted with the following objectives: 1) the technique should be computationally and storage efficient; 2) the updated classifier must maintain its high classification accuracy. In this paper, we propose an effective and efficient framework for learning an adaptive online greedy sparse linear discriminant analysis model. Unlike many existing online boosting detectors, which usually apply exponential or logistic loss, our online algorithm makes use of linear discriminant analysisâ learning criterion that not only aims to maximize the class-separation criterion but also incorporates the asymmetrical property of training data distributions. We provide a better alternative for online boosting algorithms in the context of training a visual object detector.We demonstrate the robustness and efficiency of our methods on handwritten digit and face data sets. Our results confirm that object detection tasks benefit significantly when trained in an online manner.
Smith, DB, Hanlen, LW, Zhang, JA, Miniutti, D, Rodda, D & Gilbert, B 2011, 'First-and second-order statistical characterizations of the dynamic body area propagation channel of various bandwidths', Annals of Telecommunications, vol. 66, no. 3-4, pp. 187-203.View/Download from: Publisher's site
Comprehensive statistical characterizations
of the dynamic narrowband on-body area and on-body
to off-body area channels are presented. These characterizations
are based on real-time measurements of
the time domain channel response at carrier frequencies
near the 900- and 2,400-MHz industrial, scientific,
and medical bands and at a carrier frequency near
the 402-MHz medical implant communications band.
We consider varying amounts of body movement, numerous
transmit–receive pair locations on the human
body, and various bandwidths. We also consider long
periods, i.e., hours of everyday activity (predominantly indoor scenarios), for on-body channel characterization.
Various adult human test subjects are used. It is
shown, by applying the Akaike information criterion,
that the Weibull and Gamma distributions generally fit
agglomerates of received signal amplitude data and that
in various individual cases the Lognormal distribution
provides a good fit. We also characterize fade duration
and fade depth with direct matching to second-order
temporal statistics. These first- and second-order characterizations
have important utility in the design and
evaluation of body area communications systems.
Chen, Y, Zhang, J & Jayalath, ADS 2010, 'Estimation and compensation of clipping noise in OFDMA systems', Wireless Communications, IEEE Transactions on, vol. 9, pp. 523-527.
Lu, S, Zhang, J & Feng, DD 2009, 'DETECTING GHOST AND LEFT OBJECTS IN SURVEILLANCE VIDEO', INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, vol. 23, no. 7, pp. 1503-1525.View/Download from: Publisher's site
Husain, SI, Yuan, J, Zhang, J & Martin, RK 2009, 'Time domain equalizer design using bit error rate minimization for UWB systems', EURASIP Journal on Wireless Communications and Networking, vol. 2009, pp. 9-9.
Luo, L, Zhang, J & Shi, Z 2009, 'Advanced receiver design for quadrature OFDMA systems', EURASIP Journal on Wireless Communications and Networking, vol. 2009, pp. 10-10.
Smith, DB, Zhang, JA, Hanlen, LW, Miniutti, D, Rodda, D & Gilbert, B 2009, 'Temporal correlation of dynamic on-body area radio channel', Electronics letters, vol. 45, pp. 1212-1213.
Ying, C, Zhang, JA & ADS, J 2009, 'Low-complexity estimation of CFO and frequency independent I/Q mismatch for OFDM systems', EURASIP Journal on Wireless Communications and Networking, vol. 2009.
Zhang, J, Luo, L & Shi, Z 2009, 'Quadrature OFDMA systems based on layered FFT structure', Communications, IEEE Transactions on, vol. 57, pp. 850-860.
Zhang, J, Smith, DB, Hanlen, LW, Miniutti, D, Rodda, D & Gilbert, B 2009, 'Stability of narrowband dynamic body area channel', Antennas and Wireless Propagation Letters, IEEE, vol. 8, pp. 53-56.
Paisitkriangkrai, S, Shen, C & Zhang, J 2008, 'Performance evaluation of local features in human classification and detection', IET Computer Vision, vol. 2, no. 4, pp. 236-246.View/Download from: Publisher's site
Detecting pedestrians accurately is the first fundamental step for many computer vision applications such as video surveillance, smart vehicles, intersection traffic analysis and so on. The authors present an experimental study on pedestrian detection us
Paisitkriangkrai, S, Shen, C & Zhang, J 2008, 'Fast pedestrian detection using a cascade of boosted covariance features', IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 8, pp. 1140-1151.View/Download from: Publisher's site
Efficiently and accurately detecting pedestrians plays a very important role in many computer vision applications such as video surveillance and smart cars. In order to find the right feature for this task, we first present a comprehensive experimental s
Chen, Y, Zhang, JA & Jayalath, D 2008, 'Clipping noise compensation for OFDM systems', Electronics Letters, vol. 44, pp. 1490-1491.
Zhang, J, Kennedy, RA & Abhayapala, TD 2008, 'Reduced-rank shift-invariant technique and its application for synchronization and channel identification in UWB systems', EURASIP Journal on Wireless Communications and Networking, vol. 2008, pp. 38-38.
Lu, S, Zhang, J & Dagan, F 2007, 'Detecting unattended packages through human activity recognition and object association', Pattern Recognition, vol. 40, no. 8, pp. 2173-2184.View/Download from: Publisher's site
This paper provides a novel approach to detect unattended packages in public venues. Different from previous works on this topic which are mostly limited to detecting static objects where no human is nearby, we provide a solution which can detect an unat
Husain, SI, Yuan, J & Zhang, J 2007, 'Modified channel shortening receiver based on MSSNR algorithm for UWB channels', Electronics Letters, vol. 43, pp. 535-537.
Jian, Z, Jayalath, ADS & Chen, Y 2007, 'Asymmetric OFDM systems based on layered FFT structure', Signal Processing Letters, IEEE, vol. 14, pp. 812-815.
Zhang, J, Kennedy, RA & Abhayapala, TD 2005, 'Cramér-Rao lower bounds for the synchronization of UWB signals', EURASIP Journal on Applied Signal Processing, vol. 2005, pp. 426-438.
Zhang, J, Arnold, J & Frater, M 2000, 'A cell-loss concealment technique for MPEG-2 coded video', IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 4, pp. 659-665.View/Download from: Publisher's site
Audio-visual and other multimedia services are seen as important sources of traffic for future telecommunication networks, including wireless networks. A major drawback with some wireless networks is that they introduce a significant number of transmissi
Arnold, J, Frater, M & Zhang, J 1999, 'Error resilience in the MPEG-2 video coding standard for cell based networks - a review', Signal Processing: Image Communication, vol. 14, no. 6, pp. 607-633.View/Download from: Publisher's site
The MPEG-2 video coding standard is being extensively used worldwide for the provision of digital video services. Many of these applications involve the transport of MPEG-2 video over cell-based (or packet) networks. Examples include the broadband integr
Frater, M, Arnold, J & Zhang, J 1999, 'MPEG 2 video error resilience experiments: The importance considering the impact of the systems layer', Signal Processing: Image Communication, vol. 14, no. 3, pp. 269-275.View/Download from: Publisher's site
With increasing interest in the transport of video traffic over lossy networks, several techniques for improving the quality of video services in the presence of loss have been proposed, often using the MPEG 2 video coding algorithm as a basis. Many of t
Zhang, J, Frater, M, Arnold, J & Percival, T 1997, 'MPEG 2 video services for wireless ATM networks', IEEE Journal on Selected Areas in Communications, vol. 15, no. 1, pp. 119-127.View/Download from: Publisher's site
Audio-visual and other multimedia services are seen as an important source of traffic for future telecommunications networks, including wireless networks. In this paper, we examine the impact of the properties of a 50 Mb/s asynchronous transfer mode (ATM
Zhang, J 2006, 'Error Resilience for Video Coding Service' in Wu, HR & Rao, KR (eds), Digital Video Image Quality and Perceptual Coding, CRC, Taylor & Francis group, USA, pp. 503-527.
This is part of my thesis
Li, Z, Zou, Y, Wang, G & Zhang, J 2019, 'Scale-Informed Density Estimation for Dense Crowd Counting', 2019 IEEE International Conference on Visual Communications and Image Processing, VCIP 2019, IEEE Visual Communications and Image Processing, IEEE, Sydney, Australia.View/Download from: Publisher's site
© 2019 IEEE. Dense crowd counting (DCC) remains challenging due to the scale variation and occlusion. Several deep learning based DCC methods have achieved the state-of-Arts on public datasets. However, experimental results show that the scale variation is still the main factor to hinder the DCC performance. In this work, we propose a scale-informed dense crowd counting method focusing on handling the negative effect caused by scale variation. More specifically, we propose a method to obtain the scale information of the patch from its GT density maps via estimating the mean value of the Gaussian kernel width and then a scale-classifier is deigned and trained accordingly. Moreover, with the estimated scale information, two sub-nets are dedicatedly deigned to learn the density maps for large-scale head patch and small-scale patch separately. Experimental results validate the performance of our proposed method which achieves the best performance on three dense crowd datasets.
Li, L, Liu, Z, Zhang, J & Zhou, X 2019, 'Learn Image Object Co-segmentation with Multi-scale Feature Fusion', 2019 IEEE International Conference on Visual Communications and Image Processing, VCIP 2019, IEEE Visual Communications and Image Processing, IEEE, Sydney, Australia.View/Download from: Publisher's site
© 2019 IEEE. Image object co-segmentation aims to segment common objects in a group of images. This paper proposes a novel neural network, which extracts multi-scale convolutional features at multiple layers via a modified VGG network and fuses them both within and across images as the intra-image and the inter-image features. Then these two kinds of features are further fused at each scale as the multi-scale co-features of common objects, and finally the multi-scale co-features are summed up and upsampled to obtain the co-segmentation results. To simplify the network and reduce the rapidly rising resource cost along with the inputs, the reduced input size, less downsampling and dilation convolution are adopted in the proposed model. Experimental results on the public dataset demonstrate that the proposed model achieves a comparable performance to the state-of-The-Art co-segmentation methods while the computation cost has been effectively reduced.
Zhang, L, Xu, J, Zhang, J & Gong, Y 2018, 'Information Enhancement for Travelogues via a Hybrid Clustering Model', 2018 Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing: Techniques and Applications, IEEE, Canberra, ACT, Australia, pp. 1-8.View/Download from: Publisher's site
Travelogues consist of textual information shared by tourists through web forums or other social media which often lack illustrations (images). In image sharing websites like Flicker, users can post images with rich textual information: `title', `tag' and `description'. The topics of travelogues usually revolve around beautiful sceneries. Corresponding landscape images recommended to these travelogues can enhance the vividness of reading. However, it is difficult to fuse such information because the text attached to each image has diverse meanings/views. In this paper, we propose an unsupervised Hybrid Multiple Kernel K-means (HMKKM) model to link images and travelogues through multiple views. Multi-view matrices are built to reveal the correlations between several respects. For further improving the performance, we add a regularisation based on textual similarity. To evaluate the effectiveness of the proposed method, a dataset is constructed from TripAdvisor and Flicker to find the related images for each travelogue. Experiment results demonstrate the superiority of the proposed model by comparison with other baselines.
Wang, Y, Shen, J & Zhang, J 2018, 'Deep Bi-Dense Networks for Image Super-Resolution', 2018 Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing: Techniques and Applications, IEEE, Canberra.View/Download from: Publisher's site
This paper proposes Deep Bi-Dense Networks (DBD-N) for single image super-resolution. Our approach extends previous intra-block dense connection approaches by including novel inter-block dense connections. In this way, feature information propagates from a single dense block to all subsequent blocks, instead of to a single successor. To build a DBDN, we firstly construct intra-dense blocks, which extract and compress abundant local features via densely connected convolutional layers and compression layers for further feature learning. Then, we use an inter-block dense net to connect intra-dense blocks, which allow each intra-dense block propagates its own local features to all successors. Additionally, our bi-dense construction connects each block to the output, alleviating the vanishing gradient problems in training. The evaluation of our proposed method on five benchmark data sets shows that our DBDN outperforms the state of the art in SISR with a moderate number of network parameters.
Shen, J, Wang, Y & Zhang, J 2018, 'Memory optimized Deep Dense Network for Image Super-resolution', 2018 Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing: Techniques and Applications, IEEE, Canberra, Australia.View/Download from: Publisher's site
CNN methods for image super-resolution consume a large number of training-time memory, due to the feature size will not decrease as the network goes deeper. To reduce the memory consumption during training, we propose a memory optimized deep dense network for image super-resolution. We first reduce redundant features learning, by rationally designing the skip connection and dense connection in the network. Then we adopt share memory allocations to store concatenated features and Batch Normalization intermediate feature maps. The memory optimized network consumes less memory than normal dense network. We also evaluate our proposed architecture on highly competitive super-resolution benchmark datasets. Our deep dense network outperforms some existing methods, and requires relatively less computation.
Yao, L, Kusakunniran, W, Wu, Q, Zhang, J & Tang, Z 2018, 'Robust CNN-based Gait Verification and Identification using Skeleton Gait Energy Image', 2018 Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing: Techniques and Applications, IEEE, Canberra, Australia.View/Download from: Publisher's site
As a kind of behavioral biometric feature, gait has been widely applied for human verification and identification. Approaches to gait recognition can be classified into two categories: model-free approaches and model-based approaches. Model-free approaches are sensitive to appearance changes. For model-based approaches, it is difficult to extract the reliable body models from gait sequences. In this paper, based on the robust skeleton points produced from a two-branch multi-stage CNN network, a novel model-based feature, Skeleton Gait Energy Image (SGEI), has been proposed. Relevant experimental performances indicate that SGEI is more robust to the cloth changes. Another contribution is that two different CNN-based architectures have been separately proposed for gait verification and gait identification. Both these two architectures have been evaluated on the datasets. They have presented satisfying performances and increased the robustness for gait recognition in the unconstrained environments with view variances and cloth variances.
Huang, H, Xu, J, Zhang, J, Wu, Q & Kirsch, C 2018, 'Railway Infrastructure Defects Recognition using Fine-grained Deep Convolutional Neural Networks', 2018 Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing: Techniques and Applications, IEEE, Canberra, Australia.View/Download from: Publisher's site
Railway power supply infrastructure is one of the most important components of railway transportation. As the key step of railway maintenance system, power supply infrastructure defects recognition plays a vital role in the whole defects inspection sub-system. Traditional defects recognition task is performed manually, which is time-consuming and high-labor costing. Inspired by the great success of deep neural networks in dealing with different vision tasks, this paper presents an end-to-end deep network to solve the railway infrastructure defects detection problem. More importantly, this paper is the first work that adopts the idea of deep fine-grained classification to do railway defects detection. We propose a new bilinear deep network named Spatial Transformer And Bilinear Low-Rank (STABLR) model and apply it to railway infrastructure defects detection. The experimental results demonstrate that the proposed method outperforms both hand-craft features based machine learning methods and classic deep neural network methods.
Zhao, M, Zhang, J, Zhang, C & Zhang, W 2018, 'Towards Locally Consistent Object Counting with Constrained Multi-stage Convolutional Neural Networks', ACCV 2018: Computer Vision, Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 247-261.View/Download from: Publisher's site
High-density object counting in surveillance scenes is challenging mainly due to the drastic variation of object scales. The prevalence of deep learning has largely boosted the object counting accuracy on several benchmark datasets. However, does the global counts really count? Armed with this question we dive into the predicted density map whose summation over the whole regions reports the global counts for more in-depth analysis. We observe that the object density map generated by most existing methods usually lacks of local consistency, i.e., counting errors in local regions exist unexpectedly even though the global count seems to well match with the ground-truth. Towards this problem, in this paper we propose a constrained multi-stage Convolutional Neural Networks (CNNs) to jointly pursue locally consistent density map from two aspects. Different from most existing methods that mainly rely on the multi-column architectures of plain CNNs, we exploit a stacking formulation of plain CNNs. Benefited from the internal multi-stage learning process, the feature map could be repeatedly refined, allowing the density map to approach the ground-truth density distribution. For further refinement of the density map, we also propose a grid loss function. With finer local-region-based supervisions, the underlying model is constrained to generate locally consistent density values to minimize the training errors considering both the global and local counts accuracy. Experiments on two widely-tested object counting benchmarks with overall significant results compared with state-of-the-art methods demonstrate the effectiveness of our approach.
Li, Q, Wu, Q, Zhu, C, Zhang, J & Zhao, W 2019, 'Unsupervised User Behavior Representation for Fraud Review Detection with Cold-Start Problem', Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, China, pp. 222-236.View/Download from: Publisher's site
Ye, P, Wang, Y, Xia, Y, An, P & Zhang, J 2018, 'Enhanced saliency prediction via free energy principle', Digital TV and Multimedia Communication, International Forum on Digital TV and Wireless Multimedia Communications, Springer, Shanghai, China, pp. 31-44.View/Download from: Publisher's site
© Springer Nature Singapore Pte Ltd 2019. Saliency prediction can be treated as the activity of human brain. Most saliency prediction methods employ features to determine the contrast of an image area relative to its surroundings. However, only few studies have investigated how human brain activities affect saliency prediction. In this paper, we propose an enhanced saliency prediction model via free energy principle. A new AR-RTV model, which combines the relative total variation (RTV) structure extractor with autoregressive (AR) operator, is firstly utilized to decompose an original image into the predictable component and the surprise component. Then, we adopt the local entropy of 'surprise' map and the gradient magnitude (GM) map to estimate the component saliency maps-sub-saliency respectively. Finally, inspired by visual error sensitivity, a saliency augment operator is designed to enhance the final saliency combined two sub-saliency maps. Experimental results on two benchmark databases demonstrate the superior performance of the proposed method compared to eleven state-of-the-art algorithms.
Li, Q, Wu, Q, Zhu, C & Zhang, J 2019, 'Bi-level Masked Multi-scale CNN-RNN Networks for Short Text Representation', 2019 International Conference on Document Analysis and Recognition (ICDAR), International Conference on Document Analysis and Recognition, IEEE, Sydney, Australia.View/Download from: Publisher's site
Representing short text is becoming extremely important for a variety of valuable applications. However, representing short text is critical yet challenging because it involves lots of informal words and typos (i.e. the noise problem) but only a few vocabularies in each text (i.e. the sparsity problem). Most of the existing work on representing short text relies on noise recognition and sparsity expansion. However, the noises in short text are with various forms and changing fast, but, most of the current methods may fail to adaptively recognize the noise. Also, it is hard to explicitly expand a sparse text to a high-quality dense text. In this paper, we tackle the noise and sparsity problems in short text representation by learning multi-grain noise-tolerant patterns and then embedding the most significant patterns in a text as its representation. To achieve this goal, we propose a bi-level multi-scale masked CNN-RNN network to embed the most significant multi-grain noise-tolerant relations among words and characters in a text into a dense vector space. Comprehensive experiments on five large real-world data sets demonstrate our method significantly outperforms the state-of-the-art competitors.
Huang, X, Fan, L, Wu, Q, Zhang, J & Yuan, C 2019, 'Fast registration for cross-source point clouds by using weak regional affinity and pixel-wise refinement', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Shanghai, China, pp. 1552-1557.View/Download from: Publisher's site
© 2019 IEEE. Many types of 3D acquisition sensors have emerged in recent years and point cloud has been widely used in many areas. Accurate and fast registration of cross-source 3D point clouds from different sensors is an emerged research problem in computer vision. This problem is extremely challenging because cross-source point clouds contain a mixture of various variances, such as density, partial overlap, large noise and outliers, viewpoint changing. In this paper, an algorithm is proposed to align cross-source point clouds with both high accuracy and high efficiency. There are two main contributions: firstly, two components, the weak region affinity and pixel-wise refinement, are proposed to maintain the global and local information of 3D point clouds. Then, these two components are integrated into an iterative tensor-based registration algorithm to solve the cross-source point cloud registration problem. We conduct experiments on a synthetic cross-source benchmark dataset and real cross-source datasets. Comparison with six state-of-the-art methods, the proposed method obtains both higher efficiency and accuracy.
Li, Z, Gong, Y, Zhang, J, Yi, J, Wu, Q & Kirsch, C 2019, 'Sample adaptive multiple kernel learning for failure prediction of railway points', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, Anchorage AK USA, pp. 2848-2856.View/Download from: Publisher's site
© 2019 Association for Computing Machinery. Railway points are among the key components of railway infrastructure. As a part of signal equipment, points control the routes of trains at railway junctions, having a significant impact on the reliability, capacity, and punctuality of rail transport. Meanwhile, they are also one of the most fragile parts in railway systems. Points failures cause a large portion of railway incidents. Traditionally, maintenance of points is based on a fixed time interval or raised after the equipment failures. Instead, it would be of great value if we could forecast points' failures and take action beforehand, min-imising any negative effect. To date, most of the existing prediction methods are either lab-based or relying on specially installed sensors which makes them infeasible for large-scale implementation. Besides, they often use data from only one source. We, therefore, explore a new way that integrates multi-source data which are ready to hand to fulfil this task. We conducted our case study based on Sydney Trains rail network which is an extensive network of passenger and freight railways. Unfortunately, the real-world data are usually incomplete due to various reasons, e.g., faults in the database, operational errors or transmission faults. Besides, railway points differ in their locations, types and some other properties, which means it is hard to use a unified model to predict their failures. Aiming at this challenging task, we firstly constructed a dataset from multiple sources and selected key features with the help of domain experts. In this paper, we formulate our prediction task as a multiple kernel learning problem with missing kernels. We present a robust multiple kernel learning algorithm for predicting points failures. Our model takes into account the missing pattern of data as well as the inherent variance on different sets of railway points. Extensive experiments demonstrate the superiority of our algorithm compare...
Huang, H, Zheng, J, Zhang, J, Wu, Q & Xu, J 2019, 'Compare more nuanced: Pairwise alignment bilinear network for few-shot fine-grained learning', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Shanghai, China, pp. 91-96.View/Download from: Publisher's site
© 2019 IEEE. The recognition ability of human beings is developed in a progressive way. Usually, children learn to discriminate various objects from coarse to fine-grained with limited supervision. Inspired by this learning process, we propose a simple yet effective model for the Few-Shot Fine-Grained (FSFG) recognition, which tries to tackle the challenging fine-grained recognition task using meta-learning. The proposed method, named Pairwise Alignment Bilinear Network (PABN), is an end-to-end deep neural network. Unlike traditional deep bilinear networks for fine-grained classification, which adopt the self-bilinear pooling to capture the subtle features of images, the proposed model uses a novel pairwise bilinear pooling to compare the nuanced differences between base images and query images for learning a deep distance metric. In order to match base image features with query image features, we design feature alignment losses before the proposed pairwise bilinear pooling. Experiment results on four fine-grained classification datasets and one generic few-shot dataset demonstrate that the proposed model outperforms both the state-of-the-art few-shot fine-grained and general few-shot methods.
Du, A, Huang, X, Zhang, J, Yao, L & Wu, Q 2019, 'Kpsnet: Keypoint Detection and Feature Extraction for Point Cloud Registration', Proceedings - International Conference on Image Processing, ICIP, IEEE International Conference on Image Processing, IEEE, Taipei, Taiwan, pp. 2576-2580.View/Download from: Publisher's site
© 2019 IEEE. This paper presents the KPSNet, a KeyPoint Siamese Network to simultaneously learn task-desirable keypoint detector and feature extractor. The keypoint detector is optimized to predict a score vector, which signifies the probability of each candidate being a keypoint. The feature extractor is optimized to learn robust features of keypoints by exploiting the correspondence between the keypoints generated from two inputs, respectively. For training, the KPSNet does not require to manually annotate keypoints and local patches pairwise. Instead, we design an alignment module to establish the correspondence between the two inputs and generate positive and negative samples on-the-fly. Therefore, our method can be easily extended to new scenes. We test the proposed method on the open-source benchmark and experiments show the validity of our method.
Zhao, M, Zhang, J, Zhang, C & Zhang, W 2019, 'Leveraging Heterogeneous Auxiliary Tasks to Assist Crowd Counting', 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Long Beach, CA, pp. 12728-12737.View/Download from: Publisher's site
Zhang, J, Wu, Q, Zhang, J, Shen, C & Lu, J 2019, 'Mind Your Neighbours: Image Annotation With Metadata Neighbourhood Graph Co-Attention Networks', 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE.View/Download from: Publisher's site
Zhang, P, Wu, Q, Xu, J & Jian, Z 2018, 'Long-Term Person Re-identification Using True Motion from Videos', Winter Conference on Applications of Computer Vision, IEEE, Lake Tahoe, NV, USA, pp. 494-502.View/Download from: Publisher's site
Zhang, J, Wu, Q, Zhang, J, Shen, C & Lu, J 2018, 'Kill Two Birds with One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement', The Thirty-Second AAAI Conference on Artificial Intelligence, The Thirty-Second AAAI Conference on Artificial Intelligence, AAAI Press, USA, pp. 7550-7557.
The number of social images has exploded by the wide adoption of social
networks, and people like to share their comments about them. These comments
can be a description of the image, or some objects, attributes, scenes in it,
which are normally used as the user-provided tags. However, it is well-known
that user-provided tags are incomplete and imprecise to some extent. Directly
using them can damage the performance of related applications, such as the
image annotation and retrieval. In this paper, we propose to learn an image
annotation model and refine the user-provided tags simultaneously in a
weakly-supervised manner. The deep neural network is utilized as the image
feature learning and backbone annotation model, while visual consistency,
semantic dependency, and user-error sparsity are introduced as the constraints
at the batch level to alleviate the tag noise. Therefore, our model is highly
flexible and stable to handle large-scale image sets. Experimental results on
two benchmark datasets indicate that our proposed model achieves the best
performance compared to the state-of-the-art methods.
Guo, D, Zhao, W, Cui, Y, Wang, Z, Chen, S & Zhang, J 2018, 'Siamese network based features fusion for adaptive visual tracking', PRICAI 2018: Trends in Artificial Intelligence 15th Pacific Rim International Conference on Artificial Intelligence Nanjing, China, August 28–31, 2018 Proceedings (LNAI 11012), International Conference on Artificial Intelligence, Springer, China, pp. 759-771.View/Download from: Publisher's site
© Springer Nature Switzerland AG 2018. Visual object tracking is a popular but challenging problem in computer vision. The main challenge is the lack of priori knowledge of the tracking target, which may be only supervised of a bounding box given in the first frame. Besides, the tracking suffers from many influences as scale variations, deformations, partial occlusions and motion blur, etc. To solve such a challenging problem, a suitable tracking framework is demanded to adopt different tracking scenes. This paper presents a novel approach for robust visual object tracking by multiple features fusion in the Siamese Network. Hand-crafted appearance features and CNN features are combined to mutually compensate for their shortages and enhance the advantages. The proposed network is processed as follows. Firstly, different features are extracted from the tracking frames. Secondly, the extracted features are employed via Correlation Filter respectively to learn corresponding templates, which are used to generate response maps respectively. And finally, the multiple response maps are fused to get a better response map, which can help to locate the target location more accurately. Comprehensive experiments are conducted on three benchmarks: Temple-Color, OTB50 and UAV123. Experimental results demonstrate that the proposed approach achieves state-of-the-art performance on these benchmarks.
Zhang, J, Wu, Q, Shen, C, Zhang, J, Lu, J & van den Hengel, A 2018, 'Goal-Oriented Visual Question Generation via Intermediate Rewards', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), European Conference on Computer Vision, Springer Link, Munich, Germany, pp. 189-204.View/Download from: Publisher's site
© 2018, Springer Nature Switzerland AG. Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge. Towards this end, we propose a Deep Reinforcement Learning framework based on three new intermediate rewards, namely goal-achieved, progressive and informativeness that encourage the generation of succinct questions, which in turn uncover valuable information towards the overall goal. By directly optimizing for questions that work quickly towards fulfilling the overall goal, we avoid the tendency of existing methods to generate long series of inane queries that add little value. We evaluate our model on the GuessWhat?! dataset and show that the resulting questions can help a standard 'Guesser' identify a specific object in an image at a much higher success rate.
Yao, Y, Zhang, J, Shen, F, Yang, W, Hua, XS & Tang, Z 2018, 'Extracting privileged information from untagged corpora for classifier learning', IJCAI International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 1085-1091.View/Download from: Publisher's site
© 2018 International Joint Conferences on Artificial Intelligence. All right reserved. The performance of data-driven learning approaches is often unsatisfactory when the training data is inadequate either in quantity or quality. Manually labeled privileged information (PI), e.g., attributes, tags or properties, is usually incorporated to improve classifier learning. However, the process of manually labeling is time-consuming and labor-intensive. To address this issue, we propose to enhance classifier learning by extracting PI from untagged corpora, which can effectively eliminate the dependency on manually labeled data. In detail, we treat each selected PI as a subcategory and learn one classifier for per subcategory independently. The classifiers for all subcategories are then integrated together to form a more powerful category classifier. Particularly, we propose a new instance-level multi-instance learning (MIL) model to simultaneously select a subset of training images from each subcategory and learn the optimal classifiers based on the selected images. Extensive experiments demonstrate the superiority of our approach.
Yao, Y, Zhang, J, Fumin, S, Wankou, Y, Pu, H & Zhenmin, T 2018, 'Discovering and Distinguishing Multiple Visual Senses for Polysemous Words', https://aaai.org/Library/AAAI/aaai18contents.php, AAAI Conference on Artificial Intelligence, The AAAI Press, New Orleans, USA, pp. 523-530.
To reduce the dependence on labeled data, there have been increasing
research efforts on learning visual classifiers by exploiting web images. One issue that limits their performance is the problem of polysemy. To solve this problem, in this work, we present a novel framework that solves the problem of polysemy by allowing sense-specific diversity in search results. Specifically, we first discover a list of possible semantic senses to retrieve sense-specific images. Then we merge visual similar semantic senses and prune noises by using the retrieved images. Finally, we train a visual classifier for each selected semantic sense and use the learned sense-specific classifiers to distinguish multiple visual senses. Extensive experiments on classifying images into sense-specific categories
and re-ranking search results demonstrate the superiority of our proposed approach.
Li, Z, Zhang, J, Wu, Q & Kirsch, C 2018, 'Field-regularised factorization machines for mining the maintenance logs of equipment', AI 2018: AI 2018: Advances in Artificial Intelligence 31st Australasian Joint Conference Wellington, New Zealand, December 11–14, 2018 Proceedings (LNAI 11320), Australasian Joint Conference on Artificial Intelligence, Springer, New Zealand, pp. 172-183.View/Download from: Publisher's site
© Springer Nature Switzerland AG 2018. Failure prediction is very important for railway infrastructure. Traditionally, data from various sensors are collected for this task. Value of maintenance logs is often neglected. Maintenance records of equipment usually indicate equipment status. They could be valuable for prediction of equipment faults. In this paper, we propose Field-regularised Factorization Machines (FrFMs) to predict failures of railway points with maintenance logs. Factorization Machine (FM) and its variants are state-of-the-art algorithms designed for sparse data. They are widely used in click-through rate prediction and recommendation systems. Categorical variables are converted to binary features through one-hot encoding and then fed into these models. However, field information is ignored in this process. We propose Field-regularised Factorization Machines to incorporate such valuable information. Experiments on data set from railway maintenance logs and another public data set show the effectiveness of our methods.
Zhang, J, Zhang, J, Wu, Q, Wu, Q, Xu, J, Lu, J, Phua, R, Curr, K & Tang, Z 2017, 'Historical image annotation by exploring the tag relevance', Proceedings - 4th Asian Conference on Pattern Recognition, ACPR 2017, IAPR Asian Conference on Pattern Recognition, IEEE, Nanjing, China, pp. 646-651.View/Download from: Publisher's site
© 2017 IEEE. Historical images usually contain enormous historical research value and are highly related to the history objects, events and background stories etc. Therefore, annotating these images always requires selecting tags within a large set. In this paper, we propose to annotate historical images by exploring the tag relevance. We measure the tag relevance from three different perspectives, including its visual relevance, its dependencies with other tags and its relationship with location based meta-data. By using tag relevance as guidance, we generate three tag sub-sets and use them to fulfill the annotation. Experimental results on the benchmark dataset indicate the significance of exploring the tag relevance by comparing with the baseline experiments.
Gong, Y, Li, Z, Zhang, J, Liu, W, Zheng, Y & Kirsch, C 2018, 'Network-wide Crowd Flow Prediction of Sydney Trains via customized Online Non-negative Matrix Factorization', ACM International Conference on Information and Knowledge Managemen, ACM DL, Turin, Italy.
Zhao, M, Zhang, J, Porikli, F, Zhang, C & Zhang, W 2017, 'Learning a perspective-embedded deconvolution network for crowd counting', 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE International Conference on Multimedia and Expo, IEEE, Hong Kong, China, pp. 403-408.View/Download from: Publisher's site
We present a novel deep learning framework for crowd counting
by learning a perspective-embedded deconvolution network.
Perspective is an inherent property of most surveillance
scenes. Unlike the traditional approaches that exploit the perspective
as a separate normalization, we propose to fuse the
perspective into a deconvolution network, aiming to obtain a
robust, accurate and consistent crowd density map. Through
layer-wise fusion, we merge perspective maps at different resolutions
into the deconvolution network. With the injection of
perspective, our network is driven to learn to combine the underlying
scene geometric constraints adaptively, thus enabling
an accurate interpretation from high-level feature maps to the
pixel-wise crowd density map. In addition, our network allows
generating density map for arbitrary-sized input in an
end-to-end fashion. The proposed method achieves competitive
result on the WorldExpo2010 crowd dataset.
Zhao, M, Zhang, J, Porikli, F, Zhang, C & Zhang, W 2017, 'Learning a perspective-embedded deconvolution network for crowd counting', IEEE International Conference on Multimedia and Expo (ICME) 2017.
Xin, JN, Du, X & Zhang, J 2017, 'Deep learning for robust outdoor vehicle visual tracking', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Hong Kong, China, pp. 613-618.View/Download from: Publisher's site
© 2017 IEEE. Robust visual tracking for outdoor vehicle is still a challenging problem due to large appearance variations caused by illumination variation, occlusion and scale variation, etc. In this paper, a deep-learning-based approach for robust outdoor vehicle tracking is proposed. Firstly, a stacked denoising auto-encoder is pre-trained to learn the feature representation way of images. Then, a k-sparse constraint is added to the stacked denoising auto-encoder and the encoder of k-sparse stacked denoising auto-encoder (kSSDAE) is connected with a classification layer to construct a classification neural network. After fine-tuning, the classification neural network is applied to online tracking under particle filter framework. Extensive tracking experiments are conducted on a challenging single object online tracking evaluation platform benchmark to verify the effectiveness of our tracker. Experiments show that our tracker outperforms most state-of-the-art trackers.
Gu, S, Lu, Y, Zhang, L & Zhang, J 2017, 'RGB-D Tracking Based on Kernelized Correlation Filter with Deep Features', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Conference on Neural Information ProcessingConference on Neural Information Processingtional Conference on Neural Information Processing, SpringerLink, Guangzhou, China, pp. 105-113.View/Download from: Publisher's site
© 2017, Springer International Publishing AG. This paper proposes a new RGB-D tracker which is upon Kernelized Correlation Filter(KCF) with deep features. KCF is a high-speed target tracker. However, the HOG feature used in KCF shows some weaknesses, such as not robust to noise. Therefore, we consider using RGB-D deep features in KCF, which refer to deep features of RGB and depth images and the deep features contain abundant and discriminated information for tracking. The mixture of deep features highly improves the performance of the tracker. Besides, KCF is sensitive to scale variations while depth images benefit for handling this problem. According to the principle of similar triangle, the ratio of scale variation can be observed simply. Tested over Princeton RGB-D Tracking Benchmark, Our RGB-D tracker achieves the highest accuracy when no occlusion happens. Meanwhile, we keep the high-speed tracking even if deep features are calculated during tracking and the average speed is 10 FPS.
Zuo, Y, Wu, Q & Zhang, J 2017, 'Minimum spanning forest with embedded edge inconsistency measurement for color-guided depth map upsampling', 2017 IEEE International Conference on Multimedia and Expo, IEEE, Hong Kong, China.View/Download from: Publisher's site
Color-guided depth map up-sampling, such as Markov-Random-Field-based (MRF-based) methods, is a popular depth map enhancement solution, which normally assumes edge consistency between color image and corresponding depth map. It calculates the coefficients of smoothness term in MRF according to such assumption. However, such consistency is not always true which leads to texture-copying artifacts and blurring depth edges. In this paper, we propose a novel coefficient computing scheme for smoothness term in MRF which is based on the distance between pixels in the Minimum Spanning Trees (Forest) to better preserve depth edges. The explicit edge inconsistency measurement is embedded into weights of edges in Minimum Spanning Trees, which significantly mitigates texture-copying artifacts. The proposed method is evaluated on Middlebury datasets and ToF-Mark datasets which demonstrates improved results compared with state-of-the-art methods.
Yao, L, Kusakunniran, W, Wu, Q, Zhang, J & Tang, Z 2017, 'Robust Gait Recognition under Unconstrained Environments using Hybrid Descriptions', Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), International Conference on Digital Image Computing: Techniques and Applications, IEEE, Sydney, Australia.View/Download from: Publisher's site
Gait is one of the key biometric features that has been widely applied for human identification. Appearance-based features and motion-based features are the two mainly used presentations in the gait recognition. However, appearance-based features are sensitive to the body shape changes and silhouette extraction from real-world images and videos also remains a challenge. As for motion features, due to the difficulty in extracting the underlying models from gait sequences, the localization of human joints lacks of high reliability and strong robustness. This paper proposes a new approach which utilizes Two-Point Gait (TPG) as the motion feature to remedy the deficiency of the appearance feature based on Gait Energy Image (GEI), in order to increase the robustness of gait recognition under the unconstrained environments with view changes and cloth changes. Another contribution of this paper is that this is the first time that TPG has been applied for view change and cloth change issues since it was proposed. The extensive experiments show that the proposed method is more invariant to the view change and cloth change, and can significantly improve the robustness of gait recognition.
Kusakunniran, W, Wu, Q & Zhang, J 2017, 'Action Recognition based on Correlated Codewords of Body Movements', Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), International Conference on Digital Image Computing: Techniques and Applications, IEEE, Sydney, Australia, pp. 1-8.View/Download from: Publisher's site
Using spatio-temporal features is popular for action recognition. However, existing methods embed these local features into a global representation. Orders and correlations among local motions of each action are missing. This can make it difficult to distinguish closely related actions. This paper proposes a solution to address this challenge by encoding correlations of movements. Space-time interest points are detected in each action video. Then, feature descriptors are extracted from these key points and clustered into different codewords implicitly representing different characteristics of motions. The final representation of each action video is a combination of a bag of words and correlations between codewords. Then, the support vector machine is used as a classification tool. Based on the experimental results, the proposed method achieves a very promising performance and particularly outperforms the other existing methods that rely on spatio-temporal features.
Jiang, Z, Huynh, DQ, Zhang, J, Qiang, W & Zhang, J 2017, 'Part-based Data Association for Visual Tracking', Proceedings of Digital Image Computing: Techniques and Applications (DICTA), 2017 International Conference on, International Conference on Digital Image Computing: Techniques and Applications, IEEE, Sydney, NSW, Australia, pp. 1-8.View/Download from: Publisher's site
We present a method that integrates a part-based sparse appearance model in a Bayesian inference framework for tracking targets in video sequences. We formulate the sparse appearance model as a set of smoothed colour histograms corresponding to the object windows detected by the Deformable Part Model (DPM) detector. The data association of each body part between frames is solved based on the position constraint, appearance coherence, and motion consistency. To deal with missing and noisy observations, the part detection window in the following frame is also predicted using an interacting multiple model (IMM) tracker. We have tested our tracking method on all the video sequences that involve people in upright poses from the TB-50 and TB-100 benchmark videos datasets. Our experimental results show that our tracking method outperforms six state-of-the-art tracking techniques
Kusakunniran, W, Wul, Q, Ritthipravad, P & Zhang, J 2017, 'Three-stages hard exudates segmentation in retinal images', 2017 9th International Conference on Information Technology and Electrical Engineering, ICITEE 2017, 2017 9th International Conference on Information Technology and Electrical Engineering, IEEE, Phuket, Thailand, pp. 1-6.View/Download from: Publisher's site
© 2017 IEEE. This paper proposes a three-stages method of hard exudate segmentation in retinal images. The first stage is the pre-processing. The color transfer is applied to make all retinal images to have the same color characteristics, based on statistical analysis. Then, only a yellow channel of each image is used in the further analysis. The second stage is the blob initialization. The blob detection based on color, size, and shape including circularity and convexity is used to identify initial pixels of hard exudates. The detected blobs must not be inside the optic disk. The third stage is the segmentation. The graph cut is iteratively applied on partitions of the image. The fine-tune segmentation in sub-images is necessary because the portion of hard exudates is significantly less than the portion of non-hard exudates. The proposed method is evaluated using the two well-known datasets, namely e-ophtha and DIARETDB1, in both aspects of pixel-level and image-level. Based on the comprehensive comparisons with the existing works, the proposed method is shown to be very promising. In the image-level, it achieves 96% sensitivity and 94% specificity for the e-ophtha dataset, and 96% sensitivity and 98% specificity for the DIARETDB1 dataset.
Yao, Y, Zhang, J, Shen, F, Hua, X, Xu, J & Tang, Z 2016, 'Automatic image dataset construction with multiple textual metadata', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Seattle, Washington, USA.View/Download from: Publisher's site
© 2016 IEEE.The goal of this work is to automatically collect a large number of highly relevant images from the Internet for given queries. A novel image dataset construction framework is proposed by employing multiple textual metadata. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora to obtain a richer semantic description, from which the visually non-salient and less relevant expansions are then filtered. After retrieving images from the Internet with filtered expansions, we further filter noisy images by clustering and progressively Convolutional Neural Networks (CNN). To verify the effectiveness of our proposed method, we construct a dataset with 10 categories, which is not only much larger than but also have comparable cross-dataset generalization ability with manually labeled dataset STL-10 and CIFAR-10.
Wu, S, Jing, XY, Yue, D, Zhang, J, Yang, KJ & Yang, J 2016, 'Unsupervised visual domain adaptation via dictionary evolution', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Seattle, Washington, United States.View/Download from: Publisher's site
© 2016 IEEE.In real-word visual applications, distribution mismatch between samples from different domains may significantly degrade classification performance. To improve the generalization capability of classifier across domains, domain adaptation has attracted a lot of interest in computer vision. This work focuses on unsupervised domain adaptation which is still challenging because no labels are available in the target domain. Most of the attention has been dedicated to seeking domain-invariant feature by exploring the shared structure between domains, ignoring the valuable discriminative information contained in the labeled source data. In this paper, we propose a Dictionary Evolution (DE) approach to construct discriminative features robust to domain shift. Specifically, DE aims to adapt a discriminative dictionary learnt based on labeled source samples to unlabeled target samples through a gradual transition process. We show that the learnt dictionary is endowed with cross-domain data representation ability and powerful discriminant capability. Empirical results on real world data sets demonstrate the advantages of the proposed approach over competing methods.
Zhou, T, Lu, Y, Di, H & Zhang, J 2016, 'Video object segmentation aggregation', Proceedings - IEEE International Conference on Multimedia and Expo (ICME) 2016, IEEE International Conference on Multimedia and Expo, IEEE, Seattle.View/Download from: Publisher's site
© 2016 IEEE.We present an approach for unsupervised object segmentation in unconstrained videos. Driven by the latest progress in this field, we argue that segmentation performance can be largely improved by aggregating the results generated by state-of-the-art algorithms. Initially, objects in individual frames are estimated through a per-frame aggregation procedure using majority voting. While this can predict relatively accurate object location, the initial estimation fails to cover the parts that are wrongly labeled by more than half of the algorithms. To address this, we build a holistic appearance model using non-local appearance cues by linear regression. Then, we integrate the appearance priors and spatio-temporal information into an energy minimization framework to refine the initial estimation. We evaluate our method on challenging benchmark videos and demonstrate that it outperforms state-of-the-art algorithms.
Huang, X, Fan, L, Zhang, J, Wu, Q & Yuan, C 2016, 'Real Time Complete Dense Depth Reconstruction for a Monocular Camera', Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, Nevada., pp. 674-679.View/Download from: Publisher's site
In this paper, we aim to solve the problem of estimating complete dense depth maps from a monocular moving camera. By 'complete', we mean depth information is estimated for every pixel and detailed reconstruction is achieved. Although this problem has previously been attempted, the accuracy of complete dense depth reconstruction is a remaining problem. We propose a novel system which produces accurate complete dense depth map. The new system consists of two subsystems running in separated threads, namely, dense mapping and sparse patch-based tracking. For dense mapping, a new projection error computation method is proposed to enhance the gradient component in estimated depth maps. For tracking, a new sparse patch-based tracking method estimates camera pose by minimizing a normalized error term. The experiments demonstrate that the proposed method obtains improved performance in terms of completeness and accuracy compared to three state-of the-art dense reconstruction methods VSFM+CMVC, LSDSLAM and REMODE.
Huang, X, Zhang, J, Wu, Q, Fan, L & Yuan, C 2016, 'A coarse-to-fine algorithm for registration in 3D street-view cross-source point clouds', Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Gold coast, Australia..View/Download from: Publisher's site
With the development of numerous 3D sensing technologies, object registration on cross-source point cloud has aroused researchers' interests. When the point clouds are captured from different kinds of sensors, there are large and different kinds of variations. In this study, we address an even more challenging case in which the differently-source point clouds are acquired from a real street view. One is produced directly by the LiDAR system and the other is generated by using VSFM software on image sequence captured from RGB cameras. When it confronts to large scale point clouds, previous methods mostly focus on point-to-point level registration, and the methods have many limitations.The reason is that the least mean error strategy shows poor ability in registering large variable cross-source point clouds. In this paper, different from previous ICP-based methods, and from a statistic view, we propose a effective coarse-to-fine algorithm to detect and register a small scale SFM point cloud in a large scale Lidar point cloud. Seen from the experimental results, the model can successfully run on LiDAR and SFM point clouds, hence it can make a contribution to many applications, such as robotics and smart city development
Zhao, Y, Di, H, Zhang, J, Lu, Y & Lv, F 2016, 'Recognizing human actions from low-resolution videos by region-based mixture models', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Seattle, Washington, United States.View/Download from: Publisher's site
© 2016 IEEE.Recognizing human action from low-resolution (LR) videos is essential for many applications including large-scale video surveillance, sports video analysis and intelligent aerial vehicles. Currently, state-of-the-art performance in action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, the optical flow algorithms are far from perfect in LR videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points(SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM models the spatial layout of features without any needs of body parts segmentation. Experiments are conducted on two publicly available LR human action datasets. Among which, the UT-Tower dataset is very challenging because the average height of human figures is only about 20 pixels. The proposed approach attains near-perfect accuracy on both of the datasets.
Zuo, Y, Wu, Q, Zhang, J & An, P 2016, 'Explicit modeling on depth-color inconsistency for color-guided depth up-sampling', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, USA.View/Download from: Publisher's site
© 2016 IEEE. Color-guided depth up-sampling is to enhance the resolution of depth map according to the assumption that the depth discontinuity and color image edge at the corresponding location are consistent. Through all methods reported, MRF including its variants is one of major approaches, which has dominated in this area for several years. However, the assumption above is not always true. Solution usually is to adjust the weighting inside smoothness term in MRF model. But there is no any method explicitly considering the inconsistency occurring between depth discontinuity and the corresponding color edge. In this paper, we propose quantitative measurement on such inconsistency and explicitly embed it into weighting value of smoothness term. Such solution has not been reported in the literature. The improved depth up-sampling based on the proposed method is evaluated on Middlebury datasets and ToFMark datasets and demonstrate promising results.
Zuo, Y, Wu, Q, An, P & Zhang, J 2016, 'Explicit measurement on depth-color inconsistency for depth completion', Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), IEEE International Conference on Image Processing, IEEE, Phoenix, AZ, USA, pp. 4037-4041.View/Download from: Publisher's site
Color-guided depth completion is to refine depth map through structure light sensing by filling missing depth structure and de-nosing. It is based on the assumption that depth discontinuity and color edge at the corresponding location are consistent. Among all proposed methods, MRF-based method including its variants is one of major approaches. However, the assumption above is not always true, which causes texture-copy and depth discontinuity blurring artifacts. The state-of-the-art solutions usually are to modify the weighting inside smoothness term of MRF model. Because there is no any method explicitly considering the inconsistency occurring between depth discontinuity and the corresponding color edge, they cannot adaptively control the effect of guidance from color image when completing depth map. In this paper, we propose quantitative measurement on such inconsistency and explicitly embed it into weighting value of smoothness term. The proposed method is evaluated on NYU Kinect datasets and demonstrates promising results.
Cho, N, Wu, Q, Xu, J & Zhang, J 2016, 'Content Authoring Using Single Image in Urban Environments for Augmented Reality', Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Gold Coast, Australia, pp. 1-7.View/Download from: Publisher's site
Content authoring is one of essentials of Augmented Reality (AR), which is to emplace an augmented content on a true part of a real scene in order to enhance users' visual experience. For the case of street view single 2D images, the challenge emerges because of clutter environments and unknown position and orientation related to camera pose. Although existing methods based on 2D feature point matching or vanishing point registration may recover the camera pose, the robustness is always challenging because of the uncertainty of feature point detection on texture-less region and displacement of vanishing point detection caused by irregular lines detected on the scene. By taking the advantages of characteristics of the man-made object (e.g. building) widely seen on the street view, this paper proposes a simple yet efficient content authoring approach. In this approach, the building dominant plane where the virtual object will be emplaced is detected and then projected to the frontal-parallel view on which the virtual object can be reliably emplaced. Once the virtual object and the true scene are embedded to each other on the frontal-parallel view, they are able to be converted back to the original view using inverse projection without any distortion. Experiments on public databases show that the proposed method can recover camera pose and implement content emplacement with promising performance.
Yao, Y, Hua, XS, Shen, F, Zhang, J & Tang, Z 2016, 'A domain robust approach for image dataset construction', MM 2016 - Proceedings of the 2016 ACM Multimedia Conference, ACM International Conference on Multimedia, ACM, Amsterdam, The Netherlands, pp. 212-216.View/Download from: Publisher's site
© 2016 ACM.There have been increasing research interests in automatically constructing image dataset by collecting images from the Internet. However, existing methods tend to have a weak domain adaptation ability, known as the \dataset bias problem". To address this issue, in this work, we propose a novel image dataset construction framework which can generalize well to unseen target domains. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora (GBNC) to obtain a richer semantic description, from which the noisy query expansions are then filtered out. By treating each expansion as a \bag" and the retrieved images therein as \instances", we formulate image filtering as a multi-instance learning (MIL) problem with constrained positive bags. By this approach, images from different data distributions will be kept while with noisy images filtered out. Comprehensive experiments on two challenging tasks demonstrate the effectiveness of our proposed approach.
Yao, Y, Zhang, J, Hua, XS, Shen, F & Tang, Z 2016, 'Extracting visual knowledge from the internet: Making sense of image data', MultiMedia Modeling (LNCS), International Conference on Multimedia Modeling, Springer, Miami, USA, pp. 862-873.View/Download from: Publisher's site
© Springer International Publishing Switzerland 2016.Recent successes in visual recognition can be primarily attributed to feature representation, learning algorithms, and the everincreasing size of labeled training data. Extensive research has been devoted to the first two, but much less attention has been paid to the third. Due to the high cost of manual data labeling, the size of recent efforts such as ImageNet is still relatively small in respect to daily applications. In this work, we mainly focus on how to automatically generate identifying image data for a given visual concept on a vast scale. With the generated image data, we can train a robust recognition model for the given concept. We evaluate the proposed webly supervised approach on the benchmark Pascal VOC 2007 dataset and the results demonstrates the superiority of our method over many other state-ofthe- art methods in image data collection.
Zhang, J, Zhang, J, Lu, J, Shen, C, Curr, K, Phua, R, Neville, R & Edmonds, E 2016, 'SLNSW-UTS: A Historical Image Dataset for Image Multi-Labeling and Retrieval', 2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016, Digital Image Computing Techniques and Applications, IEEE, Gold Coast, Australia, pp. 1-6.View/Download from: Publisher's site
© 2016 IEEE.This paper introduces a dataset of historical images created by the State Library of New South Wales and the University of Technology Sydney (UTS). The dataset has a total of 29713 images with 119 unique labels. Each image contains multiple labels. We use a CNN-based framework to explore the feasibility of our dataset in image multi-labeling and retrieval research, and extract semantic level image features for future research use. The experiment results illustrate that effective deep learning models can be trained on our dataset. We also introduce five applications that can be studied on our historical image dataset.
Zhao, M, Zhang, C, Zhang, W, Li, W & Zhang, J 2015, 'Decorrelation-Stretch based Cloud Detection for Total Sky Images', Visual Communications and Image Processing (VCIP), 2015, 2015 Visual Communications and Image Processing (VCIP 2015), IEEE, Singapore.View/Download from: Publisher's site
Cloud detection plays an important role in total-sky images based solar forecasting and has received more attention in recent years. Accurate cloud detection for complicated total-sky images is especially changeling due to the low contrast and vague boundaries between cloud and sky regions. Unlike the existing cloud detection method without any preprocessing, one novel decorrelation-stretch (DS) based method is proposed in this work, where the total-sky images are preprocessed using the DS algorithm firstly. With this enhancement, color feature disparity of cloud and sky can be intensified notably, and then a more accurate threshold can be obtained by applying the Minimum Cross Entropy (MCE) to the preprocessed image. Experimental results demonstrated the proposed scheme achieves better performance than the existing cloud detection methods on total-sky images, especially for images with low contrast or vague boundaries between cloud and sky regions.
Cheng, H, Zhang, J, An, P & Liu, Z 2015, 'A Novel Saliency Model for Stereoscopic Images', 2015 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), International Conference on Digital Image Computing: Techniques and Applications, IEEE, Adelaide, AUSTRALIA, pp. 54-60.
Wang, Y, Zhang, J, Liu, Z, Wu, Q, Chou, P, Zhang, Z & Jai, Y 2014, 'Completed Dense Scene Flow in RGB-D Space', Computer Vision - ACCV 2014 Workshops, Asian Conference on Computer Vision, Springer International Publishing, Singapore, pp. 191-205.View/Download from: Publisher's site
Conventional scene flow containing only translational vectors is not able to model 3D motion with rotation properly. Moreover, the accuracy of 3D motion estimation is restricted by several challenges such as large displacement, noise, and missing data (caused by sensing techniques or occlusion). In terms of solution, there are two kinds of approaches: local approaches and global approaches. However, local approaches can not generate smooth motion field, and global approaches is difficult to handle large displacement motion. In this paper, a completed dense scene flow framework is proposed, which models both rotation and translation for general motion estimation. It combines both a local method and a global method considering their complementary characteristics to handle large displacement motion and enforce smoothness respectively. The proposed framework is applied on the RGB-D image space where the computation efficiency is further improved. According to the quantitative evaluation based on Middlebury dataset, our method outperforms other published methods. The improved performance is further confirmed on the real data acquired by Kinect sensor.
Liu, X, Wang, L, Ying, J, Dou, Y & Zhang, J 2015, 'Absent Multiple Kernel Learning', Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI Publications, Austin, Texas.
Multiple kernel learning (MKL) optimally combines the multiple channels of each sample to improve classification performance. However, existing MKL algorithms cannot effectively handle the situation where some channels are missing, which is common in practical applications. This paper proposes an absent MKL (AMKL) algorithm to address this issue. Different from existing approaches where missing channels are firstly imputed and then a standard MKL algorithm is deployed on the imputed data, our algorithm directly classifies each sample with its observed channels. In specific, we define a margin for each sample in its own relevant space, which corresponds to the observed channels of that sample. The proposed AMKL algorithm then maximizes the minimum of all sample-based margins, and this leads to a difficult optimization problem. We show that this problem can be reformulated as a convex one by applying the representer theorem. This makes it readily be solved via existing convex optimization packages. Extensive experiments are conducted on five MKL benchmark data sets to compare the proposed algorithm with existing imputation-based methods. As observed, our algorithm achieves superior performance and the improvement is more significant with the increasing missing ratio.
Huang, S, Zhang, J, Lu, S & Hua, X-S 2015, 'Social Friend Recommendation Based on Network Correlation and Feature Co-Clustering', Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ACM international Conference on Multimedia Retrieval, ACM New York, NY, USA ©2015, Shanghai, pp. 315-322.View/Download from: Publisher's site
Friend recommendation is an important recommender application in social media. Major social websites such as Twitter and Facebook are all capable of recommending friends to individuals. However, friend recommendation is a difficult task and most social websites use simple friend recommendation algorithms such as similarity and popularity, whose level of accuracy does do not satisfy the majority of users.
In this paper we propose a two-stage procedure for more accurate friend recommendation: In the rest stage, based on the relationship of different social networks, the Flickr tag network and contact network are aligned to generate a "possible friend list"; In the second stage, making the assumption that a friend's friends also tend to be friends",
co-clustering is applied to the tag and image information of the list to refine the recommendation result in the first stage. Experimental results show that the proposed method achieves good performance and every stage contributes to the recommendation.
Cheng, H, Zhang, J, Ping, A & Liu, Z 2015, 'A Novel Saliency Model for Stereoscopic Images', Digital Image Computing: Techniques and Applications (DICTA), 2015 International Conference on, The International Conference on Digital Image Computing: Techniques and Applications (DICTA), IEEE, Adelaide, pp. 1-7.View/Download from: Publisher's site
In this paper, we propose a novel saliency model
for stereoscopic images. To improve depth information for stereo
saliency analysis, this model exploits depth information from
three aspects: 1) we extract the low-level features based on the
color-depth contrast features in a local and global search range
(local-global contrast); 2) to extract the topological structural
from a depth map, a surrounding map based on a Boolean
map is obtained as a weight value to enhance the local-global
contrast features; and 3) based on the saliency probability
distribution in depth information, we employ stereo center prior
enhancement to compute the final saliency. Experimental results
on two recent eye-tracking databases show that our proposed
method outperforms the state-of-the-art saliency models
Huang, X, Yuan, C & Zhang, J 2015, 'Graph Cuts Stereo Matching Based on Patch-Match and Ground Control Points Constraint', Advances in Multimedia Information Processing (LNCS), Pacific-Rim Conference on Multimedia, Springer, Gwangju, South Korea, pp. 14-23.View/Download from: Publisher's site
Stereo matching methods based on Patch-Match obtain good results on complex texture regions but show poor ability on low texture regions. In this paper, a new method that integrates Patch-Match and graph cuts (GC) is proposed in order to achieve good results in both complex and low texture regions. A label is randomly assigned for each pixel and the label is optimized through propagation process. All these labels constitute a label space for each iteration in GC. Also, a Ground Control Points (GCPs) constraint term is added to the GC to overcome the disadvantages of Patch-Match stereo in low texture regions. The proposed method has the advantage of the spatial propagation of Patch-Match and the global property of GC. The results of experiments are tested on the Middlebury evaluation system and outperform all the other PatchMatch based methods
Huang, X, Zhang, J, Wu, Q, Yuan, C & Fan, L 2015, 'Dense Correspondence Using Non-local DAISY Forest', Proceedings of the 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Adelaide, pp. 1-8.View/Download from: Publisher's site
Dense correspondence computation is a critical computer vision task with many applications. The most existing dense correspondence methods consider all the neighbors connected to the center pixels and use local support region. However, such approach might only achieve a locally-optimal solution.In this paper, we propose a non-local dense correspondence computation method by calculating the match cost on a tree structure. It is non-local because all other nodes on the tree contribute to the match cost computing for the current node. The proposed method consists of three steps, namely: 1) DAISY descriptor computation, 2) edge-preserving segmentation and forest construction, 3) PatchMatch fast search. We test our algorithm on the Middlebury and Moseg datasets. The results show that the proposed method outperforms the state-of-the-art methods in dense correspondence computing and has a low computation complexity.
Xu, W, Miao, Z, Zhang, J & Tian, Y 2014, 'Learning spatio-temporal features for action recognition with modified hidden conditional random field', Computer Vision - ECCV 2014 Workshops: Zurich, Switzerland, September 6-7 and 12, 2014, Proceedings, Part I, European Conference on Computer Vision, Springer International Publishing, Zurich; Switzerland, pp. 786-801.View/Download from: Publisher's site
Previous work on human action analysis mainly focuses on designing hand-crafted local features and combining their context information. In this paper, we propose using supervised feature learning as a way to learn spatio-temporal features. More specifically, a modified hidden conditional random field is applied to learn two high-level features conditioned on a certain action label. Among them, the individual features can describe the appearance of local parts and the interaction features can capture their spatial constraints. In order to make the best of what have been learned, a new categorization model is proposed for action matching. It is inspired by the Deformable Part Model and the intuition is that actions can be modeled by local features in a changeable spatial and temporal dependency. Experimental result shows that our algorithm can successfully recognize human actions with high accuracies both on the simple atomic action database (KTH and Weizmann) and complex interaction activity database (CASIA).
Huang, S, Zhang, J, Liu, X & Wang, L 2014, 'A method of discriminative information preservation and in-dimension distance minimization method for feature selection', Proceedings - International Conference on Pattern Recognition, International Conference on Pattern Recognition, IEEE, Swedish Soc Automated Image Anal, Stockholm, SWEDEN, pp. 1615-1620.View/Download from: Publisher's site
© 2014 IEEE. Preserving sample's pair wise similarity is essential for feature selection. In supervised learning, labels can be used as a direct measure to check whether two samples are similar with each other. In unsupervised learning, however, such similarity information is usually unavailable. In this paper, we propose a new feature selection method through spectral clustering based on discriminative information as an underlying data structure. Laplacian matrix is used to obtain more partitioning information than other previously proposed structures such as the Eigen space of original data. The high dimension of sample data is projected into a low dimensional space. The in-dimension distance is also considered to get a better compact clustering result. The proposed method can be solved efficiently by updating the projection matrix and its inverse normalized diagonal matrix. A comprehensive experimental study has demonstrated that the proposed method outperforms many state-of-the-art feature selection algorithms with different criterion including the accuracy of clustering/classification and Jaccard score.
Peng, F, Wu, Q, Fan, L, Zhang, J, You, Y, Lu, J & Yang, J 2014, 'Street view cross-sourced point cloud matching and registration', Proceedings of the 21st IEEE International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Paris, France, pp. 2026-2030.View/Download from: Publisher's site
Object registration has been widely discussed with the development of various range sensing technologies. In most work, however, the point clouds of reference and target are generated by the same technology, such as a Kinect range camera, LiDAR sensor, or Structure from Motion technique. Cases in which reference and target point clouds are generated by different technologies are rarely discussed. Due to the significant differences across various point cloud data in terms of point cloud density, sensing noise, scale, occlusion etc., object registration between such different point clouds becomes extremely difficult. In this study, we address for the first time an even more challenging case in which the differently-sourced point clouds are acquired from a real street view. One is generated on the basis of an image sequence through the SfM process, and the other is produced directly by the LiDAR system. We propose a two-stage matching and registration algorithm to achieve object registration between these two different point clouds. The experiments are based on real building object point cloud data and demonstrate the effectiveness and efficiency of the proposed solution. The newly proposed solution can be further developed to contribute to several related applications, such as Location Based Service.
Wang, D, Yuan, C, Sun, Y, Zhang, J & Zhou, H 2014, 'Fast Mode and Depth Decision Algorithm for Intra Prediction of Quality SHVC', Intelligent Computing Theory, International Conference on Intelligent Computing, Springer International Publishing, Taiyuan, China, pp. 693-699.View/Download from: Publisher's site
Scalable High-Efficiency Video Coding (SHVC) is an extension of High Efficiency Video Coding (HEVC). Since the coding procedure for HEVC is very complex, the coding procedure for SHVC is even more complex, it is very important to improve its coding speed. In this paper, we have proposed a fast mode and depth decision algorithm for Intra prediction of Quality SHVC. Initially, only partial modes are checked to determine the local minimum points (LMPs) based on the relationships between the modes and their corresponding Hadamard Costs (HC); and then only partial depths are checked by skipping depths with low possibilities indicated based on their inter-layer correlations and textural features. The experimental results showed that the proposed algorithm could improve coding speed by 61.31% on average with negligible coding efficiency losses.
Liu, X, Wang, L, Zhang, J & Yin, J 2014, 'Sample-adaptive Multiple Kernel Learning', Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI Publication, Québec, Canada, pp. 1975-1981.
Existing multiple kernel learning (MKL) algorithms indiscriminately
apply a same set of kernel combination weights
to all samples. However, the utility of base kernels could vary
across samples and a base kernel useful for one sample could
become noisy for another. In this case, rigidly applying a
same set of kernel combination weights could adversely affect
the learning performance. To improve this situation, we
propose a sample-adaptive MKL algorithm, in which base
kernels are allowed to be adaptively switched on/off with
respect to each sample. We achieve this goal by assigning
a latent binary variable to each base kernel when it is applied
to a sample. The kernel combination weights and the
latent variables are jointly optimized via margin maximization
principle. As demonstrated on five benchmark data sets,
the proposed algorithm consistently outperforms the comparable
ones in the literature.
Xu, J, Wu, Q, Zhang, J, Silk, B, Ngo, GT & Tang, Z 2014, 'Efficient People Counting With Limited Manual Interfaces', 2014 International Conference on Digital lmage Computing: Techniques and Applications (DlCTA), Digital Image Computing Techniques and Applications, IEEE, Wollongong, NSW, Australia.
People counting is a topic with various practical
applications. Over the last decade, two general approaches have
been proposed to tackle this problem: a) counting based on
individual human detection; b) counting by measuring regression
relation between the crowd density and number of people.
Because the regression based method can avoid explicit people
detection which faces several well-known challenges, it has been
considered as a robust method particularly on a complicated
environments. An efficient regression based method is proposed
in this paper, which can be well adopted into any existing video
surveillance system. It adopts color based segmentation to extract
foreground regions in images. Regression is established based on
the foreground density and the number of people. This method
is fast and can deal with lighting condition changes. Experiments
on public datasets and one captured dataset have shown the
effectiveness and robustness of the method.
Guo, D, Zhang, J, Liu, X, Cui, Y & Zhao, C 2014, 'Multiple Kernel Learning Based Multi-view Spectral Clustering', 2014 22nd International Conference on Pattern Recognition (ICPR), International Conference on Pattern Recognition, IEEE, Stockholm, Sweden, pp. 3774-3779.View/Download from: Publisher's site
For a given data set, exploring their multi-view instances under a clustering framework is a practical way to boost the clustering performance. This is because that each view might reflect partial information for the existing data. Furthermore, due to the noise and other impact factors, exploring these instances from different views will enhance the mining of the real structure and feature information within the data set. In this paper, we propose a multiple kernel spectral clustering algorithm through the multi-view instances on the given data set. By combining the kernel matrix learning and the spectral clustering optimization into one process framework, the algorithm can determine the kernel weights and cluster the multi-view data simultaneously. We compare the proposed algorithm with some recent published methods on real-world datasets to show the efficiency of the proposed algorithm.
Wang, Y, Di, H, Wang, B, Liang, W, Zhang, J & Jia, Y 2014, 'Depth Super-Resolution by Fusing Depth Imaging and Stereo Vision with Structural Determinant Information Inference', 2014 22nd International Conference on Pattern Recognition (ICPR), International Conference on Pattern Recognition, IEEE, Stockholm, Sweden, pp. 4212-4217.
In this paper, we present a depth super-resolution
framework by fusing depth imaging and stereo vision for highresolution
and high-accuracy depth maps. Depth cameras and
stereo vision have their own limitations in some aspects, but
their characteristics of range sensing are complementary. Thus,
combining both approaches can produce more satisfactory results
than either one. Unlike previous fusion methods, we initially
taking the noisy depth observation from depth camera as prior
information of scene structure. The prior information of scene
structure is also utilized to infer structural determinant information,
like depth discontinuity and occlusion, which is essential
to improve the quality of depth map in the fusion process. In
succession, the prior knowledge helps to overcome difficulties of
intensity inconsistency in image observation from stereo vision
component. Experimental results dem
Guo, D, Zhang, J, Xu, M, He, X, Li, M & Zhao, C 2014, 'A Multiple Features Distance Preserving (MFDP) Model for Saliency Detection', Bouzerdoum, Digital Image Computing Techniques and Applications, IEEE, Wollongong.View/Download from: Publisher's site
Playing a vital role, saliency has been widely applied for various image analysis tasks, such as content-aware image retargeting, image retrieval and object detection. It is generally accepted that saliency detection can benefit from the integration of multiple visual features. However, most of the existing literatures fuse multiple features at saliency map level without considering cross-feature information, i.e. generate a saliency map based on several maps computed from an individual feature. In this paper, we propose a Multiple Feature Distance Preserving (MFDP) model to seamlessly integrate multiple visual features through an alternative optimization process. Our method outperforms the state-of-the-arts methods on saliency detection. Saliency detected by our method is further cooperated with seam carving algorithm and significantly improves the performance on image retargeting.
Xu, W, Miao, Z, Zhang, J, Zhang, Q & Wu, H 2013, 'Spatial-Temporal context for action recognition combined with confidence and contribution weight', 2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2nd IAPR Asian Conference on Pattern Recognition (ACPR), IEEE COMPUTER SOC, Naha, JAPAN, pp. 576-580.View/Download from: Publisher's site
wang, S, Zhang, J & Miao, Z 2013, 'A New Edge Feature for head-shoulder Detection', 2013 IEEE International Conference on Image Processing, IEEE International Conference on Image Processing, Piscataway, NJ, Melbourne, Australia, pp. 2822-2826.View/Download from: Publisher's site
In this work, we introduce a new edge feature to improve the head-shoulder detection performance. Since Head-shoulder detection is much vulnerable to vague contour, our new edge feature is designed to extract and enhance the head-shoulder contour and suppress the other contours. The basic idea is that head-shoulder contour can be predicted by filtering edge image with edge patterns, which are generated from edge fragments through a learning process. This edge feature can significantly enhance the object contour such as human head and shoulder known as En-Contour. To evaluate the performance of the new En-Contour, we combine it with HOG+LBP  as HOG+LBP+En-Contour. The HOG+LBP is the state-of-the-art feature in pedestrian detection. Because the human head-shoulder detection is a special case of pedestrian detection, we also use it as our baseline. Our experiments have indicated that this new feature significantly improve the HOG+LBP.
Xu, J, Wu, Q, Zhang, J, Shen, F & Tang, Z 2013, 'Training boosting-like algorithms with semi-supervised subspace learning', 2013 IEEE International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Melbourne, Australia, pp. 4302-4306.View/Download from: Publisher's site
Boosting algorithms have attracted great attention since the first real-time face detector by Viola & Jones through feature selection and strong classifier learning simultaneously. On the other hand, researchers have proposed to decouple such two procedures to improve the performance of Boosting algorithms. Motivated by this, we propose a boosting-like algorithm framework by embedding semi-supervised subspace learning methods. It selects weak classifiers based on class-separability. Combination weights of selected weak classifiers can be obtained by subspace learning. Three typical algorithms are proposed under this framework and evaluated on public data sets. As shown by our experimental results, the proposed methods obtain superior performances over their supervised counterparts and AdaBoost.
Kusakunniran, W, satoh, S, Zhang, J & Wu, Q 2013, 'Attribute-based learning for large scale object classification', 2013 IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, San Jose, California, USA, pp. 1-6.View/Download from: Publisher's site
Scalability to large numbers of classes is an important challenge for multi-class classification. It can often be computationally infeasible at test phase when class prediction is performed by using every possible classifier trained for each individual class. This paper proposes an attribute-based learning method to overcome this limitation. First is to define attributes and their associations with object classes automatically and simultaneously. Such associations are learned based on greedy strategy under certain conditions. Second is to learn a classifier for each attribute instead of each class. Then, these trained classifiers are used to predict classes based on their attribute representations. The proposed method also allows trade-off between test-time complexity (which grows linearly with the number of attributes) and accuracy. Experiments based on Animals-with-Attributes and ILSVRC2010 datasets have shown that the performance of our method is promising when compared with the state-of-the-art.
wang, S, Miao, Z & Zhang, J 2013, 'Simultaneously detect and segment pedestrian', 2013 IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, San Jose, USA, pp. 1-4.View/Download from: Publisher's site
We present a framework to simultaneously detect and segment pedestrian in images. Our work is based on part-based method. We first segment the image into superpixels, then assemble superpixels into body part candidates by comparing the assembled shape with pre-built template library. A structure-based shape matching algorithm is developed to measure the shape similarity. All the body part candidates are input into our modified AND/OR graph to generate the most reasonable combination. The graph describes the possible variation of body configuration and model the constrain relationship between body parts. We perform comparison experiments on the public database and the results show the effectiveness of our framework.
Song, Y, Zhang, J, Cao, L & Sangeux, M 2013, 'On Discovering the Correlated Relationship between Static and Dynamic Data in Clinical Gait Analysis', Lecture Notes in Computer Science, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, Prague, Czech Republic, pp. 563-578.View/Download from: Publisher's site
`Gait' is a person's manner of walking. Patients may have an abnormal gait due to a range of physical impairment or brain damage. Clinical gait analysis (CGA) is a technique for identifying the underlying impairments that affect a patients gait pattern. The CGA is critical for treatment planning. Essentially, CGA tries to use patients physical examination results, known as static data, to interpret the dynamic characteristics in an abnormal gait, known as dynamic data. This process is carried out by gait analysis experts, mainly based on their experience which may lead to subjective diagnoses. To facilitate the automation of this process and form a relatively objective diagnosis, this paper proposes a new probabilistic correlated static-dynamic model (CSDM) to discover correlated relationships between the dynamic characteristics of gait and their root cause in the static data space. We propose an EMbased algorithm to learn the parameters of the CSDM. One of the main advantages of the CSDM is its ability to provide intuitive knowledge. For example, the CSDM can describe what kinds of static data will lead to what kinds of hidden gait patterns in the form of a decision tree, which helps us to infer dynamic characteristics based on static data. Our initial experiments indicate that the CSDM is promising for discovering the correlated relationship between physical examination (static) and gait (dynamic) data.
Zhang, J, Schonfeld, D & Feng, DD 2012, 'Message from ICME 2012 general chairs', Proceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2012.View/Download from: Publisher's site
ICME 2012 is the thirteen in the series of ICME conferences that has been held annually since 2000, in various cities throughout the world. The success of this conference would not have been possible without the generous help of sponsors. Paper prizes and Student Travel Grants are sponsored by the National Information and Communications Technology Australia (NICTA), Microsoft Research, IBM Research, Canon Information Systems Research Australia (CiSRA), and Advanced Analytics Institute (AAI) at the University of Technology, Sydney (UTS). ICME 2012 features a new plenary session - Time Machine! The session consists of a series of expert presentations that re-introduce ideas published "before their time" and, as a result, their impact has not yet been fully realized. ICME 2012 also has outstanding lectures including keynote lectures and research overviews. ICME 2012 will offer several paper prizes, including Best Paper Award, Best Student Paper Award, and Best Demo Award. © 2012 IEEE.
Shen, Y, Miao, Z & Zhang, J 2012, 'Unsupervised Online Learning Trajectory Analysis Based on Weighted Directed Graph', 2012 21st International Conference on Pattern Recognition (ICPR), International Conference on Pattern Recognition, IEEE, Tsukuba, Japan, pp. 1306-1309.
In this paper, we propose a novel unsupervised online learning trajectory analysis method based on weighted directed graph. Each trajectory can be represented as a sequence of key points. In the training stage, unsupervised expectation-maximization algorithm (EM) is applied for training data to cluster key points. Each class is a Gaussian distribution. It is considered as a node of the graph. According to the classification of key points, we can build a weighted directed graph to represent the trajectory network in the scene. Each path is a category of trajectories. In the test stage, we adopt online EM algorithm to classify trajectories and update the graph. In the experiments, we test our approach and obtain a good performance compared with state-of-the-art approaches.
Zhang, J, Lu, S, Mei, T, Wang, J, Wang, Z, Feng, D, Sun, J & Li, S 2012, 'Browse-to-search', Browse-to-search, ACM International Conference on Multimedia, ACM, Nara, Japan, pp. 1323-1324.
Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 30â40KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit ratemobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server.
Zhang, J, Wu, Y, Lu, S, Mei, T & Li, S 2012, 'Local visual words coding for low bit rate mobile visual search', Local visual words coding for low bit rate mobile visual search, ACM International Conference on Multimedia, ACM, Nara, Japan., pp. 989-992.View/Download from: Publisher's site
Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 30â40KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit ratemobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server
Xu, J, Wu, Q, Zhang, J & Tang, Z 2013, 'Object Detection Based on Co-Ocurrence GMuLBP Features', 2012 IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE Computer Society, 2012 IEEE International Conference on Multimedia and Expo, pp. 943-948.View/Download from: Publisher's site
Image co-occurrence has shown great powers on object classification because it captures the characteristic of individual features and spatial relationship between them simultaneously. For example, Co-occurrence Histogram of Oriented Gradients (CoHOG) has achieved great success on human detection task. However, the gradient orientation in CoHOG is sensitive to noise. In addition, CoHOG does not take gradient magnitude into account which is a key component to reinforce the feature detection. In this paper, we propose a new LBP feature detector based image co-occurrence. Building on uniform Local Binary Patterns, the new feature detector detects Co-occurrence Orientation through Gradient Magnitude calculation. It is known as CoGMuLBP. An extension version of the GoGMuLBP is also presented. The experimental results on the UIUC car data set show that the proposed features outperform state-of-the-art methods.
Quek, A, Wang, Z, Zhang, J & Feng, D 2011, 'Structural Image Classification with Graph Neural Networks', Proceedings of 2011 International Conference on Digital Image Computing - Techniques and Applications, Digital Image Computing Techniques and Applications, IEEE, Noosa, Queensland, Australia, pp. 416-421.
Many approaches to image classification tend to transform an image into an unstructured set of numeric feature vectors obtained globally and/or locally, and as a result lose important relational information between regions. In order to encode the geometric relationships between image regions, we propose a variety of structural image representations that are not specialised for any particular image category. Besides the traditional grid-partitioning and global segmentation methods, we investigate the use of local scale-invariant region detectors. Regions are connected based not only upon nearest-neighbour heuristics, but also upon minimum spanning trees and Delaunay triangulation. In order to maintain the topological and spatial relationships between regions, and also to effectively process undirected connections represented as graphs, we utilise the recently-proposed graph neural network model. To the best of our knowledge, this is the first utilisation of the model to process graph structures based on local-sampling techniques, for the task of image classification. Our experimental results demonstrate great potential for further work in this domain.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2011, 'Pairwise Shape configuration-based PSA for gait recognition under small viewing angle change', 2011 8th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), IEEE International Conference on Video and Signal Based Surveillance (AVSS), IEEE, Klagenfurt, Austria, pp. 17-22.View/Download from: Publisher's site
Two main components of Procrustes Shape Analysis (PSA) are adopted and adapted specifically to address gait recognition under small viewing angle change: 1) Procrustes Mean Shape (PMS) for gait signature description; 2) Procrustes Distance (PD) for similarity measurement. Pairwise Shape Configuration (PSC) is proposed as a shape descriptor in place of existing Centroid Shape Configuration (CSC) in conventional PSA. PSC can better tolerate shape change caused by viewing angle change than CSC. Small variation of viewing angle makes large impact only on global gait appearance. Without major impact on local spatio-temporal motion, PSC which effectively embeds local shape information can generate robust view-invariant gait feature. To enhance gait recognition performance, a novel boundary re-sampling process is proposed. It provides only necessary re-sampled points to PSC description. In the meantime, it efficiently solves problems of boundary point correspondence, boundary normalization and boundary smoothness. This re-sampling process adopts prior knowledge of body pose structure. Comprehensive experiment is carried out on the CASIA gait database. The proposed method is shown to significantly improve performance of gait recognition under small viewing angle change without additional requirements of supervised learning, known viewing angle and multi-camera system, when compared with other methods in literatures.
Zhang, J & Liu, X 2011, 'Active Learning for Human Action Recognition with Gaussian Processes', Proceedings of 2011 International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Brussels, Belgium, pp. 3253-3256.View/Download from: Publisher's site
This paper presents an active learning approach for recognizing human actions in videos based on multiple kernel combined method. We design the classifier based on Multiple Kernel Learning (MKL) through Gaussian Processes (GP) regression. This classifier is then trained in an active learning approach. In each iteration, one optimal sample is selected to be interactively annotated and incorporated into training set. The selection of the sample is based on the heuristic feedback of the GP classifier. To our knowledge, GP regression MKL based active learning methods have not been applied to address the human action recognition yet. We test this approach on standard benchmarks. This approach outperforms the state-of-the-art techniques in accuracy while requires significantly less training samples.
Li, Z, Wu, Q, Zhang, J & Geers, G 2011, 'SKRWM based descriptor for pedestrian detection in thermal images', 2011 IEEE 13th International Workshop on Multimedia Signal Processing (MMSP), IEEE International Workshop on Multimedia Signal Processing, IEEE, Hangzhou, China, pp. 1-6.View/Download from: Publisher's site
Pedestrian detection in a thermal image is a difficult task due to intrinsic challenges:1) low image resolution, 2) thermal noising, 3) polarity changes, 4) lack of color, texture or depth information. To address these challenges, we propose a novel mid-level feature descriptor for pedestrian detection in thermal domain, which combines pixel-level Steering Kernel Regression Weights Matrix (SKRWM) with their corresponding covariances. SKRWM can properly capture the local structure of pixels, while the covariance computation can further provide the correlation of low level feature. This mid-level feature descriptor not only captures the pixel-level data difference and spatial differences of local structure, but also explores the correlations among low-level features. In the case of human detection, the proposed mid-level feature descriptor can discriminatively distinguish pedestrian from complexity. For testing the performance of proposed feature descriptor, a popular classifier framework based on Principal Component Analysis (PCA) and Support Vector Machine (SVM) is also built. Overall, our experimental results show that proposed approach has overcome the problems caused by background subtraction in  while attains comparable detection accuracy compared to the state-of-the-arts.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2011, 'Speed-invariant gait recognition based on Procrustes Shape Analysis using higher-order shape configuration', 2011 18th IEEE International Conference on Image Processing (ICIP), IEEE International Conference on Image Processing, IEEE, Brussels, Belgium, pp. 545-548.View/Download from: Publisher's site
Walking speed change is considered a typical challenge hindering reliable human gait recognition. This paper proposes a novel method to extract speed-invariant gait feature based on Procrustes Shape Analysis (PSA). Two major components of PSA, i.e., Procrustes Mean Shape (PMS) and Procrustes Distance (PD), are adopted and adapted specifically for the purpose of speed-invariant gait recognition. One of our major contributions in this work is that, instead of using conventional Centroid Shape Configuration (CSC) which is not suitable to describe individual gait when body shape changes particularly due to change of walking speed, we propose a new descriptor named Higher-order derivative Shape Configuration (HSC) which can generate robust speed-invariant gait feature. From the first order to the higher order, derivative shape configuration contains gait shape information of different levels. Intuitively, the higher order of derivative is able to describe gait with shape change caused by the larger change of walking speed. Encouraging experimental results show that our proposed method is efficient for speed-invariant gait recognition and evidently outperforms other existing methods in the literatures.
Paisitkriangkrai, S, Shen, C & Zhang, J 2010, 'Face detection with effective feature extraction', Computer Vision ACCV 2010, Asian Conference on Computer Vision, SpringerLink, Queenstown, New Zealand, pp. 460-470.View/Download from: Publisher's site
There is an abundant literature on face detection due to its important role in many vision applications. Since Viola and Jones proposed the first real-time AdaBoost based face detector, Haar-like features have been adopted as the method of choice for frontal face detection. In this work, we show that simple features other than Haar-like features can also be applied for training an effective face detector. Since, single feature is not discriminative enough to separate faces from difficult non-faces, we further improve the generalization performance of our simple features by introducing feature co-occurrences. We demonstrate that our proposed features yield a performance improvement compared to Haar-like features. In addition, our findings indicate that features play a crucial role in the ability of the system to generalize.
Zhang, J, Shen, C & Geers, G 2010, 'Proceedings - 2010 Digital Image Computing: Techniques and Applications, DICTA 2010: Preface', Proceedings - 2010 Digital Image Computing: Techniques and Applications, DICTA 2010.View/Download from: Publisher's site
Saesue, W, Chou, CT & Zhang, J 2010, 'Video quality prediction in the presence of mac contention and wireless channel error', 2010 IEEE International Symposium on "A World of Wireless, Mobile and Multimedia Networks", WoWMoM 2010 - Digital Proceedings.View/Download from: Publisher's site
This paper proposes an integrated model to predict the quality of video, expressed in terms of mean square error (MSE) of the received video frames, in an IEEE 802.11e wireless network. The proposed system takes into account contention at the MAC layer, wireless channel error, queueing at the MAC layer, parameters of different 802.11e access categories (ACs), and video characteristics of different H.264 data partitions (DPs). To the best of the authors' knowledge, this is the first system that takes these network and video characteristics into consideration to predict video quality in an IEEE 802.11e network. The proposed system consists of two components. The first component predicts the packet loss rate of each H.264 data partition by using a multi-dimensional discrete-time Markov chain (DTMC) coupled to a M/G/1 queue. The second component uses these packet loss rates and the video characteristics to predict the MSE of each received video frames. We verify the accuracy of our combination system by using discrete event simulation and real H.264 coded video sequences. ©2010 IEEE.
Wang, L, Cheng, L, Thi, TH & Zhang, J 2010, 'Human Action Recognition from Boosted Pose Estimation', 2010 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Sydney, NSW, pp. 308-313.
This paper presents a unified framework for recognizing human action in video using human pose estimation. Due to high variation of human appearance and noisy context background, accurate human pose analysis is hard to achieve and rarely employed for the task of action recognition. In our approach, we take advantage of the current success of human detection and view invariability of local feature-based approach to design a pose-based action recognition system. We begin with a frame-wise human detection step to initialize the search space for human local parts, then integrate the detected parts into human kinematic structure using a tree structural graphical model. The final human articulation configuration is eventually used to infer the action class being performed based on each single part behavior and the overall structure variation. In our work, we also show that even with imprecise pose estimation, accurate action recognition can still be achieved based on informative clues from the overall pose part configuration. The promising results obtained from action recognition benchmark have proven our proposed framework is comparable to the existing state-of-the-art action recognition algorithms.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2010, 'Multi-view Gait Recognition Based on Motion Regression using Multilayer Perceptron', Proceedings: 2010 20th International Conference Pattern Recognition (ICPR 2010), International Conference Pattern Recognition, IEEE Computer Society, Istanbul Turkey, pp. 2186-2189.View/Download from: Publisher's site
It has been shown that gait is an efficient biometric feature for identifying a person at a distance. However, it is a challenging problem to obtain reliable gait feature when viewing angle changes because the body appearance can be different under the various viewing angles. In this paper, the problem above is formulated as a regression problem where a novel View Transformation Model (VTM) is constructed by adopting Multilayer Perceptron (MLP) as regression tool. It smoothly estimates gait feature under an unknown viewing angle based on motion information in a well selected Region of Interest (ROI) under other existing viewing angles. Thus, this proposal can normalize gait features under various viewing angles into a common viewing angle before gait similarity measurement is carried out. Encouraging experimental results have been obtained based on widely adopted benchmark database.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2010, 'Support Vector Regression for Multi-view Gait Recognition Based on Local Motion Feature Selection', 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, San Francisco CA, USA, pp. 974-981.View/Download from: Publisher's site
Gait is a well recognized biometric feature that is used to identify a human at a distance. However, in real environment, appearance changes of individuals due to viewing angle changes cause many difficulties for gait recognition. This paper re-formulates this problem as a regression problem. A novel solution is proposed to create a View Transformation Model (VTM) from the different point of view using Support Vector Regression (SVR). To facilitate the process of regression, a new method is proposed to seek local Region of Interest (ROI) under one viewing angle for predicting the corresponding motion information under another viewing angle. Thus, the well constructed VTM is able to transfer gait information under one viewing angle into another viewing angle. This proposal can achieve view-independent gait recognition. It normalizes gait features under various viewing angles into a common viewing angle before similarity measurement is carried out. The extensive experimental results based on widely adopted benchmark dataset demonstrate that the proposed algorithm can achieve significantly better performance than the existing methods in literature.
Li, Z, Zhang, J, Wu, Q & Geers, GD 2010, 'Feature Enhancement Using Gradient Salience on Thermal Image', Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), Digital Image Computing: Techniques and Applications, IEEE Computer Society, Sydney, Australia, pp. 556-562.View/Download from: Publisher's site
Feature enhancement in an image is to reinforce some exacted features so that it can be used for object classification and detection. As the thermal image is lack of texture and colorful information, the techniques for visual image feature enhancement is insufficient to apply to thermal images. In this paper, we propose a new gradient-based approach for feature enhancement in thermal image. We use the statistical properties of gradient of foreground object profiles, and formulate object features with gradient saliency. Empirical evaluation of the proposed approach shows significant performance improved on human contours which can be used for detection and classification.
Saesue, W, Chou, C & Zhang, J 2010, 'Cross-layer QoS-optimized EDCA adaptation for wireless video streaming', Proceedings of 2010 IEEE 17th International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Hong Kong, pp. 2925-2928.View/Download from: Publisher's site
In this paper, we propose an adaptive cross layer technique that optimally enhance the QoS of wireless video transmission in an IEEE 802.11e WLAN. The optimization takes into account the unequal error protection characteristics of video streaming, the IE
Thi, T, Zhang, J, Cheng, L, Wang, L & Satoh, S 2010, 'Human action recognition and localization in video using structured learning of local space-time features', 2010 Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance, Advanced Video and Signal Based Surveillance, IEEE, Boston, MA, pp. 204-211.View/Download from: Publisher's site
This paper presents a unified framework for human action classification and localization in video using structured learning of local space-time features. Each human action class is represented by a set of its own compact set of local patches. In our appr
Thi, T, Cheng, L, Zhang, J & Wang, L 2010, 'Implicit motion-shape model: A generic approach for action matching', Proceedings of 2010 IEEE 17th International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Hong Kong, pp. 1477-1480.View/Download from: Publisher's site
We develop a robust technique to find similar matches of human actions in video. Given a query video, Motion History Images (MHI) are constructed for consecutive keyframes. This is followed by dividing the MHI into local Motion-Shape regions, which allow
Wang, W, Zhang, J & Shen, C 2010, 'Improved human detection and classification in thermal images', Proceedings - International Conference on Image Processing, ICIP, IEEE International Conference on Image Processing, IEEE, Hong Kong, pp. 2313-2316.View/Download from: Publisher's site
We present a new method for detecting pedestrians in thermal images. The method is based on the Shape Context Descriptor (SCD) with the Adaboost cascade classifier framework. Compared with standard optical images, thermal imaging cameras offer a clear advantage for night-time video surveillance. It is robust on the light changes in day-time. Experiments show that shape context features with boosting classification provide a significant improvement on human detection in thermal images. In this work, we have also compared our proposed method with rectangle features on the public dataset of thermal imagery. Results show that shape context features are much better than the conventional rectangular features on this task.
Paisitkriangkrai, S, Mei, T, Zhang, J & Hua, X 2010, 'Scalable clip-based near-duplicate video detection with ordinal measure', CIVR 2010 - 2010 ACM International Conference on Image and Video Retrieval, ACM International Conference on Image and Video Retrieval, ACM-CIVR 2010, NA, Xi'an, pp. 121-128.View/Download from: Publisher's site
Detection of duplicate or near-duplicate videos on large-scale database plays an important role in video search. In this paper, we analyze the problem of near-duplicates detection and propose a practical and effective solution for real-time large-scale v
Thi, T, Cheng, L, Zhang, J, Wang, L & Satoh, S 2010, 'Weakly supervised action recognition using implicit shape models', Proceedings - International Conference on Pattern Recognition, 2010 20th International Conference on Pattern Recognition, ICPR 2010, IEEE, Istanbul, pp. 3517-3520.View/Download from: Publisher's site
In this paper, we present a robust framework for action recognition in video, that is able to perform competitively against the state-of-the-art methods, yet does not rely on sophisticated background subtraction preprocess to remove background features.
Khan, A, Zhang, J & Wang, Y 2010, 'Appearance-based re-identification of people in video', Proceedings - 2010 Digital Image Computing: Techniques and Applications, DICTA 2010, pp. 357-362.View/Download from: Publisher's site
This paper introduces the topic of appearance-based reidentification of people in video. This work is based on colour information of people's clothing. Most of the work described in the literature uses full body histogram. This paper evaluates the histogram method and describes ways of including spatial colour information. The paper proposes a colour-based appearance descriptor called Colour Context People Descriptor. All the methods are evaluated extensively. The results are reported in the experiments. It is concluded at the end that adding spatial colour information greatly improves the re-identification results. © 2010 IEEE.
Zhang, J, Paisitkriangkrai, S & Shen, C 2009, 'An overview of fast pedestrian detection: Feature selection and cascade framework of boosted features', Proceedings - 2009 IEEE International Conference on Multimedia and Expo, ICME 2009, pp. 1566-1567.View/Download from: Publisher's site
Efficiently and accurately detecting pedestrians plays a crucial role in many vision applications such as video surveillance, multimedia retrieval and smart car etc. In order to find the right feature for this task, we first present a comprehensive experimental study on pedestrian detection using state-of-the-art locally-extracted features. Building upon our findings, we propose a new, simpler pedestrian detecting framework based on the covariance features. We conduct feature selection and weak classifier training in the Euclidean space for faster computation. To this end, two machine learning algorithms have been designed: AdaBoost with weighted Fisher linear discriminant analysis (WLDA) based weak classifiers and Greedy Sparse Linear Discriminant Analysis (GSLDA). To further accelerate the detection, we employ a faster strategy, multiple cascade layers with heterogeneous features, to exploit the efficiency of the Haar-like features and the discriminative power of the covariance features. Experimental results shown on different datasets prove that the new pedestrian detection is not only comparable to the performance of the state-of-the-art pedestrian detectors but it also performs at a faster speed. ©2009 IEEE.
Paisitkriangkrai, S, Chunhua Shen & Zhang, J 2009, 'Efficiently training a better visual detector with sparse eigenvectors', 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), IEEE.View/Download from: Publisher's site
Kusakunniran, W, Wu, Q, Li, H & Zhang, J 2009, 'Automatic gait recognition using weighted binary pattern on video', Proceedings of Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, Advanced Video and Signal Based Surveillance, IEEE Computer Society, Genoa, Italy, pp. 49-54.
Human identification by recognizing the spontaneous gait recorded in real-world setting is a tough and not yet fully resolved problem in biometrics research. Several issues have contributed to the difficulties of this task. They include various poses, different clothes, moderate to large changes of normal walking manner due to carrying diverse goods when walking, and the uncertainty of the environments where the people are walking. In order to achieve a better gait recognition, this paper proposes a new method based on Weighted Binary Pattern (WBP). WBP first constructs binary pattern from a sequence of aligned silhouettes. Then, adaptive weighting technique is applied to discriminate significances of the bits in gait signatures. Being compared with most of existing methods in the literatures, this method can better deal with gait frequency, local spatial-temporal human pose features, and global body shape statistics. The proposed method is validated on several well known benchmark databases. The extensive and encouraging experimental results show that the proposed algorithm achieves high accuracy, but with low complexity and computational time.
Kusakunniran, W, Wu, Q, Li, H & Zhang, J 2009, 'Multiple Views Gait Recognition using View Transformation Model Based on Optimized Gait Energy Image', Proceedings of 2009 IEEE 12th International Conference on Computer Vision Workshops, IEEE International Conference on Computer Vision Workshops, IEEE, Kyoto, Japan, pp. 1058-1064.
Gait is one of well recognized biometrics that has been widely used for human identification. However, the current gait recognition might have difficulties due to viewing angle being changed. This is because the viewing angle under which the gait signature database was generated may not be the same as the viewing angle when the probe data are obtained. This paper proposes a new multi-view gait recognition approach which tackles the problems mentioned above. Being different from other approaches of same category, this new method creates a so called View Transformation Model (VTM) based on spatial-domain Gait Energy Image (GEI) by adopting Singular Value Decomposition (SVD) technique. To further improve the performance of the proposed VTM, Linear Discriminant Analysis (LDA) is used to optimize the obtained GEI feature vectors. When implementing SVD there are a few practical problems such as large matrix size and over-fitting. In this paper, reduced SVD is introduced to alleviate the effects caused by these problems. Using the generated VTM, the viewing angles of gallery gait data and probe gait data can be transformed into the same direction. Thus, gait signatures can be measured without difficulties. The extensive experiments show that the proposed algorithm can significantly improve the multiple view gait recognition performance when being compared to the similar methods in literature.
Kusakunniran, W, Li, H & Zhang, J 2009, 'A direct method to self-calibrate a surveillance camera by observing a walking pedestrian', 2009 Digital Image Computing: Techniques and Applications, Digital Image Computing Techniques and Applications, IEEE, Melbourne, VIC, pp. 250-255.View/Download from: Publisher's site
Recent efforts show that it is possible to calibrate a surveillance camera simply from observing a walking human. This procedure can be seen as a special application of the camera self-calibration technique. Several methods have been proposed along this
Wang, W, Shen, C, Zhang, J & Paisitkriangkrai, S 2009, 'A two-layer night-time vehicle detector', 2009 Digital Image Computing: Techniques and Applications, Digital Image Computing Techniques and Applications, IEEE, Melbourne, VIC, pp. 162-167.View/Download from: Publisher's site
We present a two-layer night time vehicle detector in this work. At the first layer, vehicle headlight detection [1, 2, 3] is applied to find areas (bounding boxes) where the possible pairs of headlights locate in the image, the Haar feature based AdaBoo
Paisitkriangkrai, S, Shen, C & Zhang, J 2009, 'Efficiently training a better visual detector with sparse eigenvectors', 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, FL, pp. 1129-1136.View/Download from: Publisher's site
Face detection plays an important role in many vision applications. Since Viola and Jones  proposed the first real-time AdaBoost based object detection system, much ef- fort has been spent on improving the boosting method. In this work, we first show
Thi, T, Lu, S, Zhang, J, Cheng, L & Wang, L 2009, 'Human body articulation for action recognition in video sequences', 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2009, IEEE International Conference on Video and Signal Based Surveillance (AVSS), IEEE, Genova, pp. 92-97.View/Download from: Publisher's site
This paper presents a new technique for action recognition in video using human body part-based approach, combining both local feature description of each body part, and global graphical model structure of the human action. The human body is divided into
Smith, D, Hanlen, L, Zhang, JA, Miniutti, D, Rodda, D & Gilbert, B 2009, 'Characterization of the Dynamic Narrowband On-Body to Off-Body Area Channel', 2009 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-8, IEEE International Conference on Communications (ICC 2009), IEEE, Dresden, GERMANY, pp. 4207-+.
Feng, D, Sikora, T, Siu, WC, Zhang, J, Guan, L & Dugelay, JL 2008, 'Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008: Preface', Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008.View/Download from: Publisher's site
Shen, C, Paisitkriangkrai, S & Zhang, J 2008, 'FACE DETECTION FROM FEW TRAINING EXAMPLES', 2008 15TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-5, 15th IEEE International Conference on Image Processing (ICIP 2008), IEEE, San Diego, CA, pp. 2764-2767.
Paisitkriangkra, S, Shen, C & Zhang, J 2008, 'Real-time Pedestrian Detection Using a Boosted Multi-layer Classifier', The Eighth International Workshop on Visual Surveillance, in conjunction with European Conference on Computer Vision (ECCV'08), 2008, IEEE International Workshop on Visual Surveillance, Institute of Electrical and Electronics Engineers, Marseille France.
Techniques for detecting pedestrian in still images have
attached considerable research interests due to its wide applications
such as video surveillance and intelligent transportation
systems. In this paper, we propose a novel simpler
pedestrian detector using state-of-the-art locally extracted
features, namely, covariance features. Covariance
features were originally proposed in [1, 2]. Unlike the work
in , where the feature selection and weak classifier training
are performed on the Riemannian manifold, we select
features and train weak classifiers in the Euclidean space
for faster computation. To this end, AdaBoost with weighted
Fisher linear discriminant analysis based weak classifiers
are adopted. Multiple layer boosting with heterogeneous
features is constructed to exploit the efficiency of the Haarlike
feature and the discriminative power of the covariance
feature simultaneously. Extensive experiments show that by
combining the Haar-like and covariance features, we speed
up the original covariance feature detector  by up to an
order of magnitude in processing time without compromising
the detection performance. For the first time, the proposed
work enables covariance feature based pedestrian
detection to work real-time.
Ong, C, Lu, S & Zhang, J 2008, 'An approach for enhancing the results of detecting foreground objects and their moving shadows in surveillance video', Digital Image Computing: Techniques and Applications, Digital Image Computing Techniques and Applications, IEEE, Canberra, ACT, pp. 242-249.View/Download from: Publisher's site
Automated surveillance system is becoming increasingly important especially in the fields of computer vision and video processing. This paper describes a novel approach for improving the results of detecting foreground objects and their shadows in indoor
Paisitkriangkrai, S, Shen, C & Zhang, J 2008, 'An experimental study on pedestrian classification using local features', Proceedings - IEEE International Symposium on Circuits and Systems, IEEE International Symposium on Circuits and Systems, IEEE, Seattle, WA, pp. 2741-2744.View/Download from: Publisher's site
This paper presents an experimental study on pedestrian detection using state-of-the-art local feature extraction and support vector machine (SVM) classifiers. The performance of pedestrian detection using region covariance, histogram of oriented gradien
Yu, J, Zhang, J, Sun, W, Yuan, L & Peng, G 2008, 'Crosstalk analysis of a smart sensor unit based on FBG and FOWLI', Proceedings of SPIE - The International Society for Optical Engineering, 19th International Conference on Optical Fibre Sensors, NA, Perth, WA, pp. 0-0.View/Download from: Publisher's site
The effective optical path method is proposed to analyze the measurement crosstalk of a smart fiber optic sensor unit based on multiplexing fiber Bragg gratings (FBG) and fiber optical white light interferometry (FOWLI). According the analysis, the cross
Luo, C, Cai, X & Zhang, J 2008, 'GATE: A novel robust object tracking method using the particle filtering and level set method', Digital Image Computing: Techniques and Applications, Digital Image Computing Techniques and Applications, IEEE, Canberra, ACT, pp. 378-385.View/Download from: Publisher's site
This paper presents a novel algorithm for robust object tracking based on the particle filtering method employed in recursive Bayesian estimation and image segmentation and optimisation techniques employed in active contour models and level set methods.
Saesue, W, Zhang, J & Chun, T 2008, 'Hybrid frame-recursive block-based distortion estimation model for wireless video transmission', Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008, IEEE International Workshop on Multimedia Signal Processing, IEEE, Cairns, QLD, pp. 774-779.View/Download from: Publisher's site
In wireless environments, video quality can be severely degraded due to channel errors. Improving error robustness towards the impact of packet loss in error-prone network is considered as a critical concern in wireless video networking research. Data pa
Luo, C, Cai, X & Zhang, J 2008, 'Robust object tracking using the particle filtering and level set methods: A comparative experiment', Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008, IEEE International Workshop on Multimedia Signal Processing, IEEE, Cairns, QLD, pp. 359-364.View/Download from: Publisher's site
Robust visual tracking has become an important topic of research in computer vision. A novel method for robust object tracking, GATE , improves object tracking in complex environments using the particle filtering and the level set-based active contou
Thi, T, Lu, S & Zhang, J 2008, 'Self-calibration of traffic surveillance camera using motion tracking', Proceedings of the 11th International IEEE Conference on Intelligent Transportation Systems, IEEE Conference on Intelligent Transportation Systems, IEEE, Beijing, China, pp. 304-309.View/Download from: Publisher's site
A statistical and computer vision approach using tracked moving vehicle shapes for auto-calibrating traffic surveillance cameras is presented. Vanishing point of the traffic direction is picked up from Linear Regression of all tracked vehicle points. Pre
Thi, T, Robert, K, Lu, S & Zhang, J 2008, 'Vehicle classification at nighttime using eigenspaces and support vector machine', 2008 Congress on Image and Signal Processing, International Congress on Image and Signal Processing (CISP), IEEE, Sanya, Hainan, pp. 422-426.View/Download from: Publisher's site
A robust framework to classify vehicles in nighttime traffic using vehicle eigenspaces and support vector machine is presented. In this paper, a systematic approach has been proposed and implemented to classify vehicles from roadside camera video sequenc
Xu, J, Ye, G & Zhang, H 2007, 'Long-term trajectory extraction for moving vehicles', 2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 9th IEEE Workshop on Multimedia Signal Processing, IEEE, Chania, GREECE, pp. 223-226.View/Download from: Publisher's site
Yang, J & Zhang, J 2007, 'Offline swimmer cap tracking using trajectory interpolation', Proceedings - Digital Image Computing Techniques and Applications: 9th Biennial Conference of the Australian Pattern Recognition Society, DICTA 2007, pp. 579-585.View/Download from: Publisher's site
In this paper, we present a preliminary attempt to solve the difficult problem of tracking swimmer cap in swimming videos to facilitate swimmer performance assessment. Due to the great challenges posed by moving camera and severe figure-background occlusions, an offline approach based on trajectory interpolation is adopted. Firstly, each frame is searched for hypothesized positions of the target cap using mean shift mode seeking. Secondly, most outliers due to ambiguities and noise are eliminated using lane constraints, and the hypothesis in the space-time volume are clustered into trajectory segments based on a spatial and temporal closeness criteria. Finally, cubic spline trajectory interpolation is used to infer the target cap position in occluded frames. Experiments show that satisfying tracking results are achieved by our approach. © 2007 IEEE.
Lu, S, Zhang, J & Feng, D 2007, 'An efficient method for detecting ghost and left objects in surveillance video', 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2007 Proceedings, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2007, NA, London, pp. 540-545.View/Download from: Publisher's site
This paper proposes an efficient method for detecting ghost and left objects in surveillance video, which, if not identified, may lead to errors or wasted computation in background modeling and object tracking in surveillance systems. This method contain
Paisitkriangkrai, S, Shen, C & Zhang, J 2007, 'An experimental evaluation of local features for pedestrian classification', Proceedings - Digital Image Computing Techniques and Applications: 9th Biennial Conference of the Australian Pattern Recognition Society, DICTA 2007, Australian Pattern Recognition Society (APRS), NA, Glenelg, SA, pp. 53-60.View/Download from: Publisher's site
The ability to detect pedestrians is a first important step in many computer vision applications such as video surveillance. This paper presents an experimental study on pedestrian detection using state-of-the-art local feature extraction and support vec
Luo, L, Zhang, J & Shi, Z 2007, 'Novel Block-Interleaved Multi-Code CDMA System for UWB Communications', Ultra-Wideband, 2007. ICUWB 2007. IEEE International Conference on, IEEE, pp. 648-652.
Zhang, J, Luo, L & Shi, Z 2007, 'Quadrature OFDMA systems', Global Telecommunications Conference, 2007. GLOBECOM'07. IEEE, IEEE, pp. 3734-3739.
Lu, S, Zhang, J & Feng, D 2006, 'A knowledge-based approach for detecting unattended packages in surveillance video', Proceedings - IEEE International Conference on Video and Signal Based Surveillance 2006, AVSS 2006, IEEE International Conference on Video and Signal Based Surveillance 2006, AVSS 2006, NA, Sydney, NSW, pp. 0-0.View/Download from: Publisher's site
This paper describes a novel approach for detecting unattended packages in surveillance video. Unlike the traditional approach to just detecting stationary objects in monitored scenes, our approach detects unattended packages based on accumulated knowled
Chen, J, Shen, J, Zhang, J & Wangsa, K 2006, 'A novel multimedia database system for efficient image/video retrieval based on hybrid-tree structure', Proceedings of the 2006 International Conference on Machine Learning and Cybernetics, 2006 International Conference on Machine Learning and Cybernetics, NA, Dalian, pp. 4353-4358.View/Download from: Publisher's site
With recent advances in computer vision, image processing and analysis, a retrieval process based on visual content has became a key component in achieving high efficiency image query for large multimedia databases. In this paper, we propose and develop
Mathew, R, Yu, Z & Zhang, J 2006, 'Detecting new stable objects in surveillance video', 2005 IEEE 7th Workshop on Multimedia Signal Processing, 2005 IEEE 7th Workshop on Multimedia Signal Processing, MMSP 2005, NA, Shanghai, pp. 0-0.View/Download from: Publisher's site
We describe a novel method to detect new stable objects in video. This includes detecting new objects that appear in a scene and remain stationary for a period of time. Examples include detecting a dropped bag or a parked car. Our method utilizes the sta
Chen, Y, Zhang, J & Jayalath, ADS 2006, 'Multiband-ofdm uwb vs ieee802. 11n: system level design considerations', Vehicular Technology Conference, 2006. VTC 2006-Spring. IEEE 63rd, IEEE, pp. 1972-1976.
Lu, S, Zhang, J & Feng, D 2005, 'Classification of moving humans using eigen-features and support vector machines', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11th International Conference on Computer Analysis of Images and Patterns, CAIP 2005, NA, Versailles, pp. 522-529.
This paper describes a method of categorizing the moving objects using eigen-features and support vector machines. Eigen-features, generally used in face recognition and static image classification, are applied to classify the moving objects detected fro
Yu, Z & Zhang, J 2004, 'Video deblocking with fine-grained scalable complexity for embedded mobile computing', 2004 7th International Conference on Signal Processing Proceedings, ICSP, pp. 1175-1180.
This paper addresses the need of reducing blocking artifacts after video decompression in embedded mobile computing devices such as mobile phones and PDAs with limited computational capability, where low bit rate coding is usually employed and video deblocking is highly desirable. A novel video deblocking method has been developed which consists of two steps: deblocking mode decision and deblock filtering. Blocking artifacts are detected by examining the value of several adjacent pixels. Depending on the degree of blocking artifacts, a filter mode and a corresponding filtering center are determined for a region of pixels. The deblocking filter is chosen from five different types of candidates including variable center filters and non-symmetric filters. Extensive experiments show that the proposed algorithm has achieved both lower computational complexity and better visual quality as compared to MPEG-4 VM. Furthermore, targeting the need of embedded mobile computing platforms, a scheme is developed to dynamically scale the complexity (and hence power consumption) of the deblocking algorithm with graceful visual quality degradation.
Yu, ZH & Zhang, J 2004, 'Video deblocking with fine-grained scalable complexity for embedded mobile computing', 2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3, 7th International Conference on Signal Processing, PUBLISHING HOUSE ELECTRONICS INDUSTRY, Beijing, PEOPLES R CHINA, pp. 1173-1178.
Zhang, J, Kennedy, RA & Abhayapala, TD 2004, 'Cramer-Rao lower bounds for the time delay estimation of UWB signals', Communications, 2004 IEEE International Conference on, IEEE, pp. 3424-3428.
Zhang, J, Arnold, JF, Frater, MR & Pickering, MR 1997, 'Video error concealment using decoder motion vector estimation', IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2, IEEE Region 10 Annual Conference on Speech and Image Technologies for Computing and Telecommunications (IEEE TENCON 97), IEEE, QUEENSLAND UNIV TECHNOL, BRISBANE, AUSTRALIA, pp. 777-780.
UTS Distinguished Visiting Scholars (DVS) Scheme (international)
In the last 7 years, A/Prof Jian Zhang has succeeded in the competitive selection process of the UTS DVS scheme to invite four world-renowned Professors or distinct researchers to UTS for a short period, including:
- 2012: Professor Dan Schonfeld, IEEE Fellow, from University of Illinois at Chicago. Previous Editor-in-Chief of IEEE Transactions on Circuits & Systems for Video Technology
- 2013: Dr. Zhengyou Zhang, IEEE Fellow and ACM Fellow, from Microsoft Research US. He is also a recipient of the 2013 Helmholtz Test of Time Award which was awarded to him by the International Conference on Computer Vision. He is a world-class researcher in computer vision
- 2017: Professor Jiebo Luo, IEEE Fellow, from University of Rochester. He was with Kodak Research for more than 15 years and is now a leading researcher in social multimedia research.
- 2018: Professor Ming Lin, IEEE Fellow, ACM Fellow and an American Computer Scientist from the Department of Computer Science at the University of Maryland, College Park, Chair Professor and a world-class researcher in computer graphics.
UTS Key Technology Partner (KTP) Scheme (international)
In 2015, A/Prof Jian Zhang succeeded in the competitive selection process of the UTS KTP scheme to invite two KTP visitors to UTS:
- Professor Yao Lu from School of Computer Science and Technology, Beijing Institute of Technology (BIT)
- Professor Ping An from School of Communication & Info. Engineering at Shanghai University.
Both professors brought their research experiences to UTS. It is with their help that we have recruited four dual-PhD students from BIT and SHU.
Invited KTP visitors through BIT/UTS and SHU/UTS schemes (international)
As a research leader, Jian was invited to be a KTP visitor at the Beijing Institute of Technology and Shanghai University in January 2015 and June 2015 respectively. This was in recognition of my collaboration with BIT and SHU.