Associate Professor Jian Zhang joined the University of Technology Sydney (UTS) in 2011. Now he is with the School of Computing and Communication. Before that he was a Principal Researcher with Data61 (formerly NICTA) and a Conjoint Associate Professor in School of Computer Science & Engineering, the University of New South Wales.
A/Prof Zhang was with the Advanced Analytics Institute (AAi) and recently he is the research lab leader of Multimedia and Data Analytics in the Big Data Technologies Centre (GBDTC) at UTS, Sydney.
A/Prof Zhang has published over 120 papers in top journals and refereed conference proceedings. He has actively engaged with research collaboration with industry labs, supervised PhD research students, as well as developed new multimedia analytics courses. Since 2011, as a leading chief investigator in UTS, he has led more than 10 research projects with industry labs, whose total value is over million A$. The industry labs include Microsoft Research, Nokia Research Centre and Toshiba Tec, Huawei Technologies in US, Finland, Japan, Australia and China. Apart from paper publications and book chapters from his research output, he was co-author of more than ten patents filed in US, UK, Japan and Australia including six issued US patents and one China patent. You may find detailed information, including scholarship and postdoc opportunities on his Personal Webpage.
Research Interest Areas:
· Image Processing & Computer Vision in Video Surveillance
· Pattern Recognition & Data Analytics
· Multimedia and Social Media Signal Processing
· Large Scale Image and Video Content Analysis
· Multimedia Information Retrieval
Prospective students may find information of scholarships funded by my research projects, the university and Australian governments here.
A/Prof Jian Zhang is a senior member of the IEEE and associated with its Circuits & Systems and Signal Processing Societies, He is the member of IEEE Signal Processing Society Technical Directions Board. He has been a conference keynote speaker and actively involved as general and technical program chair of IEEE conferences, associate editor in the IEEE Transactions Editorial board of Multimedia and Circuits and Systems for Video Technology since 2006. A/Prof Jian Zhang’s key contributions and services are as follows:
1. Chairs of International Conference and Professional Activities (selected key positions)
- Leading General Co-chair of 2012 IEEE International Conference on Multimedia and Expo in Melbourne (ICME12)
- Technical Co-Chairs of 2008 IEEE Multimedia Signal Processing Workshop (MMSP08)
- General Co-Chair of 2010 Digital Image Computing: Techniques and Applications (DICTA2010)
- Technical Program Co-chair of 2014 IEEE Intel. Conf. on Video Communication and Image Processing (VCIP14)
- General Co-chair of the 2019 IEEE Intel. Conf. on Visual Communications and Image Processing (VCIP19)
- Technical Program Co-chair of 2020 IEEE International Conference on Multimedia and Expo (ICME20) in London
2. IEEE Journal editorial boards
- Associate Editor of IEEE Transactions on Multimedia since 2017 (top 25% JCR Q1 rank),
- Associate Editor of IEEE Transactions on Circuits & Systems for Video Technology (top 25% JCR Q1 rank) 2006 – 2015
- Guest Editor of Computer Vision and Image Understanding for Special Issue (March 2016)
In 2019, A/prof Jian Zhang was elected as a member of the IEEE SPS Technical Directions Board. This gave him the honour of leading the IEEE Signal Processing Society. Also, he is serving and have served for the Technical Committees of IEEE SPS and IEEE CASS, for the groups on Multimedia Signal Processing, Mulyimedia Systems and Visual Signal Processing & Communications.
Can supervise: YES
I have two full PhD scholarships to fund high profile PhD candidates in the following areas
- Image processing & pattern recognition
- Data analytics and multimedia information retrieval
- Social multimedia signal processing
- 3D Computer vision
- Surveillance video content analysis
- Multimedia and new media Analytics
For international students, in addition to living expense, the scholarship also include tuition fee waiver scholarship. Please contact me for the details
Funded Research Projects:
Funded projects for which I am the leading chief investigator,
Robust Automated Video Surveillance & Monitoring in Dynamic Scenes”, National ICT Australia (NICTA) - Defence Science and Technology Organization (DSTO) joint project award, $59,000, (2008-2011).
“Visual information enhanced online video search engines”, Microsoft Research Asia (MSRA) funded project, $53,000, (2011)
“Advancing 3D deformable surface reconstruction and tracking through RGB-D cameras” Microsoft Research funded project, $110,000 (2012-2014)
“Human detection in local residential area”, Industry Lab funded research project, $70,000 (2012-2013)
“Virtual Clothing fitting on Mobile”, Industry Lab funded project, $90,000 (2013-2014)
"3D image Content Processing", Industry Lab funded project, $70,000 (2013-2016)
"Safety video surveillance in a mining environment", Industry and Australian Government funded Project (Innovation Connection), $185,000 (2015-2017)
"Human Action Recognition in Shopping Area", Industry Lab funded project, $330,000 (2015-2016)
"Data Analytics and Computer Vision", Industry Lab funded project, $270,000 (2017-2020)
"Automated Sheep Counting in the Live Export Industry", Industry funded project, $270,000 (2018-2020)
"Trusted Fish Provenance and Quality Tracking System”, Industry funded project, $660,000 (2019-2021)
"Deep Learning Based Fish Species Recogniton, Size Estimation, and Fresshness Measurement, Industry funded project, $150,000 (2019 - 2022)
My Research Students (as the Principal Supervisor)
- Muming Zhao – PhD student (Submitted her PhD Thesis in July 2019)
- Zhibin Li – PhD student
- Yongshen Gong – PhD student
- Lina Li – PhD student
- Huaxi Huang – PhD student
- Lu Zhang – PhD student
- Anan Du – PhD student
- Lingyiang Yao – PhD Student
- Jialiang Shen – Master student (Submitted her MSc Thesis in July 2019)
Five Completed UTS PhD students in my role as a principal supervisor since 2012:
- Dr. Yucheng Wang – PhD degree (completed in August 2017). He is currently working with Baidu – the search engine company in China.
- Dr. Shangrong Huang – PhD degree (completed in August 2017). He is now a lecturer at the School of Computer Science in Hunan University, China.
- Dr. Hao Cheng – PhD degree (completed in August 2018). He is now with Fairfax Media in R/D department, Sydney, Australia
- Dr. Yazhou Yao – PhD degree (completed in May 2019). He is now a professor at the school of Computer Science and Engineering, Nanjing University of Science and Technology, China.
- Dr. Xiaoshui Huang – PhD degree (completed in May 2019). He is now a Post-doc research fellow with the School of Electrical & Data Engineering, University of Technology Sydney, Australia
- Dr. Junjie Zhang – PhD degree (Final thesis submitted in July 2019). He is now a Post-doc research fellow with the School of Computer Science, The University of Adelaide, Australia
Three completed PhD students as a principal supervisor in UNSW/Data61:
- Dr. Worapan Kusakunniran, PhD degree (completed in 2013). He is now an associate professor in the Faculty of Information and Communication Technology at Mahidol University, Thailand
- Dr. Tuan Hue Thi – PhD degree (completed in 2012). He was a researcher engineer with Canon Information Systems Research Australia (CiSRA) and later joined Placemeter, a US start-up.
- Dr. Sakrapee (Paul) Paisitkriangkrai – PhD degree (completed in 2012, in my role as a principal supervisor) he was with the department of computer science, The University of Adelaide for 5 years. He is currently working with Australian Taxation Office as a data miner.
Two completed UTS PhD students as a joint supervisor in Sydney University:
- Dr. Shijun Lu – PhD degree (completed in 2012). He joined the Australian Defence Force as a software engineer in 2013.
- Dr. Shiyang Lu – PhD degree (completed in 2014). He is currently working with the Commonwealth Bank of Australia as a data scientist.
48450 Real-time Operating Systems
31338 Network Servers
32520 Systems Administration
49238 Telecommunication Networks Management
Cheng, H, Zhang, J, Wu, Q & An, P 2019, 'A computational model for stereoscopic visual saliency prediction', IEEE Transactions on Multimedia, vol. 21, no. 3, pp. 678-689.View/Download from: UTS OPUS or Publisher's site
© 2018 IEEE. Depth information plays an important role in human vision as it provides additional cues that distinguish objects from their backgrounds. This paper explores depth information for analyzing stereoscopic saliency and presents a computational model that predicts stereoscopic visual saliency based on three aspects of human vision: 1) the pop-out effect; 2) comfort zones; and 3) background effects. Through an analysis of these three phenomena, we find that most of the stereoscopic saliency region can be explained. Our model comprises three modules, each describing one aspect of saliency distribution, and a control function that can be used to adjust the three models independently. The relationship between the three models is not mutually exclusive. One, two, or three phenomena may appear in one image. Therefore, to accurately determine which phenomena the image conforms to, we have devised a selection strategy that chooses the appropriate combination of models based on the content of the image. Our approach is implemented within a framework based on the multifeature analysis. The framework considers surrounding regions, color/depth contrast, and points of interest. The selection strategy can improve the performance of the framework. A series of experiments on two recent eye-tracking datasets shows that our proposed method outperforms several state-of-the-art saliency models.
Huang, X, Fan, L, Wu, Q, Zhang, J & Yuan, C 2019, 'Fast registration for cross-source point clouds by using weak regional affinity and pixel-wise refinement', Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2019-July, pp. 1552-1557.View/Download from: UTS OPUS or Publisher's site
© 2019 IEEE. Many types of 3D acquisition sensors have emerged in recent years and point cloud has been widely used in many areas. Accurate and fast registration of cross-source 3D point clouds from different sensors is an emerged research problem in computer vision. This problem is extremely challenging because cross-source point clouds contain a mixture of various variances, such as density, partial overlap, large noise and outliers, viewpoint changing. In this paper, an algorithm is proposed to align cross-source point clouds with both high accuracy and high efficiency. There are two main contributions: firstly, two components, the weak region affinity and pixel-wise refinement, are proposed to maintain the global and local information of 3D point clouds. Then, these two components are integrated into an iterative tensor-based registration algorithm to solve the cross-source point cloud registration problem. We conduct experiments on a synthetic cross-source benchmark dataset and real cross-source datasets. Comparison with six state-of-the-art methods, the proposed method obtains both higher efficiency and accuracy.
Huang, Y, Xu, J, Wu, Q, Zheng, Z, Zhang, Z & Zhang, J 2019, 'Multi-pseudo Regularized Label for Generated Data in Person Re-Identification.', IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1391-1403.View/Download from: UTS OPUS or Publisher's site
Sufficient training data normally is required to train deeply learned models. However, due to the expensive manual process for labelling large number of images (i.e., annotation), the amount of available training data (i.e., real data) is always limited. To produce more data for training a deep network, Generative Adversarial Network (GAN) can be used to generate artificial sample data (i.e., generated data). However, the generated data usually does not have annotation labels. To solve this problem, in this paper, we propose a virtual label called Multi-pseudo Regularized Label (MpRL) and assign it to the generated data. With MpRL, the generated data will be used as the supplementary of real training data to train a deep neural network in a semi-supervised learning fashion. To build the corresponding relationship between the real data and generated data, MpRL assigns each generated data a proper virtual label which reflects the likelihood of the affiliation of the generated data to predefined training classes in the real data domain. Unlike the traditional label which usually is a single integral number, the virtual label proposed in this work is a set of weight-based values each individual of which is a number in (0,1] called multi-pseudo label and reflects the degree of relation between each generated data to every pre-defined class of real data. A comprehensive evaluation is carried out by adopting two state-of-the-art convolutional neural networks (CNNs) in our experiments to verify the effectiveness of MpRL. Experiments demonstrate that by assigning MpRL to generated data, we can further improve the person re-ID performance on five re-ID datasets, i.e., Market-1501, DukeMTMC-reID, CUHK03, VIPeR, and CUHK01. The proposed method obtains +6.29%, +6.30%, +5.58%, +5.84%, and +3.48% improvements in rank-1 accuracy over a strong CNN baseline on the five datasets respectively, and outperforms state-of-the-art methods.
Shen, W, Wu, Y, Yuan, J, Duan, L, Zhang, J & Jia, Y 2019, 'Robust Distracter-Resistive Tracker via Learning a Multi-Component Discriminative Dictionary', IEEE Transactions on Circuits and Systems for Video Technology.View/Download from: UTS OPUS or Publisher's site
IEEE Discriminative dictionary learning (DDL) provides an appealing paradigm for appearance modeling in visual tracking. However, most existing DDL based trackers cannot handle drastic appearance changes, especially for scenarios with background cluster and/or similar object interference. One reason is that they often suffer from the loss of subtle visual information which is critical to distinguish an object from distracters. In this paper, we explore the use of deep features extracted from the Convolutional Neural Networks (CNNs) to improve the object representation and propose a robust distracter-resistive tracker via learning a multi-component discriminative dictionary. The proposed method exploits both the intra-class and the interclass visual information to learn shared atoms and the classspecific atoms. By imposing several constraints into the objective function, the learned dictionary is reconstructive, compressive and discriminative, thus can better distinguish an object from the background. In addition, our convolutional features (deep features extracted from CNNs) have structural information for object localization and balance the discriminative power and semantic information of the object. Tracking is carried out within a Bayesian inference framework where a joint decision measure is used to construct the observation model. To alleviate the drift problem, the reliable tracking results obtained online are accumulated to update the dictionary. Both the qualitative and quantitative results on the CVPR2013 benchmark, the VOT2015 dataset and the SPOT dataset demonstrate that our tracker achieves better performance over the state-of-the-art approaches.
Wang, Y, Shuai, Y, Zhu, Y, Zhang, J & An, P 2019, 'Jointly learning perceptually heterogeneous features for blind 3D video quality assessment', Neurocomputing, vol. 332, pp. 298-304.View/Download from: UTS OPUS or Publisher's site
© 2018 Elsevier B.V. 3D videos quality assessment (3D-VQA) is essential to various 3D video processing applications. However, it has not been well investigated on how to make use of perceptual multi-channel video information to improve 3D-VQA under different distortion categories and degrees, especially under asymmetrical distortions. In the paper, we propose a new blind 3D-VQA metric by jointly learning perceptually heterogeneous features. Firstly, a binocular spatio-temporal internal generative mechanism (BST-IGM) is proposed to decompose the views of 3D video into multi-channel videos. Then, we extract perceptually heterogeneous features by proposed multi-channel natural video statistics (MNVS) model, which are characterized 3D video information. Furthermore, a robust AdaBoosting Radial Basis Function (RBF) neural network is utilized to map the features to the overall quality of 3D video. On two benchmark databases, the extensive evaluations demonstrate that the proposed algorithm significantly outperforms several state-of-the-art quality metrics in term of prediction accuracy and robustness.
Yang, D, Zou, YX, Zhang, J & Li, G 2019, 'C-RPNs: Promoting object detection in real world via a cascade structure of Region Proposal Networks', Neurocomputing.View/Download from: UTS OPUS or Publisher's site
© 2019 Recently, significant progresses have been made in object detection on common benchmarks (i.e., Pascal VOC). However, object detection in real world is still challenging due to the serious data imbalance. Images in real world are dominated by easy samples like the wide range of background and some easily recognizable objects, for example. Although two-stage detectors like Faster R-CNN achieved big successes in object detection due to the strategy of extracting region proposals by Region Proposal Network, they show their poor adaption in real-world object detection as a result of without considering mining hard samples during extracting region proposals. To address this issue, we propose a Cascade framework of Region Proposal Networks, referred to as C-RPNs, which adopts multiple stages to mine hard samples while extracting region proposals and learn stronger classifiers. Meanwhile, a feature chain and a score chain are proposed to help learning more discriminative representations for proposals. Moreover, a loss function of cascade stages is designed to train cascade classifiers through backpropagation. Our proposed method has been evaluated on Pascal VOC and several challenging datasets like BSBDV 2017, CityPersons, etc. Our method achieves competitive results compared with the current state-of-the-arts and attains all-sided improvements in error analysis, validating its efficacy for detection in real world.
Yao, L, Kusakunniran, W, Wu, Q, Zhang, J, Tang, Z & Yang, W 2019, 'Robust gait recognition using hybrid descriptors based on Skeleton Gait Energy Image', Pattern Recognition Letters.View/Download from: UTS OPUS or Publisher's site
© 2019 Gait features have been widely applied in human identification. The commonly-used representations for gait recognition can be roughly classified into two categories: model-free features and model-based features. However, due to the view variances and clothes changes, model-free features are sensitive to the appearance changes. For model-based features, there is great difficulty in extracting the underlying models from gait sequences. Based on the confidence maps and the part affinity fields produced by a two-branch multi-stage CNN network, a new model-based representation, Skeleton Gait Energy Image (SGEI), has been proposed in this paper. Another contribution is that a hybrid representation has been produced, which uses SGEI to remedy the deficiency of model-free features, Gait Energy Image (GEI) for instance. The experimental performances indicate that our proposed methods are more robust to the cloth changes, and contribute to increasing the robustness of gait recognition in the unconstrained environments with view variances and clothes changes.
Yao, Y, Shen, F, Zhang, J, Liu, L, Tang, Z & Shao, L 2019, 'Extracting Multiple Visual Senses for Web Learning', IEEE Transactions on Multimedia, vol. 21, no. 1, pp. 184-196.View/Download from: UTS OPUS or Publisher's site
© 1999-2012 IEEE. Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time consuming and labor intensive. To reduce the dependence on manually labeled data, there have been increasing research efforts on learning visual classifiers by directly exploiting web images. One issue that limits their performance is the problem of polysemy. Existing unsupervised approaches attempt to reduce the influence of visual polysemy by filtering out irrelevant images, but do not directly address polysemy. To this end, in this paper, we present a multimodal framework that solves the problem of polysemy by allowing sense-specific diversity in search results. Specifically, we first discover a list of possible semantic senses from untagged corpora to retrieve sense-specific images. Then, we merge visual similar semantic senses and prune noise by using the retrieved images. Finally, we train one visual classifier for each selected semantic sense and use the learned sense-specific classifiers to distinguish multiple visual senses. Extensive experiments on classifying images into sense-specific categories and reranking search results demonstrate the superiority of our proposed approach.
Yao, Y, Shen, F, Zhang, J, Liu, L, Tang, Z & Shao, L 2019, 'Extracting Privileged Information for Enhancing Classifier Learning', IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 436-450.View/Download from: UTS OPUS or Publisher's site
© 1992-2012 IEEE. The accuracy of data-driven learning approaches is often unsatisfactory when the training data is inadequate either in quantity or quality. Manually labeled privileged information (PI), e.g., attributes, tags or properties, is usually incorporated to improve classifier learning. However, the process of manually labeling is time-consuming and labor-intensive. Moreover, due to the limitations of personal knowledge, manually labeled PI may not be rich enough. To address these issues, we propose to enhance classifier learning by exploring PI from untagged corpora, which can effectively eliminate the dependency on manually labeled data and obtain much richer PI. In detail, we treat each selected PI as a subcategory and learn one classifier for each subcategory independently. The classifiers for all subcategories are integrated together to form a more powerful category classifier. Particularly, we propose a novel instance-level multi-instance learning model to simultaneously select a subset of training images from each subcategory and learn the optimal SVM classifiers based on the selected images. Extensive experiments on four benchmark data sets demonstrate the superiority of our proposed approach.
Zhang, J, Wu, Q, Zhang, J, Shen, C, Lu, J & Wu, Q 2019, 'Heritage image annotation via collective knowledge', Pattern Recognition, vol. 93, pp. 204-214.View/Download from: UTS OPUS or Publisher's site
© 2019 Elsevier Ltd The automatic image annotation can provide semantic illustrations to understand image contents, and builds a foundation to develop algorithms that can search images within a large database. However, most current methods focus on solving the annotation problem by modeling the image visual content and tag semantic information, which overlooks the additional information, such as scene descriptions and locations. Moreover, the majority of current annotation datasets are visually consistent and only annotated by common visual objects and attributes, which makes the classic methods vulnerable to handle the more diverse image annotation. To address above issues, we propose to annotate images via collective knowledge, that is, we uncover relationships between the image and its neighbors by measuring similarities among metadata and conduct the metric learning to obtain the representations of image contents, we also generate semantic representations for images given collective semantic information from their neighbors. Two representations from different paradigms are embedded together to train an annotation model. We ground our model on the heritage image collection we collected from the library online open data. Annotations on the heritage image collection are not limited to common visual objects, and are highly relevant to historical events, and the diversity of the heritage image content is much larger than the current datasets, which makes it more suitable for this task. Comprehensive experimental results on the benchmark dataset indicate that the proposed model achieves the best performance compared to baselines and state-of-the-art methods.
Huang, X, Zhang, J, Wu, Q, Fan, L & Yuan, C 2018, 'A coarse-to-fine algorithm for matching and registration in 3D cross-source point clouds', IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2965-2977.View/Download from: UTS OPUS or Publisher's site
IEEE We propose an efficient method to deal with the matching and registration problem found in cross-source point clouds captured by different types of sensors. This task is especially challenging due to the presence of density variation, scale difference, a large proportion of noise and outliers, missing data and viewpoint variation. The proposed method has two stages: in the coarse matching stage, we use the ESF descriptor to select potential K regions from the candidate point clouds for the target. In the fine stage, we propose a scale embedded generative GMM registration method to refine the results from the coarse matching stage. Following the fine stage, both the best region and accurate camera pose relationships between the candidates and target are found. We conduct experiments in which we apply the method to two applications: one is 3D object detection and localization in street-view ourdoor (LiDAR/VSFM) cross-source point clouds, and the other is 3D scene matching and registration in indoor (KinectFusion/VSFM) cross-source point clouds. The experiment results show that the proposed method performs well when compared with the existing methods. It also shows that the proposed method is robust under various sensing techniques such as LiDAR, Kinect and RGB camera.
Kusakunniran, W, Wu, Q, Ritthipravat, P & Zhang, J 2018, 'Hard exudates segmentation based on learned initial seeds and iterative graph cut', COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, vol. 158, pp. 173-183.View/Download from: UTS OPUS or Publisher's site
© 2017 Elsevier B.V. This paper proposes a novel image co-segmentation method, which aims to segment the common objects in a group of images. The proposed method takes advantages of the reliability of simple images and successfully improves the performance. The images are first ranked by the complexities based on their saliency maps. Then, the simple images, in which objects are common and easy to be segmented, are selected and processed to obtain their segmentation results, these segmentation results are taken as the samples of the targeted objects. Finally, the remaining complicated images are segmented with the guidance of the samples. The experiments on the iCoseg dataset demonstrate the outperformance and robustness of the proposed method.
Wang, Y, Zhang, J, Liu, Z, Wu, Q, Zhang, Z & Jia, Y 2018, 'Depth Super-Resolution on RGB-D Video Sequences with Large Displacement 3D Motion', IEEE Transactions on Image Processing, vol. 27, no. 7, pp. 3571-3585.View/Download from: UTS OPUS or Publisher's site
© 1992-2012 IEEE. To enhance the resolution and accuracy of depth data, some video-based depth super-resolution methods have been proposed, which utilizes its neighboring depth images in the temporal domain. They often consist of two main stages: motion compensation of temporally neighboring depth images and fusion of compensated depth images. However, large displacement 3D motion often leads to compensation error, and the compensation error is further introduced into the fusion. A video-based depth super-resolution method with novel motion compensation and fusion approaches is proposed in this paper. We claim that 3D nearest neighboring field (NNF) is a better choice than using positions with true motion displacement for depth enhancements. To handle large displacement 3D motion, the compensation stage utilized 3D NNF instead of true motion used in the previous methods. Next, the fusion approach is modeled as a regression problem to predict the super-resolution result efficiently for each depth image by using its compensated depth images. A new deep convolutional neural network architecture is designed for fusion, which is able to employ a large amount of video data for learning the complicated regression function. We comprehensively evaluate our method on various RGB-D video sequences to show its superior performance.
Xiao, L, Zhang, Y, Zhang, J, Wang, Q & Li, Y 2018, 'Combining HWEBING and HOG-MLBP features for pedestrian detection', The Journal of Engineering, vol. 2018, no. 16, pp. 1421-1426.View/Download from: Publisher's site
Zhang, J, Wu, Q, Shen, C, Zhang, J & Lu, J 2018, 'Multilabel Image Classification with Regional Latent Semantic Dependencies', IEEE Transactions on Multimedia, vol. 20, no. 10, pp. 2801-2813.View/Download from: UTS OPUS or Publisher's site
© 1999-2012 IEEE. Deep convolution neural networks (CNNs) have demonstrated advanced performance on single-label image classification, and various progress also has been made to apply CNN methods on multilabel image classification, which requires annotating objects, attributes, scene categories, etc., in a single shot. Recent state-of-the-art approaches to the multilabel image classification exploit the label dependencies in an image, at the global level, largely improving the labeling capacity. However, predicting small objects and visual concepts is still challenging due to the limited discrimination of the global visual features. In this paper, we propose a regional latent semantic dependencies model (RLSD) to address this problem. The utilized model includes a fully convolutional localization architecture to localize the regions that may contain multiple highly dependent labels. The localized regions are further sent to the recurrent neural networks to characterize the latent semantic dependencies at the regional level. Experimental results on several benchmark datasets show that our proposed model achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images. Also, we set up an upper bound model (RLSD+ft-RPN) using bounding-box coordinates during training, and the experimental results also show that our RLSD can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.
Zhao, J, Mao, X & Zhang, J 2018, 'Learning deep facial expression features from image and optical flow sequences using 3D CNN', Visual Computer, vol. 34, no. 10, pp. 1461-1475.View/Download from: UTS OPUS or Publisher's site
© 2018, Springer-Verlag GmbH Germany, part of Springer Nature. Facial expression is highly correlated with the facial motion. According to whether the temporal information of facial motion is used or not, the facial expression features can be classified as static and dynamic features. The former, which mainly includes the geometric features and appearance features, can be extracted by convolution or other learning filters; the latter, which are aimed to model the dynamic properties of facial motion, can be calculated through optical flow or other methods, respectively. When 3D convolutional neural networks (CNNs) are introduced, the extraction of two different types of features mentioned above becomes easy. In this paper, one 3D CNN architecture is presented to learn the static and dynamic features from facial image sequences and extract high-level dynamic features from optical flow sequences. Two types of dense optical flow, which contain the tracking information of facial muscle movement, are calculated according to different image pair construction methods. One is the common optical flow, and the other is an enhanced optical flow which is called accumulative optical flow. Four components of each type of optical flow are used in experiments. Three databases, two acted databases and one nearly realistic database, are selected to conduct the experiments. The experiments on the two acted databases achieve state-of-the-art accuracy, and indicate that the vertical component of optical flow has an advantage over other components in recognizing facial expression. The experimental results on the three selected databases show that more discriminative features can be learned from image sequences than from optical flow or accumulative optical flow sequences, and the accumulative optical flow contains more motion information than optical flow if the frame distance of the image pairs used to calculate them is not too large.
Zuo, Y, Wu, Q, Zhang, J & An, P 2018, 'Explicit Edge Inconsistency Evaluation Model for Color-guided Depth Map Enhancement', IEEE Transactions on Circuits and Systems for Video Technology.View/Download from: UTS OPUS or Publisher's site
Color-guided depth enhancement is to refine depth maps according to the assumption that the depth edges and the color edges at the corresponding locations are consistent. In the methods on such low-level vision task, Markov Random Fields (MRF) including its variants is one of major approaches, which has dominated this area for several years. However, the assumption above is not always true. To tackle the problem, the state-of-the-art solutions are to adjust the weighting coefficient inside the smoothness term of MRF model. These methods are lack of explicit evaluation model to quantitatively measure the inconsistency between the depth edge map and the color edge map, so it cannot adaptively control the efforts of the guidance from the color image for depth enhancement leading to various defects such as texture-copy artifacts and blurring depth edges. In this paper, we propose a quantitative measurement on such inconsistency and explicitly embed it into the smoothness term. The proposed method demonstrates the promising experimental results when compared with benchmark and the state-of-the-art methods on Middlebury datasets, ToF-Mark datasets and NYU datasets.
Zuo, Y, Wu, Q, Zhang, J & An, P 2018, 'Minimum Spanning Forest with Embedded Edge Inconsistency Measurement Model for Guided Depth Map Enhancement', IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 4145-4159.View/Download from: UTS OPUS or Publisher's site
© 1992-2012 IEEE. Guided depth map enhancement based on Markov random field (MRF) normally assumes edge consistency between the color image and the corresponding depth map. Under this assumption, the low-quality depth edges can be refined according to the guidance from the high-quality color image. However, such consistency is not always true, which leads to texture-copying artifacts and blurring depth edges. In addition, the previous MRF-based models always calculate the guidance affinities in the regularization term via a non-structural scheme, which ignores the local structure on the depth map. In this paper, a novel MRF-based method is proposed. It computes these affinities via the distance between pixels in a space consisting of the minimum spanning trees (forest) to better preserve depth edges. Furthermore, inside each minimum spanning tree, the weights of edges are computed based on the explicit edge inconsistency measurement model, which significantly mitigates texture-copying artifacts. To further tolerate the effects caused by noise and better preserve depth edges, a bandwidth adaption scheme is proposed. Our method is evaluated for depth map super-resolution and depth map completion problems on synthetic and real data sets, including Middlebury, ToF-Mark, and NYU. A comprehensive comparison against 16 state-of-the-art methods is carried out. Both qualitative and quantitative evaluations present the improved performances.
Guo, D, Xu, J, Zhang, J, Xu, M, Cui, Y & He, X 2017, 'User relationship strength modeling for friend recommendation on Instagram', Neurocomputing, vol. 239, pp. 9-18.View/Download from: UTS OPUS or Publisher's site
© 2017 Elsevier B.V.Social strength modeling in the social media community has attracted increasing research interest. Different from Flickr, which has been explored by many researchers, Instagram is more popular for mobile users and is conducive to likes and comments but seldom investigated. On Instagram, a user can post photos/videos, follow other users, comment and like other users' posts. These actions generate diverse forms of data that result in multiple user relationship views. In this paper, we propose a new framework to discover the underlying social relationship strength. User relationship learning under multiple views and the relationship strength modeling are coupled into one process framework. In addition, given the learned relationship strength, a coarse-to-fine method is proposed for friend recommendation. Experiments on friend recommendations for Instagram are presented to show the effectiveness and efficiency of the proposed framework. As exhibited by our experimental results, it can obtain better performance over other related methods. Although our method has been proposed for Instagram, it can be easily extended to any other social media communities.
Cheng, H, Zhang, J, Wu, Q, An, P & Liu, Z 2017, 'Stereoscopic visual saliency prediction based on stereo contrast and stereo focus', EURASIP Journal on Image and Video Processing.View/Download from: UTS OPUS or Publisher's site
In this paper, we exploit two characteristics of stereoscopic vision: the pop-out effect and the comfort zone. We propose a visual saliency prediction model for stereoscopic images based on stereo contrast and stereo focus models. The stereo contrast model measures stereo saliency based on the color/depth contrast and the pop-out effect. The stereo focus model describes the degree of focus based on monocular focus and the comfort zone. After obtaining the values of the stereo contrast and stereo focus models in parallel, an enhancement based on clustering is performed on both values. We then apply a multi-scale fusion to form the respective maps of the two models. Last, we use a Bayesian integration scheme to integrate the two maps (the stereo contrast and stereo focus maps) into the stereo saliency map. Experimental results on two eye-tracking databases show that our proposed method outperforms the state-of-the-art saliency models.
Edwards, D, Cheng, M, Wong, A, Zhang, J & Wu, Q 2017, 'Ambassadors of Knowledge Sharing: Co-produced Travel Information Through Tourist-Local Social Media Exchange', International Journal of Contemporary Hospitality Management, vol. 29, no. 2, pp. 690-708.View/Download from: UTS OPUS or Publisher's site
Purpose: The aim of this study is to understand the knowledge sharing structure and co-production of trip-related knowledge through online travel forums.
Design/methodology/approach: The travel forum threads were collected from TripAdvisor Sydney travel forum for the period from 2010 to 2014, which contains 115,847 threads from 8,346 conversations. The data analytical technique was based on a novel methodological approach - visual analytics including semantic pattern generation and network analysis.
Findings: Findings indicate that the knowledge structure is created by community residents who camouflage as local experts, serve as ambassadors of a destination. The knowledge structure presents collective intelligence co-produced by community residents and tourists. Further findings reveal how these community residents associate with each other and form a knowledge repertoire with information covering various travel domain areas.
Practical implications: The study offers valuable insights to help destination management organizations and tour operators identify existing and emerging tourism issues to achieve a competitive destination advantage.
Originality/value: This study highlights the process of social media mediated travel knowledge co-production. It also discovers how community residents engage in reaching out to tourists by camouflaging as ordinary users.
Huang, S, Zhang, J, Schonfeld, D, Wang, L & Hua, XS 2017, 'Two-Stage Friend Recommendation Based on Network Alignment and Series Expansion of Probabilistic Topic Model', IEEE Transactions on Multimedia, vol. 19, no. 6, pp. 1314-1326.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Precise friend recommendation is an important problem in social media. Although most social websites provide some kinds of auto friend searching functions, their accuracies are not satisfactory. In this paper, we propose a more precise auto friend recommendation method with two stages. In the first stage, by utilizing the information of the relationship between texts and users, as well as the friendship information between users, we align different social networks and choose some "possible friends." In the second stage, with the relationship between image features and users, we build a topic model to further refine the recommendation results. Because some traditional methods, such as variational inference and Gibbs sampling, have their limitations in dealing with our problem, we develop a novel method to find out the solution of the topic model based on series expansion. We conduct experiments on the Flickr dataset to show that the proposed algorithm recommends friends more precisely and faster than traditional methods.
Huang, X, Zhang, J, Fan, L, Wu, Q & Yuan, C 2017, 'A Systematic Approach for Cross-Source Point Cloud Registration by Preserving Macro and Micro Structures.', IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol. 26, no. 7, pp. 3261-3276.View/Download from: UTS OPUS or Publisher's site
We propose a systematic approach for registering cross-source point clouds that come from different kinds of sensors. This task is especially challenging due to the presence of significant missing data, large variations in point density, scale difference, large proportion of noise, and outliers. The robustness of the method is attributed to the extraction of macro and micro structures. Macro structure is the overall structure that maintains similar geometric layout in cross-source point clouds. Micro structure is the element (e.g., local segment) being used to build the macro structure. We use graph to organize these structures and convert the registration into graph matching. With a novel proposed descriptor, we conduct the graph matching in a discriminative feature space. The graph matching problem is solved by an improved graph matching solution, which considers global geometrical constraints. Robust cross source registration results are obtained by incorporating graph matching outcome with RANSAC and ICP refinements. Compared with eight state-of-the-art registration algorithms, the proposed method invariably outperforms on Pisa Cathedral and other challenging cases. In order to compare quantitatively, we propose two challenging cross-source data sets and conduct comparative experiments on more than 27 cases, and the results show we obtain much better performance than other methods. The proposed method also shows high accuracy in same-source data sets.
Yao, Y, Zhang, J, Shen, F, Hua, X, Xu, J & Tang, Z 2017, 'A new web-supervised method for image dataset constructions', Neurocomputing, vol. 236, pp. 23-31.View/Download from: UTS OPUS or Publisher's site
© 2017.The goal of this work is to automatically collect a large number of highly relevant natural images from Internet for given queries. A novel automatic image dataset construction framework is proposed by employing multiple query expansions. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora to obtain a richer semantic descriptions, from which the visually non-salient and less relevant expansions are then filtered. After retrieving images from the Internet with filtered expansions, we further filter noisy images by clustering and progressively Convolutional Neural Networks (CNN) based methods. To evaluate the performance of our proposed method for image dataset construction, we build an image dataset with 10 categories. We then run object detections on our image dataset with three other image datasets which were constructed by weak supervised, web supervised and full supervised learning, the experimental results indicated the effectiveness of our method is superior to weak supervised and web supervised state-of-the-art methods. In addition, we do a cross-dataset classification to evaluate the performance of our dataset with two publically available manual labelled dataset STL-10 and CIFAR-10.
Yao, Y, Zhang, J, Shen, F, Hua, X, Xu, J & Tang, Z 2017, 'Exploiting Web Images for Dataset Construction: A Domain Robust Approach', IEEE Transactions on Multimedia, vol. 19, no. 8, pp. 1771-1784.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time-consuming and labor intensive. To reduce the cost of manual labeling, there has been increased research interest in automatically constructing image datasets by exploiting web images. Datasets constructed by existing methods tend to have a weak domain adaptation ability, which is known as the "dataset bias problem." To address this issue, we present a novel image dataset construction framework that can be generalized well to unseen target domains. Specifically, the given queries are first expanded by searching the Google Books Ngrams Corpus to obtain a rich semantic description, from which the visually nonsalient and less relevant expansions are filtered out. By treating each selected expansion as a "bag" and the retrieved images as "instances," image selection can be formulated as a multi-instance learning problem with constrained positive bags. We propose to solve the employed problems by the cutting-plane and concave-convex procedure algorithm. By using this approach, images from different distributions can be kept while noisy images are filtered out. To verify the effectiveness of our proposed approach, we build an image dataset with 20 categories. Extensive experiments on image classification, cross-dataset generalization, diversity comparison, and object detection demonstrate the domain robustness of our dataset.
Zhao, Y, Di, H, Zhang, J, Lu, Y, Lv, F & Li, Y 2017, 'Region-based Mixture Models for human action recognition in low-resolution videos', Neurocomputing.View/Download from: UTS OPUS or Publisher's site
© 2017.State-of-the-art performance in human action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, optical flow algorithms are far from perfect in low-resolution (LR) videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points (SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM encodes the spatial layout of features without any needs of body parts segmentation. Experimental results show that the approach is effective and, more importantly, the approach is more general for LR recognition tasks.
Huang, S, Zhang, J, Wang, L & Hua, X-S 2016, 'Social Friend Recommendation Based on Multiple Network Correlation', IEEE TRANSACTIONS ON MULTIMEDIA, vol. 18, no. 2, pp. 287-299.View/Download from: UTS OPUS or Publisher's site
Wang, Y, Zhang, J, Liu, Z, Wu, Q, Chou, P, Zhang, Z & Jia, Y 2016, 'Handling Occlusion and Large Displacement through Improved RGB-D Scene Flow Estimation', IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 7, pp. 1265-1278.View/Download from: UTS OPUS or Publisher's site
The accuracy of scene flow is restricted by several challenges such as occlusion and large displacement motion. When occlusion happens, the positions inside the occluded regions lose their corresponding counterparts in preceding and succeeding frames. Large displacement motion will increase the complexity of motion modeling and computation. Moreover, occlusion and large displacement motion are highly related problems in scene flow estimation, e.g., large displacement motion often leads to considerably occluded regions in the scene. An improved dense scene flow method based on red-green-blue-depth (RGB-D) data is proposed in this paper. To handle occlusion, we model the occlusion status for each point in our problem formulation, and jointly estimate the scene flow and occluded regions. To deal with large displacement motion, we employ an over-parameterized scene flow representation to model both the rotation and translation components of the scene flow, since large displacement motion cannot be well approximated using translational motion only. Furthermore, we employ a two-stage optimization procedure for this overparameterized scene flow representation. In the first stage, we propose a new RGB-D PatchMatch method, which is mainly applied in the RGB-D image space to reduce the computational complexity introduced by the large displacement motion. According to the quantitative evaluation based on the Middlebury data set, our method outperforms other published methods. The improved performance is also comprehensively confirmed on the real data acquired by Kinect sensor.
Ye, L, Liu, Z, Zhou, X, Shen, L & Zhang, J 2016, 'Saliency Detection Via Similar Image Retrieval', IEEE SIGNAL PROCESSING LETTERS, vol. 23, no. 6, pp. 838-842.View/Download from: UTS OPUS or Publisher's site
Cui, Y, Zhang, J, Guo, D & Jin, Z 2015, 'Robust facial landmark localization using classified random ferns and pose-based initialization', SIGNAL PROCESSING, vol. 110, pp. 46-53.View/Download from: UTS OPUS or Publisher's site
Lu, S, Mei, T, Wang, J, Zhang, J, Wang, Z & Li, S 2015, 'Exploratory Product Image Search With Circle-to-Search Interaction', IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 25, no. 7, pp. 1190-1202.View/Download from: UTS OPUS or Publisher's site
Ma, X, Liu, D, Zhang, J & Xin, J 2015, 'A fast affine-invariant features for image stitching under large viewpoint changes', NEUROCOMPUTING, vol. 151, pp. 1430-1438.View/Download from: UTS OPUS or Publisher's site
Wang, S, Zhang, J, Han, TX & Miao, Z 2015, 'Sketch-Based Image Retrieval Through Hypothesis-Driven Object Boundary Selection With HLR Descriptor', IEEE Transactions on Multimedia, vol. 17, no. 7, pp. 1045-1057.View/Download from: UTS OPUS or Publisher's site
The appearance gap between sketches and photo- realistic images is a fundamental challenge in sketch-based image retrieval (SBIR) systems. The existence of noisy edges on photo- realistic images is a key factor in the enlargement of the appearance gap and significantly degrades retrieval performance . To bridge the gap, we propose a framework consisting of a new line segment -based descriptor named histogram of line relationship (HLR) and a new noise impact reduction algorithm known as object boundary selection . HLR treats sketches and extracted edges of photo- realistic images as a series of piece-wise line segments and captures the relationship between them. Based on the HLR, the object boundary selection algorithm aims to reduce the impact of noisy edges by selecting the shaping edges that best correspond to the object boundaries. Multiple hypotheses are generated for descriptors by hypothetical edge selection. The selection algorithm is formulated to find the best combination of hypotheses to maximize the retrieval score; a fast method is also proposed. To reduce the distraction of false matches in the scoring process, two constraints on spatial and coherent aspects are introduced . We tested the HLR descriptor and the proposed framework on public datasets and a new image dataset of three million images, which we recently collected for SBIR evaluation purposes. We compared the proposed HLR with state-of-the-art descriptors (SHoG, GF-HOG). The experimental results show that our HLR descriptor outperforms them. Combined with the object boundary selection algorithm, our framework significantly improves SBIR performance.
Wu, Y, Jia, Y, Li, P, Zhang, J & Yuan, J 2015, 'Manifold Kernel Sparse Representation of Symmetric Positive-Definite Matrices and Its Applications', IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 24, no. 11, pp. 3729-3741.View/Download from: UTS OPUS or Publisher's site
Zhou, T, Lu, Y, Lv, F, Di, H, Zhao, Q & Zhang, J 2015, 'Abrupt motion tracking via nearest neighbor field driven stochastic sampling', Neurocomputing, vol. 165, pp. 350-360.View/Download from: UTS OPUS or Publisher's site
Stochastic sampling based trackers have shown good performance for abrupt motion tracking so that they have gained popularity in recent years. However, conventional methods tend to use a two-stage sampling paradigm in which the search space needs to be uniformly explored with an inefficient preliminary sampling phase. In this paper, we propose a novel sampling-based method in the Bayesian filtering framework to address the problem. Within the framework, nearest neighbor field estimation is utilized to compute the importance proposal probabilities, which guide the Markov chain search towards promising regions and thus enhance the sampling efficiency; given the motion priors, a smoothing stochastic sampling Monte Carlo algorithm is proposed to approximate the posterior distribution through a smoothing weight-updating scheme. Moreover, to track the abrupt and the smooth motions simultaneously, we develop an abrupt-motion detection scheme which can discover the presence of abrupt motions during online tracking. Extensive experiments on challenging image sequences demonstrate the effectiveness and the robustness of our algorithm in handling the abrupt motions.
Kusakunniran, W, Wu, Q, Li, H, Zhang, J & Wang, L 2014, 'Recognizing Gaits across Views through Correlated Motion Co-clustering', IEEE Transactions on Image Processing, vol. 23, no. 2, pp. 696-709.View/Download from: UTS OPUS or Publisher's site
Liu, XW, Wang, L, Zhang, J, Yin, JP & Liu, H 2014, 'Global and Local Structure Preservation for Feature Selection', IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 6, pp. 1083-1095.View/Download from: UTS OPUS or Publisher's site
The recent literature indicates that preserving global pairwise sample similarity is of great importance for feature selection and that many existing selection criteria essentially work in this way. In this paper, we argue that besides global pairwise sample similarity, the local geometric structure of data is also critical and that these two factors play different roles in different learning scenarios. In order to show this, we propose a global and local structure preservation framework for feature selection (GLSPFS) which integrates both global pairwise sample similarity and local geometric data structure to conduct feature selection. To demonstrate the generality of our framework, we employ methods that are well known in the literature to model the local geometric data structure and develop three specific GLSPFS-based feature selection algorithms. Also, we develop an efficient optimization algorithm with proven global convergence to solve the resulting feature selection problem. A comprehensive experimental study is then conducted in order to compare our feature selection algorithms with many state-of-the-art ones in supervised, unsupervised, and semisupervised learning scenarios. The result indicates that: 1) our framework consistently achieves statistically significant improvement in selection performance when compared with the currently used algorithms; 2) in supervised and semisupervised learning scenarios, preserving global pairwise similarity is more important than preserving local geometric data structure; 3) in the unsupervised scenario, preserving local geometric data structure becomes clearly more important; and 4) the best feature selection performance is always obtained when the two factors are appropriately integrated. In summary, this paper not only validates the advantages of the proposed GLSPFS framework but also gains more insight into the information to be preserved in different feature selection tasks.
Lu, S, Mei, T, Wang, J, Zhang, J, Wang, Z & Li, S 2014, 'Browse-to-Search: Interactive Exploratory Search with Visual Entities', ACM Transactions on Information Systems, vol. 32, no. 4.View/Download from: UTS OPUS or Publisher's site
With the development of image search technology, users are no longer satisfied with searching for images using just metadata and textual descriptions. Instead, more search demands are focused on retrieving images based on similarities in their contents (textures, colors, shapes etc.). Nevertheless, one image may deliver rich or complex content and multiple interests. Sometimes users do not sufficiently define or describe their seeking demands for images even when general search interests appear, owing to a lack of specific knowledge to express their intents. A new form of information seeking activity, referred to as exploratory search, is emerging in the research community, which generally combines browsing and searching content together to help users gain additional knowledge and form accurate queries, thereby assisting the users with their seeking and investigation activities. However, there have been few attempts at addressing integrated exploratory search solutions when image browsing is incorporated into the exploring loop. In this work, we investigate the challenges of understanding users' search interests from the images being browsed and infer their actual search intentions. We develop a novel system to explore an effective and efficient way for allowing users to seamlessly switch between browse and search processes, and naturally complete visual-based exploratory search tasks. The system, called Browse-to-Search enables users to specify their visual search interests by circling any visual objects in the webpages being browsed, and then the system automatically forms the visual entities to represent users' underlying intent. One visual entity is not limited by the original image content, but also encapsulated by the textual-based browsing context and the associated heterogeneous attributes. We use large-scale image search technology to find the associated textual attributes from the repository. Users can then utilize the encapsulated visual entities to co...
Thi, TH, Wang, L, Ye, N, Zhang, J, Maurer-Stroh, S & Cheng, L 2014, 'Recognizing flu-like symptoms from videos.', BMC Bioinformatics, vol. 15, pp. 1-10.View/Download from: UTS OPUS or Publisher's site
BACKGROUND: Vision-based surveillance and monitoring is a potential alternative for early detection of respiratory disease outbreaks in urban areas complementing molecular diagnostics and hospital and doctor visit-based alert systems. Visible actions representing typical flu-like symptoms include sneeze and cough that are associated with changing patterns of hand to head distances, among others. The technical difficulties lie in the high complexity and large variation of those actions as well as numerous similar background actions such as scratching head, cell phone use, eating, drinking and so on. RESULTS: In this paper, we make a first attempt at the challenging problem of recognizing flu-like symptoms from videos. Since there was no related dataset available, we created a new public health dataset for action recognition that includes two major flu-like symptom related actions (sneeze and cough) and a number of background actions. We also developed a suitable novel algorithm by introducing two types of Action Matching Kernels, where both types aim to integrate two aspects of local features, namely the space-time layout and the Bag-of-Words representations. In particular, we show that the Pyramid Match Kernel and Spatial Pyramid Matching are both special cases of our proposed kernels. Besides experimenting on standard testbed, the proposed algorithm is evaluated also on the new sneeze and cough set. Empirically, we observe that our approach achieves competitive performance compared to the state-of-the-arts, while recognition on the new public health dataset is shown to be a non-trivial task even with simple single person unobstructed view. CONCLUSIONS: Our sneeze and cough video dataset and newly developed action recognition algorithm is the first of its kind and aims to kick-start the field of action recognition of flu-like symptoms from videos. It will be challenging but necessary in future developments to consider more complex real-life scenario of detecting ...
Tushar, W, Zhang, JA, Smith, DB, Poor, HV & Thiébaux, S 2014, 'Prioritizing Consumers in Smart Grid: A Game Theoretic Approach', IEEE Transactions on Smart Grid, vol. 5, no. 3, pp. 1429-1438.View/Download from: UTS OPUS or Publisher's site
This paper proposes an energy management technique for a consumer-to-grid system in smart grid. The benefit to consumers is made the primary concern to encourage consumers to participate voluntarily in energy trading with the central power station (CPS) in situations of energy deficiency. A novel system model motivating energy trading under the goal of social optimality is proposed. A single-leader multiple-follower Stackelberg game is then studied to model the interactions between the CPS and a number of energy consumers (ECs), and to find optimal distributed solutions for the optimization problem based on the system model. The CPS is considered as a leader seeking to minimize its total cost of buying energy from the ECs, and the ECs are the followers who decide on how much energy they will sell to the CPS for maximizing their utilities. It is shown that the game, which can be implemented distributedly, possesses a socially optimal solution, in which the sum of the benefits to all consumers is maximized, as the total cost to the CPS is minimized. Numerical analysis confirms the effectiveness of the game.
Wang, D, Yuan, C, Sun, Y, Zhang, J & Jin, X 2014, 'A fast mode decision algorithm applied to Coarse-Grain quality Scalable Video Coding', Journal of Visual Communication and Image Representation, vol. 25, no. 7, pp. 1631-1639.View/Download from: UTS OPUS or Publisher's site
Wu, Y, Ma, B, Yang, M, Zhang, J & Jia, Y 2014, 'Metric Learning Based Structural Appearance Model for Robust Visual Tracking', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 5, pp. 865-877.View/Download from: UTS OPUS or Publisher's site
Xu, J, Wu, Q, Zhang, J & Tang, Z 2014, 'Exploiting Universum data in AdaBoost using gradient descent', Image and Vision Computing, vol. 32, no. 8, pp. 550-557.View/Download from: UTS OPUS or Publisher's site
Xu, J, Wu, Q, Zhang, J, Shen, F & Tang, Z 2014, 'Boosting Separability in Semisupervised Learning for Object Classification', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 7, pp. 1197-1208.View/Download from: UTS OPUS or Publisher's site
Liu, X, Wang, L, Yin, J, Zhu, E & Zhang, J 2013, 'An Efficient Approach To Integrating Radius Information Into Multiple Kernel Learning', IEEE Transactions on Cybernetics, vol. 43, no. 2, pp. 557-569.View/Download from: UTS OPUS or Publisher's site
Integrating radius information has been demonstrated by recent work on multiple kernel learning (MKL) as a promising way to improve kernel learning performance. Directly integrating the radius of the minimum enclosing ball (MEB) into MKL as it is, however, not only incurs significant computational overhead but also possibly adversely affects the kernel learning performance due to the notorious sensitivity of this radius to outliers. Inspired by the relationship between the radius of the MEB and the trace of total data scattering matrix, this paper proposes to incorporate the latter into MKL to improve the situation. In particular, in order to well justify the incorporation of radius information, we strictly comply with the radius-margin bound of support vector machines (SVMs) and thus focus on the l2-norm soft-margin SVM classifier. Detailed theoretical analysis is conducted to show how the proposed approach effectively preserves the merits of incorporating the radius of the MEB and how the resulting optimization is efficiently solved. Moreover, the proposed approach achieves the following advantages over its counterparts: 1) more robust in the presence of outliers or noisy training samples; 2) more computationally efficient by avoiding the quadratic optimization for computing the radius at each iteration; and 3) readily solvable by the existing off-the-shelf MKL packages. Comprehensive experiments are conducted on University of California, Irvine, protein subcellular localization, and Caltech-101 data sets, and the results well demonstrate the effectiveness and efficiency of our approach.
Liu, X, Yin, J, Wang, L, Liu, L, Liu, J, Hou, C & Zhang, J 2013, 'An Adaptive Approach To Learning Optimal Neighborhood Kernels', IEEE Transactions on Cybernetics, vol. 43, no. 1, pp. 371-384.View/Download from: UTS OPUS or Publisher's site
Learning an optimal kernel plays a pivotal role in kernel-based methods. Recently, an approach called optimal neighborhood kernel learning (ONKL) has been proposed, showing promising classification performance. It assumes that the optimal kernel will reside in the neighborhood of a pre-specified kernel. Nevertheless, how to specify such a kernel in a principled way remains unclear. To solve this issue, this paper treats the pre-specified kernel as an extra variable and jointly learns it with the optimal neighborhood kernel and the structure parameters of support vector machines. To avoid trivial solutions, we constrain the pre-specified kernel with a parameterized model. We first discuss the characteristics of our approach and in particular highlight its adaptivity. After that, two instantiations are demonstrated by modeling the pre-specified kernel as a common Gaussian radial basis function kernel and a linear combination of a set of base kernels in the way of multiple kernel learning (MKL), respectively. We show that the optimization in our approach is a min-max problem and can be efficiently solved by employing the extended level method and Nesterov's method. Also, we give the probabilistic interpretation for our approach and apply it to explain the existing kernel learning methods, providing another perspective for their commonness and differences. Comprehensive experimental results on 13 UCI data sets and another two real-world data sets show that via the joint learning process, our approach not only adaptively identifies the pre-specified kernel, but also achieves superior classification performance to the original ONKL and the related MKL algorithms.
Lu, S, Zhang, J, Wang, Z & Feng, D 2013, 'Fast Human Action Classification And VOI Localization With Enhanced Sparse Coding', Journal of Visual Communication, vol. 24, no. 2, pp. 127-136.View/Download from: UTS OPUS or Publisher's site
Sparse coding which encodes the natural visual signal into a sparse space for visual codebook generation and feature quantization, has been successfully utilized for many image classification applications. However, it has been seldom explored for many video analysis tasks. In particular, the increased complexity in characterizing the visual patterns of diverse human actions with both the spatial and temporal variations imposes more challenges to the conventional sparse coding scheme. In this paper, we propose an enhanced sparse coding scheme through learning discriminative dictionary and optimizing the local pooling strategy. Localizing when and where a specific action happens in realistic videos is another challenging task. By utilizing the sparse coding based representations of human actions, this paper further presents a novel coarse-to-fine framework to localize the Volumes of Interest (VOIs) for the actions. Firstly, local visual features are transformed into the sparse signal domain through our enhanced sparse coding scheme. Secondly, in order to avoid exhaustive scan of entire videos for the VOI localization, we extend the Spatial Pyramid Matching into temporal domain, namely Spatial Temporal Pyramid Matching, to obtain the VOI candidates. Finally, a multi-level branch-and-bound approach is developed to refine the VOI candidates. The proposed framework is also able to avoid prohibitive computations in local similarity matching (e.g., nearest neighbors voting). Experimental results on both two popular benchmark datasets (KTH and YouTube UCF) and the widely used localization dataset (MSR) demonstrate that our approach reduces computational cost significantly while maintaining comparable classification accuracy to that of the state-of-the-art methods
Xin, J, Chen, K, Bai, L, Liu, D & Zhang, J 2013, 'Depth Adaptive Zooming Visual Servoing For A Robot With A Zooming Camera', International Journal of Advanced Robotic Systems, vol. 10, no. 1, pp. 1-11.View/Download from: UTS OPUS or Publisher's site
Abstract To solve the view visibility problem and keep the observed object in the field of view (FOV) during the visual servoing, a depth adaptive zooming visual servoing strategy for a manipulator robot with a zooming camera is proposed. Firstly, a zoom control mechanism is introduced into the robot visual servoing system. It can dynamically adjust the cameras field of view to keep all the feature points on the object in the field of view of the camera and get high object local resolution at the end of visual servoing. Secondly, an invariant visual servoing method is employed to control the robot to the desired position under the changing intrinsic parameters of the camera. Finally, a nonlinear depth adaptive estimation scheme in the invariant space using Lyapunov stability theory is proposed to estimate adaptively the depth of the image features on the object. Three kinds of robot 4DOF visual positioning simulation experiments are conducted. The simulation experiment results show that the proposed approach has higher positioning precision.
Zhang, J, Wu, Q, Kusakunniran, W, Ma, Y & Li, H 2013, 'A New View-Invariant Feature for Cross-View Gait Recognition', IEEE Transactions on Information Forensics and Security, vol. 8, no. 10, pp. 1642-1653.View/Download from: UTS OPUS or Publisher's site
Human gait is an important biometric feature which is able to identify a person remotely. However, change of view causes significant difficulties for recognizing gaits. This paper proposes a new framework to construct a new view-invariant feature for cross-view gait recognition. Our view-normalization process is performed in the input layer (i.e., on gait silhouettes) to normalize gaits from arbitrary views. That is, each sequence of gait silhouettes recorded from a certain view is transformed onto the common canonical view by using corresponding domain transformation obtained through invariant low-rank textures (TILTs). Then, an improved scheme of procrustes shape analysis (PSA) is proposed and applied on a sequence of the normalized gait silhouettes to extract a novel view-invariant gait feature based on procrustes mean shape (PMS) and consecutively measure a gait similarity based on procrustes distance (PD). Comprehensive experiments were carried out on widely adopted gait databases. It has been shown that the performance of the proposed method is promising when compared with other existing methods in the literature.
Zhang, JA 2013, 'Response to" On Mathematical Equivalence Between Vector OFDM and Quadrature OFDMA"', Communications, IEEE Transactions on, vol. 61, pp. 815-815.
Zhang, JA, Collings, IB, Chen, CS, Roullet, L, Luo, L, Ho, S-W & Yuan, J 2013, 'Evolving small-cell communications towards mobile-over-FTTx networks', IEEE Communications Magazine, vol. 51, no. 12, pp. 92-101.View/Download from: UTS OPUS or Publisher's site
Small cell techniques are recognized as the best way to deliver high capacity for broadband cellular communications. Femtocell and distributed antenna systems (DASs) are important components in the overall small cell story, but are not the complete solution. They have major disadvantages of very limited cooperation capability and expensive deployment cost, respectively. In this article, we propose a novel mobile-over-FTTx (MoF) network architecture, where an FTTx network is enhanced as an integrated rather than a simple backhauling component of a new mobile network delivering low-cost and powerful small cell solutions. In part, the MoF architecture combines the advantages of femtocells and DASs, while overcoming their disadvantages. Implementation challenges and potential solutions are discussed. Simulation results are presented and demonstrate the strong potential of the MoF in boosting the capacity of mobile networks.
Zhang, JA, Huang, X, Suzuki, H & Chen, Z 2013, 'Gaussian approximation based interpolation for channel matrix inversion in MIMO-OFDM systems', IEEE Transactions on Wireless Communications, vol. 12, no. 3, pp. 1407-1417.View/Download from: UTS OPUS or Publisher's site
Channel matrix inversion, which requires significant hardware resource and computational power, is a very challenging problem in MIMO-OFDM systems. Casting the frequency-domain channel matrix into a polynomial matrix, interpolation-based matrix inversion provides a promising solution to this problem. In this paper, we propose novel algorithms for interpolation based matrix inversion, which require little prior information of the channel matrix and enable the use of simple low-complexity interpolators such as spline and low pass filter interpolators. By invoking the central limit theorem, we show that a Gaussian approximation function well characterizes the power of the polynomial coefficients. Some low-complexity and efficient schemes are then proposed to estimate the parameters of the Gaussian function. With these estimated parameters, we introduce phase shifted interpolation and propose two algorithms which can achieve good interpolation accuracy using general low-complexity interpolators. Simulation results show that up to 85% complexity saving can be achieved with small performance degradation.
Zhang, JA, Luo, L & Huang, X 2013, 'Multicarrier Systems Based on Multistage Layered IFFT Structure', IEEE Signal Processing Letters, vol. 20, no. 7, pp. 665-668.View/Download from: UTS OPUS or Publisher's site
This letter extends our previous work on layered inverse Fast Fourier Transform (IFFT) structure to a multistage layered IFFT structure where data symbols can input at different stages of the IFFT. We first show that part of the IFFT in the transmitter of an OFDM system can be shifted to the receiver, while a conventional one-tap frequency-domain equalizer is still applicable. We then propose two IFFT split schemes based on decimation-in-time and decimation-in-frequency IFFT algorithms to enable interference-free symbol recovery with simple linear equalizers. Applications of the proposed schemes in multiple access communications are investigated. Simulation results demonstrate the effectiveness of the proposed schemes in improving bit-error-rate performance.
Zhang, JA, Yang, T & Chen, Z 2013, 'Under-determined training and estimation for distributed transmit beamforming systems', IEEE Transactions on Wireless Communications, vol. 12, no. 4, pp. 1936-1946.View/Download from: UTS OPUS or Publisher's site
Distributed transmit beamforming (DTB) can significantly boost the signal-to-noise ratio (SNR) of a wireless communication system. To realize the benefits of DTB, generating and feeding back beamforming vector are very challenging tasks. Existing schemes have either enormous overhead or weak robustness in noisy channels. In this paper, we investigate the design of training sequences and beamforming vector estimators in DTB systems. We consider an under-determined case, where the length of training sequence N sent from each node is smaller than the number of source nodes M. We derive the optimal estimation of the beamforming vector that maximizes the beamforming gain and show that it can be well approximated as the linear minimum mean square error (LMMSE) estimator. Based on the LMMSE estimator, we investigate the optimal design of training sequences and propose efficient DTB schemes. We analytically show that these schemes can achieve approximately N times increased SNR in uncorrelated channels, and even higher gain in correlated ones. We also propose a concatenated training scheme which optimally combines the training signals over multiple frames to obtain the beamforming vector. Simulation results demonstrate that the proposed DTB schemes can yield significant gains even at very low SNRs, with total feedback bits much less than those required in the existing schemes.
Huang, X, Guo, YJ & Zhang, JA 2012, 'Sample rate conversion using B-spline interpolation for OFDM based software defined radios', IEEE Transactions on Communications, vol. 60, no. 8, pp. 2113-2122.View/Download from: UTS OPUS or Publisher's site
This paper proposes arbitrary ratio sample rate conversion (SRC) architectures and a simpler B-spline interpolation algorithm for orthogonal frequency division multiplexing (OFDM) based software defined radios (SDRs) with multiband and multi-channel capabilities. Different from conventional standalone digital front-end designs for SDRs, the proposed SRC architectures combine the B-spline interpolation with OFDM modulation and equalization for OFDM transmitter and receiver respectively. With this combined design, the passband droop introduced by the B-spline interpolation can be more efficiently compensated using frequency-domain pre-distortion, instead of conventional time-domain pre-filtering, and hence an overall system complexity reduction is achieved. A novel multi-period B-spline interpolation and re-sampling structure is then constructed, and an interpolation algorithm with lower implementation complexity than that of the conventional Farrow structure is further developed. The SRC performance is also analysed by deriving the signal-to-peak distortion ratio formulas which can be used as design tools for determining the required orders of B-splines in the OFDM transmitter and receiver respectively. Finally, SRC examples used in a high-speed multiband multi-channel microwave backhaul system are given and compared with conventional polyphase filterbank interpolation to demonstrate the practicality and performance of the proposed SRC architectures and interpolation algorithm
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2012, 'Cross-view and multi-view gait recognitions based on view transformation model using multi-layer perceptron', Pattern Recognition Letters, vol. 33, pp. 882-889.View/Download from: UTS OPUS or Publisher's site
Gait has been shown to be an efficient biometric feature for human identification at a distance. However, performance of gait recognition can be affected by view variation. This leads to a consequent difficulty of cross-view gait recognition. A novel method is proposed to solve the above difficulty by using view transformation model (VTM). VTM is constructed based on regression processes by adopting multi-layer perceptron (MLP) as a regression tool. VTM estimates gait feature from one view using a well selected region of interest (ROI) on gait feature from another view. Thus, trained VTMs can normalize gait features from across views into the same view before gait similarity is measured. Moreover, this paper proposes a new multi-view gait recognition which estimates gait feature on one view using selected gait features from several other views. Extensive experimental results demonstrate that the proposed method significantly outperforms other baseline methods in literature for both cross-view and multi-view gait recognitions. In our experiments, particularly, average accuracies of 99%, 98% and 93% are achieved for multiple views gait recognition by using 5 cameras, 4 cameras and 3 cameras respectively.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2012, 'Gait Recognition across Various Walking Speeds using Higher-order Shape Configuration based on Differential Composition Model', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 42, no. 6, pp. 1654-1668.View/Download from: UTS OPUS or Publisher's site
Gait has been known as an effective biometric feature to identify a person at a distance. However, variation of walking speeds may lead to significant changes to human walking patterns. It causes many difficulties for gait recognition. A comprehensive analysis has been carried out in this paper to identify such effects. Based on the analysis, Procrustes shape analysis is adopted for gait signature description and relevant similarity measurement. To tackle the challenges raised by speed change, this paper proposes a higher order shape configuration for gait shape description, which deliberately conserves discriminative information in the gait signatures and is still able to tolerate the varying walking speed. Instead of simply measuring the similarity between two gaits by treating them as two unified objects, a differential composition model (DCM) is constructed. The DCM differentiates the different effects caused by walking speed changes on various human body parts. In the meantime, it also balances well the different discriminabilities of each body part on the overall gait similarity measurements. In this model, the Fisher discriminant ratio is adopted to calculate weights for each body part. Comprehensive experiments based on widely adopted gait databases demonstrate that our proposed method is efficient for cross-speed gait recognition and outperforms other state-of-the-art methods.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2012, 'Gait Recognition Under Various Viewing Angles Based On Correlated Motion Regression', Ieee Transactions On Circuits And Systems For Video Technology, vol. 22, no. 6, pp. 966-980.View/Download from: UTS OPUS or Publisher's site
It is well recognized that gait is an important biometric feature to identify a person at a distance, e. g., in video surveillance application. However, in reality, change of viewing angle causes significant challenge for gait recognition. A novel approa
Luo, L, Zhang, JA & Davis, LM 2012, 'Space-Time Block Code and Spatial Multiplexing Design for Quadrature-OFDMA Systems', IEEE Transactions on Communications, vol. 60, pp. 3133-3142.View/Download from: UTS OPUS or Publisher's site
To alleviate the high peak-to-average power ratio (PAPR), high complexity in user terminal and sensitivity to carrier frequency offset (CFO) problems in current orthogonal frequency division multiple access (OFDMA) systems, a Quadrature OFDM (Q-OFDMA) system has been recently proposed in the single-input single-output environment. In this paper we study the realization of multi-input multi-output (MIMO) diversity- and multiplexing- oriented methods for Q-OFDMA systems. An Alamouti-like space-time block code (STBC) and simple detection for spatial multiplexing (SM) for Q-OFDMA systems are constructed, both zero forcing (ZF) and minimum mean square error (MMSE) equalizers are investigated. The proposed STBC is a full diversity scheme, which encodes in intermediate domain and decodes in frequency domain. Analytical and empirical results demonstrate that the Q-OFDMA systems can be implemented flexibly and efficiently in a MIMO framework, and the proposed scheme can be easily applied in OFDMA and Single-Carrier Frequency Division Multiple Access (SC-FDMA) by adjusting the parameters of Q-OFDMA.
Thi, T, Cheng, L, Zhang, J, Wang, L & satoh, S 2012, 'Integrating local action elements for action analysis', Computer Vision and Image Understanding, vol. 116, no. 3, pp. 378-395.View/Download from: UTS OPUS or Publisher's site
In this paper, we propose a framework for human action analysis from video footage. A video action sequence in our perspective is a dynamic structure of sparse local spatialâtemporal patches termed action elements, so the problems of action analysis in video are carried out here based on the set of local characteristics as well as global shape of a prescribed action. We first detect a set of action elements that are the most compact entities of an action, then we extend the idea of Implicit Shape Model to space time, in order to properly integrate the spatial and temporal properties of these action elements. In particular, we consider two different recipes to construct action elements: one is to use a Sparse Bayesian Feature Classifier to choose action elements from all detected Spatial Temporal Interest Points, and is termed discriminative action elements. The other one detects affine invariant local features from the holistic Motion History Images, and picks up action elements according to their compactness scores, and is called generative action elements. Action elements detected from either way are then used to construct a voting space based on their local feature representations as well as their global configuration constraints. Our approach is evaluated in the two main contexts of current human action analysis challenges, action retrieval and action classification. Comprehensive experimental results show that our proposed framework marginally outperforms all existing state-of-the-arts techniques on a range of different datasets.
Thi, T, Cheng, L, Zhang, J, Wang, L & satoh, S 2012, 'Structured learning of local features for human action classification and localization', Image & Vision Computing, vol. 30, no. 1, pp. 1-14.View/Download from: UTS OPUS or Publisher's site
Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods.
Xu, J, Wu, Q, Zhang, J & Tang, Z 2012, 'Fast and Accurate Human Detection Using a Cascade of Boosted MS-LBP Features', IEEE Signal Processing Letters, vol. 19, no. 10, pp. 676-679.View/Download from: UTS OPUS or Publisher's site
In this letter, a new scheme for generating local binary patterns (LBP) is presented. This Modi?ed Symmetric LBP (MS-LBP) feature takes advantage of LBP and gradient features. It is then applied into a boosted cascade framework for human detection. By combining MS-LBP with Haar-like feature into the boosted framework, the performances of heterogeneous features based detectors are evaluated for the best trade-off between accuracy and speed. Two feature training schemes, namely Single AdaBoost Training Scheme (SATS) and Dual AdaBoost Training Scheme (DATS) are proposed and compared. On the top of AdaBoost, two multidimensional feature projection methods are described. A comprehensive experiment is presented. Apart from obtaining higher detection accuracy, the detection speed based on DATS is 17 times faster than HOG method.
Zhang, J & Huang, X 2012, 'Autocorrelation based coarse timing with differential normalization', IEEE Transactions on Wireless Communications, vol. 11, no. 2, pp. 526-530.View/Download from: UTS OPUS or Publisher's site
Two novel differential normalization factors, depending on the severity of carrier frequency offset, are proposed for autocorrelation based coarse timing scheme. Compared with the conventional normalization factor based on signal energy, they improve the robustness of the timing metric to signal-to-noise ratio (SNR), improve the mainlobe sharpness of the timing metric and reduce both missed detection and false alarm probabilities.
Zhang, J, Li, N, Yang, Q & Hu, C 2012, 'Self-adaptive Chaotic Differential Evolution Algorithm for Solving Constrained Circular Packing Problem', Journal of Computational Information Systems, vol. 8, no. 18, pp. 7747-7755.View/Download from: UTS OPUS
Circles packing into a circular container with equilibrium constraint is a NP hard layout optimization problem. It has a broad application in engineering. This paper studies a two-dimensional constrained packing problem. Classical di?erential evolution for solving this problem is easy to fall into local optima. An adaptive chaotic di?erential evolution algorithm is proposed to improve the performance in this paper. The weighting parameters are dynamically adjusted by chaotic mutation in the searching procedure. The penalty factors of the ?tness function are modi?ed during iteration. To keep the diversity of the population, we limit the populations concentration. To enhance the local search capability, we adopt adaptive mutation of the global optimal individual. The improved algorithm can maintain the basic algorithms structure as well as extend the searching scales, and can hold the diversity of population as well as increase the searching accuracy. Furthermore, our improved algorithm can escape from premature and speed up the convergence. Numerical examples indicate the e?ectiveness and efficiency of the proposed algorithm.
This letter proposes simple algorithms for computing a phase shift term, which is introduced to greatly improve the accuracy of complex signal interpolation, applicable to any interpolator. Based on a cost function targeting at minimizing the phase transition between adjacent samples, the phase shift term can be easily computed using either signal statistics obtained in advance or known base samples in real time. Simulation results, exemplified for channel interpolation in OFDM systems, show that the proposed phase estimators can significantly improve the interpolation performance for various interpolators such as spline, low-pass filter, and linear and cubic polynomial interpolators, compared to the case without phase shifting.
Zhang, JA, Huang, X, Cantoni, A & Guo, YJ 2012, 'Sidelobe suppression with orthogonal projection for multicarrier systems', IEEE Transactions on Communications, vol. 60, no. 2, pp. 589-599.View/Download from: UTS OPUS or Publisher's site
Sidelobe suppression, or out-of-band emission reduction, in multicarrier systems is conventionally achieved via time-domain windowing which is spectrum inefficient. Although some sidelobe cancellation and signal predistortion techniques have been proposed for spectrum shaping, they are generally not well balanced between complexity and suppression performance. In this paper, an efficient and low-complexity sidelobe suppression with orthogonal projection (SSOP) scheme is proposed. The SSOP scheme uses an orthogonal projection matrix for sidelobe suppression, and adopts as few as one reserved subcarrier for recovering the distorted signal in the receiver. Unlike most known approaches, the SSOP scheme requires multiplications as few as the number of subcarriers in the band, and enables straightforward selection of parameters. Analytical and simulation results show that more than 50dB sidelobe suppression can be readily achieved with only a slight degradation in receiver performance.
Paisitkriangkrai, S, Mei, T, Zhang, J & Hua, XS 2011, 'Clip-based hierarchical representation for near-duplicate video detection', International Journal of Computer Mathematics, vol. 88, no. 18, pp. 3817-3833.View/Download from: Publisher's site
Searching for near-duplicate content has become an important task in many multimedia applications, for example, images, videos and music. The ability to detect duplicate videos plays an important role in several video applications, for example, effective video search, copyright infringement and the study on users' behaviour on near-duplicate video production. Current web video search systems rely only on text keywords and, hence, fail to detect many duplicate videos. In this paper, we analyse the problem of near-duplicate detection and propose a practical solution for real-time large-scale video retrieval. Unlike many existing approaches which make use of video frames or key-frames, our solution is based on a more discriminative signature of video clips. The feature used in this paper is an extension of ordinal measures which have proven to be robust to change in brightness, compression formats and compression ratios. For efficient retrieval, we propose to use multi-probe locality sensitive hashing (MPLSH) to index the video clips for fast similarity search and high recall. MPLSH is able to filter out a large number of dissimilar clips from video database. To refine the search process, we apply a similarity voting based on video clip signatures. Experimental results on the dataset of 12,790 web videos show that the proposed approach improves average precision over the baseline colour histogram approach while satisfying real-time requirements. © 2011 Copyright Taylor and Francis Group, LLC.
Paisitkriangkrai, S, Shen, C & Zhang, J 2011, 'Incremental Training of a Detector Using Online Sparse Eigendecomposition', IEEE Transactions On Image Processing, vol. 20, no. 1, pp. 213-226.View/Download from: UTS OPUS or Publisher's site
The ability to efficiently and accurately detect objects plays a very crucial role for many computer vision tasks. Recently, offline object detectors have shown a tremendous success. However, one major drawback of offline techniques is that a complete set of training data has to be collected beforehand. In addition, once learned, an offline detector cannot make use of newly arriving data. To alleviate these drawbacks, online learning has been adopted with the following objectives: 1) the technique should be computationally and storage efficient; 2) the updated classifier must maintain its high classification accuracy. In this paper, we propose an effective and efficient framework for learning an adaptive online greedy sparse linear discriminant analysis model. Unlike many existing online boosting detectors, which usually apply exponential or logistic loss, our online algorithm makes use of linear discriminant analysisâ learning criterion that not only aims to maximize the class-separation criterion but also incorporates the asymmetrical property of training data distributions. We provide a better alternative for online boosting algorithms in the context of training a visual object detector.We demonstrate the robustness and efficiency of our methods on handwritten digit and face data sets. Our results confirm that object detection tasks benefit significantly when trained in an online manner.
Shen, C, Paisitkriangkrai, S & Zhang, J 2011, 'Efficiently Learning a Detection Cascade with Sparse Eigenvectors', IEEE Transactions On Image Processing, vol. 19, no. 7, pp. 22-35.View/Download from: UTS OPUS or Publisher's site
Real-time object detection has many computer vision applications. Since Viola and Jones proposed the first real-time AdaBoost based face detection system, much effort has been spent on improving the boosting method. In this work, we first show that feature selection methods other than boosting can also be used for training an efficient object detector. In particular, we introduce greedy sparse linear discriminant analysis (GSLDA) for its conceptual simplicity and computational efficiency; and slightly better detection performance is achieved compared with . Moreover, we propose a new technique, termed boosted greedy sparse linear discriminant analysis (BGSLDA), to efficiently train a detection cascade. BGSLDA exploits the sample reweighting property of boosting and the class-separability criterion of GSLDA. Experiments in the domain of highly skewed data distributions (e.g., face detection) demonstrate that classifiers trained with the proposed BGSLDA outperforms AdaBoost and its variants. This finding provides a significant opportunity to argue that AdaBoost and similar approaches are not the only methods that can achieve high detection results for real-time object detection.
Smith, DB, Hanlen, LW, Zhang, JA, Miniutti, D, Rodda, D & Gilbert, B 2011, 'First-and second-order statistical characterizations of the dynamic body area propagation channel of various bandwidths', Annals of Telecommunications, vol. 66, no. 3-4, pp. 187-203.View/Download from: UTS OPUS or Publisher's site
Comprehensive statistical characterizations
of the dynamic narrowband on-body area and on-body
to off-body area channels are presented. These characterizations
are based on real-time measurements of
the time domain channel response at carrier frequencies
near the 900- and 2,400-MHz industrial, scientific,
and medical bands and at a carrier frequency near
the 402-MHz medical implant communications band.
We consider varying amounts of body movement, numerous
transmit–receive pair locations on the human
body, and various bandwidths. We also consider long
periods, i.e., hours of everyday activity (predominantly indoor scenarios), for on-body channel characterization.
Various adult human test subjects are used. It is
shown, by applying the Akaike information criterion,
that the Weibull and Gamma distributions generally fit
agglomerates of received signal amplitude data and that
in various individual cases the Lognormal distribution
provides a good fit. We also characterize fade duration
and fade depth with direct matching to second-order
temporal statistics. These first- and second-order characterizations
have important utility in the design and
evaluation of body area communications systems.
Chen, Y, Zhang, J & Jayalath, ADS 2010, 'Estimation and compensation of clipping noise in OFDMA systems', Wireless Communications, IEEE Transactions on, vol. 9, pp. 523-527.
Husain, SI, Yuan, J, Zhang, J & Martin, RK 2009, 'Time domain equalizer design using bit error rate minimization for UWB systems', EURASIP Journal on Wireless Communications and Networking, vol. 2009, pp. 9-9.
Lu, S, Zhang, J & Feng, DD 2009, 'Detecting ghost and left objects insurveillance video', International Journal of Pattern Recognition and Artificial Intelligence, vol. 23, no. 7, pp. 1503-1525.View/Download from: UTS OPUS or Publisher's site
This paper proposes an efficient method for detecting ghost and left objects in surveillance video, which, if not identified, may lead to errors or wasted computational power in background modeling and object tracking in video surveillance systems. This method contains two main steps: the first one is to detect stationary objects, which narrows down the evaluation targets to a very small number of regions in the input image; the second step is to discriminate the candidates between ghost and left objects. For the first step, we introduce a novel stationary object detection method based on continuous object tracking and shape matching. For the second step, we propose a fast and robust inpainting method to differentiate between ghost and left objects by reconstructing the real background using the candidate's corresponding regions in the current input and background image. The effectiveness of our method has been validated by experiments over a variety of video sequences and comparisons with existing state-of-art methods. © 2009 World Scientific Publishing Company.
Luo, L, Zhang, J & Shi, Z 2009, 'Advanced receiver design for quadrature OFDMA systems', EURASIP Journal on Wireless Communications and Networking, vol. 2009, pp. 10-10.
Smith, DB, Zhang, JA, Hanlen, LW, Miniutti, D, Rodda, D & Gilbert, B 2009, 'Temporal correlation of dynamic on-body area radio channel', Electronics letters, vol. 45, pp. 1212-1213.
Ying, C, Zhang, JA & ADS, J 2009, 'Low-complexity estimation of CFO and frequency independent I/Q mismatch for OFDM systems', EURASIP Journal on Wireless Communications and Networking, vol. 2009.
Zhang, J, Luo, L & Shi, Z 2009, 'Quadrature OFDMA systems based on layered FFT structure', Communications, IEEE Transactions on, vol. 57, pp. 850-860.
Zhang, J, Smith, DB, Hanlen, LW, Miniutti, D, Rodda, D & Gilbert, B 2009, 'Stability of narrowband dynamic body area channel', Antennas and Wireless Propagation Letters, IEEE, vol. 8, pp. 53-56.
Chen, Y, Zhang, JA & Jayalath, D 2008, 'Clipping noise compensation for OFDM systems', Electronics Letters, vol. 44, pp. 1490-1491.
Paisitkriangkrai, S, Shen, C & Zhang, J 2008, 'Fast pedestrian detection using a cascade of boosted covariance features', IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 8, pp. 1140-1151.View/Download from: UTS OPUS or Publisher's site
Efficiently and accurately detecting pedestrians plays a very important role in many computer vision applications such as video surveillance and smart cars. In order to find the right feature for this task, we first present a comprehensive experimental s
Paisitkriangkrai, S, Shen, C & Zhang, J 2008, 'Performance evaluation of local features in human classification and detection', IET Computer Vision, vol. 2, no. 4, pp. 236-246.View/Download from: UTS OPUS or Publisher's site
Detecting pedestrians accurately is the first fundamental step for many computer vision applications such as video surveillance, smart vehicles, intersection traffic analysis and so on. The authors present an experimental study on pedestrian detection us
Zhang, J, Kennedy, RA & Abhayapala, TD 2008, 'Reduced-rank shift-invariant technique and its application for synchronization and channel identification in UWB systems', EURASIP Journal on Wireless Communications and Networking, vol. 2008, pp. 38-38.
Husain, SI, Yuan, J & Zhang, J 2007, 'Modified channel shortening receiver based on MSSNR algorithm for UWB channels', Electronics Letters, vol. 43, pp. 535-537.
Jian, Z, Jayalath, ADS & Chen, Y 2007, 'Asymmetric OFDM systems based on layered FFT structure', Signal Processing Letters, IEEE, vol. 14, pp. 812-815.
Lu, S, Zhang, J & Dagan, F 2007, 'Detecting unattended packages through human activity recognition and object association', Pattern Recognition, vol. 40, no. 8, pp. 2173-2184.View/Download from: Publisher's site
This paper provides a novel approach to detect unattended packages in public venues. Different from previous works on this topic which are mostly limited to detecting static objects where no human is nearby, we provide a solution which can detect an unat
Zhang, J, Kennedy, RA & Abhayapala, TD 2005, 'Cramér-Rao lower bounds for the synchronization of UWB signals', EURASIP Journal on Applied Signal Processing, vol. 2005, pp. 426-438.
Zhang, J, Arnold, J & Frater, M 2000, 'A cell-loss concealment technique for MPEG-2 coded video', IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 4, pp. 659-665.View/Download from: Publisher's site
Audio-visual and other multimedia services are seen as important sources of traffic for future telecommunication networks, including wireless networks. A major drawback with some wireless networks is that they introduce a significant number of transmissi
Arnold, J, Frater, M & Zhang, J 1999, 'Error resilience in the MPEG-2 video coding standard for cell based networks - a review', Signal Processing: Image Communication, vol. 14, no. 6, pp. 607-633.View/Download from: Publisher's site
The MPEG-2 video coding standard is being extensively used worldwide for the provision of digital video services. Many of these applications involve the transport of MPEG-2 video over cell-based (or packet) networks. Examples include the broadband integr
Frater, M, Arnold, J & Zhang, J 1999, 'MPEG 2 video error resilience experiments: The importance considering the impact of the systems layer', Signal Processing: Image Communication, vol. 14, no. 3, pp. 269-275.View/Download from: Publisher's site
With increasing interest in the transport of video traffic over lossy networks, several techniques for improving the quality of video services in the presence of loss have been proposed, often using the MPEG 2 video coding algorithm as a basis. Many of t
Zhang, J, Frater, M, Arnold, J & Percival, T 1997, 'MPEG 2 video services for wireless ATM networks', IEEE Journal on Selected Areas in Communications, vol. 15, no. 1, pp. 119-127.View/Download from: Publisher's site
Audio-visual and other multimedia services are seen as an important source of traffic for future telecommunications networks, including wireless networks. In this paper, we examine the impact of the properties of a 50 Mb/s asynchronous transfer mode (ATM
Zhang, J 2006, 'Error Resilience for Video Coding Service' in Wu, HR & Rao, KR (eds), Digital Video Image Quality and Perceptual Coding, CRC, Taylor & Francis group, USA, pp. 503-527.
This is part of my thesis
Huang, H, Xu, J, Zhang, J, Wu, Q & Kirsch, C 2018, 'Railway Infrastructure Defects Recognition using Fine-grained Deep Convolutional Neural Networks', 2018 Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing: Techniques and Applications, IEEE, Canberra, Australia.View/Download from: UTS OPUS or Publisher's site
Huang, H, Zheng, J, Zhang, J, Wu, Q & Xu, J 2019, 'Compare more nuanced: Pairwise alignment bilinear network for few-shot fine-grained learning', Proceedings - IEEE International Conference on Multimedia and Expo, pp. 91-96.View/Download from: UTS OPUS or Publisher's site
© 2019 IEEE. The recognition ability of human beings is developed in a progressive way. Usually, children learn to discriminate various objects from coarse to fine-grained with limited supervision. Inspired by this learning process, we propose a simple yet effective model for the Few-Shot Fine-Grained (FSFG) recognition, which tries to tackle the challenging fine-grained recognition task using meta-learning. The proposed method, named Pairwise Alignment Bilinear Network (PABN), is an end-to-end deep neural network. Unlike traditional deep bilinear networks for fine-grained classification, which adopt the self-bilinear pooling to capture the subtle features of images, the proposed model uses a novel pairwise bilinear pooling to compare the nuanced differences between base images and query images for learning a deep distance metric. In order to match base image features with query image features, we design feature alignment losses before the proposed pairwise bilinear pooling. Experiment results on four fine-grained classification datasets and one generic few-shot dataset demonstrate that the proposed model outperforms both the state-of-the-art few-shot fine-grained and general few-shot methods.
Li, Q, Wu, Q, Zhu, C, Zhang, J & Zhao, W 2019, 'Unsupervised User Behavior Representation for Fraud Review Detection with Cold-Start Problem', Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, China, pp. 222-236.View/Download from: UTS OPUS or Publisher's site
Li, Z, Gong, Y, Zhang, J, Yi, J, Wu, Q & Kirsch, C 2019, 'Sample adaptive multiple kernel learning for failure prediction of railway points', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2848-2856.View/Download from: UTS OPUS or Publisher's site
© 2019 Association for Computing Machinery. Railway points are among the key components of railway infrastructure. As a part of signal equipment, points control the routes of trains at railway junctions, having a significant impact on the reliability, capacity, and punctuality of rail transport. Meanwhile, they are also one of the most fragile parts in railway systems. Points failures cause a large portion of railway incidents. Traditionally, maintenance of points is based on a fixed time interval or raised after the equipment failures. Instead, it would be of great value if we could forecast points' failures and take action beforehand, min-imising any negative effect. To date, most of the existing prediction methods are either lab-based or relying on specially installed sensors which makes them infeasible for large-scale implementation. Besides, they often use data from only one source. We, therefore, explore a new way that integrates multi-source data which are ready to hand to fulfil this task. We conducted our case study based on Sydney Trains rail network which is an extensive network of passenger and freight railways. Unfortunately, the real-world data are usually incomplete due to various reasons, e.g., faults in the database, operational errors or transmission faults. Besides, railway points differ in their locations, types and some other properties, which means it is hard to use a unified model to predict their failures. Aiming at this challenging task, we firstly constructed a dataset from multiple sources and selected key features with the help of domain experts. In this paper, we formulate our prediction task as a multiple kernel learning problem with missing kernels. We present a robust multiple kernel learning algorithm for predicting points failures. Our model takes into account the missing pattern of data as well as the inherent variance on different sets of railway points. Extensive experiments demonstrate the superiority of our algorithm compare...
Shen, J, Wang, Y & Zhang, J 2018, 'Memory optimized Deep Dense Network for Image Super-resolution', Digital Image Computing: Techniques and Applications, IEEE, Canberra, Australia.View/Download from: UTS OPUS or Publisher's site
CNN methods for image super-resolution consume a large number of training-time memory, due to the feature size will not decrease as the network goes deeper. To reduce the memory consumption during training, we propose a memory optimized deep dense network for image super-resolution. We first reduce redundant features learning, by rationally designing the skip connection and dense connection in the network. Then we adopt share memory allocations to store concatenated features and Batch Normalization intermediate feature maps. The memory optimized network consumes less memory than normal dense network. We also evaluate our proposed architecture on highly competitive super-resolution benchmark datasets. Our deep dense network outperforms some existing methods, and requires relatively less computation.
Yao, L, Kusakunniran, W, Wu, Q, Zhang, J & Tang, Z 2018, 'Robust CNN-based Gait Verification and Identification using Skeleton Gait Energy Image', 2018 Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing: Techniques and Applications, IEEE, Canberra, Australia.View/Download from: UTS OPUS or Publisher's site
Ye, P, Wang, Y, Xia, Y, An, P & Zhang, J 2019, 'Enhanced saliency prediction via free energy principle', Communications in Computer and Information Science, pp. 31-44.View/Download from: UTS OPUS or Publisher's site
© Springer Nature Singapore Pte Ltd 2019. Saliency prediction can be treated as the activity of human brain. Most saliency prediction methods employ features to determine the contrast of an image area relative to its surroundings. However, only few studies have investigated how human brain activities affect saliency prediction. In this paper, we propose an enhanced saliency prediction model via free energy principle. A new AR-RTV model, which combines the relative total variation (RTV) structure extractor with autoregressive (AR) operator, is firstly utilized to decompose an original image into the predictable component and the surprise component. Then, we adopt the local entropy of 'surprise' map and the gradient magnitude (GM) map to estimate the component saliency maps-sub-saliency respectively. Finally, inspired by visual error sensitivity, a saliency augment operator is designed to enhance the final saliency combined two sub-saliency maps. Experimental results on two benchmark databases demonstrate the superior performance of the proposed method compared to eleven state-of-the-art algorithms.
Zhang, L, Xu, J, Zhang, J & Gong, Y 2018, 'Information Enhancement for Travelogues via a Hybrid Clustering Model', Digital Image Computing: Techniques and Applications, IEEE, Canberra, ACT, Australia, pp. 1-8.View/Download from: UTS OPUS or Publisher's site
Zhao, M, Zhang, J, Zhang, C & Zhang, W 2018, 'Towards Locally Consistent Object Counting with Constrained Multi-stage Convolutional Neural Networks', ACCV 2018: Computer Vision, Asian Conference on Computer Vision, Springer, Perth, Australia, pp. 247-261.View/Download from: UTS OPUS or Publisher's site
High-density object counting in surveillance scenes is challenging mainly due to the drastic variation of object scales. The prevalence of deep learning has largely boosted the object counting accuracy on several benchmark datasets. However, does the global counts really count? Armed with this question we dive into the predicted density map whose summation over the whole regions reports the global counts for more in-depth analysis. We observe that the object density map generated by most existing methods usually lacks of local consistency, i.e., counting errors in local regions exist unexpectedly even though the global count seems to well match with the ground-truth. Towards this problem, in this paper we propose a constrained multi-stage Convolutional Neural Networks (CNNs) to jointly pursue locally consistent density map from two aspects. Different from most existing methods that mainly rely on the multi-column architectures of plain CNNs, we exploit a stacking formulation of plain CNNs. Benefited from the internal multi-stage learning process, the feature map could be repeatedly refined, allowing the density map to approach the ground-truth density distribution. For further refinement of the density map, we also propose a grid loss function. With finer local-region-based supervisions, the underlying model is constrained to generate locally consistent density values to minimize the training errors considering both the global and local counts accuracy. Experiments on two widely-tested object counting benchmarks with overall significant results compared with state-of-the-art methods demonstrate the effectiveness of our approach.
Gong, Y, Li, Z, Zhang, J, Liu, W, Zheng, Y & Kirsch, C 2018, 'Network-wide Crowd Flow Prediction of Sydney Trains via customized Online Non-negative Matrix Factorization', ACM International Conference on Information and Knowledge Managemen, ACM DL, Turin, Italy.View/Download from: UTS OPUS
Guo, D, Zhao, W, Cui, Y, Wang, Z, Chen, S & Zhang, J 2018, 'Siamese network based features fusion for adaptive visual tracking', PRICAI 2018: Trends in Artificial Intelligence 15th Pacific Rim International Conference on Artificial Intelligence Nanjing, China, August 28–31, 2018 Proceedings (LNAI 11012), International Conference on Artificial Intelligence, Springer, China, pp. 759-771.View/Download from: UTS OPUS or Publisher's site
© Springer Nature Switzerland AG 2018. Visual object tracking is a popular but challenging problem in computer vision. The main challenge is the lack of priori knowledge of the tracking target, which may be only supervised of a bounding box given in the first frame. Besides, the tracking suffers from many influences as scale variations, deformations, partial occlusions and motion blur, etc. To solve such a challenging problem, a suitable tracking framework is demanded to adopt different tracking scenes. This paper presents a novel approach for robust visual object tracking by multiple features fusion in the Siamese Network. Hand-crafted appearance features and CNN features are combined to mutually compensate for their shortages and enhance the advantages. The proposed network is processed as follows. Firstly, different features are extracted from the tracking frames. Secondly, the extracted features are employed via Correlation Filter respectively to learn corresponding templates, which are used to generate response maps respectively. And finally, the multiple response maps are fused to get a better response map, which can help to locate the target location more accurately. Comprehensive experiments are conducted on three benchmarks: Temple-Color, OTB50 and UAV123. Experimental results demonstrate that the proposed approach achieves state-of-the-art performance on these benchmarks.
Li, Z, Zhang, J, Wu, Q & Kirsch, C 2018, 'Field-regularised factorization machines for mining the maintenance logs of equipment', AI 2018: AI 2018: Advances in Artificial Intelligence 31st Australasian Joint Conference Wellington, New Zealand, December 11–14, 2018 Proceedings (LNAI 11320), Australasian Joint Conference on Artificial Intelligence, Springer, New Zealand, pp. 172-183.View/Download from: UTS OPUS or Publisher's site
© Springer Nature Switzerland AG 2018. Failure prediction is very important for railway infrastructure. Traditionally, data from various sensors are collected for this task. Value of maintenance logs is often neglected. Maintenance records of equipment usually indicate equipment status. They could be valuable for prediction of equipment faults. In this paper, we propose Field-regularised Factorization Machines (FrFMs) to predict failures of railway points with maintenance logs. Factorization Machine (FM) and its variants are state-of-the-art algorithms designed for sparse data. They are widely used in click-through rate prediction and recommendation systems. Categorical variables are converted to binary features through one-hot encoding and then fed into these models. However, field information is ignored in this process. We propose Field-regularised Factorization Machines to incorporate such valuable information. Experiments on data set from railway maintenance logs and another public data set show the effectiveness of our methods.
Yao, Y, Zhang, J, Fumin, S, Wankou, Y, Pu, H & Zhenmin, T 2018, 'Discovering and Distinguishing Multiple Visual Senses for Polysemous Words', https://aaai.org/Library/AAAI/aaai18contents.php, AAAI Conference on Artificial Intelligence, The AAAI Press, New Orleans, USA, pp. 523-530.View/Download from: UTS OPUS
To reduce the dependence on labeled data, there have been increasing
research efforts on learning visual classifiers by exploiting web images. One issue that limits their performance is the problem of polysemy. To solve this problem, in this work, we present a novel framework that solves the problem of polysemy by allowing sense-specific diversity in search results. Specifically, we first discover a list of possible semantic senses to retrieve sense-specific images. Then we merge visual similar semantic senses and prune noises by using the retrieved images. Finally, we train a visual classifier for each selected semantic sense and use the learned sense-specific classifiers to distinguish multiple visual senses. Extensive experiments on classifying images into sense-specific categories
and re-ranking search results demonstrate the superiority of our proposed approach.
Yao, Y, Zhang, J, Shen, F, Yang, W, Hua, XS & Tang, Z 2018, 'Extracting privileged information from untagged corpora for classifier learning', IJCAI International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 1085-1091.View/Download from: UTS OPUS or Publisher's site
© 2018 International Joint Conferences on Artificial Intelligence. All right reserved. The performance of data-driven learning approaches is often unsatisfactory when the training data is inadequate either in quantity or quality. Manually labeled privileged information (PI), e.g., attributes, tags or properties, is usually incorporated to improve classifier learning. However, the process of manually labeling is time-consuming and labor-intensive. To address this issue, we propose to enhance classifier learning by extracting PI from untagged corpora, which can effectively eliminate the dependency on manually labeled data. In detail, we treat each selected PI as a subcategory and learn one classifier for per subcategory independently. The classifiers for all subcategories are then integrated together to form a more powerful category classifier. Particularly, we propose a new instance-level multi-instance learning (MIL) model to simultaneously select a subset of training images from each subcategory and learn the optimal classifiers based on the selected images. Extensive experiments demonstrate the superiority of our approach.
Zhang, J, Wu, Q, Shen, C, Zhang, J, Lu, J & van den Hengel, A 2018, 'Goal-Oriented Visual Question Generation via Intermediate Rewards', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), European Conference on Computer Vision, Springer Link, Munich, Germany, pp. 189-204.View/Download from: UTS OPUS or Publisher's site
© 2018, Springer Nature Switzerland AG. Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge. Towards this end, we propose a Deep Reinforcement Learning framework based on three new intermediate rewards, namely goal-achieved, progressive and informativeness that encourage the generation of succinct questions, which in turn uncover valuable information towards the overall goal. By directly optimizing for questions that work quickly towards fulfilling the overall goal, we avoid the tendency of existing methods to generate long series of inane queries that add little value. We evaluate our model on the GuessWhat?! dataset and show that the resulting questions can help a standard 'Guesser' identify a specific object in an image at a much higher success rate.
Zhang, J, Wu, Q, Zhang, J, Shen, C & Lu, J 2018, 'Kill Two Birds with One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement', The Thirty-Second AAAI Conference on Artificial Intelligence, The Thirty-Second AAAI Conference on Artificial Intelligence, AAAI Press, USA, pp. 7550-7557.View/Download from: UTS OPUS
The number of social images has exploded by the wide adoption of social
networks, and people like to share their comments about them. These comments
can be a description of the image, or some objects, attributes, scenes in it,
which are normally used as the user-provided tags. However, it is well-known
that user-provided tags are incomplete and imprecise to some extent. Directly
using them can damage the performance of related applications, such as the
image annotation and retrieval. In this paper, we propose to learn an image
annotation model and refine the user-provided tags simultaneously in a
weakly-supervised manner. The deep neural network is utilized as the image
feature learning and backbone annotation model, while visual consistency,
semantic dependency, and user-error sparsity are introduced as the constraints
at the batch level to alleviate the tag noise. Therefore, our model is highly
flexible and stable to handle large-scale image sets. Experimental results on
two benchmark datasets indicate that our proposed model achieves the best
performance compared to the state-of-the-art methods.
Zhang, J, Zhang, J, Wu, Q, Wu, Q, Xu, J, Lu, J, Phua, R, Curr, K & Tang, Z 2017, 'Historical image annotation by exploring the tag relevance', Proceedings - 4th Asian Conference on Pattern Recognition, ACPR 2017, IAPR Asian Conference on Pattern Recognition, IEEE, Nanjing, China, pp. 646-651.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Historical images usually contain enormous historical research value and are highly related to the history objects, events and background stories etc. Therefore, annotating these images always requires selecting tags within a large set. In this paper, we propose to annotate historical images by exploring the tag relevance. We measure the tag relevance from three different perspectives, including its visual relevance, its dependencies with other tags and its relationship with location based meta-data. By using tag relevance as guidance, we generate three tag sub-sets and use them to fulfill the annotation. Experimental results on the benchmark dataset indicate the significance of exploring the tag relevance by comparing with the baseline experiments.
Zhang, P, Wu, Q, Xu, J & Jian, Z 2018, 'Long-Term Person Re-identification Using True Motion from Videos', Winter Conference on Applications of Computer Vision, IEEE, Lake Tahoe, NV, USA, pp. 494-502.View/Download from: UTS OPUS or Publisher's site
Gu, S, Lu, Y, Zhang, L & Zhang, J 2017, 'RGB-D Tracking Based on Kernelized Correlation Filter with Deep Features', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Conference on Neural Information ProcessingConference on Neural Information Processingtional Conference on Neural Information Processing, SpringerLink, Guangzhou, China, pp. 105-113.View/Download from: UTS OPUS or Publisher's site
© 2017, Springer International Publishing AG. This paper proposes a new RGB-D tracker which is upon Kernelized Correlation Filter(KCF) with deep features. KCF is a high-speed target tracker. However, the HOG feature used in KCF shows some weaknesses, such as not robust to noise. Therefore, we consider using RGB-D deep features in KCF, which refer to deep features of RGB and depth images and the deep features contain abundant and discriminated information for tracking. The mixture of deep features highly improves the performance of the tracker. Besides, KCF is sensitive to scale variations while depth images benefit for handling this problem. According to the principle of similar triangle, the ratio of scale variation can be observed simply. Tested over Princeton RGB-D Tracking Benchmark, Our RGB-D tracker achieves the highest accuracy when no occlusion happens. Meanwhile, we keep the high-speed tracking even if deep features are calculated during tracking and the average speed is 10 FPS.
Jiang, Z, Huynh, DQ, Zhang, J, Qiang, W & Zhang, J 2017, 'Part-based Data Association for Visual Tracking', Proceedings of Digital Image Computing: Techniques and Applications (DICTA), 2017 International Conference on, International Conference on Digital Image Computing: Techniques and Applications, IEEE, Sydney, NSW, Australia, pp. 1-8.View/Download from: UTS OPUS or Publisher's site
We present a method that integrates a part-based sparse appearance model in a Bayesian inference framework for tracking targets in video sequences. We formulate the sparse appearance model as a set of smoothed colour histograms corresponding to the object windows detected by the Deformable Part Model (DPM) detector. The data association of each body part between frames is solved based on the position constraint, appearance coherence, and motion consistency. To deal with missing and noisy observations, the part detection window in the following frame is also predicted using an interacting multiple model (IMM) tracker. We have tested our tracking method on all the video sequences that involve people in upright poses from the TB-50 and TB-100 benchmark videos datasets. Our experimental results show that our tracking method outperforms six state-of-the-art tracking techniques
Kusakunniran, W, Wu, Q & Zhang, J 2017, 'Action Recognition based on Correlated Codewords of Body Movements', Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), International Conference on Digital Image Computing: Techniques and Applications, IEEE, Sydney, Australia, pp. 1-8.View/Download from: UTS OPUS or Publisher's site
Using spatio-temporal features is popular for action recognition. However, existing methods embed these local features into a global representation. Orders and correlations among local motions of each action are missing. This can make it difficult to distinguish closely related actions. This paper proposes a solution to address this challenge by encoding correlations of movements. Space-time interest points are detected in each action video. Then, feature descriptors are extracted from these key points and clustered into different codewords implicitly representing different characteristics of motions. The final representation of each action video is a combination of a bag of words and correlations between codewords. Then, the support vector machine is used as a classification tool. Based on the experimental results, the proposed method achieves a very promising performance and particularly outperforms the other existing methods that rely on spatio-temporal features.
Kusakunniran, W, Wul, Q, Ritthipravad, P & Zhang, J 2017, 'Three-stages hard exudates segmentation in retinal images', 2017 9th International Conference on Information Technology and Electrical Engineering, ICITEE 2017, 2017 9th International Conference on Information Technology and Electrical Engineering, IEEE, Phuket, Thailand, pp. 1-6.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. This paper proposes a three-stages method of hard exudate segmentation in retinal images. The first stage is the pre-processing. The color transfer is applied to make all retinal images to have the same color characteristics, based on statistical analysis. Then, only a yellow channel of each image is used in the further analysis. The second stage is the blob initialization. The blob detection based on color, size, and shape including circularity and convexity is used to identify initial pixels of hard exudates. The detected blobs must not be inside the optic disk. The third stage is the segmentation. The graph cut is iteratively applied on partitions of the image. The fine-tune segmentation in sub-images is necessary because the portion of hard exudates is significantly less than the portion of non-hard exudates. The proposed method is evaluated using the two well-known datasets, namely e-ophtha and DIARETDB1, in both aspects of pixel-level and image-level. Based on the comprehensive comparisons with the existing works, the proposed method is shown to be very promising. In the image-level, it achieves 96% sensitivity and 94% specificity for the e-ophtha dataset, and 96% sensitivity and 98% specificity for the DIARETDB1 dataset.
Xin, JN, Du, X & Zhang, J 2017, 'Deep learning for robust outdoor vehicle visual tracking', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Hong Kong, China, pp. 613-618.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Robust visual tracking for outdoor vehicle is still a challenging problem due to large appearance variations caused by illumination variation, occlusion and scale variation, etc. In this paper, a deep-learning-based approach for robust outdoor vehicle tracking is proposed. Firstly, a stacked denoising auto-encoder is pre-trained to learn the feature representation way of images. Then, a k-sparse constraint is added to the stacked denoising auto-encoder and the encoder of k-sparse stacked denoising auto-encoder (kSSDAE) is connected with a classification layer to construct a classification neural network. After fine-tuning, the classification neural network is applied to online tracking under particle filter framework. Extensive tracking experiments are conducted on a challenging single object online tracking evaluation platform benchmark to verify the effectiveness of our tracker. Experiments show that our tracker outperforms most state-of-the-art trackers.
Yao, L, Kusakunniran, W, Wu, Q, Zhang, J & Tang, Z 2017, 'Robust Gait Recognition under Unconstrained Environments using Hybrid Descriptions', Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), International Conference on Digital Image Computing: Techniques and Applications, IEEE, Sydney, Australia.View/Download from: UTS OPUS or Publisher's site
Gait is one of the key biometric features that has been widely applied for human identification. Appearance-based features and motion-based features are the two mainly used presentations in the gait recognition. However, appearance-based features are sensitive to the body shape changes and silhouette extraction from real-world images and videos also remains a challenge. As for motion features, due to the difficulty in extracting the underlying models from gait sequences, the localization of human joints lacks of high reliability and strong robustness. This paper proposes a new approach which utilizes Two-Point Gait (TPG) as the motion feature to remedy the deficiency of the appearance feature based on Gait Energy Image (GEI), in order to increase the robustness of gait recognition under the unconstrained environments with view changes and cloth changes. Another contribution of this paper is that this is the first time that TPG has been applied for view change and cloth change issues since it was proposed. The extensive experiments show that the proposed method is more invariant to the view change and cloth change, and can significantly improve the robustness of gait recognition.
Zhao, M, Zhang, J, Porikli, F, Zhang, C & Zhang, W 2017, 'Learning a perspective-embedded deconvolution network for crowd counting', 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE International Conference on Multimedia and Expo, IEEE, Hong Kong, China, pp. 403-408.View/Download from: UTS OPUS or Publisher's site
We present a novel deep learning framework for crowd counting
by learning a perspective-embedded deconvolution network.
Perspective is an inherent property of most surveillance
scenes. Unlike the traditional approaches that exploit the perspective
as a separate normalization, we propose to fuse the
perspective into a deconvolution network, aiming to obtain a
robust, accurate and consistent crowd density map. Through
layer-wise fusion, we merge perspective maps at different resolutions
into the deconvolution network. With the injection of
perspective, our network is driven to learn to combine the underlying
scene geometric constraints adaptively, thus enabling
an accurate interpretation from high-level feature maps to the
pixel-wise crowd density map. In addition, our network allows
generating density map for arbitrary-sized input in an
end-to-end fashion. The proposed method achieves competitive
result on the WorldExpo2010 crowd dataset.
Zuo, Y, Wu, Q & Zhang, J 2017, 'Minimum spanning forest with embedded edge inconsistency measurement for color-guided depth map upsampling', 2017 IEEE International Conference on Multimedia and Expo, IEEE, Hong Kong, China.View/Download from: UTS OPUS or Publisher's site
Color-guided depth map up-sampling, such as Markov-Random-Field-based (MRF-based) methods, is a popular depth map enhancement solution, which normally assumes edge consistency between color image and corresponding depth map. It calculates the coefficients of smoothness term in MRF according to such assumption. However, such consistency is not always true which leads to texture-copying artifacts and blurring depth edges. In this paper, we propose a novel coefficient computing scheme for smoothness term in MRF which is based on the distance between pixels in the Minimum Spanning Trees (Forest) to better preserve depth edges. The explicit edge inconsistency measurement is embedded into weights of edges in Minimum Spanning Trees, which significantly mitigates texture-copying artifacts. The proposed method is evaluated on Middlebury datasets and ToF-Mark datasets which demonstrates improved results compared with state-of-the-art methods.
Cho, N, Wu, Q, Xu, J & Zhang, J 2016, 'Content Authoring Using Single Image in Urban Environments for Augmented Reality', Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Gold Coast, Australia, pp. 1-7.View/Download from: UTS OPUS or Publisher's site
Content authoring is one of essentials of Augmented Reality (AR), which is to emplace an augmented content on a true part of a real scene in order to enhance users' visual experience. For the case of street view single 2D images, the challenge emerges because of clutter environments and unknown position and orientation related to camera pose. Although existing methods based on 2D feature point matching or vanishing point registration may recover the camera pose, the robustness is always challenging because of the uncertainty of feature point detection on texture-less region and displacement of vanishing point detection caused by irregular lines detected on the scene. By taking the advantages of characteristics of the man-made object (e.g. building) widely seen on the street view, this paper proposes a simple yet efficient content authoring approach. In this approach, the building dominant plane where the virtual object will be emplaced is detected and then projected to the frontal-parallel view on which the virtual object can be reliably emplaced. Once the virtual object and the true scene are embedded to each other on the frontal-parallel view, they are able to be converted back to the original view using inverse projection without any distortion. Experiments on public databases show that the proposed method can recover camera pose and implement content emplacement with promising performance.
Huang, X, Fan, L, Zhang, J, Wu, Q & Yuan, C 2016, 'Real Time Complete Dense Depth Reconstruction for a Monocular Camera', Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, Nevada., pp. 674-679.View/Download from: UTS OPUS or Publisher's site
In this paper, we aim to solve the problem of estimating complete dense depth maps from a monocular moving camera. By 'complete', we mean depth information is estimated for every pixel and detailed reconstruction is achieved. Although this problem has previously been attempted, the accuracy of complete dense depth reconstruction is a remaining problem. We propose a novel system which produces accurate complete dense depth map. The new system consists of two subsystems running in separated threads, namely, dense mapping and sparse patch-based tracking. For dense mapping, a new projection error computation method is proposed to enhance the gradient component in estimated depth maps. For tracking, a new sparse patch-based tracking method estimates camera pose by minimizing a normalized error term. The experiments demonstrate that the proposed method obtains improved performance in terms of completeness and accuracy compared to three state-of the-art dense reconstruction methods VSFM+CMVC, LSDSLAM and REMODE.
Huang, X, Zhang, J, Wu, Q, Fan, L & Yuan, C 2016, 'A coarse-to-fine algorithm for registration in 3D street-view cross-source point clouds', Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Gold coast, Australia..View/Download from: UTS OPUS or Publisher's site
With the development of numerous 3D sensing technologies, object registration on cross-source point cloud has aroused researchers' interests. When the point clouds are captured from different kinds of sensors, there are large and different kinds of variations. In this study, we address an even more challenging case in which the differently-source point clouds are acquired from a real street view. One is produced directly by the LiDAR system and the other is generated by using VSFM software on image sequence captured from RGB cameras. When it confronts to large scale point clouds, previous methods mostly focus on point-to-point level registration, and the methods have many limitations.The reason is that the least mean error strategy shows poor ability in registering large variable cross-source point clouds. In this paper, different from previous ICP-based methods, and from a statistic view, we propose a effective coarse-to-fine algorithm to detect and register a small scale SFM point cloud in a large scale Lidar point cloud. Seen from the experimental results, the model can successfully run on LiDAR and SFM point clouds, hence it can make a contribution to many applications, such as robotics and smart city development
Wu, S, Jing, XY, Yue, D, Zhang, J, Yang, KJ & Yang, J 2016, 'Unsupervised visual domain adaptation via dictionary evolution', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Seattle, Washington, United States.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.In real-word visual applications, distribution mismatch between samples from different domains may significantly degrade classification performance. To improve the generalization capability of classifier across domains, domain adaptation has attracted a lot of interest in computer vision. This work focuses on unsupervised domain adaptation which is still challenging because no labels are available in the target domain. Most of the attention has been dedicated to seeking domain-invariant feature by exploring the shared structure between domains, ignoring the valuable discriminative information contained in the labeled source data. In this paper, we propose a Dictionary Evolution (DE) approach to construct discriminative features robust to domain shift. Specifically, DE aims to adapt a discriminative dictionary learnt based on labeled source samples to unlabeled target samples through a gradual transition process. We show that the learnt dictionary is endowed with cross-domain data representation ability and powerful discriminant capability. Empirical results on real world data sets demonstrate the advantages of the proposed approach over competing methods.
Yao, Y, Hua, XS, Shen, F, Zhang, J & Tang, Z 2016, 'A domain robust approach for image dataset construction', MM 2016 - Proceedings of the 2016 ACM Multimedia Conference, ACM International Conference on Multimedia, ACM, Amsterdam, The Netherlands, pp. 212-216.View/Download from: UTS OPUS or Publisher's site
© 2016 ACM.There have been increasing research interests in automatically constructing image dataset by collecting images from the Internet. However, existing methods tend to have a weak domain adaptation ability, known as the \dataset bias problem". To address this issue, in this work, we propose a novel image dataset construction framework which can generalize well to unseen target domains. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora (GBNC) to obtain a richer semantic description, from which the noisy query expansions are then filtered out. By treating each expansion as a \bag" and the retrieved images therein as \instances", we formulate image filtering as a multi-instance learning (MIL) problem with constrained positive bags. By this approach, images from different data distributions will be kept while with noisy images filtered out. Comprehensive experiments on two challenging tasks demonstrate the effectiveness of our proposed approach.
Yao, Y, Zhang, J, Hua, XS, Shen, F & Tang, Z 2016, 'Extracting visual knowledge from the internet: Making sense of image data', MultiMedia Modeling (LNCS), International Conference on Multimedia Modeling, Springer, Miami, USA, pp. 862-873.View/Download from: Publisher's site
© Springer International Publishing Switzerland 2016.Recent successes in visual recognition can be primarily attributed to feature representation, learning algorithms, and the everincreasing size of labeled training data. Extensive research has been devoted to the first two, but much less attention has been paid to the third. Due to the high cost of manual data labeling, the size of recent efforts such as ImageNet is still relatively small in respect to daily applications. In this work, we mainly focus on how to automatically generate identifying image data for a given visual concept on a vast scale. With the generated image data, we can train a robust recognition model for the given concept. We evaluate the proposed webly supervised approach on the benchmark Pascal VOC 2007 dataset and the results demonstrates the superiority of our method over many other state-ofthe- art methods in image data collection.
Yao, Y, Zhang, J, Shen, F, Hua, X, Xu, J & Tang, Z 2016, 'Automatic image dataset construction with multiple textual metadata', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Seattle, Washington, USA.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.The goal of this work is to automatically collect a large number of highly relevant images from the Internet for given queries. A novel image dataset construction framework is proposed by employing multiple textual metadata. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora to obtain a richer semantic description, from which the visually non-salient and less relevant expansions are then filtered. After retrieving images from the Internet with filtered expansions, we further filter noisy images by clustering and progressively Convolutional Neural Networks (CNN). To verify the effectiveness of our proposed method, we construct a dataset with 10 categories, which is not only much larger than but also have comparable cross-dataset generalization ability with manually labeled dataset STL-10 and CIFAR-10.
Zhang, J, Zhang, J, Lu, J, Shen, C, Curr, K, Phua, R, Neville, R & Edmonds, E 2016, 'SLNSW-UTS: A Historical Image Dataset for Image Multi-Labeling and Retrieval', 2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016, Digital Image Computing Techniques and Applications, IEEE, Gold Coast, Australia, pp. 1-6.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.This paper introduces a dataset of historical images created by the State Library of New South Wales and the University of Technology Sydney (UTS). The dataset has a total of 29713 images with 119 unique labels. Each image contains multiple labels. We use a CNN-based framework to explore the feasibility of our dataset in image multi-labeling and retrieval research, and extract semantic level image features for future research use. The experiment results illustrate that effective deep learning models can be trained on our dataset. We also introduce five applications that can be studied on our historical image dataset.
Zhao, Y, Di, H, Zhang, J, Lu, Y & Lv, F 2016, 'Recognizing human actions from low-resolution videos by region-based mixture models', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Seattle, Washington, United States.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.Recognizing human action from low-resolution (LR) videos is essential for many applications including large-scale video surveillance, sports video analysis and intelligent aerial vehicles. Currently, state-of-the-art performance in action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, the optical flow algorithms are far from perfect in LR videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points(SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM models the spatial layout of features without any needs of body parts segmentation. Experiments are conducted on two publicly available LR human action datasets. Among which, the UT-Tower dataset is very challenging because the average height of human figures is only about 20 pixels. The proposed approach attains near-perfect accuracy on both of the datasets.
Zhou, T, Lu, Y, Di, H & Zhang, J 2016, 'Video object segmentation aggregation', Proceedings - IEEE International Conference on Multimedia and Expo (ICME) 2016, IEEE International Conference on Multimedia and Expo, IEEE, Seattle.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.We present an approach for unsupervised object segmentation in unconstrained videos. Driven by the latest progress in this field, we argue that segmentation performance can be largely improved by aggregating the results generated by state-of-the-art algorithms. Initially, objects in individual frames are estimated through a per-frame aggregation procedure using majority voting. While this can predict relatively accurate object location, the initial estimation fails to cover the parts that are wrongly labeled by more than half of the algorithms. To address this, we build a holistic appearance model using non-local appearance cues by linear regression. Then, we integrate the appearance priors and spatio-temporal information into an energy minimization framework to refine the initial estimation. We evaluate our method on challenging benchmark videos and demonstrate that it outperforms state-of-the-art algorithms.
Zuo, Y, Wu, Q, An, P & Zhang, J 2016, 'Explicit measurement on depth-color inconsistency for depth completion', Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), IEEE International Conference on Image Processing, IEEE, Phoenix, AZ, USA, pp. 4037-4041.View/Download from: UTS OPUS or Publisher's site
Color-guided depth completion is to refine depth map through structure light sensing by filling missing depth structure and de-nosing. It is based on the assumption that depth discontinuity and color edge at the corresponding location are consistent. Among all proposed methods, MRF-based method including its variants is one of major approaches. However, the assumption above is not always true, which causes texture-copy and depth discontinuity blurring artifacts. The state-of-the-art solutions usually are to modify the weighting inside smoothness term of MRF model. Because there is no any method explicitly considering the inconsistency occurring between depth discontinuity and the corresponding color edge, they cannot adaptively control the effect of guidance from color image when completing depth map. In this paper, we propose quantitative measurement on such inconsistency and explicitly embed it into weighting value of smoothness term. The proposed method is evaluated on NYU Kinect datasets and demonstrates promising results.
Zuo, Y, Wu, Q, Zhang, J & An, P 2016, 'Explicit modeling on depth-color inconsistency for color-guided depth up-sampling', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, USA.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE. Color-guided depth up-sampling is to enhance the resolution of depth map according to the assumption that the depth discontinuity and color image edge at the corresponding location are consistent. Through all methods reported, MRF including its variants is one of major approaches, which has dominated in this area for several years. However, the assumption above is not always true. Solution usually is to adjust the weighting inside smoothness term in MRF model. But there is no any method explicitly considering the inconsistency occurring between depth discontinuity and the corresponding color edge. In this paper, we propose quantitative measurement on such inconsistency and explicitly embed it into weighting value of smoothness term. Such solution has not been reported in the literature. The improved depth up-sampling based on the proposed method is evaluated on Middlebury datasets and ToFMark datasets and demonstrate promising results.
Cheng, H, Zhang, J, Ping, A & Liu, Z 2015, 'A Novel Saliency Model for Stereoscopic Images', Digital Image Computing: Techniques and Applications (DICTA), 2015 International Conference on, The International Conference on Digital Image Computing: Techniques and Applications (DICTA), IEEE, Adelaide, pp. 1-7.View/Download from: Publisher's site
In this paper, we propose a novel saliency model
for stereoscopic images. To improve depth information for stereo
saliency analysis, this model exploits depth information from
three aspects: 1) we extract the low-level features based on the
color-depth contrast features in a local and global search range
(local-global contrast); 2) to extract the topological structural
from a depth map, a surrounding map based on a Boolean
map is obtained as a weight value to enhance the local-global
contrast features; and 3) based on the saliency probability
distribution in depth information, we employ stereo center prior
enhancement to compute the final saliency. Experimental results
on two recent eye-tracking databases show that our proposed
method outperforms the state-of-the-art saliency models
Huang, S, Zhang, J, Lu, S & Hua, X-S 2015, 'Social Friend Recommendation Based on Network Correlation and Feature Co-Clustering', Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ACM international Conference on Multimedia Retrieval, ACM New York, NY, USA ©2015, Shanghai, pp. 315-322.View/Download from: UTS OPUS or Publisher's site
Friend recommendation is an important recommender application in social media. Major social websites such as Twitter and Facebook are all capable of recommending friends to individuals. However, friend recommendation is a difficult task and most social websites use simple friend recommendation algorithms such as similarity and popularity, whose level of accuracy does do not satisfy the majority of users.
In this paper we propose a two-stage procedure for more accurate friend recommendation: In the rest stage, based on the relationship of different social networks, the Flickr tag network and contact network are aligned to generate a "possible friend list"; In the second stage, making the assumption that a friend's friends also tend to be friends",
co-clustering is applied to the tag and image information of the list to refine the recommendation result in the first stage. Experimental results show that the proposed method achieves good performance and every stage contributes to the recommendation.
Huang, X, Yuan, C & Zhang, J 2015, 'Graph Cuts Stereo Matching Based on Patch-Match and Ground Control Points Constraint', Advances in Multimedia Information Processing (LNCS), Pacific-Rim Conference on Multimedia, Springer, Gwangju, South Korea, pp. 14-23.View/Download from: UTS OPUS or Publisher's site
Stereo matching methods based on Patch-Match obtain good results on complex texture regions but show poor ability on low texture regions. In this paper, a new method that integrates Patch-Match and graph cuts (GC) is proposed in order to achieve good results in both complex and low texture regions. A label is randomly assigned for each pixel and the label is optimized through propagation process. All these labels constitute a label space for each iteration in GC. Also, a Ground Control Points (GCPs) constraint term is added to the GC to overcome the disadvantages of Patch-Match stereo in low texture regions. The proposed method has the advantage of the spatial propagation of Patch-Match and the global property of GC. The results of experiments are tested on the Middlebury evaluation system and outperform all the other PatchMatch based methods
Huang, X, Zhang, J, Wu, Q, Yuan, C & Fan, L 2015, 'Dense Correspondence Using Non-local DAISY Forest', Proceedings of the 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Adelaide, pp. 1-8.View/Download from: UTS OPUS or Publisher's site
Dense correspondence computation is a critical computer vision task with many applications. The most existing dense correspondence methods consider all the neighbors connected to the center pixels and use local support region. However, such approach might only achieve a locally-optimal solution.In this paper, we propose a non-local dense correspondence computation method by calculating the match cost on a tree structure. It is non-local because all other nodes on the tree contribute to the match cost computing for the current node. The proposed method consists of three steps, namely: 1) DAISY descriptor computation, 2) edge-preserving segmentation and forest construction, 3) PatchMatch fast search. We test our algorithm on the Middlebury and Moseg datasets. The results show that the proposed method outperforms the state-of-the-art methods in dense correspondence computing and has a low computation complexity.
Liu, X, Wang, L, Ying, J, Dou, Y & Zhang, J 2015, 'Absent Multiple Kernel Learning', Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI Publications, Austin, Texas.View/Download from: UTS OPUS
Multiple kernel learning (MKL) optimally combines the multiple channels of each sample to improve classification performance. However, existing MKL algorithms cannot effectively handle the situation where some channels are missing, which is common in practical applications. This paper proposes an absent MKL (AMKL) algorithm to address this issue. Different from existing approaches where missing channels are firstly imputed and then a standard MKL algorithm is deployed on the imputed data, our algorithm directly classifies each sample with its observed channels. In specific, we define a margin for each sample in its own relevant space, which corresponds to the observed channels of that sample. The proposed AMKL algorithm then maximizes the minimum of all sample-based margins, and this leads to a difficult optimization problem. We show that this problem can be reformulated as a convex one by applying the representer theorem. This makes it readily be solved via existing convex optimization packages. Extensive experiments are conducted on five MKL benchmark data sets to compare the proposed algorithm with existing imputation-based methods. As observed, our algorithm achieves superior performance and the improvement is more significant with the increasing missing ratio.
Wang, Y, Zhang, J, Liu, Z, Wu, Q, Chou, P, Zhang, Z & Jai, Y 2014, 'Completed Dense Scene Flow in RGB-D Space', Computer Vision - ACCV 2014 Workshops, Asian Conference on Computer Vision, Springer International Publishing, Singapore, pp. 191-205.View/Download from: UTS OPUS or Publisher's site
Conventional scene flow containing only translational vectors is not able to model 3D motion with rotation properly. Moreover, the accuracy of 3D motion estimation is restricted by several challenges such as large displacement, noise, and missing data (caused by sensing techniques or occlusion). In terms of solution, there are two kinds of approaches: local approaches and global approaches. However, local approaches can not generate smooth motion field, and global approaches is difficult to handle large displacement motion. In this paper, a completed dense scene flow framework is proposed, which models both rotation and translation for general motion estimation. It combines both a local method and a global method considering their complementary characteristics to handle large displacement motion and enforce smoothness respectively. The proposed framework is applied on the RGB-D image space where the computation efficiency is further improved. According to the quantitative evaluation based on Middlebury dataset, our method outperforms other published methods. The improved performance is further confirmed on the real data acquired by Kinect sensor.
Xu, W, Miao, Z, Zhang, J & Tian, Y 2014, 'Learning spatio-temporal features for action recognition with modified hidden conditional random field', Computer Vision - ECCV 2014 Workshops: Zurich, Switzerland, September 6-7 and 12, 2014, Proceedings, Part I, European Conference on Computer Vision, Springer International Publishing, Zurich; Switzerland, pp. 786-801.View/Download from: UTS OPUS or Publisher's site
Previous work on human action analysis mainly focuses on designing hand-crafted local features and combining their context information. In this paper, we propose using supervised feature learning as a way to learn spatio-temporal features. More specifically, a modified hidden conditional random field is applied to learn two high-level features conditioned on a certain action label. Among them, the individual features can describe the appearance of local parts and the interaction features can capture their spatial constraints. In order to make the best of what have been learned, a new categorization model is proposed for action matching. It is inspired by the Deformable Part Model and the intuition is that actions can be modeled by local features in a changeable spatial and temporal dependency. Experimental result shows that our algorithm can successfully recognize human actions with high accuracies both on the simple atomic action database (KTH and Weizmann) and complex interaction activity database (CASIA).
Zhao, M, Zhang, C, Zhang, W, Li, W & Zhang, J 2015, 'Decorrelation-Stretch based Cloud Detection for Total Sky Images', Visual Communications and Image Processing (VCIP), 2015, 2015 Visual Communications and Image Processing (VCIP 2015), IEEE, Singapore.View/Download from: UTS OPUS or Publisher's site
Cloud detection plays an important role in total-sky images based solar forecasting and has received more attention in recent years. Accurate cloud detection for complicated total-sky images is especially changeling due to the low contrast and vague boundaries between cloud and sky regions. Unlike the existing cloud detection method without any preprocessing, one novel decorrelation-stretch (DS) based method is proposed in this work, where the total-sky images are preprocessed using the DS algorithm firstly. With this enhancement, color feature disparity of cloud and sky can be intensified notably, and then a more accurate threshold can be obtained by applying the Minimum Cross Entropy (MCE) to the preprocessed image. Experimental results demonstrated the proposed scheme achieves better performance than the existing cloud detection methods on total-sky images, especially for images with low contrast or vague boundaries between cloud and sky regions.
Guo, D, Zhang, J, Xu, M, He, X, Li, M & Zhao, C 2014, 'A Multiple Features Distance Preserving (MFDP) Model for Saliency Detection', Bouzerdoum, Digital Image Computing Techniques and Applications, IEEE, Wollongong.View/Download from: UTS OPUS or Publisher's site
Playing a vital role, saliency has been widely applied for various image analysis tasks, such as content-aware image retargeting, image retrieval and object detection. It is generally accepted that saliency detection can benefit from the integration of multiple visual features. However, most of the existing literatures fuse multiple features at saliency map level without considering cross-feature information, i.e. generate a saliency map based on several maps computed from an individual feature. In this paper, we propose a Multiple Feature Distance Preserving (MFDP) model to seamlessly integrate multiple visual features through an alternative optimization process. Our method outperforms the state-of-the-arts methods on saliency detection. Saliency detected by our method is further cooperated with seam carving algorithm and significantly improves the performance on image retargeting.
Guo, D, Zhang, J, Liu, X, Cui, Y & Zhao, C 2014, 'Multiple Kernel Learning Based Multi-view Spectral Clustering', 2014 22nd International Conference on Pattern Recognition (ICPR), International Conference on Pattern Recognition, IEEE, Stockholm, Sweden, pp. 3774-3779.View/Download from: UTS OPUS or Publisher's site
For a given data set, exploring their multi-view instances under a clustering framework is a practical way to boost the clustering performance. This is because that each view might reflect partial information for the existing data. Furthermore, due to the noise and other impact factors, exploring these instances from different views will enhance the mining of the real structure and feature information within the data set. In this paper, we propose a multiple kernel spectral clustering algorithm through the multi-view instances on the given data set. By combining the kernel matrix learning and the spectral clustering optimization into one process framework, the algorithm can determine the kernel weights and cluster the multi-view data simultaneously. We compare the proposed algorithm with some recent published methods on real-world datasets to show the efficiency of the proposed algorithm.
Huang, S, Zhang, J, Liu, X & Wang, L 2014, 'A method of discriminative information preservation and in-dimension distance minimization method for feature selection', Proceedings - International Conference on Pattern Recognition, International Conference on Pattern Recognition, IEEE, Swedish Soc Automated Image Anal, Stockholm, SWEDEN, pp. 1615-1620.View/Download from: Publisher's site
© 2014 IEEE. Preserving sample's pair wise similarity is essential for feature selection. In supervised learning, labels can be used as a direct measure to check whether two samples are similar with each other. In unsupervised learning, however, such similarity information is usually unavailable. In this paper, we propose a new feature selection method through spectral clustering based on discriminative information as an underlying data structure. Laplacian matrix is used to obtain more partitioning information than other previously proposed structures such as the Eigen space of original data. The high dimension of sample data is projected into a low dimensional space. The in-dimension distance is also considered to get a better compact clustering result. The proposed method can be solved efficiently by updating the projection matrix and its inverse normalized diagonal matrix. A comprehensive experimental study has demonstrated that the proposed method outperforms many state-of-the-art feature selection algorithms with different criterion including the accuracy of clustering/classification and Jaccard score.
Liu, X, Wang, L, Zhang, J & Yin, J 2014, 'Sample-adaptive Multiple Kernel Learning', Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI Publication, Québec, Canada, pp. 1975-1981.View/Download from: UTS OPUS
Existing multiple kernel learning (MKL) algorithms indiscriminately
apply a same set of kernel combination weights
to all samples. However, the utility of base kernels could vary
across samples and a base kernel useful for one sample could
become noisy for another. In this case, rigidly applying a
same set of kernel combination weights could adversely affect
the learning performance. To improve this situation, we
propose a sample-adaptive MKL algorithm, in which base
kernels are allowed to be adaptively switched on/off with
respect to each sample. We achieve this goal by assigning
a latent binary variable to each base kernel when it is applied
to a sample. The kernel combination weights and the
latent variables are jointly optimized via margin maximization
principle. As demonstrated on five benchmark data sets,
the proposed algorithm consistently outperforms the comparable
ones in the literature.
Peng, F, Wu, Q, Fan, L, Zhang, J, You, Y, Lu, J & Yang, J 2014, 'Street view cross-sourced point cloud matching and registration', Proceedings of the 21st IEEE International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Paris, France, pp. 2026-2030.View/Download from: UTS OPUS or Publisher's site
Object registration has been widely discussed with the development of various range sensing technologies. In most work, however, the point clouds of reference and target are generated by the same technology, such as a Kinect range camera, LiDAR sensor, or Structure from Motion technique. Cases in which reference and target point clouds are generated by different technologies are rarely discussed. Due to the significant differences across various point cloud data in terms of point cloud density, sensing noise, scale, occlusion etc., object registration between such different point clouds becomes extremely difficult. In this study, we address for the first time an even more challenging case in which the differently-sourced point clouds are acquired from a real street view. One is generated on the basis of an image sequence through the SfM process, and the other is produced directly by the LiDAR system. We propose a two-stage matching and registration algorithm to achieve object registration between these two different point clouds. The experiments are based on real building object point cloud data and demonstrate the effectiveness and efficiency of the proposed solution. The newly proposed solution can be further developed to contribute to several related applications, such as Location Based Service.
Wang, D, Yuan, C, Sun, Y, Zhang, J & Zhou, H 2014, 'Fast Mode and Depth Decision Algorithm for Intra Prediction of Quality SHVC', Intelligent Computing Theory, International Conference on Intelligent Computing, Springer International Publishing, Taiyuan, China, pp. 693-699.View/Download from: Publisher's site
Scalable High-Efficiency Video Coding (SHVC) is an extension of High Efficiency Video Coding (HEVC). Since the coding procedure for HEVC is very complex, the coding procedure for SHVC is even more complex, it is very important to improve its coding speed. In this paper, we have proposed a fast mode and depth decision algorithm for Intra prediction of Quality SHVC. Initially, only partial modes are checked to determine the local minimum points (LMPs) based on the relationships between the modes and their corresponding Hadamard Costs (HC); and then only partial depths are checked by skipping depths with low possibilities indicated based on their inter-layer correlations and textural features. The experimental results showed that the proposed algorithm could improve coding speed by 61.31% on average with negligible coding efficiency losses.
Wang, Y, Di, H, Wang, B, Liang, W, Zhang, J & Jia, Y 2014, 'Depth Super-Resolution by Fusing Depth Imaging and Stereo Vision with Structural Determinant Information Inference', 2014 22nd International Conference on Pattern Recognition (ICPR), International Conference on Pattern Recognition, IEEE, Stockholm, Sweden, pp. 4212-4217.View/Download from: UTS OPUS
In this paper, we present a depth super-resolution
framework by fusing depth imaging and stereo vision for highresolution
and high-accuracy depth maps. Depth cameras and
stereo vision have their own limitations in some aspects, but
their characteristics of range sensing are complementary. Thus,
combining both approaches can produce more satisfactory results
than either one. Unlike previous fusion methods, we initially
taking the noisy depth observation from depth camera as prior
information of scene structure. The prior information of scene
structure is also utilized to infer structural determinant information,
like depth discontinuity and occlusion, which is essential
to improve the quality of depth map in the fusion process. In
succession, the prior knowledge helps to overcome difficulties of
intensity inconsistency in image observation from stereo vision
component. Experimental results dem
Xu, J, Wu, Q, Zhang, J, Silk, B, Ngo, GT & Tang, Z 2014, 'Efficient People Counting With Limited Manual Interfaces', 2014 International Conference on Digital lmage Computing: Techniques and Applications (DlCTA), Digital Image Computing Techniques and Applications, IEEE, Wollongong, NSW, Australia.View/Download from: UTS OPUS
People counting is a topic with various practical
applications. Over the last decade, two general approaches have
been proposed to tackle this problem: a) counting based on
individual human detection; b) counting by measuring regression
relation between the crowd density and number of people.
Because the regression based method can avoid explicit people
detection which faces several well-known challenges, it has been
considered as a robust method particularly on a complicated
environments. An efficient regression based method is proposed
in this paper, which can be well adopted into any existing video
surveillance system. It adopts color based segmentation to extract
foreground regions in images. Regression is established based on
the foreground density and the number of people. This method
is fast and can deal with lighting condition changes. Experiments
on public datasets and one captured dataset have shown the
effectiveness and robustness of the method.
Kusakunniran, W, satoh, S, Zhang, J & Wu, Q 2013, 'Attribute-based learning for large scale object classification', 2013 IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, San Jose, California, USA, pp. 1-6.View/Download from: UTS OPUS or Publisher's site
Scalability to large numbers of classes is an important challenge for multi-class classification. It can often be computationally infeasible at test phase when class prediction is performed by using every possible classifier trained for each individual class. This paper proposes an attribute-based learning method to overcome this limitation. First is to define attributes and their associations with object classes automatically and simultaneously. Such associations are learned based on greedy strategy under certain conditions. Second is to learn a classifier for each attribute instead of each class. Then, these trained classifiers are used to predict classes based on their attribute representations. The proposed method also allows trade-off between test-time complexity (which grows linearly with the number of attributes) and accuracy. Experiments based on Animals-with-Attributes and ILSVRC2010 datasets have shown that the performance of our method is promising when compared with the state-of-the-art.
wang, S, Miao, Z & Zhang, J 2013, 'Simultaneously detect and segment pedestrian', 2013 IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, San Jose, USA, pp. 1-4.View/Download from: UTS OPUS or Publisher's site
We present a framework to simultaneously detect and segment pedestrian in images. Our work is based on part-based method. We first segment the image into superpixels, then assemble superpixels into body part candidates by comparing the assembled shape with pre-built template library. A structure-based shape matching algorithm is developed to measure the shape similarity. All the body part candidates are input into our modified AND/OR graph to generate the most reasonable combination. The graph describes the possible variation of body configuration and model the constrain relationship between body parts. We perform comparison experiments on the public database and the results show the effectiveness of our framework.
wang, S, Zhang, J & Miao, Z 2013, 'A New Edge Feature for head-shoulder Detection', 2013 IEEE International Conference on Image Processing, IEEE International Conference on Image Processing, Piscataway, NJ, Melbourne, Australia, pp. 2822-2826.View/Download from: UTS OPUS or Publisher's site
In this work, we introduce a new edge feature to improve the head-shoulder detection performance. Since Head-shoulder detection is much vulnerable to vague contour, our new edge feature is designed to extract and enhance the head-shoulder contour and suppress the other contours. The basic idea is that head-shoulder contour can be predicted by filtering edge image with edge patterns, which are generated from edge fragments through a learning process. This edge feature can significantly enhance the object contour such as human head and shoulder known as En-Contour. To evaluate the performance of the new En-Contour, we combine it with HOG+LBP  as HOG+LBP+En-Contour. The HOG+LBP is the state-of-the-art feature in pedestrian detection. Because the human head-shoulder detection is a special case of pedestrian detection, we also use it as our baseline. Our experiments have indicated that this new feature significantly improve the HOG+LBP.
Xu, J, Wu, Q, Zhang, J, Shen, F & Tang, Z 2013, 'Training boosting-like algorithms with semi-supervised subspace learning', 2013 IEEE International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Melbourne, Australia, pp. 4302-4306.View/Download from: UTS OPUS or Publisher's site
Boosting algorithms have attracted great attention since the first real-time face detector by Viola & Jones through feature selection and strong classifier learning simultaneously. On the other hand, researchers have proposed to decouple such two procedures to improve the performance of Boosting algorithms. Motivated by this, we propose a boosting-like algorithm framework by embedding semi-supervised subspace learning methods. It selects weak classifiers based on class-separability. Combination weights of selected weak classifiers can be obtained by subspace learning. Three typical algorithms are proposed under this framework and evaluated on public data sets. As shown by our experimental results, the proposed methods obtain superior performances over their supervised counterparts and AdaBoost.
Xu, W, Miao, Z, Zhang, J, Zhang, Q & Wu, H 2013, 'Spatial-temporal context for action recognition combined with confidence and contribution weight', Proceedings - 2nd IAPR Asian Conference on Pattern Recognition, ACPR 2013, pp. 576-580.View/Download from: Publisher's site
In this paper, we propose a new method for human action analysis in videos. A video sequence of human action in our perspective can be modeled through feature distribution over spatial-temporal domain. Relationships between features and each defined action are also explored to form discriminative feature sets. In our work, we first capture contextual correlations between the local features through multiple windows. We then mine confidences from association rules and learn contributions from trained-SVM based on sample videos. Finally, through the analysis of feature distribution and their interactions over spatial-temporal domain, we combine the contexture correlations and the relationships between words and their related actions to derive weights of bag of feature words for action matching. In most of the case, our experiments have indicated that the new method outperforms other previous published results on the Weizmann and KTH datasets. © 2013 IEEE.
Song, Y, Zhang, J, Cao, L & Sangeux, M 2013, 'On Discovering the Correlated Relationship between Static and Dynamic Data in Clinical Gait Analysis', Lecture Notes in Computer Science, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, Prague, Czech Republic, pp. 563-578.View/Download from: UTS OPUS or Publisher's site
`Gait' is a person's manner of walking. Patients may have an abnormal gait due to a range of physical impairment or brain damage. Clinical gait analysis (CGA) is a technique for identifying the underlying impairments that affect a patients gait pattern. The CGA is critical for treatment planning. Essentially, CGA tries to use patients physical examination results, known as static data, to interpret the dynamic characteristics in an abnormal gait, known as dynamic data. This process is carried out by gait analysis experts, mainly based on their experience which may lead to subjective diagnoses. To facilitate the automation of this process and form a relatively objective diagnosis, this paper proposes a new probabilistic correlated static-dynamic model (CSDM) to discover correlated relationships between the dynamic characteristics of gait and their root cause in the static data space. We propose an EMbased algorithm to learn the parameters of the CSDM. One of the main advantages of the CSDM is its ability to provide intuitive knowledge. For example, the CSDM can describe what kinds of static data will lead to what kinds of hidden gait patterns in the form of a decision tree, which helps us to infer dynamic characteristics based on static data. Our initial experiments indicate that the CSDM is promising for discovering the correlated relationship between physical examination (static) and gait (dynamic) data.
Shen, Y, Miao, Z & Zhang, J 2012, 'Unsupervised Online Learning Trajectory Analysis Based on Weighted Directed Graph', 2012 21st International Conference on Pattern Recognition (ICPR), International Conference on Pattern Recognition, IEEE, Tsukuba, Japan, pp. 1306-1309.View/Download from: UTS OPUS
In this paper, we propose a novel unsupervised online learning trajectory analysis method based on weighted directed graph. Each trajectory can be represented as a sequence of key points. In the training stage, unsupervised expectation-maximization algorithm (EM) is applied for training data to cluster key points. Each class is a Gaussian distribution. It is considered as a node of the graph. According to the classification of key points, we can build a weighted directed graph to represent the trajectory network in the scene. Each path is a category of trajectories. In the test stage, we adopt online EM algorithm to classify trajectories and update the graph. In the experiments, we test our approach and obtain a good performance compared with state-of-the-art approaches.
Xu, J, Wu, Q, Zhang, J & Tang, Z 2013, 'Object Detection Based on Co-Ocurrence GMuLBP Features', 2012 IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE Computer Society, 2012 IEEE International Conference on Multimedia and Expo, pp. 943-948.View/Download from: UTS OPUS or Publisher's site
Image co-occurrence has shown great powers on object classification because it captures the characteristic of individual features and spatial relationship between them simultaneously. For example, Co-occurrence Histogram of Oriented Gradients (CoHOG) has achieved great success on human detection task. However, the gradient orientation in CoHOG is sensitive to noise. In addition, CoHOG does not take gradient magnitude into account which is a key component to reinforce the feature detection. In this paper, we propose a new LBP feature detector based image co-occurrence. Building on uniform Local Binary Patterns, the new feature detector detects Co-occurrence Orientation through Gradient Magnitude calculation. It is known as CoGMuLBP. An extension version of the GoGMuLBP is also presented. The experimental results on the UIUC car data set show that the proposed features outperform state-of-the-art methods.
Zhang, J, Lu, S, Mei, T, Wang, J, Wang, Z, Feng, D, Sun, J & Li, S 2012, 'Browse-to-search', Browse-to-search, ACM International Conference on Multimedia, ACM, Nara, Japan, pp. 1323-1324.View/Download from: UTS OPUS
Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 30â40KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit ratemobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server.
Zhang, J, Schonfeld, D & Feng, DD 2012, 'Message from ICME 2012 general chairs', Proceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2012.View/Download from: Publisher's site
ICME 2012 is the thirteen in the series of ICME conferences that has been held annually since 2000, in various cities throughout the world. The success of this conference would not have been possible without the generous help of sponsors. Paper prizes and Student Travel Grants are sponsored by the National Information and Communications Technology Australia (NICTA), Microsoft Research, IBM Research, Canon Information Systems Research Australia (CiSRA), and Advanced Analytics Institute (AAI) at the University of Technology, Sydney (UTS). ICME 2012 features a new plenary session - Time Machine! The session consists of a series of expert presentations that re-introduce ideas published "before their time" and, as a result, their impact has not yet been fully realized. ICME 2012 also has outstanding lectures including keynote lectures and research overviews. ICME 2012 will offer several paper prizes, including Best Paper Award, Best Student Paper Award, and Best Demo Award. © 2012 IEEE.
Zhang, J, Wu, Y, Lu, S, Mei, T & Li, S 2012, 'Local visual words coding for low bit rate mobile visual search', Local visual words coding for low bit rate mobile visual search, ACM International Conference on Multimedia, ACM, Nara, Japan., pp. 989-992.View/Download from: UTS OPUS or Publisher's site
Mobile visual search has attracted extensive attention for its huge potential for numerous applications. Research on this topic has been focused on two schemes: sending query images, and sending compact descriptors extracted on mobile phones. The first scheme requires about 30â40KB data to transmit, while the second can reduce the bit rate by 10 times. In this paper, we propose a third scheme for extremely low bit ratemobile visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. This scheme can further reduce the bit rate with few extra computational costs on the client. Specifically, we store a vocabulary tree and extract visual descriptors on the mobile client. A light-weight pre-retrieval is performed to obtain the visited leaf nodes in the vocabulary tree. The orientation of each local descriptor and the tree histogram are then encoded to be transmitted to server. Our new scheme transmits less than 1KB data, which reduces the bit rate in the second scheme by 3 times, and obtains about 30% improvement in terms of search accuracy over the traditional Bag-of-Words baseline. The time cost is only 1.5 secs on the client and 240 msecs on the server
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2011, 'Pairwise Shape configuration-based PSA for gait recognition under small viewing angle change', 2011 8th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), IEEE International Conference on Video and Signal Based Surveillance (AVSS), IEEE, Klagenfurt, Austria, pp. 17-22.View/Download from: UTS OPUS or Publisher's site
Two main components of Procrustes Shape Analysis (PSA) are adopted and adapted specifically to address gait recognition under small viewing angle change: 1) Procrustes Mean Shape (PMS) for gait signature description; 2) Procrustes Distance (PD) for similarity measurement. Pairwise Shape Configuration (PSC) is proposed as a shape descriptor in place of existing Centroid Shape Configuration (CSC) in conventional PSA. PSC can better tolerate shape change caused by viewing angle change than CSC. Small variation of viewing angle makes large impact only on global gait appearance. Without major impact on local spatio-temporal motion, PSC which effectively embeds local shape information can generate robust view-invariant gait feature. To enhance gait recognition performance, a novel boundary re-sampling process is proposed. It provides only necessary re-sampled points to PSC description. In the meantime, it efficiently solves problems of boundary point correspondence, boundary normalization and boundary smoothness. This re-sampling process adopts prior knowledge of body pose structure. Comprehensive experiment is carried out on the CASIA gait database. The proposed method is shown to significantly improve performance of gait recognition under small viewing angle change without additional requirements of supervised learning, known viewing angle and multi-camera system, when compared with other methods in literatures.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2011, 'Speed-invariant gait recognition based on Procrustes Shape Analysis using higher-order shape configuration', 2011 18th IEEE International Conference on Image Processing (ICIP), IEEE International Conference on Image Processing, IEEE, Brussels, Belgium, pp. 545-548.View/Download from: UTS OPUS or Publisher's site
Walking speed change is considered a typical challenge hindering reliable human gait recognition. This paper proposes a novel method to extract speed-invariant gait feature based on Procrustes Shape Analysis (PSA). Two major components of PSA, i.e., Procrustes Mean Shape (PMS) and Procrustes Distance (PD), are adopted and adapted specifically for the purpose of speed-invariant gait recognition. One of our major contributions in this work is that, instead of using conventional Centroid Shape Configuration (CSC) which is not suitable to describe individual gait when body shape changes particularly due to change of walking speed, we propose a new descriptor named Higher-order derivative Shape Configuration (HSC) which can generate robust speed-invariant gait feature. From the first order to the higher order, derivative shape configuration contains gait shape information of different levels. Intuitively, the higher order of derivative is able to describe gait with shape change caused by the larger change of walking speed. Encouraging experimental results show that our proposed method is efficient for speed-invariant gait recognition and evidently outperforms other existing methods in the literatures.
Li, Z, Wu, Q, Zhang, J & Geers, G 2011, 'SKRWM based descriptor for pedestrian detection in thermal images', 2011 IEEE 13th International Workshop on Multimedia Signal Processing (MMSP), IEEE International Workshop on Multimedia Signal Processing, IEEE, Hangzhou, China, pp. 1-6.View/Download from: UTS OPUS or Publisher's site
Pedestrian detection in a thermal image is a difficult task due to intrinsic challenges:1) low image resolution, 2) thermal noising, 3) polarity changes, 4) lack of color, texture or depth information. To address these challenges, we propose a novel mid-level feature descriptor for pedestrian detection in thermal domain, which combines pixel-level Steering Kernel Regression Weights Matrix (SKRWM) with their corresponding covariances. SKRWM can properly capture the local structure of pixels, while the covariance computation can further provide the correlation of low level feature. This mid-level feature descriptor not only captures the pixel-level data difference and spatial differences of local structure, but also explores the correlations among low-level features. In the case of human detection, the proposed mid-level feature descriptor can discriminatively distinguish pedestrian from complexity. For testing the performance of proposed feature descriptor, a popular classifier framework based on Principal Component Analysis (PCA) and Support Vector Machine (SVM) is also built. Overall, our experimental results show that proposed approach has overcome the problems caused by background subtraction in  while attains comparable detection accuracy compared to the state-of-the-arts.
Paisitkriangkrai, S, Shen, C & Zhang, J 2010, 'Face detection with effective feature extraction', Computer Vision ACCV 2010, Asian Conference on Computer Vision, SpringerLink, Queenstown, New Zealand, pp. 460-470.View/Download from: UTS OPUS or Publisher's site
There is an abundant literature on face detection due to its important role in many vision applications. Since Viola and Jones proposed the first real-time AdaBoost based face detector, Haar-like features have been adopted as the method of choice for frontal face detection. In this work, we show that simple features other than Haar-like features can also be applied for training an effective face detector. Since, single feature is not discriminative enough to separate faces from difficult non-faces, we further improve the generalization performance of our simple features by introducing feature co-occurrences. We demonstrate that our proposed features yield a performance improvement compared to Haar-like features. In addition, our findings indicate that features play a crucial role in the ability of the system to generalize.
Quek, A, Wang, Z, Zhang, J & Feng, D 2011, 'Structural Image Classification with Graph Neural Networks', Proceedings of 2011 International Conference on Digital Image Computing - Techniques and Applications, Digital Image Computing Techniques and Applications, IEEE, Noosa, Queensland, Australia, pp. 416-421.View/Download from: UTS OPUS
Many approaches to image classification tend to transform an image into an unstructured set of numeric feature vectors obtained globally and/or locally, and as a result lose important relational information between regions. In order to encode the geometric relationships between image regions, we propose a variety of structural image representations that are not specialised for any particular image category. Besides the traditional grid-partitioning and global segmentation methods, we investigate the use of local scale-invariant region detectors. Regions are connected based not only upon nearest-neighbour heuristics, but also upon minimum spanning trees and Delaunay triangulation. In order to maintain the topological and spatial relationships between regions, and also to effectively process undirected connections represented as graphs, we utilise the recently-proposed graph neural network model. To the best of our knowledge, this is the first utilisation of the model to process graph structures based on local-sampling techniques, for the task of image classification. Our experimental results demonstrate great potential for further work in this domain.
Zhang, J & Liu, X 2011, 'Active Learning for Human Action Recognition with Gaussian Processes', Proceedings of 2011 International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Brussels, Belgium, pp. 3253-3256.View/Download from: UTS OPUS or Publisher's site
This paper presents an active learning approach for recognizing human actions in videos based on multiple kernel combined method. We design the classifier based on Multiple Kernel Learning (MKL) through Gaussian Processes (GP) regression. This classifier is then trained in an active learning approach. In each iteration, one optimal sample is selected to be interactively annotated and incorporated into training set. The selection of the sample is based on the heuristic feedback of the GP classifier. To our knowledge, GP regression MKL based active learning methods have not been applied to address the human action recognition yet. We test this approach on standard benchmarks. This approach outperforms the state-of-the-art techniques in accuracy while requires significantly less training samples.
Khan, A, Zhang, J & Wang, Y 2010, 'Appearance-based re-identification of people in video', Proceedings - 2010 Digital Image Computing: Techniques and Applications, DICTA 2010, pp. 357-362.View/Download from: Publisher's site
This paper introduces the topic of appearance-based reidentification of people in video. This work is based on colour information of people's clothing. Most of the work described in the literature uses full body histogram. This paper evaluates the histogram method and describes ways of including spatial colour information. The paper proposes a colour-based appearance descriptor called Colour Context People Descriptor. All the methods are evaluated extensively. The results are reported in the experiments. It is concluded at the end that adding spatial colour information greatly improves the re-identification results. © 2010 IEEE.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2010, 'Multi-view Gait Recognition Based on Motion Regression using Multilayer Perceptron', Proceedings: 2010 20th International Conference Pattern Recognition (ICPR 2010), International Conference Pattern Recognition, IEEE Computer Society, Istanbul Turkey, pp. 2186-2189.View/Download from: UTS OPUS or Publisher's site
It has been shown that gait is an efficient biometric feature for identifying a person at a distance. However, it is a challenging problem to obtain reliable gait feature when viewing angle changes because the body appearance can be different under the various viewing angles. In this paper, the problem above is formulated as a regression problem where a novel View Transformation Model (VTM) is constructed by adopting Multilayer Perceptron (MLP) as regression tool. It smoothly estimates gait feature under an unknown viewing angle based on motion information in a well selected Region of Interest (ROI) under other existing viewing angles. Thus, this proposal can normalize gait features under various viewing angles into a common viewing angle before gait similarity measurement is carried out. Encouraging experimental results have been obtained based on widely adopted benchmark database.
Kusakunniran, W, Wu, Q, Zhang, J & Li, H 2010, 'Support Vector Regression for Multi-view Gait Recognition Based on Local Motion Feature Selection', 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, San Francisco CA, USA, pp. 974-981.View/Download from: UTS OPUS or Publisher's site
Gait is a well recognized biometric feature that is used to identify a human at a distance. However, in real environment, appearance changes of individuals due to viewing angle changes cause many difficulties for gait recognition. This paper re-formulates this problem as a regression problem. A novel solution is proposed to create a View Transformation Model (VTM) from the different point of view using Support Vector Regression (SVR). To facilitate the process of regression, a new method is proposed to seek local Region of Interest (ROI) under one viewing angle for predicting the corresponding motion information under another viewing angle. Thus, the well constructed VTM is able to transfer gait information under one viewing angle into another viewing angle. This proposal can achieve view-independent gait recognition. It normalizes gait features under various viewing angles into a common viewing angle before similarity measurement is carried out. The extensive experimental results based on widely adopted benchmark dataset demonstrate that the proposed algorithm can achieve significantly better performance than the existing methods in literature.
Li, Z, Zhang, J, Wu, Q & Geers, GD 2010, 'Feature Enhancement Using Gradient Salience on Thermal Image', Proceedings. 2010 Digital Image Computing: Techniques and Applications (DICTA 2010), Digital Image Computing: Techniques and Applications, IEEE Computer Society, Sydney, Australia, pp. 556-562.View/Download from: UTS OPUS or Publisher's site
Feature enhancement in an image is to reinforce some exacted features so that it can be used for object classification and detection. As the thermal image is lack of texture and colorful information, the techniques for visual image feature enhancement is insufficient to apply to thermal images. In this paper, we propose a new gradient-based approach for feature enhancement in thermal image. We use the statistical properties of gradient of foreground object profiles, and formulate object features with gradient saliency. Empirical evaluation of the proposed approach shows significant performance improved on human contours which can be used for detection and classification.
Paisitkriangkrai, S, Mei, T, Zhang, J & Hua, X 2010, 'Scalable clip-based near-duplicate video detection with ordinal measure', CIVR 2010 - 2010 ACM International Conference on Image and Video Retrieval, ACM International Conference on Image and Video Retrieval, ACM-CIVR 2010, NA, Xi'an, pp. 121-128.View/Download from: UTS OPUS or Publisher's site
Detection of duplicate or near-duplicate videos on large-scale database plays an important role in video search. In this paper, we analyze the problem of near-duplicates detection and propose a practical and effective solution for real-time large-scale v
Saesue, W, Chou, C & Zhang, J 2010, 'Cross-layer QoS-optimized EDCA adaptation for wireless video streaming', Proceedings of 2010 IEEE 17th International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Hong Kong, pp. 2925-2928.View/Download from: UTS OPUS or Publisher's site
In this paper, we propose an adaptive cross layer technique that optimally enhance the QoS of wireless video transmission in an IEEE 802.11e WLAN. The optimization takes into account the unequal error protection characteristics of video streaming, the IE
Saesue, W, Chou, CT & Zhang, J 2010, 'Video quality prediction in the presence of mac contention and wireless channel error', 2010 IEEE International Symposium on "A World of Wireless, Mobile and Multimedia Networks", WoWMoM 2010 - Digital Proceedings.View/Download from: Publisher's site
This paper proposes an integrated model to predict the quality of video, expressed in terms of mean square error (MSE) of the received video frames, in an IEEE 802.11e wireless network. The proposed system takes into account contention at the MAC layer, wireless channel error, queueing at the MAC layer, parameters of different 802.11e access categories (ACs), and video characteristics of different H.264 data partitions (DPs). To the best of the authors' knowledge, this is the first system that takes these network and video characteristics into consideration to predict video quality in an IEEE 802.11e network. The proposed system consists of two components. The first component predicts the packet loss rate of each H.264 data partition by using a multi-dimensional discrete-time Markov chain (DTMC) coupled to a M/G/1 queue. The second component uses these packet loss rates and the video characteristics to predict the MSE of each received video frames. We verify the accuracy of our combination system by using discrete event simulation and real H.264 coded video sequences. ©2010 IEEE.
Thi, T, Cheng, L, Zhang, J & Wang, L 2010, 'Implicit motion-shape model: A generic approach for action matching', Proceedings of 2010 IEEE 17th International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Hong Kong, pp. 1477-1480.View/Download from: UTS OPUS or Publisher's site
We develop a robust technique to find similar matches of human actions in video. Given a query video, Motion History Images (MHI) are constructed for consecutive keyframes. This is followed by dividing the MHI into local Motion-Shape regions, which allow
Thi, T, Cheng, L, Zhang, J, Wang, L & Satoh, S 2010, 'Weakly supervised action recognition using implicit shape models', Proceedings - International Conference on Pattern Recognition, 2010 20th International Conference on Pattern Recognition, ICPR 2010, IEEE, Istanbul, pp. 3517-3520.View/Download from: UTS OPUS or Publisher's site
In this paper, we present a robust framework for action recognition in video, that is able to perform competitively against the state-of-the-art methods, yet does not rely on sophisticated background subtraction preprocess to remove background features.
Thi, T, Zhang, J, Cheng, L, Wang, L & Satoh, S 2010, 'Human action recognition and localization in video using structured learning of local space-time features', 2010 Seventh IEEE International Conference on Advanced Video and Signal Based Surveillance, Advanced Video and Signal Based Surveillance, IEEE, Boston, MA, pp. 204-211.View/Download from: UTS OPUS or Publisher's site
This paper presents a unified framework for human action classification and localization in video using structured learning of local space-time features. Each human action class is represented by a set of its own compact set of local patches. In our appr
Wang, L, Cheng, L, Thi, TH & Zhang, J 2010, 'Human Action Recognition from Boosted Pose Estimation', 2010 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Sydney, NSW, pp. 308-313.View/Download from: UTS OPUS
This paper presents a unified framework for recognizing human action in video using human pose estimation. Due to high variation of human appearance and noisy context background, accurate human pose analysis is hard to achieve and rarely employed for the task of action recognition. In our approach, we take advantage of the current success of human detection and view invariability of local feature-based approach to design a pose-based action recognition system. We begin with a frame-wise human detection step to initialize the search space for human local parts, then integrate the detected parts into human kinematic structure using a tree structural graphical model. The final human articulation configuration is eventually used to infer the action class being performed based on each single part behavior and the overall structure variation. In our work, we also show that even with imprecise pose estimation, accurate action recognition can still be achieved based on informative clues from the overall pose part configuration. The promising results obtained from action recognition benchmark have proven our proposed framework is comparable to the existing state-of-the-art action recognition algorithms.
Wang, W, Zhang, J & Shen, C 2010, 'Improved human detection and classification in thermal images', Proceedings - International Conference on Image Processing, ICIP, IEEE International Conference on Image Processing, IEEE, Hong Kong, pp. 2313-2316.View/Download from: UTS OPUS or Publisher's site
We present a new method for detecting pedestrians in thermal images. The method is based on the Shape Context Descriptor (SCD) with the Adaboost cascade classifier framework. Compared with standard optical images, thermal imaging cameras offer a clear advantage for night-time video surveillance. It is robust on the light changes in day-time. Experiments show that shape context features with boosting classification provide a significant improvement on human detection in thermal images. In this work, we have also compared our proposed method with rectangle features on the public dataset of thermal imagery. Results show that shape context features are much better than the conventional rectangular features on this task.
Zhang, J, Shen, C & Geers, G 2010, 'Proceedings - 2010 Digital Image Computing: Techniques and Applications, DICTA 2010: Preface', Proceedings - 2010 Digital Image Computing: Techniques and Applications, DICTA 2010.View/Download from: Publisher's site
Kusakunniran, W, Li, H & Zhang, J 2009, 'A direct method to self-calibrate a surveillance camera by observing a walking pedestrian', 2009 Digital Image Computing: Techniques and Applications, Digital Image Computing Techniques and Applications, IEEE, Melbourne, VIC, pp. 250-255.View/Download from: UTS OPUS or Publisher's site
Recent efforts show that it is possible to calibrate a surveillance camera simply from observing a walking human. This procedure can be seen as a special application of the camera self-calibration technique. Several methods have been proposed along this
Kusakunniran, W, Wu, Q, Li, H & Zhang, J 2009, 'Automatic gait recognition using weighted binary pattern on video', Proceedings of Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, Advanced Video and Signal Based Surveillance, IEEE Computer Society, Genoa, Italy, pp. 49-54.View/Download from: UTS OPUS
Human identification by recognizing the spontaneous gait recorded in real-world setting is a tough and not yet fully resolved problem in biometrics research. Several issues have contributed to the difficulties of this task. They include various poses, different clothes, moderate to large changes of normal walking manner due to carrying diverse goods when walking, and the uncertainty of the environments where the people are walking. In order to achieve a better gait recognition, this paper proposes a new method based on Weighted Binary Pattern (WBP). WBP first constructs binary pattern from a sequence of aligned silhouettes. Then, adaptive weighting technique is applied to discriminate significances of the bits in gait signatures. Being compared with most of existing methods in the literatures, this method can better deal with gait frequency, local spatial-temporal human pose features, and global body shape statistics. The proposed method is validated on several well known benchmark databases. The extensive and encouraging experimental results show that the proposed algorithm achieves high accuracy, but with low complexity and computational time.
Kusakunniran, W, Wu, Q, Li, H & Zhang, J 2009, 'Multiple Views Gait Recognition using View Transformation Model Based on Optimized Gait Energy Image', Proceedings of 2009 IEEE 12th International Conference on Computer Vision Workshops, IEEE International Conference on Computer Vision Workshops, IEEE, Kyoto, Japan, pp. 1058-1064.View/Download from: UTS OPUS
Gait is one of well recognized biometrics that has been widely used for human identification. However, the current gait recognition might have difficulties due to viewing angle being changed. This is because the viewing angle under which the gait signature database was generated may not be the same as the viewing angle when the probe data are obtained. This paper proposes a new multi-view gait recognition approach which tackles the problems mentioned above. Being different from other approaches of same category, this new method creates a so called View Transformation Model (VTM) based on spatial-domain Gait Energy Image (GEI) by adopting Singular Value Decomposition (SVD) technique. To further improve the performance of the proposed VTM, Linear Discriminant Analysis (LDA) is used to optimize the obtained GEI feature vectors. When implementing SVD there are a few practical problems such as large matrix size and over-fitting. In this paper, reduced SVD is introduced to alleviate the effects caused by these problems. Using the generated VTM, the viewing angles of gallery gait data and probe gait data can be transformed into the same direction. Thus, gait signatures can be measured without difficulties. The extensive experiments show that the proposed algorithm can significantly improve the multiple view gait recognition performance when being compared to the similar methods in literature.
Paisitkriangkrai, S, Shen, C & Zhang, J 2009, 'Efficiently training a better visual detector with sparse eigenvectors', 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, FL, pp. 1129-1136.View/Download from: UTS OPUS or Publisher's site
Face detection plays an important role in many vision applications. Since Viola and Jones  proposed the first real-time AdaBoost based object detection system, much ef- fort has been spent on improving the boosting method. In this work, we first show
Smith, D, Hanlen, L, Zhang, JA, Miniutti, D, Rodda, D & Gilbert, B 2009, 'Characterization of the Dynamic Narrowband On-Body to Off-Body Area Channel', 2009 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-8, IEEE International Conference on Communications (ICC 2009), IEEE, Dresden, GERMANY, pp. 4207-+.
Thi, T, Lu, S, Zhang, J, Cheng, L & Wang, L 2009, 'Human body articulation for action recognition in video sequences', 6th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2009, IEEE International Conference on Video and Signal Based Surveillance (AVSS), IEEE, Genova, pp. 92-97.View/Download from: UTS OPUS or Publisher's site
This paper presents a new technique for action recognition in video using human body part-based approach, combining both local feature description of each body part, and global graphical model structure of the human action. The human body is divided into
Wang, W, Shen, C, Zhang, J & Paisitkriangkrai, S 2009, 'A two-layer night-time vehicle detector', 2009 Digital Image Computing: Techniques and Applications, Digital Image Computing Techniques and Applications, IEEE, Melbourne, VIC, pp. 162-167.View/Download from: UTS OPUS or Publisher's site
We present a two-layer night time vehicle detector in this work. At the first layer, vehicle headlight detection [1, 2, 3] is applied to find areas (bounding boxes) where the possible pairs of headlights locate in the image, the Haar feature based AdaBoo
Zhang, J, Paisitkriangkrai, S & Shen, C 2009, 'An overview of fast pedestrian detection: Feature selection and cascade framework of boosted features', Proceedings - 2009 IEEE International Conference on Multimedia and Expo, ICME 2009, pp. 1566-1567.View/Download from: Publisher's site
Efficiently and accurately detecting pedestrians plays a crucial role in many vision applications such as video surveillance, multimedia retrieval and smart car etc. In order to find the right feature for this task, we first present a comprehensive experimental study on pedestrian detection using state-of-the-art locally-extracted features. Building upon our findings, we propose a new, simpler pedestrian detecting framework based on the covariance features. We conduct feature selection and weak classifier training in the Euclidean space for faster computation. To this end, two machine learning algorithms have been designed: AdaBoost with weighted Fisher linear discriminant analysis (WLDA) based weak classifiers and Greedy Sparse Linear Discriminant Analysis (GSLDA). To further accelerate the detection, we employ a faster strategy, multiple cascade layers with heterogeneous features, to exploit the efficiency of the Haar-like features and the discriminative power of the covariance features. Experimental results shown on different datasets prove that the new pedestrian detection is not only comparable to the performance of the state-of-the-art pedestrian detectors but it also performs at a faster speed. ©2009 IEEE.
Feng, D, Sikora, T, Siu, WC, Zhang, J, Guan, L & Dugelay, JL 2008, 'Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008: Preface', Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008.View/Download from: Publisher's site
Luo, C, Cai, X & Zhang, J 2008, 'GATE: A novel robust object tracking method using the particle filtering and level set method', Digital Image Computing: Techniques and Applications, Digital Image Computing Techniques and Applications, IEEE, Canberra, ACT, pp. 378-385.View/Download from: UTS OPUS or Publisher's site
This paper presents a novel algorithm for robust object tracking based on the particle filtering method employed in recursive Bayesian estimation and image segmentation and optimisation techniques employed in active contour models and level set methods.
Luo, C, Cai, X & Zhang, J 2008, 'Robust object tracking using the particle filtering and level set methods: A comparative experiment', Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008, IEEE International Workshop on Multimedia Signal Processing, IEEE, Cairns, QLD, pp. 359-364.View/Download from: UTS OPUS or Publisher's site
Robust visual tracking has become an important topic of research in computer vision. A novel method for robust object tracking, GATE , improves object tracking in complex environments using the particle filtering and the level set-based active contou
Ong, C, Lu, S & Zhang, J 2008, 'An approach for enhancing the results of detecting foreground objects and their moving shadows in surveillance video', Digital Image Computing: Techniques and Applications, Digital Image Computing Techniques and Applications, IEEE, Canberra, ACT, pp. 242-249.View/Download from: UTS OPUS or Publisher's site
Automated surveillance system is becoming increasingly important especially in the fields of computer vision and video processing. This paper describes a novel approach for improving the results of detecting foreground objects and their shadows in indoor
Paisitkriangkra, S, Shen, C & Zhang, J 2008, 'Real-time Pedestrian Detection Using a Boosted Multi-layer Classifier', The Eighth International Workshop on Visual Surveillance, in conjunction with European Conference on Computer Vision (ECCV'08), 2008, IEEE International Workshop on Visual Surveillance, Institute of Electrical and Electronics Engineers, Marseille France.View/Download from: UTS OPUS
Techniques for detecting pedestrian in still images have
attached considerable research interests due to its wide applications
such as video surveillance and intelligent transportation
systems. In this paper, we propose a novel simpler
pedestrian detector using state-of-the-art locally extracted
features, namely, covariance features. Covariance
features were originally proposed in [1, 2]. Unlike the work
in , where the feature selection and weak classifier training
are performed on the Riemannian manifold, we select
features and train weak classifiers in the Euclidean space
for faster computation. To this end, AdaBoost with weighted
Fisher linear discriminant analysis based weak classifiers
are adopted. Multiple layer boosting with heterogeneous
features is constructed to exploit the efficiency of the Haarlike
feature and the discriminative power of the covariance
feature simultaneously. Extensive experiments show that by
combining the Haar-like and covariance features, we speed
up the original covariance feature detector  by up to an
order of magnitude in processing time without compromising
the detection performance. For the first time, the proposed
work enables covariance feature based pedestrian
detection to work real-time.
Paisitkriangkrai, S, Shen, C & Zhang, J 2008, 'An experimental study on pedestrian classification using local features', Proceedings - IEEE International Symposium on Circuits and Systems, IEEE International Symposium on Circuits and Systems, IEEE, Seattle, WA, pp. 2741-2744.View/Download from: UTS OPUS or Publisher's site
This paper presents an experimental study on pedestrian detection using state-of-the-art local feature extraction and support vector machine (SVM) classifiers. The performance of pedestrian detection using region covariance, histogram of oriented gradien
Saesue, W, Zhang, J & Chun, T 2008, 'Hybrid frame-recursive block-based distortion estimation model for wireless video transmission', Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008, IEEE International Workshop on Multimedia Signal Processing, IEEE, Cairns, QLD, pp. 774-779.View/Download from: UTS OPUS or Publisher's site
In wireless environments, video quality can be severely degraded due to channel errors. Improving error robustness towards the impact of packet loss in error-prone network is considered as a critical concern in wireless video networking research. Data pa
Shen, C, Paisitkriangkrai, S & Zhang, J 2008, 'Face detection from few training examples', Proceedings - International Conference on Image Processing, ICIP, pp. 2764-2767.View/Download from: Publisher's site
Face detection in images is very important for many multimedia applications. Haar-like wavelet features have become dominant in face detection because of their tremendous success since Viola and Jones  proposed their AdaBoost based detection system. While Haar features' simplicity makes rapid computation possible, its discriminative power is limited. As a consequence, a large training dataset is required to train a classifier. This may hamper its application in scenarios that a large labeled dataset is difficult to obtain. In this work, we address the problem of learning to detect faces from a small set of training examples. In particular, we propose to use covariance features. Also for better classification performance, linear hyperplane classifier based on Fisher discriminant analysis (FDA) is proffered. Compared with the decision stump, FDA is more discriminative and therefore fewer weak learners are needed. We show that the detection rate can be significantly improved with covariance features on a small dataset (a few hundred positive examples), compared to Haar features used in current most face detection systems. © 2008 IEEE.
Thi, T, Lu, S & Zhang, J 2008, 'Self-calibration of traffic surveillance camera using motion tracking', Proceedings of the 11th International IEEE Conference on Intelligent Transportation Systems, IEEE Conference on Intelligent Transportation Systems, IEEE, Beijing, China, pp. 304-309.View/Download from: UTS OPUS or Publisher's site
A statistical and computer vision approach using tracked moving vehicle shapes for auto-calibrating traffic surveillance cameras is presented. Vanishing point of the traffic direction is picked up from Linear Regression of all tracked vehicle points. Pre
Thi, T, Robert, K, Lu, S & Zhang, J 2008, 'Vehicle classification at nighttime using eigenspaces and support vector machine', 2008 Congress on Image and Signal Processing, International Congress on Image and Signal Processing (CISP), IEEE, Sanya, Hainan, pp. 422-426.View/Download from: UTS OPUS or Publisher's site
A robust framework to classify vehicles in nighttime traffic using vehicle eigenspaces and support vector machine is presented. In this paper, a systematic approach has been proposed and implemented to classify vehicles from roadside camera video sequenc
Yu, J, Zhang, J, Sun, W, Yuan, L & Peng, G 2008, 'Crosstalk analysis of a smart sensor unit based on FBG and FOWLI', Proceedings of SPIE - The International Society for Optical Engineering, 19th International Conference on Optical Fibre Sensors, NA, Perth, WA, pp. 0-0.View/Download from: Publisher's site
The effective optical path method is proposed to analyze the measurement crosstalk of a smart fiber optic sensor unit based on multiplexing fiber Bragg gratings (FBG) and fiber optical white light interferometry (FOWLI). According the analysis, the cross
Jie, X, Ye, G & Jian, Z 2007, 'Long-term trajectory extraction for moving vehicles', 2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 - Proceedings, pp. 223-226.View/Download from: Publisher's site
In recent years, trajectory analysis of moving vehicles in video-based traffic monitoring systems has drawn the attention of many researchers. Trajectory extraction is a fundamental step that is required prior to trajectory analysis. Lots of previous work have focused on trajectory extraction via tracking. However, they often fail to achieve long-term consistent trajectories. In this paper, we propose a robust approach for extracting long-term trajectories of moving vehicles in traffic monitoring using SIFT-descriptor. Experimental results show that the proposed method outperforms tracking-based techniques. © 2007 IEEE.
Lu, S, Zhang, J & Feng, D 2007, 'An efficient method for detecting ghost and left objects in surveillance video', 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2007 Proceedings, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2007, NA, London, pp. 540-545.View/Download from: Publisher's site
This paper proposes an efficient method for detecting ghost and left objects in surveillance video, which, if not identified, may lead to errors or wasted computation in background modeling and object tracking in surveillance systems. This method contain
Luo, L, Zhang, J & Shi, Z 2007, 'Novel Block-Interleaved Multi-Code CDMA System for UWB Communications', Ultra-Wideband, 2007. ICUWB 2007. IEEE International Conference on, IEEE, pp. 648-652.
Paisitkriangkrai, S, Shen, C & Zhang, J 2007, 'An experimental evaluation of local features for pedestrian classification', Proceedings - Digital Image Computing Techniques and Applications: 9th Biennial Conference of the Australian Pattern Recognition Society, DICTA 2007, Australian Pattern Recognition Society (APRS), NA, Glenelg, SA, pp. 53-60.View/Download from: Publisher's site
The ability to detect pedestrians is a first important step in many computer vision applications such as video surveillance. This paper presents an experimental study on pedestrian detection using state-of-the-art local feature extraction and support vec
Yang, J & Zhang, J 2007, 'Offline swimmer cap tracking using trajectory interpolation', Proceedings - Digital Image Computing Techniques and Applications: 9th Biennial Conference of the Australian Pattern Recognition Society, DICTA 2007, pp. 579-585.View/Download from: Publisher's site
In this paper, we present a preliminary attempt to solve the difficult problem of tracking swimmer cap in swimming videos to facilitate swimmer performance assessment. Due to the great challenges posed by moving camera and severe figure-background occlusions, an offline approach based on trajectory interpolation is adopted. Firstly, each frame is searched for hypothesized positions of the target cap using mean shift mode seeking. Secondly, most outliers due to ambiguities and noise are eliminated using lane constraints, and the hypothesis in the space-time volume are clustered into trajectory segments based on a spatial and temporal closeness criteria. Finally, cubic spline trajectory interpolation is used to infer the target cap position in occluded frames. Experiments show that satisfying tracking results are achieved by our approach. © 2007 IEEE.
Zhang, J, Luo, L & Shi, Z 2007, 'Quadrature OFDMA systems', Global Telecommunications Conference, 2007. GLOBECOM'07. IEEE, IEEE, pp. 3734-3739.
Chen, J, Shen, J, Zhang, J & Wangsa, K 2006, 'A novel multimedia database system for efficient image/video retrieval based on hybrid-tree structure', Proceedings of the 2006 International Conference on Machine Learning and Cybernetics, 2006 International Conference on Machine Learning and Cybernetics, NA, Dalian, pp. 4353-4358.View/Download from: Publisher's site
With recent advances in computer vision, image processing and analysis, a retrieval process based on visual content has became a key component in achieving high efficiency image query for large multimedia databases. In this paper, we propose and develop
Chen, Y, Zhang, J & Jayalath, ADS 2006, 'Multiband-ofdm uwb vs ieee802. 11n: system level design considerations', Vehicular Technology Conference, 2006. VTC 2006-Spring. IEEE 63rd, IEEE, pp. 1972-1976.
Lu, S, Zhang, J & Feng, D 2006, 'A knowledge-based approach for detecting unattended packages in surveillance video', Proceedings - IEEE International Conference on Video and Signal Based Surveillance 2006, AVSS 2006, IEEE International Conference on Video and Signal Based Surveillance 2006, AVSS 2006, NA, Sydney, NSW, pp. 0-0.View/Download from: Publisher's site
This paper describes a novel approach for detecting unattended packages in surveillance video. Unlike the traditional approach to just detecting stationary objects in monitored scenes, our approach detects unattended packages based on accumulated knowled
Mathew, R, Yu, Z & Zhang, J 2006, 'Detecting new stable objects in surveillance video', 2005 IEEE 7th Workshop on Multimedia Signal Processing, 2005 IEEE 7th Workshop on Multimedia Signal Processing, MMSP 2005, NA, Shanghai, pp. 0-0.View/Download from: Publisher's site
We describe a novel method to detect new stable objects in video. This includes detecting new objects that appear in a scene and remain stationary for a period of time. Examples include detecting a dropped bag or a parked car. Our method utilizes the sta
Lu, S, Zhang, J & Feng, D 2005, 'Classification of moving humans using eigen-features and support vector machines', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11th International Conference on Computer Analysis of Images and Patterns, CAIP 2005, NA, Versailles, pp. 522-529.
This paper describes a method of categorizing the moving objects using eigen-features and support vector machines. Eigen-features, generally used in face recognition and static image classification, are applied to classify the moving objects detected fro
Yu, Z & Zhang, J 2004, 'Video deblocking with fine-grained scalable complexity for embedded mobile computing', 2004 7th International Conference on Signal Processing Proceedings, ICSP, pp. 1175-1180.
This paper addresses the need of reducing blocking artifacts after video decompression in embedded mobile computing devices such as mobile phones and PDAs with limited computational capability, where low bit rate coding is usually employed and video deblocking is highly desirable. A novel video deblocking method has been developed which consists of two steps: deblocking mode decision and deblock filtering. Blocking artifacts are detected by examining the value of several adjacent pixels. Depending on the degree of blocking artifacts, a filter mode and a corresponding filtering center are determined for a region of pixels. The deblocking filter is chosen from five different types of candidates including variable center filters and non-symmetric filters. Extensive experiments show that the proposed algorithm has achieved both lower computational complexity and better visual quality as compared to MPEG-4 VM. Furthermore, targeting the need of embedded mobile computing platforms, a scheme is developed to dynamically scale the complexity (and hence power consumption) of the deblocking algorithm with graceful visual quality degradation.
Zhang, J, Kennedy, RA & Abhayapala, TD 2004, 'Cramer-Rao lower bounds for the time delay estimation of UWB signals', Communications, 2004 IEEE International Conference on, IEEE, pp. 3424-3428.
Zhang, J, Arnold, JF, Frater, MR & Pickering, MR 1997, 'Video error concealment using decoder motion vector estimation', IEEE Region 10 Annual International Conference, Proceedings/TENCON, pp. 777-780.
Audio-visual and other multimedia services are seen as important sources of traffic for future wireless networks. An important characteristic of such networks is that they introduce a significant number of errors into the transmitted digital bitstream. For services such as video, these errors can have the effect of degrading the quality of the received service to the point where it is unusable. In this paper, we introduce a technique that allows the concealment of the impact of these errors. Our work is based on MPEG 2 encoded video (although our scheme would work equally well with other standards) to be transmitted over a wireless network whose data structures are similar to those of Asynchronous Transfer Mode (ATM). Our simulations include the impact of the MPEG 2 Systems Layer and cover cell loss rates up to 5%.
UTS Distinguished Visiting Scholars (DVS) Scheme (international)
In the last 7 years, A/Prof Jian Zhang has succeeded in the competitive selection process of the UTS DVS scheme to invite four world-renowned Professors or distinct researchers to UTS for a short period, including:
- 2012: Professor Dan Schonfeld, IEEE Fellow, from University of Illinois at Chicago. Previous Editor-in-Chief of IEEE Transactions on Circuits & Systems for Video Technology
- 2013: Dr. Zhengyou Zhang, IEEE Fellow and ACM Fellow, from Microsoft Research US. He is also a recipient of the 2013 Helmholtz Test of Time Award which was awarded to him by the International Conference on Computer Vision. He is a world-class researcher in computer vision
- 2017: Professor Jiebo Luo, IEEE Fellow, from University of Rochester. He was with Kodak Research for more than 15 years and is now a leading researcher in social multimedia research.
- 2018: Professor Ming Lin, IEEE Fellow, ACM Fellow and an American Computer Scientist from the Department of Computer Science at the University of Maryland, College Park, Chair Professor and a world-class researcher in computer graphics.
UTS Key Technology Partner (KTP) Scheme (international)
In 2015, A/Prof Jian Zhang succeeded in the competitive selection process of the UTS KTP scheme to invite two KTP visitors to UTS:
- Professor Yao Lu from School of Computer Science and Technology, Beijing Institute of Technology (BIT)
- Professor Ping An from School of Communication & Info. Engineering at Shanghai University.
Both professors brought their research experiences to UTS. It is with their help that we have recruited four dual-PhD students from BIT and SHU.
Invited KTP visitors through BIT/UTS and SHU/UTS schemes (international)
As a research leader, Jian was invited to be a KTP visitor at the Beijing Institute of Technology and Shanghai University in January 2015 and June 2015 respectively. This was in recognition of my collaboration with BIT and SHU.