JingSong Xu is a research fellow with the Global Big Data Technologies Center at University of Technology, Sydney. He received the B.Eng and the Ph.D. degree from the School of Computer Science and Engineering, Nanjing University of Science and Technology, China in 2007 and 2014 respectively. He was an exchanged student at University of New South Wales (UNSW), Sydney and National Information and Communications Technology Australia (NICTA), Sydney from Sep. 2010 to Sep. 2012. He also visited University of Technology, Sydney (UTS) from Sep. 2011 to Dec. 2014.
- Xu, J; Ni, Z; Wu, Q; Zhang, J; Liu, H; Zhang, P; Chen W; (2015)Systems and Methods for Pedestrian Detection in Images, US Patent 9008365
International Journals/Transactions/Conferences Reviewer:
·IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT)
·Pattern Recognition Letters (PRL)
·Signal Processing Letters (SPL)
·MultiMedia Tools and Applications
·International Journal of Image and Video Processing
·International Conference on Digital Image Computing: Techniques and Applications (DICTA2012, 2013, 2014)
·Multimedia Signal Processing (MSP2011)
·IEEE Visual Communications and Image Processing (VCIP2013, VCIP2014)
·International Conference on Image Processing (ICIP2014)
Can supervise: YES
- Computer Vision
- Image Processing
- Pattern Recognition
- Multimedia Processing
- Machine Learning
Xu, M., Tang, Z., Yao, Y., Yao, L., Liu, H. & Xu, J. 2017, 'Deep Learning for Person Reidentification Using Support Vector Machines', Advances in Multimedia, vol. 2017.View/Download from: UTS OPUS or Publisher's site
© 2017 Mengyu Xu et al. Due to the variations of viewpoint, pose, and illumination, a given individual may appear considerably different across different camera views. Tracking individuals across camera networks with no overlapping fields is still a challenging problem. Previous works mainly focus on feature representation and metric learning individually which tend to have a suboptimal solution. To address this issue, in this work, we propose a novel framework to do the feature representation learning and metric learning jointly. Different from previous works, we represent the pairs of pedestrian images as new resized input and use linear Support Vector Machine to replace softmax activation function for similarity learning. Particularly, dropout and data augmentation techniques are also employed in this model to prevent the network from overfitting. Extensive experiments on two publically available datasets VIPeR and CUHK01 demonstrate the effectiveness of our proposed approach.
Guo, D., Xu, J., Zhang, J., Xu, M., Cui, Y. & He, X. 2017, 'User relationship strength modeling for friend recommendation on Instagram', Neurocomputing, vol. 239, pp. 9-18.View/Download from: UTS OPUS or Publisher's site
© 2017 Elsevier B.V.Social strength modeling in the social media community has attracted increasing research interest. Different from Flickr, which has been explored by many researchers, Instagram is more popular for mobile users and is conducive to likes and comments but seldom investigated. On Instagram, a user can post photos/videos, follow other users, comment and like other users' posts. These actions generate diverse forms of data that result in multiple user relationship views. In this paper, we propose a new framework to discover the underlying social relationship strength. User relationship learning under multiple views and the relationship strength modeling are coupled into one process framework. In addition, given the learned relationship strength, a coarse-to-fine method is proposed for friend recommendation. Experiments on friend recommendations for Instagram are presented to show the effectiveness and efficiency of the proposed framework. As exhibited by our experimental results, it can obtain better performance over other related methods. Although our method has been proposed for Instagram, it can be easily extended to any other social media communities.
Yao, Y., Zhang, J., Shen, F., Hua, X., Xu, J. & Tang, Z. 2017, 'A new web-supervised method for image dataset constructions', Neurocomputing, vol. 236, pp. 23-31.View/Download from: UTS OPUS or Publisher's site
© 2017.The goal of this work is to automatically collect a large number of highly relevant natural images from Internet for given queries. A novel automatic image dataset construction framework is proposed by employing multiple query expansions. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora to obtain a richer semantic descriptions, from which the visually non-salient and less relevant expansions are then filtered. After retrieving images from the Internet with filtered expansions, we further filter noisy images by clustering and progressively Convolutional Neural Networks (CNN) based methods. To evaluate the performance of our proposed method for image dataset construction, we build an image dataset with 10 categories. We then run object detections on our image dataset with three other image datasets which were constructed by weak supervised, web supervised and full supervised learning, the experimental results indicated the effectiveness of our method is superior to weak supervised and web supervised state-of-the-art methods. In addition, we do a cross-dataset classification to evaluate the performance of our dataset with two publically available manual labelled dataset STL-10 and CIFAR-10.
Yao, Y., Zhang, J., Shen, F., Hua, X., Xu, J. & Tang, Z. 2017, 'Exploiting Web Images for Dataset Construction: A Domain Robust Approach', IEEE Transactions on Multimedia, vol. 19, no. 8, pp. 1771-1784.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time-consuming and labor intensive. To reduce the cost of manual labeling, there has been increased research interest in automatically constructing image datasets by exploiting web images. Datasets constructed by existing methods tend to have a weak domain adaptation ability, which is known as the "dataset bias problem." To address this issue, we present a novel image dataset construction framework that can be generalized well to unseen target domains. Specifically, the given queries are first expanded by searching the Google Books Ngrams Corpus to obtain a rich semantic description, from which the visually nonsalient and less relevant expansions are filtered out. By treating each selected expansion as a "bag" and the retrieved images as "instances," image selection can be formulated as a multi-instance learning problem with constrained positive bags. We propose to solve the employed problems by the cutting-plane and concave-convex procedure algorithm. By using this approach, images from different distributions can be kept while noisy images are filtered out. To verify the effectiveness of our proposed approach, we build an image dataset with 20 categories. Extensive experiments on image classification, cross-dataset generalization, diversity comparison, and object detection demonstrate the domain robustness of our dataset.
Xu, J., Wu, Q., Zhang, J. & Tang, Z. 2014, 'Exploiting Universum data in AdaBoost using gradient descent', Image and Vision Computing, vol. 32, no. 8, pp. 550-557.View/Download from: UTS OPUS or Publisher's site
Xu, J., Wu, Q., Zhang, J., Shen, F. & Tang, Z. 2014, 'Boosting Separability in Semisupervised Learning for Object Classification', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 7, pp. 1197-1208.View/Download from: UTS OPUS or Publisher's site
Shen, F., Tang, Z. & Xu, J. 2013, 'Locality constrained representation based classification with spatial pyramid patches', Neurocomputing, vol. 101, pp. 104-115.View/Download from: UTS OPUS or Publisher's site
In this work, we propose a linear representation based face recognition (FR) method incorporating locality information from both spatial features and training samples. Instead of holistic face images, the proposed method is conducted on the spatial pyramid local patches, which are aggregated by a Bayesian based fusion method. The locality constraint on the representation coefficients leads to an approximately sparse representation, which effectively explores the discriminative nature of spatial local features. Different from the sparse representation based classification (SRC) exposing an 1-norm constraint on the coefficients, the proposed locality constrained representation based classification (LCRC) is formulated with a computationally efficient 2-norm. The proposed method is robust to two crucial problems in face recognition: occlusion and lack of training data. A simple locality based concentration index (LCI) is defined to measure the reliability of each local patch, by which not only the heavily corrupted patches but also the less discriminant ones are rejected. Due to the use of both local patches and the locality constraint, less training data are required by the proposed method. Based on the locality constrained representation, we present three algorithms which outperform the state-of-the-art on the AR and Extended Yale B datasets for both the occlusion and single sample per person (SSPP) problems. © 2012.
Xu, J., Wu, Q., Zhang, J. & Tang, Z. 2012, 'Fast and Accurate Human Detection Using a Cascade of Boosted MS-LBP Features', IEEE Signal Processing Letters, vol. 19, no. 10, pp. 676-679.View/Download from: UTS OPUS or Publisher's site
In this letter, a new scheme for generating local binary patterns (LBP) is presented. This Modi?ed Symmetric LBP (MS-LBP) feature takes advantage of LBP and gradient features. It is then applied into a boosted cascade framework for human detection. By combining MS-LBP with Haar-like feature into the boosted framework, the performances of heterogeneous features based detectors are evaluated for the best trade-off between accuracy and speed. Two feature training schemes, namely Single AdaBoost Training Scheme (SATS) and Dual AdaBoost Training Scheme (DATS) are proposed and compared. On the top of AdaBoost, two multidimensional feature projection methods are described. A comprehensive experiment is presented. Apart from obtaining higher detection accuracy, the detection speed based on DATS is 17 times faster than HOG method.
Cho, N., Wu, Q., Xu, J. & Zhang, J. 2016, 'Content Authoring Using Single Image in Urban Environments for Augmented Reality', Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Gold Coast, Australia, pp. 1-7.View/Download from: UTS OPUS or Publisher's site
Content authoring is one of essentials of Augmented Reality (AR), which is to emplace an augmented content on a true part of a real scene in order to enhance users' visual experience. For the case of street view single 2D images, the challenge emerges because of clutter environments and unknown position and orientation related to camera pose. Although existing methods based on 2D feature point matching or vanishing point registration may recover the camera pose, the robustness is always challenging because of the uncertainty of feature point detection on texture-less region and displacement of vanishing point detection caused by irregular lines detected on the scene. By taking the advantages of characteristics of the man-made object (e.g. building) widely seen on the street view, this paper proposes a simple yet efficient content authoring approach. In this approach, the building dominant plane where the virtual object will be emplaced is detected and then projected to the frontal-parallel view on which the virtual object can be reliably emplaced. Once the virtual object and the true scene are embedded to each other on the frontal-parallel view, they are able to be converted back to the original view using inverse projection without any distortion. Experiments on public databases show that the proposed method can recover camera pose and implement content emplacement with promising performance.
Yao, Y., Zhang, J., Shen, F., Hua, X., Xu, J. & Tang, Z. 2016, 'Automatic image dataset construction with multiple textual metadata', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Seattle, Washington, USA.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.The goal of this work is to automatically collect a large number of highly relevant images from the Internet for given queries. A novel image dataset construction framework is proposed by employing multiple textual metadata. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora to obtain a richer semantic description, from which the visually non-salient and less relevant expansions are then filtered. After retrieving images from the Internet with filtered expansions, we further filter noisy images by clustering and progressively Convolutional Neural Networks (CNN). To verify the effectiveness of our proposed method, we construct a dataset with 10 categories, which is not only much larger than but also have comparable cross-dataset generalization ability with manually labeled dataset STL-10 and CIFAR-10.
Xu, J., Wu, Q., Zhang, J., Silk, B., Ngo, G.T. & Tang, Z. 2014, 'Efficient People Counting With Limited Manual Interfaces', 2014 International Conference on Digital lmage Computing: Techniques and Applications (DlCTA), Digital Image Computing Techniques and Applications, IEEE, Wollongong, NSW, Australia.View/Download from: UTS OPUS
People counting is a topic with various practical
applications. Over the last decade, two general approaches have
been proposed to tackle this problem: a) counting based on
individual human detection; b) counting by measuring regression
relation between the crowd density and number of people.
Because the regression based method can avoid explicit people
detection which faces several well-known challenges, it has been
considered as a robust method particularly on a complicated
environments. An efficient regression based method is proposed
in this paper, which can be well adopted into any existing video
surveillance system. It adopts color based segmentation to extract
foreground regions in images. Regression is established based on
the foreground density and the number of people. This method
is fast and can deal with lighting condition changes. Experiments
on public datasets and one captured dataset have shown the
effectiveness and robustness of the method.
Xu, J., Wu, Q., Zhang, J., Shen, F. & Tang, Z. 2013, 'Training boosting-like algorithms with semi-supervised subspace learning', 2013 IEEE International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Melbourne, Australia, pp. 4302-4306.View/Download from: UTS OPUS or Publisher's site
Boosting algorithms have attracted great attention since the first real-time face detector by Viola & Jones through feature selection and strong classifier learning simultaneously. On the other hand, researchers have proposed to decouple such two procedures to improve the performance of Boosting algorithms. Motivated by this, we propose a boosting-like algorithm framework by embedding semi-supervised subspace learning methods. It selects weak classifiers based on class-separability. Combination weights of selected weak classifiers can be obtained by subspace learning. Three typical algorithms are proposed under this framework and evaluated on public data sets. As shown by our experimental results, the proposed methods obtain superior performances over their supervised counterparts and AdaBoost.
Xu, J., Wu, Q., Zhang, J. & Tang, Z. 2013, 'Object Detection Based on Co-Ocurrence GMuLBP Features', 2012 IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE Computer Society, 2012 IEEE International Conference on Multimedia and Expo, pp. 943-948.View/Download from: UTS OPUS or Publisher's site
Image co-occurrence has shown great powers on object classification because it captures the characteristic of individual features and spatial relationship between them simultaneously. For example, Co-occurrence Histogram of Oriented Gradients (CoHOG) has achieved great success on human detection task. However, the gradient orientation in CoHOG is sensitive to noise. In addition, CoHOG does not take gradient magnitude into account which is a key component to reinforce the feature detection. In this paper, we propose a new LBP feature detector based image co-occurrence. Building on uniform Local Binary Patterns, the new feature detector detects Co-occurrence Orientation through Gradient Magnitude calculation. It is known as CoGMuLBP. An extension version of the GoGMuLBP is also presented. The experimental results on the UIUC car data set show that the proposed features outperform state-of-the-art methods.