JingSong Xu is a research fellow with the Global Big Data Technologies Center at University of Technology, Sydney. He received the B.Eng and the Ph.D. degree from the School of Computer Science and Engineering, Nanjing University of Science and Technology, China in 2007 and 2014 respectively. He was an exchanged student at University of New South Wales (UNSW), Sydney and National Information and Communications Technology Australia (NICTA), Sydney from Sep. 2010 to Sep. 2012. He also visited University of Technology, Sydney (UTS) from Sep. 2011 to Dec. 2014.
- Xu, J; Ni, Z; Wu, Q; Zhang, J; Liu, H; Zhang, P; Chen W; (2015)Systems and Methods for Pedestrian Detection in Images, US Patent 9008365
International Journals/Transactions/Conferences Reviewer:
·IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT)
·Pattern Recognition Letters (PRL)
·Signal Processing Letters (SPL)
·MultiMedia Tools and Applications
·International Journal of Image and Video Processing
·International Conference on Digital Image Computing: Techniques and Applications (DICTA2012, 2013, 2014)
·Multimedia Signal Processing (MSP2011)
·IEEE Visual Communications and Image Processing (VCIP2013, VCIP2014)
·International Conference on Image Processing (ICIP2014)
Can supervise: YES
- Computer Vision
- Image Processing
- Pattern Recognition
- Multimedia Processing
- Machine Learning
Switch and Routing Essentials
Unix System Administration
Huang, Y, Xu, J, Wu, Q, Zheng, Z, Zhang, Z & Zhang, J 2019, 'Multi-pseudo Regularized Label for Generated Data in Person Re-Identification.', IEEE Transactions on Image Processing, vol. 28, no. 3, pp. 1391-1403.View/Download from: UTS OPUS or Publisher's site
Sufficient training data normally is required to train deeply learned models. However, due to the expensive manual process for labelling large number of images (i.e., annotation), the amount of available training data (i.e., real data) is always limited. To produce more data for training a deep network, Generative Adversarial Network (GAN) can be used to generate artificial sample data (i.e., generated data). However, the generated data usually does not have annotation labels. To solve this problem, in this paper, we propose a virtual label called Multi-pseudo Regularized Label (MpRL) and assign it to the generated data. With MpRL, the generated data will be used as the supplementary of real training data to train a deep neural network in a semi-supervised learning fashion. To build the corresponding relationship between the real data and generated data, MpRL assigns each generated data a proper virtual label which reflects the likelihood of the affiliation of the generated data to predefined training classes in the real data domain. Unlike the traditional label which usually is a single integral number, the virtual label proposed in this work is a set of weight-based values each individual of which is a number in (0,1] called multi-pseudo label and reflects the degree of relation between each generated data to every pre-defined class of real data. A comprehensive evaluation is carried out by adopting two state-of-the-art convolutional neural networks (CNNs) in our experiments to verify the effectiveness of MpRL. Experiments demonstrate that by assigning MpRL to generated data, we can further improve the person re-ID performance on five re-ID datasets, i.e., Market-1501, DukeMTMC-reID, CUHK03, VIPeR, and CUHK01. The proposed method obtains +6.29%, +6.30%, +5.58%, +5.84%, and +3.48% improvements in rank-1 accuracy over a strong CNN baseline on the five datasets respectively, and outperforms state-of-the-art methods.
Guo, D, Xu, J, Zhang, J, Xu, M, Cui, Y & He, X 2017, 'User relationship strength modeling for friend recommendation on Instagram', Neurocomputing, vol. 239, pp. 9-18.View/Download from: UTS OPUS or Publisher's site
© 2017 Elsevier B.V.Social strength modeling in the social media community has attracted increasing research interest. Different from Flickr, which has been explored by many researchers, Instagram is more popular for mobile users and is conducive to likes and comments but seldom investigated. On Instagram, a user can post photos/videos, follow other users, comment and like other users' posts. These actions generate diverse forms of data that result in multiple user relationship views. In this paper, we propose a new framework to discover the underlying social relationship strength. User relationship learning under multiple views and the relationship strength modeling are coupled into one process framework. In addition, given the learned relationship strength, a coarse-to-fine method is proposed for friend recommendation. Experiments on friend recommendations for Instagram are presented to show the effectiveness and efficiency of the proposed framework. As exhibited by our experimental results, it can obtain better performance over other related methods. Although our method has been proposed for Instagram, it can be easily extended to any other social media communities.
Yao, Y, Zhang, J, Shen, F, Hua, X, Xu, J & Tang, Z 2017, 'A new web-supervised method for image dataset constructions', Neurocomputing, vol. 236, pp. 23-31.View/Download from: UTS OPUS or Publisher's site
© 2017.The goal of this work is to automatically collect a large number of highly relevant natural images from Internet for given queries. A novel automatic image dataset construction framework is proposed by employing multiple query expansions. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora to obtain a richer semantic descriptions, from which the visually non-salient and less relevant expansions are then filtered. After retrieving images from the Internet with filtered expansions, we further filter noisy images by clustering and progressively Convolutional Neural Networks (CNN) based methods. To evaluate the performance of our proposed method for image dataset construction, we build an image dataset with 10 categories. We then run object detections on our image dataset with three other image datasets which were constructed by weak supervised, web supervised and full supervised learning, the experimental results indicated the effectiveness of our method is superior to weak supervised and web supervised state-of-the-art methods. In addition, we do a cross-dataset classification to evaluate the performance of our dataset with two publically available manual labelled dataset STL-10 and CIFAR-10.
Yao, Y, Zhang, J, Shen, F, Hua, X, Xu, J & Tang, Z 2017, 'Exploiting Web Images for Dataset Construction: A Domain Robust Approach', IEEE Transactions on Multimedia, vol. 19, no. 8, pp. 1771-1784.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time-consuming and labor intensive. To reduce the cost of manual labeling, there has been increased research interest in automatically constructing image datasets by exploiting web images. Datasets constructed by existing methods tend to have a weak domain adaptation ability, which is known as the "dataset bias problem." To address this issue, we present a novel image dataset construction framework that can be generalized well to unseen target domains. Specifically, the given queries are first expanded by searching the Google Books Ngrams Corpus to obtain a rich semantic description, from which the visually nonsalient and less relevant expansions are filtered out. By treating each selected expansion as a "bag" and the retrieved images as "instances," image selection can be formulated as a multi-instance learning problem with constrained positive bags. We propose to solve the employed problems by the cutting-plane and concave-convex procedure algorithm. By using this approach, images from different distributions can be kept while noisy images are filtered out. To verify the effectiveness of our proposed approach, we build an image dataset with 20 categories. Extensive experiments on image classification, cross-dataset generalization, diversity comparison, and object detection demonstrate the domain robustness of our dataset.
Xu, M, Tang, Z, Yao, Y, Yao, L, Liu, H & Xu, J 2017, 'Deep Learning for Person Reidentification Using Support Vector Machines', Advances in Multimedia, vol. 2017.View/Download from: UTS OPUS or Publisher's site
© 2017 Mengyu Xu et al. Due to the variations of viewpoint, pose, and illumination, a given individual may appear considerably different across different camera views. Tracking individuals across camera networks with no overlapping fields is still a challenging problem. Previous works mainly focus on feature representation and metric learning individually which tend to have a suboptimal solution. To address this issue, in this work, we propose a novel framework to do the feature representation learning and metric learning jointly. Different from previous works, we represent the pairs of pedestrian images as new resized input and use linear Support Vector Machine to replace softmax activation function for similarity learning. Particularly, dropout and data augmentation techniques are also employed in this model to prevent the network from overfitting. Extensive experiments on two publically available datasets VIPeR and CUHK01 demonstrate the effectiveness of our proposed approach.
Xu, J, Wu, Q, Zhang, J & Tang, Z 2014, 'Exploiting Universum data in AdaBoost using gradient descent', Image and Vision Computing, vol. 32, no. 8, pp. 550-557.View/Download from: UTS OPUS or Publisher's site
Xu, J, Wu, Q, Zhang, J, Shen, F & Tang, Z 2014, 'Boosting Separability in Semisupervised Learning for Object Classification', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 7, pp. 1197-1208.View/Download from: UTS OPUS or Publisher's site
Shen, F, Tang, Z & Xu, J 2013, 'Locality constrained representation based classification with spatial pyramid patches', NEUROCOMPUTING, vol. 101, pp. 104-115.View/Download from: UTS OPUS or Publisher's site
Xu, J, Wu, Q, Zhang, J & Tang, Z 2012, 'Fast and Accurate Human Detection Using a Cascade of Boosted MS-LBP Features', IEEE Signal Processing Letters, vol. 19, no. 10, pp. 676-679.View/Download from: UTS OPUS or Publisher's site
In this letter, a new scheme for generating local binary patterns (LBP) is presented. This Modi?ed Symmetric LBP (MS-LBP) feature takes advantage of LBP and gradient features. It is then applied into a boosted cascade framework for human detection. By combining MS-LBP with Haar-like feature into the boosted framework, the performances of heterogeneous features based detectors are evaluated for the best trade-off between accuracy and speed. Two feature training schemes, namely Single AdaBoost Training Scheme (SATS) and Dual AdaBoost Training Scheme (DATS) are proposed and compared. On the top of AdaBoost, two multidimensional feature projection methods are described. A comprehensive experiment is presented. Apart from obtaining higher detection accuracy, the detection speed based on DATS is 17 times faster than HOG method.
Huang, H, Xu, J, Zhang, J, Wu, Q & Kirsch, C 2018, 'Railway Infrastructure Defects Recognition using Fine-grained Deep Convolutional Neural Networks', 2018 Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing: Techniques and Applications, IEEE, Canberra, Australia.View/Download from: UTS OPUS or Publisher's site
Zhang, L, Xu, J, Zhang, J & Gong, Y 2018, 'Information Enhancement for Travelogues via a Hybrid Clustering Model', Digital Image Computing: Techniques and Applications, IEEE, Canberra, ACT, Australia, pp. 1-8.View/Download from: UTS OPUS or Publisher's site
Huang, H, Zheng, J, Zhang, J, Wu, Q & Xu, J 2019, 'Compare more nuanced: Pairwise alignment bilinear network for few-shot fine-grained learning', Proceedings - IEEE International Conference on Multimedia and Expo, pp. 91-96.View/Download from: Publisher's site
© 2019 IEEE. The recognition ability of human beings is developed in a progressive way. Usually, children learn to discriminate various objects from coarse to fine-grained with limited supervision. Inspired by this learning process, we propose a simple yet effective model for the Few-Shot Fine-Grained (FSFG) recognition, which tries to tackle the challenging fine-grained recognition task using meta-learning. The proposed method, named Pairwise Alignment Bilinear Network (PABN), is an end-to-end deep neural network. Unlike traditional deep bilinear networks for fine-grained classification, which adopt the self-bilinear pooling to capture the subtle features of images, the proposed model uses a novel pairwise bilinear pooling to compare the nuanced differences between base images and query images for learning a deep distance metric. In order to match base image features with query image features, we design feature alignment losses before the proposed pairwise bilinear pooling. Experiment results on four fine-grained classification datasets and one generic few-shot dataset demonstrate that the proposed model outperforms both the state-of-the-art few-shot fine-grained and general few-shot methods.
Zhang, P, Wu, Q, Xu, J & Jian, Z 2018, 'Long-Term Person Re-identification Using True Motion from Videos', Winter Conference on Applications of Computer Vision, IEEE, Lake Tahoe, NV, USA, pp. 494-502.View/Download from: UTS OPUS or Publisher's site
Cho, N, Wu, Q, Xu, J & Zhang, J 2016, 'Content Authoring Using Single Image in Urban Environments for Augmented Reality', Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Gold Coast, Australia, pp. 1-7.View/Download from: UTS OPUS or Publisher's site
Content authoring is one of essentials of Augmented Reality (AR), which is to emplace an augmented content on a true part of a real scene in order to enhance users' visual experience. For the case of street view single 2D images, the challenge emerges because of clutter environments and unknown position and orientation related to camera pose. Although existing methods based on 2D feature point matching or vanishing point registration may recover the camera pose, the robustness is always challenging because of the uncertainty of feature point detection on texture-less region and displacement of vanishing point detection caused by irregular lines detected on the scene. By taking the advantages of characteristics of the man-made object (e.g. building) widely seen on the street view, this paper proposes a simple yet efficient content authoring approach. In this approach, the building dominant plane where the virtual object will be emplaced is detected and then projected to the frontal-parallel view on which the virtual object can be reliably emplaced. Once the virtual object and the true scene are embedded to each other on the frontal-parallel view, they are able to be converted back to the original view using inverse projection without any distortion. Experiments on public databases show that the proposed method can recover camera pose and implement content emplacement with promising performance.
Yao, Y, Zhang, J, Shen, F, Hua, X, Xu, J & Tang, Z 2016, 'Automatic image dataset construction with multiple textual metadata', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE, Seattle, Washington, USA.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.The goal of this work is to automatically collect a large number of highly relevant images from the Internet for given queries. A novel image dataset construction framework is proposed by employing multiple textual metadata. In specific, the given queries are first expanded by searching in the Google Books Ngrams Corpora to obtain a richer semantic description, from which the visually non-salient and less relevant expansions are then filtered. After retrieving images from the Internet with filtered expansions, we further filter noisy images by clustering and progressively Convolutional Neural Networks (CNN). To verify the effectiveness of our proposed method, we construct a dataset with 10 categories, which is not only much larger than but also have comparable cross-dataset generalization ability with manually labeled dataset STL-10 and CIFAR-10.
Xu, J, Wu, Q, Zhang, J, Silk, B, Ngo, GT & Tang, Z 2014, 'Efficient People Counting With Limited Manual Interfaces', 2014 International Conference on Digital lmage Computing: Techniques and Applications (DlCTA), Digital Image Computing Techniques and Applications, IEEE, Wollongong, NSW, Australia.View/Download from: UTS OPUS
People counting is a topic with various practical
applications. Over the last decade, two general approaches have
been proposed to tackle this problem: a) counting based on
individual human detection; b) counting by measuring regression
relation between the crowd density and number of people.
Because the regression based method can avoid explicit people
detection which faces several well-known challenges, it has been
considered as a robust method particularly on a complicated
environments. An efficient regression based method is proposed
in this paper, which can be well adopted into any existing video
surveillance system. It adopts color based segmentation to extract
foreground regions in images. Regression is established based on
the foreground density and the number of people. This method
is fast and can deal with lighting condition changes. Experiments
on public datasets and one captured dataset have shown the
effectiveness and robustness of the method.
Xu, J, Wu, Q, Zhang, J, Shen, F & Tang, Z 2013, 'Training boosting-like algorithms with semi-supervised subspace learning', 2013 IEEE International Conference on Image Processing, IEEE International Conference on Image Processing, IEEE, Melbourne, Australia, pp. 4302-4306.View/Download from: UTS OPUS or Publisher's site
Boosting algorithms have attracted great attention since the first real-time face detector by Viola & Jones through feature selection and strong classifier learning simultaneously. On the other hand, researchers have proposed to decouple such two procedures to improve the performance of Boosting algorithms. Motivated by this, we propose a boosting-like algorithm framework by embedding semi-supervised subspace learning methods. It selects weak classifiers based on class-separability. Combination weights of selected weak classifiers can be obtained by subspace learning. Three typical algorithms are proposed under this framework and evaluated on public data sets. As shown by our experimental results, the proposed methods obtain superior performances over their supervised counterparts and AdaBoost.
Xu, J, Wu, Q, Zhang, J & Tang, Z 2013, 'Object Detection Based on Co-Ocurrence GMuLBP Features', 2012 IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo, IEEE Computer Society, 2012 IEEE International Conference on Multimedia and Expo, pp. 943-948.View/Download from: UTS OPUS or Publisher's site
Image co-occurrence has shown great powers on object classification because it captures the characteristic of individual features and spatial relationship between them simultaneously. For example, Co-occurrence Histogram of Oriented Gradients (CoHOG) has achieved great success on human detection task. However, the gradient orientation in CoHOG is sensitive to noise. In addition, CoHOG does not take gradient magnitude into account which is a key component to reinforce the feature detection. In this paper, we propose a new LBP feature detector based image co-occurrence. Building on uniform Local Binary Patterns, the new feature detector detects Co-occurrence Orientation through Gradient Magnitude calculation. It is known as CoGMuLBP. An extension version of the GoGMuLBP is also presented. The experimental results on the UIUC car data set show that the proposed features outperform state-of-the-art methods.