I am currently a research fellow at the University of Technology Sydney. I am conducting research on 2D-3D mixture scene understanding, including 3D-2D segmentation, 2D-3D feature learning, 2D-3D registration.
I am now working on a project to building a 3D model from a 3D point cloud or 2D images. The goal is to enable human-level scene understanding.
- Reviewer for Journals of IEEE Transactions on image processing (TIP), IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), IEEE Transactions on Multimedia(T-MM), EURASIP Journal on Image and Video Processing (JIVP), Signal Processing: Image Communication.
- Reviewer for Conferences of ICIP, ICME, CVPRW and DICTA.
Computer vision, Artificial intelligence
Huang, X, Zhang, J, Fan, L, Wu, Q & Yuan, C 2017, 'A Systematic Approach for Cross-Source Point Cloud Registration by Preserving Macro and Micro Structures.', IEEE transactions on image processing : a publication of the IEEE Signal Processing Society, vol. 26, no. 7, pp. 3261-3276.View/Download from: UTS OPUS or Publisher's site
We propose a systematic approach for registering cross-source point clouds that come from different kinds of sensors. This task is especially challenging due to the presence of significant missing data, large variations in point density, scale difference, large proportion of noise, and outliers. The robustness of the method is attributed to the extraction of macro and micro structures. Macro structure is the overall structure that maintains similar geometric layout in cross-source point clouds. Micro structure is the element (e.g., local segment) being used to build the macro structure. We use graph to organize these structures and convert the registration into graph matching. With a novel proposed descriptor, we conduct the graph matching in a discriminative feature space. The graph matching problem is solved by an improved graph matching solution, which considers global geometrical constraints. Robust cross source registration results are obtained by incorporating graph matching outcome with RANSAC and ICP refinements. Compared with eight state-of-the-art registration algorithms, the proposed method invariably outperforms on Pisa Cathedral and other challenging cases. In order to compare quantitatively, we propose two challenging cross-source data sets and conduct comparative experiments on more than 27 cases, and the results show we obtain much better performance than other methods. The proposed method also shows high accuracy in same-source data sets.
Huang, X, Zhang, J, Wu, Q, Fan, L & Yuan, C 2017, 'A coarse-to-fine algorithm for matching and registration in 3D cross-source point clouds', IEEE Transactions on Circuits and Systems for Video Technology.View/Download from: UTS OPUS or Publisher's site
IEEE We propose an efficient method to deal with the matching and registration problem found in cross-source point clouds captured by different types of sensors. This task is especially challenging due to the presence of density variation, scale difference, a large proportion of noise and outliers, missing data and viewpoint variation. The proposed method has two stages: in the coarse matching stage, we use the ESF descriptor to select potential K regions from the candidate point clouds for the target. In the fine stage, we propose a scale embedded generative GMM registration method to refine the results from the coarse matching stage. Following the fine stage, both the best region and accurate camera pose relationships between the candidates and target are found. We conduct experiments in which we apply the method to two applications: one is 3D object detection and localization in street-view ourdoor (LiDAR/VSFM) cross-source point clouds, and the other is 3D scene matching and registration in indoor (KinectFusion/VSFM) cross-source point clouds. The experiment results show that the proposed method performs well when compared with the existing methods. It also shows that the proposed method is robust under various sensing techniques such as LiDAR, Kinect and RGB camera.
Wang, S, Hu, L, Cao, L, Huang, X, Lian, D & Liu, W 2018, 'Attention-based transactional context embedding for next-item recommendation', Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI, New Orleans, United States, pp. 2532-2539.View/Download from: UTS OPUS
To recommend the next item to a user in a transactional context is practical yet challenging in applications such as marketing campaigns. Transactional context refers to the items that are observable in a transaction. Most existing transaction-based recommender systems (TBRSs) make recommendations by mainly considering recently occurring items instead of all the ones observed in the current context. Moreover, they often assume a rigid order between items within a transaction, which is not always practical. More importantly, a long transaction often contains many items irreverent to the next choice, which tends to overwhelm the influence of a few truely relevant ones. Therefore, we posit that a good TBRS should not only consider all the observed items in the current transaction but also weight them with different relevance to build an attentive context that outputs the proper next item with a high probability. To this end, we design an effective attention-based transaction embedding model (ATEM) for context embedding to weight each observed item in a transaction without assuming order. The empirical study on real-world transaction datasets proves that ATEM significantly outperforms the state-of-the-art methods in terms of both accuracy and novelty.
Huang, X, Fan, L, Zhang, J, Wu, Q & Yuan, C 2016, 'Real Time Complete Dense Depth Reconstruction for a Monocular Camera', Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, Nevada., pp. 674-679.View/Download from: UTS OPUS or Publisher's site
In this paper, we aim to solve the problem of estimating complete dense depth maps from a monocular moving camera. By 'complete', we mean depth information is estimated for every pixel and detailed reconstruction is achieved. Although this problem has previously been attempted, the accuracy of complete dense depth reconstruction is a remaining problem. We propose a novel system which produces accurate complete dense depth map. The new system consists of two subsystems running in separated threads, namely, dense mapping and sparse patch-based tracking. For dense mapping, a new projection error computation method is proposed to enhance the gradient component in estimated depth maps. For tracking, a new sparse patch-based tracking method estimates camera pose by minimizing a normalized error term. The experiments demonstrate that the proposed method obtains improved performance in terms of completeness and accuracy compared to three state-of the-art dense reconstruction methods VSFM+CMVC, LSDSLAM and REMODE.
Huang, X, Zhang, J, Wu, Q, Fan, L & Yuan, C 2016, 'A coarse-to-fine algorithm for registration in 3D street-view cross-source point clouds', Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Gold coast, Australia..View/Download from: UTS OPUS or Publisher's site
With the development of numerous 3D sensing technologies, object registration on cross-source point cloud has aroused researchers' interests. When the point clouds are captured from different kinds of sensors, there are large and different kinds of variations. In this study, we address an even more challenging case in which the differently-source point clouds are acquired from a real street view. One is produced directly by the LiDAR system and the other is generated by using VSFM software on image sequence captured from RGB cameras. When it confronts to large scale point clouds, previous methods mostly focus on point-to-point level registration, and the methods have many limitations.The reason is that the least mean error strategy shows poor ability in registering large variable cross-source point clouds. In this paper, different from previous ICP-based methods, and from a statistic view, we propose a effective coarse-to-fine algorithm to detect and register a small scale SFM point cloud in a large scale Lidar point cloud. Seen from the experimental results, the model can successfully run on LiDAR and SFM point clouds, hence it can make a contribution to many applications, such as robotics and smart city development
Huang, X, Yuan, C & Zhang, J 2015, 'Graph Cuts Stereo Matching Based on Patch-Match and Ground Control Points Constraint', Advances in Multimedia Information Processing (LNCS), Pacific-Rim Conference on Multimedia, Springer, Gwangju, South Korea, pp. 14-23.View/Download from: UTS OPUS or Publisher's site
Stereo matching methods based on Patch-Match obtain good results on complex texture regions but show poor ability on low texture regions. In this paper, a new method that integrates Patch-Match and graph cuts (GC) is proposed in order to achieve good results in both complex and low texture regions. A label is randomly assigned for each pixel and the label is optimized through propagation process. All these labels constitute a label space for each iteration in GC. Also, a Ground Control Points (GCPs) constraint term is added to the GC to overcome the disadvantages of Patch-Match stereo in low texture regions. The proposed method has the advantage of the spatial propagation of Patch-Match and the global property of GC. The results of experiments are tested on the Middlebury evaluation system and outperform all the other PatchMatch based methods
Huang, X, Zhang, J, Wu, Q, Yuan, C & Fan, L 2015, 'Dense Correspondence Using Non-local DAISY Forest', Proceedings of the 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Digital Image Computing Techniques and Applications, IEEE, Adelaide, pp. 1-8.View/Download from: UTS OPUS or Publisher's site
Dense correspondence computation is a critical computer vision task with many applications. The most existing dense correspondence methods consider all the neighbors connected to the center pixels and use local support region. However, such approach might only achieve a locally-optimal solution.In this paper, we propose a non-local dense correspondence computation method by calculating the match cost on a tree structure. It is non-local because all other nodes on the tree contribute to the match cost computing for the current node. The proposed method consists of three steps, namely: 1) DAISY descriptor computation, 2) edge-preserving segmentation and forest construction, 3) PatchMatch fast search. We test our algorithm on the Middlebury and Moseg datasets. The results show that the proposed method outperforms the state-of-the-art methods in dense correspondence computing and has a low computation complexity.
Shi, X, Huang, X & Zhang, D 2013, 'Fast stitch algorithm on aerial images', Proceedings - 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer, MEC 2013, pp. 2213-2217.View/Download from: Publisher's site
© 2013 IEEE. Because of high resolution of the aerial images, the existing stitching algorithms' speed are too slow. In this paper, we propose a fast stitching algorithm. First, we build Gaussian pyramid and select the key pyramid image by image characteristic. Then, we extract feature points and use region removal algorithm to get the key feature matches of an image. Region removal algorithm selects key feature match in a circle region and removes the redundant quantity. It improves the efficiency of image stitching. Finally, we analyze the relationship between the total errors with iteration times of bundle adjustment, which shows that the errors decrease fast at the first 50 iterations. Experiments on two datasets show that, our algorithm improves both the speed and precision.
Huang, X, 'Learning a 3D descriptor for cross-source point cloud registration from synthetic data'.
As the development of 3D sensors, registration of 3D data (e.g. point cloud)
coming from different kind of sensor is dispensable and shows great demanding.
However, point cloud registration between different sensors is challenging
because of the variant of density, missing data, different viewpoint, noise and
outliers, and geometric transformation. In this paper, we propose a method to
learn a 3D descriptor for finding the correspondent relations between these
challenging point clouds. To train the deep learning framework, we use
synthetic 3D point cloud as input. Starting from synthetic dataset, we use
region-based sampling method to select reasonable, large and diverse training
samples from synthetic samples. Then, we use data augmentation to extend our
network be robust to rotation transformation. We focus our work on more general
cases that point clouds coming from different sensors, named cross-source point
cloud. The experiments show that our descriptor is not only able to generalize
to new scenes, but also generalize to different sensors. The results
demonstrate that the proposed method successfully aligns two 3D cross-source
point clouds which outperforms state-of-the-art method.