Dr Long joined UTS at 2010, and obtained his PhD degree from UTS in 2014. Before joined UTS, he has more than six years industry R&D working experience.
He is currently leading a research group to conduct application-driven research on machine learning and data mining. Particularly, his research interests focus on several application domains, such as Healthcare, Smart Home, NLP, Education, and Social Media.
He was an assessor for more than 18 ARC (Australia Research Council) proposals including DP, LP and DECRA.
He serves as a reviewer for a few AI related conferences, e.g. IJCAI. He was the Job Match Chair for KDD 2015 and IJCAI 2017. The Job Match program aims to provide a face-to-face recruitment opportunity for conference attendees and sponsors.
Dr. Long published 20+ papers on ERA Rank A conference (e.g. AAAI, ICDM, and CIKM) and journal (e.g. TKDE, TKDD, TCYB, WWW and Pattern Recognition). He currently focuses on application driven research that aim to develop innovative ideas inspired by industry partner's real requirements.
Dr. Long currently focused on training his PhD students and delivering industry training.
Wang, S., Li, X., Chang, X., Yao, L., Sheng, Q.Z. & Long, G. 2017, 'Learning multiple diagnosis codes for ICU patients with local disease correlation mining', ACM Transactions on Knowledge Discovery from Data, vol. 11, no. 3.View/Download from: Publisher's site
© 2017 ACM. In the era of big data, a mechanism that can automatically annotate disease codes to patients' records in the medical information system is in demand. The purpose of this work is to propose a framework that automatically annotates the disease labels of multi-source patient data in Intensive Care Units (ICUs). We extract features from two main sources, medical charts and notes. The Bag-of-Words model is used to encode the features. Unlike most of the existing multi-label learning algorithms that globally consider correlations between diseases, our model learns disease correlation locally in the patient data. To achieve this, we derive a local disease correlation representation to enrich the discriminant power of each patient data. This representation is embedded into a unified multi-label learning framework. We develop an alternating algorithm to iteratively optimize the objective function. Extensive experiments have been conducted on a real-world ICU database. We have compared our algorithm with representative multi-label learning algorithms. Evaluation results have shown that our proposed method has state-of-the-art performance in the annotation of multiple diagnostic codes for ICU patients. This study suggests that problems in the automated diagnosis code annotation can be reliably addressed by using a multi-label learning model that exploits disease correlation. The findings of this study will greatly benefit health care and management in ICU considering that the automated diagnosis code annotation can significantly improve the quality and management of health care for both patients and caregivers.
Zhang, Q., Wu, J., ZHANG, P., Long, G. & Zhang, C. 2017, 'Collective Hyping Detection System for Identifying Online Spam Activities', IEEE Intelligent Systems.View/Download from: UTS OPUS or Publisher's site
IEEE Online reviews are extensively utilized by potential buyers to make business decisions. Unfortunately, fraudsters offer to write spam reviews for product promotion or competitor defamation, which drives online business holders to adopt this type of vicious strategy to increase their profits. These fake reviews always mislead users who shop online. Though existing anti-spam strategies have been proved to be effective in detecting traditional spam activities, evolving spam schemes can successfully cheat conventional testing by buying the comments of a massive number of random but genuine users which are sold by specific web markets, i.e., User Cloud. A more crucial problem is that such spam activities turn into a kind of 'advertising campaign' among business owners as they need to maintain their rank in the top few positions. In this paper, we propose a new Collaborative Marketing Hyping Detection solution, which aims to identify spam comments generated by the Spam Reviewer Cloud and to detect products which adopt an evolving spam strategy for promotion. Our experiments validate the existence of the Collaborative Marketing Hyping activities on a real-life e-commercial platform and also demonstrate that our model can effectively and accurately identify these advanced spam activities.
Pan, S., Wu, J., Zhu, X., Long, G. & Zhang, C. 2017, 'Boosting for graph classification with universum', Knowledge and Information Systems, vol. 50, no. 1, pp. 53-77.View/Download from: UTS OPUS or Publisher's site
© 2016 Springer-Verlag London Recent years have witnessed extensive studies of graph classification due to the rapid increase in applications involving structural data and complex relationships. To support graph classification, all existing methods require that training graphs should be relevant (or belong) to the target class, but cannot integrate graphs irrelevant to the class of interest into the learning process. In this paper, we study a new universum graph classification framework which leverages additional 'non-example graphs to help improve the graph classification accuracy. We argue that although universum graphs do not belong to the target class, they may contain meaningful structure patterns to help enrich the feature space for graph representation and classification. To support universum graph classification, we propose a mathematical programming algorithm, ugBoost, which integrates discriminative subgraph selection and margin maximization into a unified framework to fully exploit the universum. Because informative subgraph exploration in a universum setting requires the search of a large space, we derive an upper bound discriminative score for each subgraph and employ a branch-and-bound scheme to prune the search space. By using the explored subgraphs, our graph classification model intends to maximize the margin between positive and negative graphs and minimize the loss on the universum graph examples simultaneously. The subgraph exploration and the learning are integrated and performed iteratively so that each can be beneficial to the other. Experimental results and comparisons on real-world dataset demonstrate the performance of our algorithm.
Pan, S., Wu, J., Zhu, X., Long, G. & Zhang, C. 2017, 'Task Sensitive Feature Exploration and Learning for Multitask Graph Classification', IEEE Transactions on Cybernetics, vol. 47, no. 3, pp. 744-758.View/Download from: Publisher's site
Multitask learning (MTL) is commonly used for jointly optimizing multiple learning tasks. To date, all existing MTL methods have been designed for tasks with feature-vector represented instances, but cannot be applied to structure data, such as graphs. More importantly, when carrying out MTL, existing methods mainly focus on exploring overall commonality or disparity between tasks for learning, but cannot explicitly capture task relationships in the feature space, so they are unable to answer important questions, such as what exactly is shared between tasks and what is the uniqueness of one task differing from others? In this paper, we formulate a new multitask graph learning problem, and propose a task sensitive feature exploration and learning algorithm for multitask graph classification. Because graphs do not have features available, we advocate a task sensitive feature exploration and learning paradigm to jointly discover discriminative subgraph features across different tasks. In addition, a feature learning process is carried out to categorize each subgraph feature into one of three categories: 1) common feature; 2) task auxiliary feature; and 3) task specific feature, indicating whether the feature is shared by all tasks, by a subset of tasks, or by only one specific task, respectively. The feature learning and the multiple task learning are iteratively optimized to form a multitask graph classification model with a global optimization goal. Experiments on real-world functional brain analysis and chemical compound categorization demonstrate the algorithm's performance. Results confirm that our method can be used to explicitly capture task correlations and uniqueness in the feature space, and explicitly answer what are shared between tasks and what is the uniqueness of a specific task.
Zhang, Q., Wu, J., Zhang, P., Long, G. & Zhang, C. 2017, 'Collective Hyping Detection System for Identifying Online Spam Activities', IEEE Intelligent Systems, vol. 32, no. 5, pp. 53-63.View/Download from: UTS OPUS or Publisher's site
© 2001-2011 IEEE. Although existing antispam strategies detect traditional spam activities effectively, evolving spam schemes can successfully cheat conventional testing by buying the comments that are written by genuine users and sold by specific web markets. Such spam activities turn into a kind of advertising campaign among business owners to maintain their rank in top positions. This article proposes a new collaborative marketing hyping detection solution that aims to identify spam comments generated by the Spam Reviewer Cloud and detect products that adopt an evolving spam strategy for promotion. The authors propose an unsupervised learning model that combines heterogeneous product review networks in an attempt to discover collective hyping activities. Their experiments validate the existence of the collaborative marketing hyping activities on a real-life ecommerce platform and demonstrate that their model can effectively and accurately identify these advanced spam activities.
© 2017 Springer Science+Business Media, LLC Recommender systems are designed to solve the information overload problem and have been widely studied for many years. Conventional recommender systems tend to take ratings of users on products into account. With the development of Web 2.0, Rating Networks in many online communities (e.g. Netflix and Douban) allow users not only to co-comment or co-rate their interests (e.g. movies and books), but also to build explicit social networks. Recent recommendation models use various social data, such as observable links, but these explicit pieces of social information incorporating recommendations normally adopt similarity measures (e.g. cosine similarity) to evaluate the explicit relationships in the network - they do not consider the latent and implicit relationships in the network, such as social influence. A target user's purchase behavior or interest, for instance, is not always determined by their directly connected relationships and may be significantly influenced by the high reputation of people they do not know in the network, or others who have expertise in specific domains (e.g. famous social communities). In this paper, based on the above observations, we first simulate the social influence diffusion in the network to find the global and local influence nodes and then embed this dual influence data into a traditional recommendation model to improve accuracy. Mathematically, we formulate the global and local influence data as new dual social influence regularization terms and embed them into a matrix factorization-based recommendation model. Experiments on real-world datasets demonstrate the effective performance of the proposed method.
© 2014 Elsevier B.V. Feature selection improves the quality of the model by filtering out the noisy or redundant part. In the unsupervised scenarios, the selection is challenging due to the unavailability of the labels. To overcome that, the graphs which can unfold the geometry structure on the manifold are usually used to regularize the selection process. These graphs can be constructed either in the local view or the global view. As the local graph is more discriminative, previous methods tended to use the local graph rather than the global graph. But the global graph also has useful information. In light of this, in this paper, we propose a multiple graph unsupervised feature selection method to leverage the information from both local and global graphs. Besides that, we enforce the l2,p norm to achieve more flexible sparse learning. The experiments which inspect the effects of multiple graph and l2,p norm are conducted respectively on various datasets, and the comparisons to other mainstream methods are also presented in this paper. The results support that the multiple graph could be better than the single graph in the unsupervised feature selection, and the overall performance of the proposed method is higher than the other comparisons.
Wang, S., Chang, X., Li, X., Long, G., Yao, L. & Sheng, Q. 2016, 'Diagnosis Code Assignment Using Sparsity-based Disease Correlation Embedding', IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3191-3202.View/Download from: Publisher's site
With the latest developments in database technologies, it becomes easier to store the medical records of hospital patients from their first day of admission than was previously possible. In Intensive Care Units (ICU) in the modern medical information system can record patient events in relational databases every second. Knowledge mining from these huge volumes of medical data is beneficial to both caregivers and patients. Given a set of electronic patient records, a system that effectively assigns the disease labels can facilitate medical database management and also benefit other researchers, e.g. pathologists. In this paper, we have proposed a framework to achieve that goal. Medical chart and note data of a patient are used to extract distinctive features. To encode patient features, we apply a Bag-of-Words encoding method for both chart and note data. We also propose a model that takes into account both global information and local correlations between diseases. Correlated diseases are characterized by a graph structure that is embedded in our sparsity-based framework. Our algorithm captures the disease relevance when labeling disease codes rather than making individual decision with respect to a specific disease. At the same time, the global optimal values are guaranteed by our proposed convex objective function. Extensive experiments have been conducted on a real-world large-scale ICU database. The evaluation results demonstrate that our method improves multi-label classification results by successfully incorporating disease correlations.
Wang, S., Pan, P., Long, G., Chen, W., Li, X. & Sheng, Q.Z. 2016, 'Compact representation for large-scale unconstrained video analysis', World Wide Web, vol. 19, no. 2, pp. 231-246.View/Download from: UTS OPUS or Publisher's site
Recently, newly invented features (e.g. Fisher vector, VLAD) have achieved state-of-the-art performance in large-scale video analysis systems that aims to understand the contents in videos, such as concept recognition and event detection. However, these features are in high-dimensional representations, which remarkably increases computation costs and correspondingly deteriorates the performance of subsequent learning tasks. Notably, the situation becomes even worse when dealing with large-scale video data where the number of class labels are limited. To address this problem, we propose a novel algorithm to compactly represent huge amounts of unconstrained video data. Specifically, redundant feature dimensions are removed by using our proposed feature selection algorithm. Considering unlabeled videos that are easy to obtain on the web, we apply this feature selection algorithm in a semi-supervised framework coping with a shortage of class information. Different from most of the existing semi-supervised feature selection algorithms, our proposed algorithm does not rely on manifold approximation, i.e. graph Laplacian, which is quite expensive for a large number of data. Thus, it is possible to apply the proposed algorithm to a real large-scale video analysis system. Besides, due to the difficulty of solving the non-smooth objective function, we develop an efficient iterative approach to seeking the global optimum. Extensive experiments are conducted on several real-world video datasets, including KTH, CCV, and HMDB. The experimental results have demonstrated the effectiveness of the proposed algorithm.
Zhang, P., He, J., Long, G., Huang, G. & Zhang, C. 2016, 'Towards anomalous diffusion sources detection in a large network', ACM Transactions on Internet Technology, vol. 16, no. 1.View/Download from: UTS OPUS or Publisher's site
© 2016 ACM. Witnessing the wide spread of malicious information in large networks, we develop an efficient method to detect anomalous diffusion sources and thus protect networks from security and privacy attacks. To date, most existing work on diffusion sources detection are based on the assumption that network snapshots that reflect information diffusion can be obtained continuously. However, obtaining snapshots of an entire network needs to deploy detectors on all network nodes and thus is very expensive. Alternatively, in this article, we study the diffusion sources locating problem by learning from information diffusion data collected from only a small subset of network nodes. Specifically, we present a new regression learning model t hat can detect anomalous diffusion sources by jointly solving five challenges, that is, unknown number of source nodes, few activated detectors, unknown initial propagation time, uncertain propagation path and uncertain propagation time delay. We theoretically analyze the strength of the model and derive performance bounds. We empirically test and compare the model using both synthetic and real-world networks to demonstrate its performance.
Zhang, Q., Zhang, P., Long, G., Ding, W., Zhang, C. & Wu, X. 2016, 'Online learning from trapezoidal data streams', IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 10, pp. 2709-2723.View/Download from: Publisher's site
© 1989-2012 IEEE. In this paper, we study a new problem of continuous learning from doubly-streaming data where both data volume and feature space increase over time. We refer to the doubly-streaming data as trapezoidal data streams and the corresponding learning problem as online learning from trapezoidal data streams. The problem is challenging because both data volume and data dimension increase over time, and existing online learning  ,  , online feature selection  , and streaming feature selection algorithms  ,  are inapplicable. We propose a new Online Learning with Streaming Features algorithm (OL SF for short) and its two variants, which combine online learning  ,  and streaming feature selection  ,  to enable learning from trapezoidal data streams with infinite training instances and features. When a new training instance carrying new features arrives, a classifier updates the existing features by following the passive-aggressive update rule  and updates the new features by following the structural risk minimization principle. Feature sparsity is then introduced by using the projected truncation technique. We derive performance bounds of the OL SF algorithm and its variants. We also conduct experiments on real-world data sets to show the performance of the proposed algorithms.
Hu, R., Pan, S., Jiang, J. & Long, G. 2017, 'Graph ladder networks for network classification', International Conference on Information and Knowledge Management, Proceedings, pp. 2103-2106.View/Download from: Publisher's site
© 2017 ACM. Numerous network representation-based algorithms for network classification have emerged in recent years, but many suffer from two limitations. First, they separate the network representation learning and node classification in networks into two steps, which may result in sub-optimal results because the node representation may not fit the classification model well, and vice versa. Second, they are mostly shallow methods that can only capture the linear and simple relationships in the data. In this paper, we propose an effective deep learning model, Graph Ladder Networks (GLN), for node classification in networks. Our model learns a ladder network which unifies the representation learning and network classification into one single framework by exploiting both labeled and unlabeled nodes in a network. To integrate both structure and node content information in the networks, the most recently developed graph convolution network, is further employed. The experiments on the most popular academic network dataset, Citeseer, demonstrate that our approach reaches outstanding performance compared to other state-of-the-art algorithms.
Hu, R., Yu, C.P., Fung, S.F., Pan, S., Wang, H. & Long, G. 2017, 'Universal network representation for heterogeneous information networks', Proceedings of the International Joint Conference on Neural Networks, pp. 388-395.View/Download from: Publisher's site
© 2017 IEEE. Network representation aims to represent the nodes in a network as continuous and compact vectors, and has attracted much attention in recent years due to its ability to capture complex structure relationships inside networks. However, existing network representation methods are commonly designed for homogeneous information networks where all the nodes (entities) of a network are of the same type, e.g., papers in a citation network. In this paper, we propose a universal network representation approach (UNRA), that represents different types of nodes in heterogeneous information networks in a continuous and common vector space. The UNRA is built on our latest mutually updated neural language module, which simultaneously captures inter-relationship among homogeneous nodes and node-content correlation. Relationships between different types of nodes are also assembled and learned in a unified framework. Experiments validate that the UNRA achieves outstanding performance, compared to six other state-of-the-art algorithms, in node representation, node classification, and network visualization. In node classification, the UNRA achieves a 3% to 132% performance improvement in terms of accuracy.
Wang, C., Pan, S., Long, G., Zhu, X. & Jiang, J. 2017, 'MGAE: Marginalized graph autoencoder for graph clustering', International Conference on Information and Knowledge Management, Proceedings, pp. 889-898.View/Download from: Publisher's site
© 2017 ACM. Graph clustering aims to discover community structures in networks, the task being fundamentally challenging mainly because the topology structure and the content of the graphs are dicult to represent for clustering analysis. Recently, graph clustering has moved from traditional shallow methods to deep learning approaches, thanks to the unique feature representation learning capability of deep learning. However, existing deep approaches for graph clustering can only exploit the structure information, while ignoring the content information associated with the nodes in a graph. In this paper, we propose a novel marginalized graph autoencoder (MGAE) algorithm for graph clustering. The key innovation of MGAE is that it advances the autoencoder to the graph domain, so graph representation learning can be carried out not only in a purely unsupervised se.ing by leveraging structure and content information, it can also be stacked in a deep fashion to learn effective representation. From a technical viewpoint, we propose a marginalized graph convolutional network to corrupt network node content, allowing node content to interact with network features, and marginalizes the corrupted features in a graph autoencoder context to learn graph feature representations. The learned features are fed into the spectral clustering algorithm for graph clustering. Experimental results on benchmark datasets demonstrate the superior performance of MGAE, compared to numerous baselines.
Zhang, Q., Wu, J., Zhang, P., Long, G., Tsang, I.W. & Zhang, C. 2017, 'Inferring latent network from cascade data for dynamic social recommendation', Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 669-678.View/Download from: Publisher's site
© 2016 IEEE. Social recommendation explores social information to improve the quality of a recommender system. It can be further divided into explicit and implicit social network recommendation. The former assumes the existence of explicit social connections between users in addition to the rating data. The latter one assumes the availability of only the ratings but not the social connections between users since the explicit social information data may not necessarily be available and usually are binary decision values (e.g., whether two people are friends), while the strength of their relationships is missing. Most of the works in this field use only rating data to infer the latent social networks. They ignore the dynamic nature of users that the preferences of users drift over time distinctly. To this end, we propose a new Implicit Dynamic Social Recommendation (IDSR) model, which infers latent social network from cascade data. It can sufficiently mine the information contained in time by mining the cascade data and identify the dynamic changes in the users in time by using the latest updated social network to make recommendations. Experiments and comparisons on three real-world datasets show that the proposed model outperforms the state-of-The-Art solutions in both explicit and implicit scenarios.
Chang, X., Yang, Y., Long, G., Zhang, C. & Hauptmann, A.G. 2016, 'Dynamic concept composition for zero-example event detection', Proceedings of 30th AAAI Conference on Artificial Intelligence, AAAI 2016, AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence, Phoenix, Arizona, United States, pp. 3464-3470.View/Download from: UTS OPUS
© Copyright 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.In this paper, we focus on automatically detecting events in unconstrained videos without the use of any visual training exemplars. In principle, zero-shot learning makes it possible to train an event detection model based on the assumption that events (e.g. birthday party) can be described by multiple mid-level semantic concepts (e.g. "blowing candle", "birthday cake"). Towards this goal, we first pre-Train a bundle of concept classifiers using data from other sources. Then we evaluate the semantic correlation of each concept w.r.t. the event of interest and pick up the relevant concept classifiers, which are applied on all test videos to get multiple prediction score vectors. While most existing systems combine the predictions of the concept classifiers with fixed weights, we propose to learn the optimal weights of the concept classifiers for each testing video by exploring a set of online available videos with freeform text descriptions of their content. To validate the effectiveness of the proposed approach, we have conducted extensive experiments on the latest TRECVID MEDTest 2014, MEDTest 2013 and CCV dataset. The experimental results confirm the superiority of the proposed approach.
Yan, Y., Xu, Z., Tsang, W., Long, G. & Yang, Y. 2016, 'Robust Semi-supervised Learning through Label Aggregation', Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), AAAI Conference on Artificial Intelligence, AAAI, Phoenix, USA, pp. 2244-2250.View/Download from: UTS OPUS
Semi-supervised learning is proposed to exploit both labeled
and unlabeled data. However, as the scale of data in real
world applications increases significantly, conventional semisupervised
algorithms usually lead to massive computational
cost and cannot be applied to large scale datasets. In addition,
label noise is usually present in the practical applications
due to human annotation, which very likely results in remarkable
degeneration of performance in semi-supervised methods.
To address these two challenges, in this paper, we propose
an efficient RObust Semi-Supervised Ensemble Learning
(ROSSEL) method, which generates pseudo-labels for
unlabeled data using a set of weak annotators, and combines
them to approximate the ground-truth labels to assist semisupervised
learning. We formulate the weighted combination
process as a multiple label kernel learning (MLKL) problem
which can be solved efficiently. Compared with other semisupervised
learning algorithms, the proposed method has linear
time complexity. Extensive experiments on five benchmark
datasets demonstrate the superior effectiveness, effi-
ciency and robustness of the proposed algorithm.
Zhang, Q., Zhang, P., Long, G., Ding, W., Zhang, C. & Wu, X. 2015, 'Towards mining trapezoidal data streams', Proceedings - IEEE International Conference on Data Mining, ICDM, IEEE International Conference on Data Mining, IEEE, Atlantic City, New Jersey, United States, pp. 1111-1116.View/Download from: UTS OPUS or Publisher's site
© 2015 IEEE.We study a new problem of learning from doubly-streaming data where both data volume and feature space increase over time. We refer to the problem as mining trapezoidal data streams. The problem is challenging because both data volume and feature space are increasing, to which existing online learning, online feature selection and streaming feature selection algorithms are inapplicable. We propose a new Sparse Trapezoidal Streaming Data mining algorithm (STSD) and its two variants which combine online learning and online feature selection to enable learning trapezoidal data streams with infinite training instances and features. Specifically, when new training instances carrying new features arrive, the classifier updates the existing features by following the passive-aggressive update rule used in online learning and updates the new features with the structural risk minimization principle. Feature sparsity is also introduced using the projected truncation techniques. Extensive experiments on the demonstrated UCI data sets show the performance of the proposed algorithms.
Zhang, Q., Zhang, Q., Long, G., Zhang, P. & Zhang, C. 2016, 'Exploring heterogeneous product networks for discovering collective marketing hyping behavior', Advances in Knowledge Discovery and Data Mining - LNCS, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Auckland, New Zealand, pp. 40-51.View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing Switzerland 2016. Online spam comments often misguide users during online shopping. Existing online spam detection methods rely on semantic clues, behavioral footprints, and relational connections between users in review systems. Although these methods can successfully identify spam activities, evolving fraud strategies can successfully escape from the detection rules by purchasing positive comments from massive random users, i.e., user Cloud. In this paper, we study a new problem, Collective Marketing Hyping detection, for spam comments detection generated from the user Cloud. It is defined as detecting a group of marketing hyping products with untrustful marketing promotion behaviour. We propose a new learning model that uses heterogenous product networks extracted from product review systems. Our model aims to mining a group of hyping activities, which differs from existing models that only detect a single product with hyping activities. We show the existence of the Collective Marketing Hyping behavior in real-life networks. Experimental results demonstrate that the product information network can effectively detect fraud intentional product promotions.
Hu, R., Pan, S., Long, G., Zhu, X., Jiang, J. & Zhang, C. 2016, 'Co-clustering enterprise social networks', Proceedings of the International Joint Conference on Neural Networks, IEEE International Joint Conference on Neural Networks, IEEE, Vancouver, Canada, pp. 107-114.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.An enterprise social network (ESN) involves diversified user groups from producers, suppliers, logistics, to end consumers, and users have different scales, broad interests, and various objectives, such as advertising, branding, customer relationship management etc. In addition, such a highly diversified network is also featured with rich content, including recruiting messages, advertisements, news release, customer complains etc. Due to such complex nature, an immediate need is to properly organize a chaotic enterprise social network as functional groups, where each group corresponds to a set of peers with business interactions and common objectives, and further understand the business role of each group, such as their common interests and key features differing from other groups. In this paper, we argue that due to unique characteristics of enterprise social networks, simple clustering for ESN nodes or using existing topic discovery methods cannot effectively discover functional groups and understand their roles. Alternatively, we propose CENFLD, which carries out co-clustering on enterprise social networks for functional group discovery and understanding. CENFLD is a co-factorization based framework which combines network topology structures and rich content information, including interactions between nodes and correlations between node content, to discover functional user groups. Because the number of functional groups is highly data driven and hard to estimate, CENFLD employs a hold-out test principle to find the group number optimally complying with the underlying data. Experiments and comparisons, with state-of-the-art approaches, on 13 real-world enterprise/organizational networks validate the performance of CENFLD.
Bai, Y., Wang, H., Wu, J., Zhang, Y., Jiang, J. & Long, G. 2016, 'Evolutionary lazy learning for Naive Bayes classification', Proceedings of the International Joint Conference on Neural Networks, IEEE International Joint Conference on Neural Networks, IEEE, Canada, pp. 3124-3129.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.Most improvements for Naive Bayes (NB) have a common yet important flaw - these algorithms split the modeling of the classifier into two separate stages - the stage of preprocessing (e.g., feature selection and data expansion) and the stage of building the NB classifier. The first stage does not take the NB's objective function into consideration, so the performance of the classification cannot be guaranteed. Motivated by these facts and aiming to improve NB with accurate classification, we present a new learning algorithm called Evolutionary Local Instance Weighted Naive Bayes or ELWNB, to extend NB for classification. ELWNB combines local NB, instance weighted dataset extension and evolutionary algorithms seamlessly. Experiments on 20 UCI benchmark datasets demonstrate that ELWNB significantly outperforms NB and several other improved NB algorithms.
Zhang, Q., Wu, J., Yang, H., Lu, W., Long, G. & Zhang, C. 2016, 'Global and local influence-based social recommendation', International Conference on Information and Knowledge Management, Proceedings, ACM International Conference on Information and Knowledge Management, ACM, Indianapolis, USA, pp. 1917-1920.View/Download from: UTS OPUS or Publisher's site
© 2016 ACM.Social recommendation has been widely studied in recent years. Existing social recommendation models use various explicit pieces of social information as regularization terms, e.g., social links are considered as new constraints. However, social influence, an implicit source of information in social networks, is seldomly considered, even though it often drives recommendations in social networks. In this paper, we introduce a new global and local influence-based social recommendation model. Based on the observation that user purchase behaviour is influenced by both global influential nodes and the local influential nodes of the user, we formulate the global and local influence as an regularization terms, and incorporate them into a matrix factorization-based recommendation model. Experimental results on large data sets demonstrate the performance of the proposed method.
Unankard, S., Li, X. & Long, G. 2015, 'Invariant event tracking on social networks', Database Systems for Advanced Applications (LNCS), Database Systems for Advanced Applications, Springer, Hanoi, Vietnam, pp. 517-521.View/Download from: Publisher's site
© 2015, Springer International Publishing Switzerland, All rights Reserved. When an event is emerging and actively discussed on social networks, its related issues may change from time to time. People may focus on different issues of an event at different times. An invariant event is an event with changing subsequent issues that last for a period of time. Examples of invariant events include government elections, natural disasters, and breaking news. This paper describes our demonstration system for tracking invariant events over social networks. Our system is able to summarize continuous invariant events and track their developments along a timeline. We propose invariant event detection by utilizing an approach of Clique Percolation Method (CPM) community mining. We also present an approach to event tracking based on the relationships between communities. The Twitter messages related to the 2013 Australian Federal Election are used to demonstrate the effectiveness of our approach. As the first of this kind, our system provides a benchmark for further development of monitoring tools for social events.
Zhang, Q., Yu, L. & Long, G. 2015, 'SocialTrail: Recommending Social Trajectories from Location-Based Social Networks', Databases Theory and Applications (LNCS), Australasian Database Conference, Springer International Publishing, Melbourne, VIC, Australia, pp. 314-317.View/Download from: Publisher's site
Trajectory recommendation plays an important role for travel planning. Most existing systems are mainly designed for spot recommendation without the understanding of the overall trip and tend to utilize homogeneous data only (e.g., geo-tagged images). Furthermore, they focus on the popularity of locations and fail to consider other important factors like traveling time and sequence, etc. In this paper, we propose a novel system that can not only integrate geo-tagged images and check-in data to discover meaningful social trajectories to enrich the travel information, but also take both temporal and spatial factors into consideration to make trajectory recommendation more accurately.
Jiang, X., Liu, W., Cao, L. & Long, G. 2015, 'Coupled Collaborative Filtering for Context-aware Recommendation', AAAI Publications, Twenty-Ninth AAAI Conference on Artificial Intelligence, Student Abstracts, AAAI Conference on Artificial Intelligence, AAAI, Austin Texas, USA, pp. 4172-4173.View/Download from: UTS OPUS
Context-aware features have been widely recognized as important factors in recommender systems. However, as a major technique in recommender systems, traditional Collaborative Filtering (CF) does not provide a straight-forward way of integrating the context-aware information into personal recommendation. We propose a Coupled Collaborative Filtering (CCF) model to measure the contextual information and use it to improve recommendations. In the proposed approach, coupled similarity computation is designed to be calculated by interitem, intra-context and inter-context interactions among item, user and context-ware factors. Experiments based on different types of CF models demonstrate the effectiveness of our design.
Jiang, X., Peng, X. & Long, G. 2015, 'Discovering sequential rental patterns by fleet tracking', Data Science (LNCS), Second International Conference on Data Science, Springer, Sydney, Australia, pp. 42-49.View/Download from: Publisher's site
© Springer International Publishing Switzerland 2015. As one of the most well-known methods on customer analysis, sequential pattern mining generally focuses on customer business transactions to discover their behaviors. However in the real-world rental industry, behaviors are usually linked to other factors in terms of actual equipment circumstance. Fleet tracking factors, such as location and usage, have been widely considered as important features to improve work performance and predict customer preferences. In this paper, we propose an innovative sequential pattern mining method to discover rental patterns by combining business transactions with the fleet tracking factors. A novel sequential pattern mining framework is designed to detect the effective items by utilizing both business transactions and fleet tracking information. Experimental results on real datasets testify the effectiveness of our approach.
Jiang, J., Lu, J., Zhang, G. & Long, G. 2013, 'Optimal Cloud Resource Auto-scaling for Web Application', IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid), IEEE, Delft, Netherlands, pp. 58-65.View/Download from: UTS OPUS or Publisher's site
In the on-demand cloud environment, web application providers have the potential to scale virtual resources up or down to achieve cost-effective outcomes. True elasticity and cost-effectiveness in the pay-per-use cloud business model, however, have not yet been achieved. To address this challenge, we propose a novel cloud resource auto-scaling scheme at the virtual machine (VM) level for web application providers. The scheme automatically predicts the number of web requests and discovers an optimal cloud resource demand with cost-latency trade-off. Based on this demand, the scheme makes a resource scaling decision that is up or down or NOP (no operation) in each time-unit re-allocation. We have implemented the scheme on the Amazon cloud platform and evaluated it using three real-world web log datasets. Our experiment results demonstrate that the proposed scheme achieves resource auto-scaling with an optimal cost-latency trade-off, as well as low SLA violations.
Long, G. & Jiang, J. 2013, 'Graph Based Feature Augmentation for Short and Sparse Text Classification', Lecture Notes in Computer Science, International Conference on Advanced Data Mining and Applications, Springer, China, pp. 456-467.View/Download from: UTS OPUS or Publisher's site
Short text classification, such as snippets, search queries, micro-blogs and product reviews, is a challenging task mainly because short texts have insufficient co-occurrence information between words and have a very spare document-term representation. To address this problem, we propose a novel multi-view classification method by combining both the original document-term representation and a new graph based feature representation. Our proposed method uses all documents to construct a neighbour graph by using the shared co-occurrence words. Multi-Dimensional Scaling (MDS) is further applied to extract a low-dimensional feature representation from the graph, which is augmented with the original text features for learning. Experiments on several benchmark datasets show that the proposed multi-view classifier, trained from augmented feature representation, obtains significant performance gain compared to the baseline methods.
Long, G., Chen, L., Zhu, X. & Zhang, C. 2012, 'TCSST: transfer classification of short & sparse text using external data', Proc. Of The 21st ACM Conference on Information and Knowledge Management (CIKM-12), ACM International Conference on Information and Knowledge Management, ACM, Hawaii, USA, pp. 764-772.View/Download from: UTS OPUS or Publisher's site
Short & sparse text is becoming more prevalent on the web, such as search snippets, micro-blogs and product reviews. Accurately classifying short & sparse text has emerged as an important while challenging task. Existing work has considered utilizing external data (e.g. Wikipedia) to alleviate data sparseness, by appending topics detected from external data as new features. However, training a classifier on features concatenated from different spaces is not easy considering the features have different physical meanings and different significance to the classification task. Moreover, it exacerbates the "curse of dimensionality" problem. In this study, we propose a transfer classification method, TCSST, to exploit the external data to tackle the data sparsity issue. The transfer classifier will be learned in the original feature space. Considering that the labels of the external data may not be readily available or sufficiently enough, TCSST further exploits the unlabeled external data to aid the transfer classification. We develop novel strategies to allow TCSST to iteratively select high quality unlabeled external data to help with the classification. We evaluate the performance of TCSST on both benchmark as well as real-world data sets. Our experimental results demonstrate that the proposed method is effective in classifying very short & sparse text, consistently outperforming existing and baseline methods
Jiang, J., Lu, J., Zhang, G. & Long, G. 2011, 'Scaling-Up Item-Based Collaborative Filtering Recommendation Algorithm Based on Hadoop', 2011 IEEE World Congress on Services (SERVICES 2011), IEEE World Congress on Services, IEEE, Washington, DC, pp. 490-497.View/Download from: UTS OPUS or Publisher's site
Collaborative filtering (CF) techniques have achieved widespread success in E-commerce nowadays. The tremendous growth of the number of customers and products in recent years poses some key challenges for recommender systems in which high quality recommendations are required and more recommendations per second for millions of customers and products need to be performed. Thus, the improvement of scalability and efficiency of collaborative filtering (CF) algorithms become increasingly important and difficult. In this paper, we developed and implemented a scaling-up item-based collaborative filtering algorithm on MapReduce, by splitting the three most costly computations in the proposed algorithm into four Map-Reduce phases, each of which can be independently executed on different nodes in parallel. We also proposed efficient partition strategies not only to enable the parallel computation in each Map-Reduce phase but also to maximize data locality to minimize the communication cost. Experimental results effectively showed the good performance in scalability and efficiency of the item-based CF algorithm on a Hadoop cluster.
Dr. Long is managing more than $2m external research grants including two ARC Linkage projects, and three Research contract projects.
His industry partners include:
1) Australia Federal Department of Health (2016 - current)
2) Coates Hire Pty Ltd (the largest Australia rental company) (2012-2015)
3) Mission Australia Pty Ltd (2014 - current)
4) Australia Research Alliance for Children & Youth (2015 - current)
5) Global Business College Australia (2015- current)
6) MakeMagic Australia Pty Ltd. (2016- current)
7) Gubei Tech Co.Ltd. (China) (2016 - Current)