Dr Wei Liu is a Senior Lecturer and Data Science Research Leader at the School of Computer Science and Advanced Analytics Institute in the University of Technology Sydney (UTS). Before joining UTS, he was a Machine Learning Researcher and Project Manager at the National ICT Australia (now Data61). He obtained his PhD degree in Data Science research from the University of Sydney (USYD). His research outputs are published in top-prestige journals and conferences. He has received 3 Best Paper Awards.
Dr Liu has been leading industry-driven data analytics research that makes real-world impacts. He has led a number of significant research projects funded by government agencies and industrial organisations, spanning internet security, insurance, trading, transportation, and infrastructure sectors. He has developed advanced data mining models and software tools that accurately identify causes of network incidents. He has also designed cutting-edge predictive models for problems including rare event prediction, fraud/intrusion detection, and emerging trends detection.
Dr Liu is a committee member in world-leading international Data Mining and Artificial Intelligence conferences, such as KDD (The ACM SIGKDD Conference on Knowledge Discovery and Data Mining), AAAI (The AAAI Conference on Artificial Intelligence), and ICDM (The IEEE International Conference on Data Mining). He has also been a reviewer for top journals such as TKDE, TNNLS, TKDD and TPAMI.
Can supervise: YES
Main Research Interests:
- Tensor and matrix factorization
- Deep learning, Representation learning
- Causal inference, Granger causality
- Game theoretical modeling, Adversarial learning
- Graph mining, Dynamic network analysis
- Anomaly and Outlier detection
Data Mining and Knowledge Discovery; Data Analytics; Databases
Yang, P, Ormerod, JT, Liu, W, Ma, C, Zomaya, AY & Yang, JYH 2019, 'AdaSampling for Positive-Unlabeled and Label Noise Learning With Bioinformatics Applications.', IEEE Transactions on Cybernetics, vol. 49, no. 5, pp. 1932-1943.View/Download from: UTS OPUS or Publisher's site
Class labels are required for supervised learning but may be corrupted or missing in various applications. In binary classification, for example, when only a subset of positive instances is labeled whereas the remaining are unlabeled, positive-unlabeled (PU) learning is required to model from both positive and unlabeled data. Similarly, when class labels are corrupted by mislabeled instances, methods are needed for learning in the presence of class label noise (LN). Here we propose adaptive sampling (AdaSampling), a framework for both PU learning and learning with class LN. By iteratively estimating the class mislabeling probability with an adaptive sampling procedure, the proposed method progressively reduces the risk of selecting mislabeled instances for model training and subsequently constructs highly generalizable models even when a large proportion of mislabeled instances is present in the data. We demonstrate the utilities of proposed methods using simulation and benchmark data, and compare them to alternative approaches that are commonly used for PU learning and/or learning with LN. We then introduce two novel bioinformatics applications where AdaSampling is used to: 1) identify kinase-substrates from mass spectrometry-based phosphoproteomics data and 2) predict transcription factor target genes by integrating various next-generation sequencing data.
Yin, Z, Wang, F, Liu, W & Chawla, S 2018, 'Sparse Feature Attacks in Adversarial Learning', IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 6, pp. 1164-1177.View/Download from: UTS OPUS or Publisher's site
© 2018 IEEE. Adversarial learning is the study of machine learning techniques deployed in non-benign environments. Example applications include classification for detecting spam, network intrusion detection, and credit card scoring. In fact, as the use of machine learning grows in diverse application domains, the possibility for adversarial behavior is likely to increase. When adversarial learning is modelled in a game-theoretic setup, the standard assumption about the adversary (player) behavior is the ability to change all features of the classifiers (the opponent player) at will. The adversary pays a cost proportional to the size of the 'attack'. We refer to this form of adversarial behavior as a dense feature attack. However, the aim of an adversary is not just to subvert a classifier but carry out data transformation in a way such that spam continues to remain effective. We demonstrate that an adversary could potentially achieve this objective by carrying out a sparse feature attack. We design an algorithm to show how a classifier should be designed to be robust against sparse adversarial attacks. Our main insight is that sparse feature attacks are best defended by designing classifiers which use ℓ1 regularizers.
Nguyen, H, Liu, W & Chen, F 2017, 'Discovering Congestion Propagation Patterns in Spatio-Temporal Traffic Data', IEEE Transactions on Big Data, vol. 3, no. 2, pp. 169-180.View/Download from: UTS OPUS or Publisher's site
Traffic congestion is a condition of a segment in the road network where the traffic demand is greater than the available
road capacity. The detection of unusual traffic patterns including congestions is a significant research problem in the data mining and
knowledge discovery community. However, to the best of our knowledge, the discovery of propagations, or causal interactions among
detected traffic congestions has not been appropriately investigated before. In this research, we introduce algorithms which construct
causality trees from congestions and estimate their propagation probabilities based on temporal and spatial information of the
congestions. Frequent sub-structures of these causality trees reveal not only recurring interactions among spatio-temporal
congestions, but potential bottlenecks or flaws in the designs of existing traffic networks. Our algorithms have been validated by
experiments on a travel time data set recorded from an urban road network.
Pang, LX, Chawla, S, Liu, W & Zheng, Y 2013, 'On detection of emerging anomalous traffic patterns using GPS data', DATA & KNOWLEDGE ENGINEERING, vol. 87, pp. 357-373.View/Download from: UTS OPUS or Publisher's site
Liu, W, Yin, Z & Chawla, S 2019, 'Adversarial Attack, Defense, and Applications with Deep Learning Frameworks' in Alazab, M & Tang, M (eds), Deep Learning Applications for Cyber Security, Springer.
This book addresses questions of how deep learning methods can be used to advance cyber security objectives, including detection, modeling, monitoring and analysis of as well as defense against various threats to sensitive data and security ...
Liu, W & Quan, D 2018, 'Scalable Multimodal Factorization for Learning from Very Big Data' in Seng, KP, Ang, L-M, Gao, J & Liew, AW-C (eds), Multimodal Analytics for Next-Generation Big Data Technologies and Applications, Spinger, Switzerland.View/Download from: UTS OPUS or Publisher's site
Recent technology advances in data acquisition bring to re-
search communities new opportunities as well as new challenges. They
enable researchers to acquire multiple modes of information about the real
world. This multimodal data can be naturally and e ciently represented
by a multi-way structure, so-called tensors, which can be analyzed to ex-
tract the underlying core patterns of the observed data. Multiple datasets
obtained from di erent acquisition methods and sensors are increasingly
available. The increasing availability of multiple modalities, captured in
correlated tensors, provides a complete picture of the whole data patterns.
Given large-scale datasets, existing distributed methods for joint analy-
sis of multi-dimensional data generated from multiple sources decompose
them on several computing nodes following Map-Reduce paradigm. How
to improve the performance of Map-Reduce based factorization algorithms
as observed data gets bigger is still an open problem. This requires an even
more e cient solution that not only reduces communication overhead but
also optimizes factors faster.
In this book chapter, we provide readers knowledge about Tensor Factor-
ization and joint analysis of several correlated Tensors. We propose a
Scalable Multimodal Factorization (SMF) algorithm for analyzing corre-
lated big multimodal data. It has two key features to enable big multimodal
data analysis. Firstly, SMF's design, based on Apache Spark, enables it
to have the smallest communication cost. Secondly, its optimized solver
converges faster. These key advantages reduce factorization's time com-
plexity. As a result, SMF's performance is extremely e cient as the data
increases. Con rmed by our experiments with 1 billion known entries,
SMF outperforms the currently fastest Coupled Tensor Factorization and
Tensor Factorization by 17.8 and 3.8 times, respectively. Compellingly,
SMF achieves this speed with the highest accuracy.
Chivukula, AS, Li, J & Liu, W 2018, 'Discovering granger-causal features from deep learning networks', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), The 31st Australasian Joint Conference on Artificial Intelligence, Wellington, New Zealand, pp. 692-705.View/Download from: UTS OPUS or Publisher's site
© Springer Nature Switzerland AG 2018. In this research, we propose deep networks that discover Granger causes from multivariate temporal data generated in financial markets. We introduce a Deep Neural Network (DNN) and a Recurrent Neural Network (RNN) that discover Granger-causal features for bivariate regression on bivariate time series data distributions. These features are subsequently used to discover Granger-causal graphs for multivariate regression on multivariate time series data distributions. Our supervised feature learning process in proposed deep regression networks has favourable F-tests for feature selection and t-tests for model comparisons. The experiments, minimizing root mean squared errors in the regression analysis on real stock market data obtained from Yahoo Finance, demonstrate that our causal features significantly improve the existing deep learning regression models.
Gong, Y, Li, Z, Zhang, J, Liu, W, Zheng, Y & Kirsch, C 2018, 'Network-wide Crowd Flow Prediction of Sydney Trains via customized Online Non-negative Matrix Factorization', ACM International Conference on Information and Knowledge Managemen, ACM DL, Turin, Italy.View/Download from: UTS OPUS
Liu, W & Chivukula, A 2018, 'Discovering Granger-causal Features from Deep Learning Networks', The 31st Australasian Joint Conference on Artificial Intelligence, Springer, Wellington, New Zealand, pp. 692-692.View/Download from: UTS OPUS or Publisher's site
Verma, S, Liu, W, Wang, C & Zhu, L 2018, 'Hybrid networks: Improving deep learning networks via integrating two views of images', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Conference on Neural Information Processing, Springer Link, Siem Reap, Cambodia, pp. 46-58.View/Download from: UTS OPUS or Publisher's site
© 2018, Springer Nature Switzerland AG. The principal component analysis network (PCANet) is an unsupervised parsimonious deep network, utilizing principal components as filters in the layers. It creates an amalgamated view of the data by transforming it into column vectors which destroys its spatial structure while obtaining the principal components. In this research, we first propose a tensor-factorization based method referred as the Tensor Factorization Networks (TFNet). The TFNet retains the spatial structure of the data by preserving its individual modes. This presentation provides a minutiae view of the data while extracting matrix factors. However, the above methods are restricted to extract a single representation and thus incurs information loss. To alleviate this information loss with the above methods we propose Hybrid Network (HybridNet) to simultaneously learn filters from both the views of the data. Comprehensive results on multiple benchmark datasets validate the superiority of integrating both the views of the data in our proposed HybridNet.
Wang, S, Hu, L, Cao, L, Huang, X, Lian, D & Liu, W 2018, 'Attention-based transactional context embedding for next-item recommendation', Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI, New Orleans, United States, pp. 2532-2539.View/Download from: UTS OPUS
To recommend the next item to a user in a transactional context is practical yet challenging in applications such as marketing campaigns. Transactional context refers to the items that are observable in a transaction. Most existing transaction-based recommender systems (TBRSs) make recommendations by mainly considering recently occurring items instead of all the ones observed in the current context. Moreover, they often assume a rigid order between items within a transaction, which is not always practical. More importantly, a long transaction often contains many items irreverent to the next choice, which tends to overwhelm the influence of a few truely relevant ones. Therefore, we posit that a good TBRS should not only consider all the observed items in the current transaction but also weight them with different relevance to build an attentive context that outputs the proper next item with a high probability. To this end, we design an effective attention-based transaction embedding model (ATEM) for context embedding to weight each observed item in a transaction without assuming order. The empirical study on real-world transaction datasets proves that ATEM significantly outperforms the state-of-the-art methods in terms of both accuracy and novelty.
Braytee, A, Liu, W & Kennedy, PJ 2017, 'Supervised context-aware non-negative matrix factorization to handle high-dimensional high-correlated imbalanced biomedical data', Proceedings of the International Joint Conference on Neural Networks, 2017 International Joint Conference on Neural Networks, Anchorage, AK, USA, pp. 4512-4519.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Traditional feature selection techniques are used to identify a subset of the most useful features, and consider the rest as unimportant, redundant or noisy. In the presence of highly correlated features, many variable selection methods consider correlated features as redundant and need to be removed. In this paper, a novel supervised feature selection algorithm SCANMF is proposed by jointly integrating correlation analysis and structural analysis of the balanced supervised non-negative matrix factorization (NMF). Furthermore, ℓ2,1-norm minimization constraint is incorporated into the objective function to guarantee sparsity in the feature matrix rows and reduce noisy features. Our algorithm exploits the discriminative information, feature combinations, and the original features in the context of a supervised NMF method which can be beneficial for both classification and interpretation. An efficient iterative algorithm is designed to solve the constrained optimization problem with guaranteed convergence. Finally, a series of extensive experiments are conducted on 8 complex datasets. Promising results using multiple classifiers demonstrate the effectiveness and efficiency of our algorithm over state-of-the-art methods.
Braytee, A, Liu, W, Catchpoole, DR & Kennedy, PJ 2017, 'Multi-label feature selection using correlation information', International Conference on Information and Knowledge Management, Proceedings, ACM on Conference on Information and Knowledge Management, ACM, Singapore, Singapore, pp. 1649-1656.View/Download from: UTS OPUS or Publisher's site
© 2017 ACM. High-dimensional multi-labeled data contain instances, where each instance is associated with a set of class labels and has a large number of noisy and irrelevant features. Feature selection has been shown to have great benefits in improving the classification performance in machine learning. In multi-label learning, to select the discriminative features among multiple labels, several challenges should be considered: interdependent labels, different instances may share different label correlations, correlated features, and missing and .awed labels. This work is part of a project at .e Children's Hospital at Westmead (TB-CHW), Australia to explore the genomics of childhood leukaemia. In this paper, we propose a CMFS (Correlated-and Multi-label Feature Selection method), based on non-negative matrix factorization (NMF) for simultaneously performing feature selection and addressing the aforementioned challenges. Significantly, a major advantage of our research is to exploit the correlation information contained in features, labels and instances to select the relevant features among multiple labels. Furthermore, l2;1-norm regularization is incorporated in the objective function to undertake feature selection by imposing sparsity on the feature matrix rows. We employ CMFS to decompose the data and multi-label matrices into a low-dimensional space. To solve the objective function, an efficient iterative optimization algorithm is proposed with guaranteed convergence. Finally, extensive experiments are conducted on high-dimensional multi-labeled datasets. The experimental results demonstrate that our method significantly outperforms state-of-the-art multi-label feature selection methods.
Chivukula, AS & Liu, W 2017, 'Adversarial learning games with deep learning models', Proceedings of the International Joint Conference on Neural Networks, International Joint Conference on Neural Networks, IEEE, Anchorage, AK, USA, pp. 2758-2767.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Deep learning has been found to be vulnerable to changes in the data distribution. This means that inputs that have an imperceptibly and immeasurably small difference from training data correspond to a completely different class label in deep learning. Thus an existing deep learning network like a Convolutional Neural Network (CNN) is vulnerable to adversarial examples. We design an adversarial learning algorithm for supervised learning in general and CNNs in particular. Adversarial examples are generated by a game theoretic formulation on the performance of deep learning. In the game, the interaction between an intelligent adversary and deep learning model is a two-person sequential noncooperative Stackelberg game with stochastic payoff functions. The Stackelberg game is solved by the Nash equilibrium which is a pair of strategies (learner weights and genetic operations) from which there is no incentive for either learner or adversary to deviate. The algorithm performance is evaluated under different strategy spaces on MNIST handwritten digits data. We show that the Nash equilibrium leads to solutions robust to subsequent adversarial data manipulations. Results suggest that game theory and stochastic optimization algorithms can be used to study performance vulnerabilities in deep learning models.
Do, Q, Liu, W & Chen, F 2017, 'Discovering both explicit and implicit similarities for cross-domain recommendation', Advances in Knowledge Discovery and Data Mining (LNAI), Pacific Asia Conference on Advances in Knowledge Discovery and Data Mining, Springer, Jeju, South Korea, pp. 618-630.View/Download from: UTS OPUS or Publisher's site
© 2017, Springer International Publishing AG. Recommender System has become one of the most important techniques for businesses today. Improving its performance requires a thorough understanding of latent similarities among users and items. This issue is addressable given recent abundance of datasets across domains. However, the question of how to utilize this cross-domain rich information to improve recommendation performance is still an open problem. In this paper, we propose a cross-domain recommender as the first algorithm utilizing both explicit and implicit similarities between datasets across sources for performance improvement. Validated on real-world datasets, our proposed idea outperforms the current cross-domain recommendation methods by more than 2 times. Yet, the more interesting observation is that both explicit and implicit similarities between datasets help to better suggest unknown information from cross-domain sources.
Luo, L, Liu, W, Koprinska, I & Chen, F 2015, 'DAAR: A discrimination-aware association rule classifier for decision support', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Conference on Big Data Analytics and Knowledge Discovery (DAWAK), Springer, Spain, pp. 47-68.View/Download from: UTS OPUS or Publisher's site
© Springer-Verlag GmbH Germany 2017. Undesirable correlations between sensitive attributes (such as race, gender or personal status) and the class label (such as recruitment decision and approval of credit card), may lead to biased decision in data analytics. In this paper, we investigate how to build discrimination-aware models even when the available training set is intrinsically discriminating based on the sensitive attributes. We propose a new classification method called Discrimination-Aware Association Rule classifier (DAAR), which integrates a new discrimination-aware measure and an association rule mining algorithm. We evaluate the performance of DAAR on three real datasets from different domains and compare DAAR with two non-discrimination-aware classifiers (a standard association rule classification algorithm and the state-of-the-art association rule algorithm SPARCCC), and also with a recently proposed discrimination-aware decision tree method. Our comprehensive evaluation is based on three measures: predictive accuracy, discrimination score and inclusion score. The results show that DAAR is able to effectively filter out the discriminatory rules and decrease the discrimination severity on all datasets with insignificant impact on the predictive accuracy. We also find that DAAR generates a small set of rules that are easy to understand and applied by users, to help them make discrimination-free decisions.
Verma, S, Liu, W, Wang, C & Zhu, L 2017, 'Extracting highly effective features for supervised learning via simultaneous tensor factorization', Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI, San Francisco, USA, pp. 4995-4996.View/Download from: UTS OPUS
Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Real world data is usually generated over multiple time periods associated with multiple labels, which can be represented as multiple labeled tensor sequences. These sequences are linked together, sharing some common features while exhibiting their own unique features. Conventional tensor factorization techniques are limited to extract either common or unique features, but not both simultaneously. However, both types of these features are important in many machine learning systems as they inherently affect the systems' performance. In this paper, we propose a novel supervised tensor factorization technique which simultaneously extracts ordered common and unique features. Classification results using features extracted by our method on CIFAR-10 database achieves significantly better performance over other factorization methods, illustrating the effectiveness of the proposed technique.
Braytee, A, Catchpoole, DR, Kennedy, PJ & Liu, W 2016, 'Balanced Supervised Non-Negative Matrix Factorization for Childhood Leukaemia Patients', CIKM '16 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ACM International Conference on Information and Knowledge Management, ACM, Indianapolis, Indiana, USA.View/Download from: UTS OPUS or Publisher's site
Supervised feature extraction methods have received considerable attention in the data mining community due to their capability to improve the classification performance of the unsupervised dimensionality reduction methods. With increasing dimensionality, several methods based on supervised feature extraction are proposed to achieve a feature ranking especially on microarray gene expression data. This paper proposes a method with twofold objectives: it implements a balanced supervised non-negative matrix factorization (BSNMF) to handle the class imbalance problem in supervised non-negative matrix factorization techniques. Furthermore, it proposes an accurate gene ranking method based on our proposed BSNMF for microarray gene expression datasets. To the best of our knowledge, this is the first work to handle the class imbalance problem in supervised feature extraction methods. This work is part of a Human Genome project at The Children's Hospital at Westmead (TB-CHW), Australia. Our experiments indicate that the factorized components using supervised feature extraction approach have more classification capability than the unsupervised one, but it drastically fails at the presence of class imbalance problem. Our proposed method outperforms the state-of-the-art methods and shows promise in overcoming this concern.
Braytee, A, Liu, W & Kennedy, P 2016, 'A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data', Springer International Publishing, International Conference on Neural Information Processing, Springer International Publishing, Kyoto, Japan, pp. 78-86.View/Download from: UTS OPUS or Publisher's site
In this paper, novel cost-sensitive principal component analysis (CSPCA) and cost-sensitive non-negative matrix factorization (CSNMF) methods are proposed for handling the problem of feature extraction from imbalanced data. The presence of highly imbalanced data misleads existing feature extraction techniques to produce biased features, which results in poor classification performance especially for the minor class problem. To solve this problem, we propose a cost-sensitive learning strategy for feature extraction techniques that uses the imbalance ratio of classes to discount the majority samples. This strategy is adapted to the popular feature extraction methods such as PCA and NMF. The main advantage of the proposed methods is that they are able to lessen the inherent bias of the extracted features to the majority class in existing PCA and NMF algorithms. Experiments on twelve public datasets with different levels of imbalance ratios show that the proposed methods outperformed the state-of-the-art methods on multiple classifiers.
Cheema, P, Khoa, NLD, Alamdari, MM, Liu, W, Wang, Y, Chen, F & Runcie, P 2016, 'On Structural Health Monitoring Using Tensor Analysis and Support Vector Machine with Artificial Negative Data', CIKM '16 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ACM International Conference on Information and Knowledge Management, ACM, Indianapolis, Indiana, USA, pp. 1813-1822.View/Download from: UTS OPUS or Publisher's site
Structural health monitoring is a condition-based technology to monitor infrastructure using sensing systems. Since we usually only have data associated with the healthy state of a structure, one-class approaches are more practical. However, tuning the parameters for one-class techniques (like one-class Support Vector Machines) still remains a relatively open and difficult problem. Moreover, in structural health monitoring, data are usually multi-way, highly redundant and correlated, which a matrix-based two-way approach cannot capture all these relationships and correlations together. Tensor analysis allows us to analyse the multi-way vibration data at the same time. In our approach, we propose the use of tensor learning and support vector machines with artificial negative data generated by density estimation techniques for damage detection, localization and estimation in a one-class manner. The artificial negative data can help tuning SVM parameters and calibrating probabilistic outputs, which is not possible to do with one-class SVM. The proposed method shows promising results using data from laboratory-based structures and also with data collected from the Sydney Harbour Bridge, one of the most iconic structures in Australia. The method works better than the one-class approach and the approach without using tensor analysis.
Chen, Q, Hu, L, Xu, J, Liu, W & cao, L 2015, 'Document Similarity Analysis via Involving Both Explicit and Implicit Semantic Couplings', Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, International Conference on Data Science and Advanced Analytics, IEEE, Paris.View/Download from: UTS OPUS or Publisher's site
Do, D & Liu, W 2016, 'ASTEN: an Accurate and Scalable Approach to Coupled Tensor Factorization', the International Joint Conference in Neural Networks, The International Joint Conference in Neural Networks, IEEE, Vancouver, Canada, pp. 99-106.View/Download from: UTS OPUS or Publisher's site
Coupled Tensor Factorization (CTF) has become one of the most popular methods for joint analysis of high dimensional data generated from multiple sources. The goal of CTF is to factorize correlated datasets into latent factors efficiently. This research was taken with a particular goal of improving the accuracy of CTF. It is important to optimize the factorization of each single tensor of the coupled tensors. To achieve this, we introduce ASTEN, an Accurate and Scalable Tensor factorization method, where the objective function is optimized with respect to every single tensor and matrix. Differing from algorithms with a traditional objective function which forces shared modes among tensors to have identical factors, ASTEN enables each tensor to have its own discriminative factor on the shared mode and thus is capable of finding the accurate approximation of every tensor. Furthermore, to make it highly scalable in handling big data, we design it to be fully distributed and scalable with respect to the number of tensors, their dimensions, their sizes and the number of data partitions. In addition, we provide our theoretical proof and experimental evidence that our algorithm converges to an optimum. Experiments on both real and synthetic datasets demonstrate that our proposed ASTEN outperforms alternative existing algorithms.
Do, Q, Pham, T, Liu, W & Ramamohanarao, K 2016, 'WTEN: An Advanced Coupled Tensor Factorization Strategy for Learning from Imbalanced Data', Web Information Systems Engineering – WISE 2016, International Conference on Web Information Systems Engineering, Springer International Publishing, Shanghai, China, pp. 537-552.View/Download from: UTS OPUS or Publisher's site
Learning from imbalanced and sparse data in multi-mode and high-dimensional tensor formats efficiently is a significant problem in data mining research. On one hand, Coupled Tensor Factorization (CTF) has become one of the most popular methods for joint analysis of heterogeneous sparse data generated from different sources. On the other hand, techniques such as sampling, cost-sensitive learning, etc. have been applied to many supervised learning models to handle imbalanced data. This research focuses on studying the effectiveness of combining advantages of both CTF and imbalanced data learning techniques for missing entry prediction, especially for entries with rare class labels. Importantly, we have also investigated the implication of joint analysis of the main tensor and extra information. One of our major goals is to design a robust weighting strategy for CTF to be able to not only effectively recover missing entries but also perform well when the entries are associated with imbalanced labels. Experiments on both real and synthetic datasets show that our approach outperforms existing CTF algorithms on imbalanced data.
Liu, W 2016, 'Factorization of multiple tensors for supervised feature extraction', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Conference on Neural Information Processing, Springer, Kyoto, Japan, pp. 406-414.View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing AG 2016.Tensors are effective representations for complex and timevarying networks. The factorization of a tensor provides a high-quality low-rank compact basis for each dimension of the tensor, which facilitates the interpretation of important structures of the represented data. Many existing tensor factorization (TF) methods assume there is one tensor that needs to be decomposed to low-rank factors. However in practice, data are usually generated from different time periods or by different class labels, which are represented by a sequence of multiple tensors associated with different labels. When one needs to analyse and compare multiple tensors, existing TF methods are unsuitable for discovering all potentially useful patterns, as they usually fail to discover either common or unique factors among the tensors: (1) if each tensor is factorized separately, the factor matrices will fail to explicitly capture the common information shared by different tensors, and (2) if tensors are concatenated together to form a larger 'overall' tensor and then factorize this concatenated tensor, the intrinsic unique subspaces that are specific to each tensor will be lost. The cause of such an issue is mainly from the fact that existing tensor factorization methods handle data observations in an unsupervised way, considering only features but not labels of the data. To tackle this problem, we design a novel probabilistic tensor factorization model that takes both features and class labels of tensors into account, and produces informative common and unique factors of all tensors simultaneously. Experiment results on feature extraction in classification problems demonstrate the effectiveness of the factors discovered by our method.
Nguyen, H, Liu, W, Rivera, P & Chen, F 2016, 'TrafficWatch: Real-time traffic incident detection and monitoring using social media', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, New Zealand, pp. 540-551.View/Download from: Publisher's site
© Springer International Publishing Switzerland 2016. Social media has become a valuable source of real-time information. Transport Management Centre (TMC) in Australian state government of New South Wales has been collaborating with us to develop TrafficWatch, a system that leverages Twitter as a channel for transport network monitoring, incident and event managements. This system utilises advanced web technologies and state-of-the-art machine learning algorithms. The crawled tweets are first filtered to show incidents in Australia, and then divided into different groups by online clustering and classification algorithms. Findings from the use of TrafficWatch at TMC demonstrated that it has strong potential to report incidents earlier than other data sources, as well as identifying unreported incidents. TrafficWatch also shows its advantages in improving TMC's network monitoring capabilities to assess network impacts of incidents and events.
Rashidi, L, Kan, A, Bailey, J, Chan, J, Leckie, C, Liu, W, Rajasegarar, S & Ramamohanarao, K 2016, 'Node re-ordering as a means of anomaly detection in time-evolving graphs', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, Riva del Garda, Italy, pp. 162-178.View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing AG 2016.Anomaly detection is a vital task for maintaining and improving any dynamic system. In this paper, we address the problem of anomaly detection in time-evolving graphs, where graphs are a natural representation for data in many types of applications. A key challenge in this context is how to process large volumes of streaming graphs. We propose a pre-processing step before running any further analysis on the data, where we permute the rows and columns of the adjacency matrix. This pre-processing step expedites graph mining techniques such as anomaly detection, PageRank, or graph coloring. In this paper, we focus on detecting anomalies in a sequence of graphs based on rank correlations of the reordered nodes. The merits of our approach lie in its simplicity and resilience to challenges such as unsupervised input, large volumes and high velocities of data. We evaluate the scalability and accuracy of our method on real graphs, where our method facilitates graph processing while producing more deterministic orderings. We show that the proposed approach is capable of revealing anomalies in a more efficient manner based on node rankings. Furthermore, our method can produce visual representations of graphs that are useful for graph compression.
Wang, S, Liu, W, Wu, J, Cao, L, Meng, Q & Kennedy, PJ 2016, 'Training deep neural networks on imbalanced data sets', Proceedings of the International Joint Conference on Neural Networks, IEEE International Joint Conference on Neural Networks, IEEE, Vancouver, Canada, pp. 4368-4374.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.Deep learning has become increasingly popular in both academic and industrial areas in the past years. Various domains including pattern recognition, computer vision, and natural language processing have witnessed the great power of deep networks. However, current studies on deep learning mainly focus on data sets with balanced class labels, while its performance on imbalanced data is not well examined. Imbalanced data sets exist widely in real world and they have been providing great challenges for classification tasks. In this paper, we focus on the problem of classification using deep network on imbalanced data sets. Specifically, a novel loss function called mean false error together with its improved version mean squared false error are proposed for the training of deep networks on imbalanced data sets. The proposed method can effectively capture classification errors from both majority class and minority class equally. Experiments and comparisons demonstrate the superiority of the proposed approach compared with conventional methods in classifying imbalanced data sets on deep neural networks.
Jiang, X, Liu, W, Cao, L & Long, G 2015, 'Coupled Collaborative Filtering for Context-aware Recommendation', AAAI Publications, Twenty-Ninth AAAI Conference on Artificial Intelligence, Student Abstracts, AAAI Conference on Artificial Intelligence, AAAI, Austin Texas, USA, pp. 4172-4173.View/Download from: UTS OPUS
Context-aware features have been widely recognized as important factors in recommender systems. However, as a major technique in recommender systems, traditional Collaborative Filtering (CF) does not provide a straight-forward way of integrating the context-aware information into personal recommendation. We propose a Coupled Collaborative Filtering (CCF) model to measure the contextual information and use it to improve recommendations. In the proposed approach, coupled similarity computation is designed to be calculated by interitem, intra-context and inter-context interactions among item, user and context-ware factors. Experiments based on different types of CF models demonstrate the effectiveness of our design.
Khoa, NLD, Zhang, B, Wang, Y, Liu, W, Chen, F, Mustapha, S & Runcie, P 2015, 'On Damage Identification in Civil Structures Using Tensor Analysis', Advances in Knowledge Discovery and Data Mining: 19th Pacific-Asia Conference Proceedings, Part 1, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer International Publishing, Ho Chi Minh City, Vietnam, pp. 459-471.View/Download from: Publisher's site
Luo, L, Liu, W, Koprinska, I & Chen, F 2015, 'Discovering causal structures from time series data via enhanced granger causality', AI 2015: Advances in Artificial Intelligence (LNCS), Australasian Joint Conference on Artificial Intelligence, Springer, Canberra, Australia, pp. 365-378.View/Download from: Publisher's site
© Springer International Publishing Switzerland 2015. Granger causality has been applied to explore predictive causal relations among multiple time series in various fields. However, the existence of nonstationary distributional changes among the time series variables poses significant challenges. By analyzing a real dataset, we observe that factors such as noise, distribution changes and shifts increase the complexity of the modelling, and large errors often occur when the underlying distribution shifts with time. Motivated by this challenge, we propose a new regression model for causal structure discovery – a Linear Model with Weighted Distribution Shift (linear WDS), which improves the prediction accuracy of the Granger causality model by taking into account the weights of the distribution-shift samples and by optimizing a quadratic-mean based objective function. The linear WDS is integrated in the Granger causality model to improve the inference of the predictive causal structure. The performance of the enhanced Granger causality model is evaluated on synthetic datasets and real traffic datasets, and the proposed model is compared with three different regression-based Granger causality models (standard linear regression, robust regression and quadratic-mean-based regression). The results show that the enhanced Granger causality model outperforms the other models especially when there are distribution shifts in the data.
Luo, L, Liu, W, Koprinska, I & Chen, F 2015, 'Discrimination-aware association rule mining for unbiased data analytics', Big Data Analytics and Knowledge Discovery: 17th International Conference, DaWaK 2015, Valencia, Spain, September 1-4, 2015, Proceedings, International Conference on Big Data Analytics and Knowledge Discovery, Springer International Publishing, Valencia; Spain, pp. 108-120.View/Download from: Publisher's site
A discriminatory dataset refers to a dataset with undesirable correlation between sensitive attributes and the class label, which often leads to biased decision making in data analytics processes. This paper investigates how to build discrimination-aware models even when the available training set is intrinsically discriminating based on some sensitive attributes, such as race, gender or personal status. We propose a new classification method called Discrimination-Aware Association Rule classifier (DAAR), which integrates a new discrimination-aware measure and an association rule mining algorithm. We evaluate the performance of DAAR on three real datasets from different domains and compare it with two non-discrimination-aware classifiers (a standard association rule classification algorithm and the state-of-the-art association rule algorithm SPARCCC), and also with a recently proposed discrimination-aware decision tree method. The results show that DAAR is able to effectively filter out the discriminatory rules and decrease the discrimination on all datasets with insignificant impact on the predictive accuracy.
Shao, J, Yin, J, Liu, W & Cao, L 2015, 'Actionable Combined High Utility Itemset Mining', AAAI'15 Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI Press, Austin, Texas, USA, pp. 4206-4207.View/Download from: UTS OPUS
Shao, J, Yin, J, Liu, W & Cao, L 2015, 'Mining Actionable Combined Patterns of High Utility and Frequency', Proceedings of the IEEE International Conference on Data Science and Advanced Analytics, IEEE International Conference on Data Science and Advanced Analytics, IEEE, Paris, pp. 1-10.View/Download from: UTS OPUS or Publisher's site
In recent years, the importance of identifying actionable patterns has become increasingly recognized so that decision-support actions can be inspired by the resultant patterns. A typical shift is on identifying high utility rather than highly frequent patterns. Accordingly, High Utility Itemset (HUI) Mining methods have become quite popular as well as faster and more reliable than before. However, the current research focus has been on improving the efficiency while the coupling relationships between items are ignored. It is important to study item and itemset couplings inbuilt in the data. For example, the utility of one itemset might be lower than user-specified threshold until one additional itemset takes part in; and vice versa, an item's utility might be high until another one joins in. In this way, even though some absolutely high utility itemsets can be discovered, sometimes it is easily to find out that quite a lot of redundant itemsets sharing the same item are mined (e.g., if the utility of a diamond is high enough, all its supersets are proved to be HUIs). Such itemsets are not actionable, and sellers cannot make higher profit if marketing strategies are created on top of such findings. To this end, here we introduce a new framework for mining actionable high utility association rules, called Combined Utility-Association Rules (CUAR), which aims to find high utility and strong association of itemset combinations incorporating item/itemset relations. The algorithm is proved to be efficient per experimental outcomes on both real and synthetic datasets.
Chan, J, Vinh, NX, Liu, W, Bailey, J, Leckie, CA, Ramamohanarao, K & Pei, J 2014, 'Structure-aware distance measures for comparing clusterings in graphs', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 362-373.View/Download from: Publisher's site
Clustering in graphs aims to group vertices with similar patterns of connections. Applications include discovering communities and latent structures in graphs. Many algorithms have been proposed to find graph clusterings, but an open problem is the need for suitable comparison measures to quantitatively validate these algorithms, performing consensus clustering and to track evolving (graph) clusters across time. To date, most comparison measures have focused on comparing the vertex groupings, and completely ignore the difference in the structural approximations in the clusterings, which can lead to counter-intuitive comparisons. In this paper, we propose new measures that account for differences in the approximations. We focus on comparison measures for two important graph clustering approaches, community detection and blockmodelling, and propose comparison measures that work for weighted (and unweighted) graphs. © 2014 Springer International Publishing.
Liu, W, Lee, D & Rao, K 2014, 'Using local information to significantly improve classification performance', CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management, pp. 1947-1950.View/Download from: Publisher's site
Copyright 2014 ACM. In this research we propose to derive new features based on data samples' local information with the aim of improving the performance of general supervised learning algorithms. The creation of new features is inspired by the measure of average precision which is known to be a robust measure that is insensitive to the number of retrieved items in information retrieval. We use the idea of average precision to weight the neighbours of an instance and show that this weighting strategy is insensitive to the number of neighbours in the locality. Information captured in the new features allows a general classifier to learn additional useful peripheral knowledge that are helpful in building effective classification models. We comprehensively evaluate our method on real datasets and the results show substantial improvements in the performance of classifiers including SVM, Bayesian networks, random forest, and C4.5.
Liu, W, Sarda, A, Chen, F & Geers, G 2014, 'Forecasting changes of traffic flow caused by road incidents', 21st World Congress on Intelligent Transport Systems, ITSWC 2014: Reinventing Transportation in Our Connected World.
This paper explores the potential for supervised machine learning techniques in forecasting changes of traffic flow caused by road incidents based on incident features. Data fusion approaches are carried out on a high quality SCATS dataset measuring traffic flow of a major Australian city, and on an incident log data set encompassing a time period of 4 months' road incidents. Based on incident features, a range of both prevalent and advanced machine learning algorithms are applied to these data, and the accuracies of the algorithms are evaluated. We then examine the effectiveness of such models in categorizing changes of traffic flow as either trivial or non-trivial in the extent of their responses to incidents. The models are promising in their capacity and are able to correctly predict with more than 70% accuracy that a change of traffic flow shall be major. This has significant implications for determining the optimal allocation of resources for both road traffic control and incident response units.
Wang, F, Liu, W & Chawla, S 2014, 'On Sparse Feature Attacks in Adversarial Learning', Proceedings - IEEE International Conference on Data Mining, ICDM, IEEE International Conference on Data Mining, IEEE, Shenzhen, China, pp. 1013-1018.View/Download from: UTS OPUS or Publisher's site
Adversarial learning is the study of machine learning techniques deployed in non-benign environments. Example applications include classifications for detecting spam email, network intrusion detection and credit card scoring. In fact as the gamut of application domains of machine learning grows, the possibility and opportunity for adversarial behavior will only increase. Till now, the standard assumption about modeling adversarial behavior has been to empower an adversary to change all features of the classifier sat will. The adversary pays a cost proportional to the size of 'attack'. We refer to this form of adversarial behavior as a dense feature attack. However, the aim of an adversary is not just to subvert a classifier but carry out data transformation in a way such that spam continues to appear like spam to the user as much as possible. We demonstrate that an adversary achieves this objective by carrying out a sparse feature attack. We design an algorithm to show how a classifier should be designed to be robust against sparse adversarial attacks. Our main insight is that sparse feature attacks are best defended by designing classifiers which use l1 regularizers.
Chan, J, Liu, W, Kan, A, Leckie, C & Bailey, J 2013, 'Discovering latent blockmodels in sparse and noisy graphs using non-negative matrix factorisation', International Conference on Information and Knowledge Management, Proceedings, pp. 811-816.View/Download from: Publisher's site
Blockmodelling is an important technique in social network analysis for discovering the latent structure in graphs. A blockmodel partitions the set of vertices in a graph into groups, where there are either many edges or few edges between any two groups. For example, in the reply graph of a question and answer forum, blockmodelling can identify the group of experts by their many replies to questioners, and the group of questioners by their lack of replies among themselves but many replies from experts. Non-negative matrix factorisation has been successfully applied to many problems, including blockmodelling. However, these existing approaches can fail to discover the true latent structure when the graphs have strong background noise or are sparse, which is typical of most real graphs. In this paper, we propose a new non-negative matrix factorisation approach that can discover blockmodels in sparse and noisy graphs. We use synthetic and real datasets to show that our approaches have much higher accuracy and comparable running times. Copyright 2013 ACM.
Yang, P, Liu, W, Zhou, BB, Chawla, S & Zomaya, AY 2013, 'Ensemble-based wrapper methods for feature selection and class imbalance learning', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 544-555.View/Download from: Publisher's site
The wrapper feature selection approach is useful in identifying informative feature subsets from high-dimensional datasets. Typically, an inductive algorithm "wrapped" in a search algorithm is used to evaluate the merit of the selected features. However, significant bias may be introduced when dealing with highly imbalanced dataset. That is, the selected features may favour one class while being less useful to the adverse class. In this paper, we propose an ensemble-based wrapper approach for feature selection from data with highly imbalanced class distribution. The key idea is to create multiple balanced datasets from the original imbalanced dataset via sampling, and subsequently evaluate feature subsets using an ensemble of base classifiers each trained on a balanced dataset. The proposed approach provides a unified framework that incorporates ensemble feature selection and multiple sampling in a mutually beneficial way. The experimental results indicate that, overall, features selected by the ensemble-based wrapper are significantly better than those selected by wrappers with a single inductive algorithm in imbalanced data classification. © Springer-Verlag 2013.
Chan, J, Liu, W, Leckie, C, Bailey, J & Ramamohanarao, K 2012, 'SeqiBloc: Mining multi-time spanning blockmodels in dynamic graphs', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 651-659.View/Download from: Publisher's site
Blockmodelling is an important technique for decomposing graphs into sets of roles. Vertices playing the same role have similar patterns of interactions with vertices in other roles. These roles, along with the role to role interactions, can succinctly summarise the underlying structure of the studied graphs. As the underlying graphs evolve with time, it is important to study how their blockmodels evolve too. This will enable us to detect role changes across time, detect different patterns of interactions, for example, weekday and weekend behaviour, and allow us to study how the structure in the underlying dynamic graph evolves. To date, there has been limited research on studying dynamic blockmodels. They focus on smoothing role changes between adjacent time instances. However, this approach can overfit during stationary periods where the underling structure does not change but there is random noise in the graph. Therefore, an approach to a) find blockmodels across spans of time and b) to find the stationary periods is needed. In this paper, we propose an information theoretic framework, SeqiBloc, combined with a change point detection approach to achieve a) and b). In addition, we propose new vertex equivalence definitions that include time, and show how they relate back to our information theoretic approach. We demonstrate their usefulness and superior accuracy over existing work on synthetic and real datasets. © 2012 ACM.
Liu, W, Chawla, S, Bailey, J, Leckie, C & Ramamohanarao, K 2012, 'An efficient adversarial learning strategy for constructing robust classification boundaries', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 649-660.View/Download from: Publisher's site
Traditional classification methods assume that the training and the test data arise from the same underlying distribution. However in some adversarial settings, the test set can be deliberately constructed in order to increase the error rates of a classifier. A prominent example is email spam where words are transformed to avoid word-based features embedded in a spam filter. Recent research has modeled interactions between a data miner and an adversary as a sequential Stackelberg game, and solved its Nash equilibrium to build classifiers that are more robust to subsequent manipulations on training data sets. However in this paper we argue that the iterative algorithm used in the Stackelberg game, which solves an optimization problem at each step of play, is sufficient but not necessary for achieving Nash equilibria in classification problems. Instead, we propose a method that transforms singular vectors of a training data matrix to simulate manipulations by an adversary, and from that perspective a Nash equilibrium can be obtained by solving a novel optimization problem only once. We show that compared with the iterative algorithm used in recent literature, our one-step game significantly reduces computing time while still being able to produce good Nash equilibria results. © 2012 Springer-Verlag.
Liu, W, Kan, A, Chan, J, Bailey, J, Leckie, C, Pei, J & Kotagiri, R 2012, 'On compressing weighted time-evolving graphs', ACM International Conference Proceeding Series, pp. 2319-2322.View/Download from: Publisher's site
Existing graph compression techniquesmostly focus on static graphs. However for many practical graphs such as social networks the edge weights frequently change over time. This phenomenon raises the question of how to compress dynamic graphs while maintaining most of their intrinsic structural patterns at each time snapshot. In this paper we show that the encoding cost of a dynamic graph is proportional to the heterogeneity of a three dimensional tensor that represents the dynamic graph. We propose an effective algorithm that compresses a dynamic graph by reducing the heterogeneity of its tensor representation, and at the same time also maintains a maximum lossy compression error at any time stamp of the dynamic graph. The bounded compression error benefits compressed graphs in that they retain good approximations of the original edge weights, and hence properties of the original graph (such as shortest paths) are well preserved. To the best of our knowledge, this is the first work that compresses weighted dynamic graphs with bounded lossy compression error at any time snapshot of the graph. © 2012 ACM.
Pang, LX, Chawla, S, Liu, W & Zheng, Y 2011, 'On mining anomalous patterns in road traffic streams', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 237-251.View/Download from: Publisher's site
Large number of taxicabs in major metropolitan cities are now equipped with a GPS device. Since taxis are on the road nearly twenty four hours a day (with drivers changing shifts), they can now act as reliable sensors to monitor the behavior of traffic. In this paper we use GPS data from taxis to monitor the emergence of unexpected behavior in the Beijing metropolitan area. We adapt likelihood ratio tests (LRT) which have previously been mostly used in epidemiological studies to describe traffic patterns. To the best of our knowledge the use of LRT in traffic domain is not only novel but results in very accurate and rapid detection of anomalous behavior. © 2011 Springer-Verlag.