Anjin Liu is a Postdoctoral Research Associate in the A/DRsch Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology Sydney. He received the BIT degree (Honour) at the University of Sydney in 2012. His research interests include concept drift detection, adaptive data stream learning, multi-stream learning, machine learning and big data analytics
Can supervise: YES
Concept Drift, Adaptive Data Stream Learning, Multi-stream Learning
IEEE Concept drift describes unforeseeable changes in the underlying distribution of streaming data over time. Concept drift research involves the development of methodologies and techniques for drift detection, understanding and adaptation. Data analysis has revealed that machine learning in a concept drift environment will result in poor learning results if the drift is not addressed. To help researchers identify which research topics are significant and how to apply related techniques in data analysis tasks, it is necessary that a high quality, instructive review of current research developments and trends in the concept drift field is conducted. In addition, due to the rapid development of concept drift in recent years, the methodologies of learning under concept drift have become noticeably systematic, unveiling a framework which has not been mentioned in literature. This paper reviews over 130 high quality publications in concept drift related research areas, analyzes up-to-date developments in methodologies and techniques, and establishes a framework of learning under concept drift including three main components: concept drift detection, concept drift understanding, and concept drift adaptation. This paper lists and discusses 10 popular synthetic datasets and 14 publicly available benchmark datasets used for evaluating the performance of learning algorithms aiming at handling concept drift. Also, concept drift related research directions are covered and discussed. By providing state-of-the-art knowledge, this survey will directly support researchers in their understanding of research developments in the field of learning under concept drift.
Liu, A, Lu, J, Liu, F & Zhang, G 2018, 'Accumulating regional density dissimilarity for concept drift detection in data streams', Pattern Recognition, vol. 76, pp. 256-272.View/Download from: UTS OPUS or Publisher's site
© 2017 Elsevier Ltd In a non-stationary environment, newly received data may have different knowledge patterns from the data used to train learning models. As time passes, a learning model's performance may become increasingly unreliable. This problem is known as concept drift and is a common issue in real-world domains. Concept drift detection has attracted increasing attention in recent years. However, very few existing methods pay attention to small regional drifts, and their accuracy may vary due to differing statistical significance tests. This paper presents a novel concept drift detection method, based on regional-density estimation, named nearest neighbor-based density variation identification (NN-DVI). It consists of three components. The first is a k-nearest neighbor-based space-partitioning schema (NNPS), which transforms unmeasurable discrete da ta instances into a set of shared subspaces for density estimation. The second is a distance function that accumulates the density discrepancies in these subspaces and quantifies the overall differences. The third component is a tailored statistical significance test by which the confidence interval of a concept drift can be accurately determined. The distance applied in NN-DVI is sensitive to regional drift and has been proven to follow a normal distribution. As a result, the NN-DVI's accuracy and false-alarm rate are statistically guaranteed. Additionally, several benchmarks have been used to evaluate the method, including both synthetic and real-world datasets. The overall results show that NN-DVI has better performance in terms of addressing problems related to concept drift-detection.
Chow, D, Liu, A, Zhang, G & Lu, J 2019, 'Knowledge graph-based entity importance learning for multi-stream regression on Australian fuel price forecasting', Proceedings of the International Joint Conference on Neural Networks.View/Download from: Publisher's site
© 2019 IEEE. A knowledge graph (KG) represents a collection of interlinked descriptions of entities. It has become a key focus for organising and utilising this type of data for applications. Many graph embedding techniques have been proposed to simplify the manipulation while preserving the inherent structure of the KG. However, scant attention has been given to the investigation of the importance of the entities (the nodes of KGs). In this paper, we propose a novel entities importance learning framework that investigates how to weight the entities and use them as a prior knowledge for solving multi-stream regression problems. The framework consists of KG feature extraction, multi-stream correlation analysis, and entity importance learning. To evaluate the proposed method, we implemented the framework based on Wikidata and applied it to Australian retail fuel price forecasting. The experiment results indicate that the proposed method reduces prediction error, which supports the weighted knowledge graph information as a means for improving machine learning model accuracy.
Liu, A, Song, Y, Zhang, G & Lu, J 2017, 'Regional concept drift detection and density synchronized drift adaptation', IJCAI International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence Orgqanization, Melbourne, Australia, pp. 2280-2286.View/Download from: UTS OPUS
In data stream mining, the emergence of new patterns or a pattern ceasing to exist is called concept drift. Concept drift makes the learning process complicated because of the inconsistency between existing data and upcoming data. Since concept drift was first proposed, numerous articles have been published to address this issue in terms of distribution analysis. However, most distributionbased drift detection methods assume that a drift happens at an exact time point, and the data arrived before that time point is considered not important. Thus, if a drift only occurs in a small region of the entire feature space, the other non-drifted regions may also be suspended, thereby reducing the learning efficiency of models. To retrieve nondrifted information from suspended historical data, we propose a local drift degree (LDD) measurement that can continuously monitor regional density changes. Instead of suspending all historical data after a drift, we synchronize the regional density discrepancies according to LDD. Experimental evaluations on three benchmark data sets show that our concept drift adaptation algorithm improves accuracy compared to other methods.
Liu, A, Zhang, G & Lu, J 2017, 'Fuzzy time windowing for gradual concept drift adaptation', IEEE International Conference on Fuzzy Systems, International Conference on Fuzzy Systems, IEEE, Naples, Italy, pp. 1-6.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. The aim of machine learning is to find hidden insights into historical data, and then apply them to forecast the future data or trends. Machine learning algorithms optimize learning models for lowest error rate based on the assumption that the historical data and the data to be predicted conform to the same knowledge pattern (data distribution). However, if the historical data is not enough, or the knowledge pattern keeps changing (data uncertainty), this assumption will become invalid. In data stream mining, this phenomenon of knowledge pattern changing is called concept drift. To address this issue, we propose a novel fuzzy windowing concept drift adaptation (FW-DA) method. Compared to conventional windowing-based drift adaptation algorithms, FW-DA achieves higher accuracy by allowing the sliding windows to keep an overlapping period so that the data instances belonging to different concepts can be determined more precisely. In addition, FW-DA statistically guarantees that the upcoming data conforms to the inferred knowledge pattern with a certain confidence level. To evaluate FW-DA, four experiments were conducted using both synthetic and real-world data sets. The experiment results show that FW-DA outperforms the other windowing-based methods including state-of-the-art drift adaptation methods.
Liu, A, Zhang, G, Lu, J, Lu, N & Lin, CT 2016, 'An online competence-based concept drift detection algorithm', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Australasian Joint Conference on Artificial Intelligence, Springer, Hobart, TAS, Australia, pp. 416-428.View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing AG 2016. The ability to adapt to new learning environments is a vital feature of contemporary case-based reasoning system. It is imperative that decision makers know when and how to discard outdated cases and apply new cases to perform smart maintenance operations. Competencebased empirical distance has been recently proposed as a measurement that can estimate the difference between case sample sets without knowing the actual case distributions. It is reportedly one of the most accurate drift detection algorithms in both synthetic and real-world data sets. However, as the construction of competence models have to retain every case in memory, it is not suitable for online drift detection. In addition, the high computational complexity O(n 2 ) also limits its practical application, especially when dealing with large scale data sets with time constrains. In this paper, therefore, we propose a space-based online case grouping strategy, and a new case group enhanced competence distance (CGCD), to address these issues. The experiment results show that the proposed strategy and related algorithms significantly improve the efficiency of the current leading competence-based drift detection algorithm.
Liu, A, Zhang, G & Lu, J 2014, 'A Novel Weighting Method for Online Ensemble Learning with the Presence of Concept Drift', Proceedings of the 11th Internationaal FLINS Conference, Decision Making and Soft Computing, International Fuzzy Logic and Intelligent technologies in Nuclear Science Conference, World Scientific Publishing Co. Pte. Ltd., Brazil, pp. 550-555.View/Download from: UTS OPUS or Publisher's site
Ensemble of classifiers is a very popular method for online and incremental learning in non-stationary environment, as it improves the accuracy of single classifiers and is able to recover from drifting concept without explicit drift detection. However, current ensemble weighing methods do not consider the relationship between a test instance and each ensemble member's training domain. As a result, a locally correct ensemble member may be reduced weight unfairly because that its prediction result of an out of domain test instance is wrong. These inaccuracies will increases when there is a significant concept change. In this paper, therefore, we proposed a fuzzy online ensemble weighting method which takes the consideration of the degree of membership of each instance in each ensemble member and a modified majority voting method to improve the ability of ensembles on handling online classification tasks with concept drift
Abstract. In online machine learning, the ability to adapt to new concept quick-ly is highly desired. In this paper, we propose a novel concept drift detection method, which is called Anomaly Analysis Drift Detection (AADD), to im-prove the performance of machine learning algorithms under non-stationary en-vironment. The proposed AADD method is based on an anomaly analysis of learner’s accuracy associate with the similarity between learners’ training do-main and test data. This method first identifies whether there are conflicts be-tween current concept and new coming data. Then the learner will incremental-ly learn the non-conflict data, which will not decrease the accuracy of the learn-er on previous trained data, for concept extension. Otherwise, a new learner will be created based on the new data. Experiments illustrate that this AADD meth-od can detect new concept quickly and learn extensional drift incrementally.