Zheng, Z, Wei, W, Liu, C, Cao, W, Cao, L & Bhatia, M 2016, 'An effective contrast sequential pattern mining approach to taxpayer behavior analysis', World Wide Web, vol. 19, no. 4, pp. 633-651.View/Download from: UTS OPUS or Publisher's site
Data mining for client behavior analysis has become increasingly important in business, however further analysis on transactions and sequential behaviors would be of even greater value, especially in the financial service industry, such as banking and insurance, government and so on. In a real-world business application of taxation debt collection, in order to understand the internal relationship between taxpayers' sequential behaviors (payment, lodgment and actions) and compliance to their debt, we need to find the contrast sequential behavior patterns between compliant and non-compliant taxpayers. Contrast Patterns (CP) are defined as the itemsets showing the difference/discrimination between two classes/datasets (Dong and Li, 1999). However, the existing CP mining methods which can only mine itemset patterns, are not suitable for mining sequential patterns, such as time-ordered transactions in taxpayer sequential behaviors. Little work has been conducted on Contrast Sequential Pattern (CSP) mining so far. Therefore, to address this issue, we develop a CSP mining approach, e C S P, by using an effective CSP-tree structure, which improves the PrefixSpan tree (Pei et al., 2001) for mining contrast patterns. We propose some heuristics and interestingness filtering criteria, and integrate them into the CSP-tree seamlessly to reduce the search space and to find business-interesting patterns as well. The performance of the proposed approach is evaluated on three real-world datasets. In addition, we use a case study to show how to implement the approach to analyse taxpayer behaviour. The results show a very promising performance and convincing business value.
Li, J, Wang, C, Wei, W, Li, M & Liu, C 2013, 'Efficient mining of contrast patterns on large scale imbalanced real-life data', Lecture Notes in Computer Science, vol. 7818, no. 1, pp. 62-73.View/Download from: UTS OPUS or Publisher's site
Contrast pattern mining has been studied intensively for its strong discriminative capability. However, the state-of-the-art methods rarely consider the class imbalanced problem, which has been proved to be a big challenge in mining large scale data. This paper introduces a novel pattern, i.e. converging pattern, which refers to the itemsets whose supports contrast sharply from the minority class to the majority one. A novel algorithm, ConvergMiner, which adopts T*-tree and branch bound pruning strategies to mine converging patterns efficiently, is proposed. Substantial experiments in online banking fraud detection show that the ConvergMiner greatly outperforms the existing cost-sensitive classification methods in terms of predicative accuracy. In particular, the efficiency improves with the increase of data imbalance.
Wei, W, Li, J, Cao, L, Ou, Y & Chen, J 2013, 'Effective Detection of Sophisticated Online Banking Fraud in Extremely Imbalanced Data', World Wide Web, vol. 16, no. 4, pp. 449-475.View/Download from: UTS OPUS or Publisher's site
Sophisticated online banking fraud reflects the integrative abuse of resources in social, cyber and physical worlds. Its detection is a typical use case of the broad-based Wisdom Web of Things (W2T) methodology. However, there is very limited information available to distinguish dynamic fraud from genuine customer behavior in such an extremely sparse and imbalanced data environment, which makes the instant and effective detection become more and more important and challenging. In this paper, we propose an effective online banking fraud detection framework that synthesizes relevant resources and incorporates several advanced data mining techniques. By building a contrast vector for each transaction based on its customerâs historical behavior sequence, we profile the differentiating rate of each current transaction against the customerâs behavior preference. A novel algorithm, ContrastMiner, is introduced to efficiently mine contrast patterns and distinguish fraudulent from genuine behavior, followed by an effective pattern selection and risk scoring that combines predictions from different models. Results from experiments on large-scale real online banking data demonstrate that our system can achieve substantially higher accuracy and lower alert volume than the latest benchmarking fraud detection system incorporating domain knowledge and traditional fraud detection methods.
Xu, J, Wei, W & Cao, L 2017, 'Copula-based high dimensional cross-market dependence modeling', Proceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017, IEEE International Conference on Data Science and Advanced Analytics, IEEE, Tokyo, Japan, pp. 734-743.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Dependence across multiple financial markets, such as stock and foreign exchange rate markets, is high-dimensional, contains various relationships, and often presents complicated dependence structures and characteristics such as asymmetrical dependence. Modeling such dependence structures is very challenging. Although copula has been demonstrated to be effective in describing dependence between variables in recent studies, building effective dependence structures to address the above complexities significantly challenges existing copula models. In this paper, we propose a new D vine-based model with a bottom-up strategy to construct high-dimensional dependence structures. The new modeling outcomes are applied to trade 15 stock market indices and 10 currency rates over 16 years as a case study. Extensive experimental results show that this model and its intrinsic design significantly outperform typical models and industry baselines, as shown by the log-likelihood and Vuong test, and Value at Risk - a widely used industrial benchmark. Our model provides interpretable knowledge and profound insights into the high-dimensional dependence structures across data sources.
Wei, W, Li, J, Cao, L, Sun, J, Liu, C & Li, M 2013, 'Optimal Allocation of High Dimensional Assets through Canonical Vines', Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, April 14-17, 2013, Proceedings, Part I, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Gold Coast, Australia, pp. 366-377.View/Download from: UTS OPUS or Publisher's site
Canonical Vine, Mean Variance Criterion, Financial Return.