Zhigang Zheng was awarded a PhD of Computing Science and joined UTS in 2012.
His research interests are in the areas of sequential pattern mining and data mining application, especially in customer data and behavior pattern analysis.
Zheng, Z, Wei, W, Liu, C, Cao, W, Cao, L & Bhatia, M 2016, 'An effective contrast sequential pattern mining approach to taxpayer behavior analysis', World Wide Web, vol. 19, no. 4, pp. 633-651.View/Download from: UTS OPUS or Publisher's site
Data mining for client behavior analysis has become increasingly important in business, however further analysis on transactions and sequential behaviors would be of even greater value, especially in the financial service industry, such as banking and insurance, government and so on. In a real-world business application of taxation debt collection, in order to understand the internal relationship between taxpayers' sequential behaviors (payment, lodgment and actions) and compliance to their debt, we need to find the contrast sequential behavior patterns between compliant and non-compliant taxpayers. Contrast Patterns (CP) are defined as the itemsets showing the difference/discrimination between two classes/datasets (Dong and Li, 1999). However, the existing CP mining methods which can only mine itemset patterns, are not suitable for mining sequential patterns, such as time-ordered transactions in taxpayer sequential behaviors. Little work has been conducted on Contrast Sequential Pattern (CSP) mining so far. Therefore, to address this issue, we develop a CSP mining approach, e C S P, by using an effective CSP-tree structure, which improves the PrefixSpan tree (Pei et al., 2001) for mining contrast patterns. We propose some heuristics and interestingness filtering criteria, and integrate them into the CSP-tree seamlessly to reduce the search space and to find business-interesting patterns as well. The performance of the proposed approach is evaluated on three real-world datasets. In addition, we use a case study to show how to implement the approach to analyse taxpayer behaviour. The results show a very promising performance and convincing business value.
© 2016 The Authors. Published by Elsevier B.V. As an important tool for behavior informatics, negative sequential patterns (NSP) (such as missing medical treatments) are critical and sometimes much more informative than positive sequential patterns (PSP) (e.g. using a medical service) in many intelligent systems and applications such as intelligent transport systems, healthcare and risk management, as they often involve non-occurring but interesting behaviors. However, discovering NSP is much more difficult than identifying PSP due to the significant problem complexity caused by non-occurring elements, high computational cost and huge search space in calculating negative sequential candidates (NSC). So far, the problem has not been formalized well, and very few approaches have been proposed to mine for specific types of NSP, which rely on database re-scans after identifying PSP in order to calculate the NSC supports. This has been shown to be very inefficient or even impractical, since the NSC search space is usually huge. This paper proposes a very innovative and efficient theoretical framework: Set theory-based NSP mining (ST-NSP), and a corresponding algorithm, e-NSP, to efficiently identify NSP by involving only the identified PSP, without re-scanning the database. Accordingly, negative containment is first defined to determine whether a data sequence contains a negative sequence based on set theory. Second, an efficient approach is proposed to convert the negative containment problem to a positive containment problem. The NSC supports are then calculated based only on the corresponding PSP. This not only avoids the need for additional database scans, but also enables the use of existing PSP mining algorithms to mine for NSP. Finally, a simple but efficient strategy is proposed to generate NSC. Theoretical analyses show that e-NSP performs particularly well on datasets with a small number of elements in a sequence, a large number of itemsets and low minimum s...
Yin, J, Zheng, Z, Cao, L, Song, Y & Wei, W 2013, 'Efficiently Mining Top-K High Utility Sequential Patterns', 2013 IEEE 13th International Conference on Data Mining, IEEE International Conference on Data Mining, IEEE, Dallas, TX, USA, pp. 1259-1264.View/Download from: UTS OPUS or Publisher's site
High utility sequential pattern mining is an emerging topic in the data mining community. Compared to the classic frequent sequence mining, the utility framework provides more informative and actionable knowledge since the utility of a sequence indicates business value and impact. However, the introduction of "utility" makes the problem fundamentally different from the frequency-based pattern mining framework and brings about dramatic challenges. Although the existing high utility sequential pattern mining algorithms can discover all the patterns satisfying a given minimum utility, it is often difficult for users to set a proper minimum utility. A too small value may produce thousands of patterns, whereas a too big one may lead to no findings. In this paper, we propose a novel framework called top-k high utility sequential pattern mining to tackle this critical problem. Accordingly, an efficient algorithm, Top-k high Utility Sequence (TUS for short) mining, is designed to identify top-k high utility sequential patterns without minimum utility. In addition, three effective features are introduced to handle the efficiency problem, including two strategies for raising the threshold and one pruning for filtering unpromising items. Our experiments are conducted on both synthetic and real datasets. The results show that TUS incorporating the efficiency-enhanced strategies demonstrates impressive performance without missing any high utility sequential patterns
Yin, J, Zheng, Z & Cao, L 2012, 'USpan: an efficient algorithm for mining high utility sequential patterns', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM International Conference on Knowledge Discovery and Data Mining, ACM, Beijing, China, pp. 660-668.View/Download from: UTS OPUS or Publisher's site
Sequential pattern mining plays an important role in many applications, such as bioinformatics and consumer behavior analysis. However, the classic frequency-based framework often leads to many patterns being identified, most of which are not informative enough for business decision-making. In frequent pattern mining, a recent effort has been to incorporate utility into the pattern selection framework, so that high utility (frequent or infrequent) patterns are mined which address typical business concerns such as dollar value associated with each pattern. In this paper, we incorporate utility into sequential pattern mining, and a generic framework for high utility sequence mining is defined. An efficient algorithm, USpan, is presented to mine for high utility sequential patterns. In USpan, we introduce the lexicographic quantitative sequence tree to extract the complete set of high utility sequences and design concatenation mechanisms for calculating the utility of a node and its children with two effective pruning strategies. Substantial experiments on both synthetic and real datasets show that USpan efficiently identifies high utility sequences from large scale data with very low minimum utility.
Dong, X, Zheng, Z, Cao, L, Zhao, Y, Zhang, C, Li, J, Wei, W & Ou, Y 2011, 'e-NSP: efficient negative sequential pattern mining based on identified positive patterns without database rescanning', Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM International Conference on Information and Knowledge Management, ACM, Glasgow, Scotland, UK, pp. 825-830.View/Download from: UTS OPUS or Publisher's site
Mining Negative Sequential Patterns (NSP) is much more challenging than mining Positive Sequential Patterns (PSP) due to the high computational complexity and huge search space required in calculating Negative Sequential Candidates (NSC). Very few approaches are available for mining NSP, which mainly rely on re-scanning databases after identifying PSP. As a result, they are very ine?cient. In this paper, we propose an e?cient algorithm for mining NSP, called e-NSP, which mines for NSP by only involving the identi?ed PSP, without re-scanning databases. First, negative containment is de?ned to determine whether or not a data sequence contains a negative sequence. Second, an e?cient approach is proposed to convert the negative containment problem to a positive containment problem. The supports of NSC are then calculated based only on the corresponding PSP. Finally, a simple but e?cient approach is proposed to generate NSC. With e-NSP, mining NSP does not require additional database scans, and the existing PSP mining algorithms can be integrated into e-NSP to mine for NSP e?ciently. eNSP is compared with two currently available NSP mining algorithms on 14 synthetic and real-life datasets. Intensive experiments show that e-NSP takes as little as 3% of the runtime of the baseline approaches and is applicable for efficient mining of NSP in large datasets.
Zheng, Z, Zhao, Y, Zuo, Z & Cao, L 2010, 'An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns', Advances in Knowledge Discovery and Data Mining - Lecture Notes in Artificial Intelligence, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer Berlin / Heidelberg, Hyderabad, India, pp. 262-273.View/Download from: UTS OPUS or Publisher's site
Negative sequential pattern mining has attracted increasing concerns in recent datamining research because it considers negative relationships between itemsets, which are ignored by positive sequential pattern mining. However, the search space for mining negative patterns is much bigger than that for positive ones.When the support threshold is low, in particular, there will be huge amounts of negative candidates. This paper proposes a Genetic Algorithm (GA) based algorithm to find negative sequential patterns with novel crossover and mutation operations, which are efficient at passing good genes on to next generations without generating candidates. An effective dynamic fitness function and a pruning method are also provided to improve performance. The results of extensive experiments show that the proposed method can find negative patterns efficiently and has remarkable performance compared with some other algorithms of negative pattern mining.
Zheng, Z, Zhao, Y, Zuo, Z & Cao, L 2009, 'Negative-GSP: An Efficient Method for Mining Negative Sequential Patterns', Proceedings of the 8th Australasian Data Mining Conference (AusDM'09): Data Mining and Analytics - Conferences in Research and Practice in Information Technology Volume 101, Australian Data Mining Conference, Australian Computer Society, Melbourne, Australia, pp. 63-67.View/Download from: UTS OPUS
Different from traditional positive sequential pattern mining, negative sequential pattern mining considers both positive and negative relationships between items. Negative sequential pattern mining doesn't necessarily follow the Apriori principle, and the searching space is much larger than positive pattern mining. Giving definitions and some constraints of negative sequential patterns, this paper proposes a new method for mining negative sequential patterns, called Negative-GSP. Negative-GSP can find negative sequential patterns effectively and efficiently by joining and pruning, and extensive experimental results show the efficiency of the method.
Cao, L, Luo, D, Xiao, Y & Zheng, Z 2008, 'Agent Collaboration for Multiple Trading Strategy Integration', Lecture Notes in Artificial Intelligence Vol 4953: Agent and Multi-Agent Systems: Technologies and Applications, International KES Symposium on Agents and Multiagent systems - Technologies and Applications, Springer Berlin, Incheon, Korea,, pp. 361-370.View/Download from: UTS OPUS or Publisher's site
The collaboration of agents can undertake complicated tasks that cannot be handled well by a single agent. This is even true for excecuting multiple goals at the same time. In this paper, we demonstrate the use of trading agent collaboration in integrating multiple trading strategies. Trading agents are used for developing quality trading strategies to support smart actions in the market. Evolutionary trading agents are armed with evolutionary computing capability to optimize strategy parameters. To develop even smarter trading strategies (we call golden strategies), multiple Evolutionary and Collaborative trading agents negotiate with each other for m loops to search multiple local strategies with best parameter combinations. They also integrate multiple classes of strategies for trading agents to achieve the best global objectives acceptable for trader needs. Tests of five classes of trading strategies in ten years of five markets of data have shown that agent collaboration for strategy integration can achieve much better performance of trading compared with that of either individually optimized or randomly chosen strategies.