Wang, C, Cao, L, Gaussier, E, Li, J, Ou, Y & Luo, D 2014, 'Coupled Behavior Representation, Modeling, Analysis, and Reasoning', IEEE Intelligent Systems, vol. 29, no. 4, pp. 66-69.View/Download from: UTS OPUS or Publisher's site
Behavior refers to the action, reaction,
or property of an entity, human or otherwise,
to situations or stimuli in its
environment.1 The in-depth analysis
of behavior has been increasingly recognized
as a crucial means for understanding
and disclosing interior driving
forces and intrinsic cause-effects
on business and social applications,
including Web community analysis,
counter-terrorism, fraud detection,
and customer relationship management.
With the deepening and widening
of social/business intelligences and
their networking, the concept of behavior
is in great demand to be consolidated
and formalized to deeply
scrutinize the native behavior intention,
lifecycle, and impact on complex
problems and business issues.
Although there's an emerging focus
on deep behavior studies, such as social
network analysis,2 periodic behavior
analysis3 and behavior informatics
approach,1 previous research work has
mainly focused on individual behaviors
without considering the interactions of
them. However, with increasing network
and community-based events
as well as their applications, such as
group-based crime and social network
interactions, coupling relationships between
behaviors contribute to the intrinsic
causes and impacts of eventual
business and social problems. In the real-world applications, group behavior
interactions (that is, coupled behaviors)
are widely seen in natural, social,
and artificial behavior-related problems.
Complex behavior and social applications
often exhibit strong explicit
or implicit coupling relationships both
between their entities and properties.
Moreover, it's also quite difficult to
model, analyze, and check behaviors
coupled with one another due to the
complexity from data, domain, context,
and impact perspectives.
Due to the emerging popularity and
importance of coupled behaviors, the
representation, modeling, analysis,
mining and learning, and determination
of coupled behaviors are becoming
increasingly essential yet challenging
Wei, W, Li, J, Cao, L, Ou, Y & Chen, J 2013, 'Effective Detection of Sophisticated Online Banking Fraud in Extremely Imbalanced Data', World Wide Web, vol. 16, no. 4, pp. 449-475.View/Download from: UTS OPUS or Publisher's site
Sophisticated online banking fraud reflects the integrative abuse of resources in social, cyber and physical worlds. Its detection is a typical use case of the broad-based Wisdom Web of Things (W2T) methodology. However, there is very limited information available to distinguish dynamic fraud from genuine customer behavior in such an extremely sparse and imbalanced data environment, which makes the instant and effective detection become more and more important and challenging. In this paper, we propose an effective online banking fraud detection framework that synthesizes relevant resources and incorporates several advanced data mining techniques. By building a contrast vector for each transaction based on its customerâs historical behavior sequence, we profile the differentiating rate of each current transaction against the customerâs behavior preference. A novel algorithm, ContrastMiner, is introduced to efficiently mine contrast patterns and distinguish fraudulent from genuine behavior, followed by an effective pattern selection and risk scoring that combines predictions from different models. Results from experiments on large-scale real online banking data demonstrate that our system can achieve substantially higher accuracy and lower alert volume than the latest benchmarking fraud detection system incorporating domain knowledge and traditional fraud detection methods.
Cao, L, Ou, Y & Yu, P 2012, 'Coupled Behavior Analysis With Applications', IEEE Transactions On Knowledge And Data Engineering, vol. 24, no. 8, pp. 1378-1392.View/Download from: UTS OPUS or Publisher's site
Coupled behaviors refer to the activities of one to many actors who are associated with each other in terms of certain relationships. With increasing network and community-based events and applications, such as group-based crime and social network intera
Many real-life applications often involve multiple sequences, which are coupled with each other. It is unreasonable to either study the multiple coupled sequences separately or simply merge them into one sequence, because the information about their interacting relationships would be lost. Furthermore, such coupled sequences also have frequently significant changes which are likely to degrade the performance of trained model. Taking the detection of abnormal trading activity patterns in stock markets as an example, this paper proposes a Hidden Markov Model-based approach to address the above two issues. Our approach is suitable for sequence analysis on multiple coupled sequences and can adapt to the significant sequence changes automatically. Substantial experiments conducted on a real dataset show that our approach is effective.
Market Surveillance plays important mechanism roles in constructing market models. From data analysis perspective, we view it valuable for smart trading in designing legal and profitable trading strategies and smart regulation in maintaining market integrity, transparency and fairness. The existing trading pattern analysis only focuses on interday data which discloses explicit and high-level market dynamics. In the mean time, the existing market surveillance systems available from large exchanges are facing crucial challenges of diversified, dynamic, distributed and cyber-based misuse, mis-disclosure and misdealing of information, announcement and orders in one market or crossing multiple markets. Therefore, there is a crucial need to develop innovative and workable methods for smart trading and surveillance. To deal with such issues, we propose the innovative concept microstructure pattern analysis and corresponding approaches in this paper. Microstructure pattern analysis studies trading behaviour patterns of traders in market microstructure data by utilizing market microstructure knowledge. The identified market microstructure patterns are then used for powering market trading and surveillance agents for automatically detecting/designing profitable and legal trading strategies or monitoring abnormal market dynamics and traderÂs behaviour. Such trading/surveillance agent-driven market trading/surveillance systems can greatly enhance the analytical, discovery and decision-support capability of market trading/surveillance than the current predefined rule/alert-based systems.
Yu, JX, Ou, Y, Zhang, C & Zhang, S 2005, 'Identifying interesting visitors through Web log classification', IEEE Intelligent Systems, vol. 20, no. 3, pp. 55-59.View/Download from: UTS OPUS or Publisher's site
Web site owners have trouble identifying customer purchasing patterns from their Web logs because the two aren't directly related. Thus, organizations must understand their customers' behavior, preferences, and future needs. This imperative leads many companies to develop a great many e-service systems for data collection and analysis. Web mining is a popular technique for analyzing visitor activities in e-service systems. It mainly includes Web text mining, Web structure mining and Web log mining. Our Web log mining approach classifies a particular site's visitors into different groups on the basis of their purchase interest.
Li, M, Li, J, Ou, Y & Luo, D 2015, 'A coupled similarity kernel for pairwise support vector machine' in Agents and Data Mining Interaction (LNCS), Springer, Germany, pp. 114-123.View/Download from: Publisher's site
© 2015 Springer International Publishing Switzerland. Support vector machine is a supervised learning model with associated learning algorithms that analyzes data and recognizes patterns. In various applications, the SVM shows its advantage of the classification performance, however, the original SVM was designed for the numerical data. For using the SVM on the nominal data, most previous research used a certain number to replace each nominal value or transformed the nominal value into the one hot vector. Both methods could not present the original nominal data's structure and the similarity between them, which leads to information loss from the data and reduce the classification performance. In this work, we design a novel coupled similarity metric between nominally attributed data. This metric is pairwise, we also propose an adapted SVMwhich can handle this. The experiment result shows the proposed method outperforms the traditional SVM and other popular classification methods on various public data sets.
Luo, C, Zhao, Y, Luo, D, Ou, Y & Liu, L 2010, 'Recent Advances of Exception Mining in Stock Market' in Pedro Furtado (ed), Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions, IGI Global, Washington, DC, USA, pp. 212-232.View/Download from: UTS OPUS or Publisher's site
This chapter aims to provide a comprehensive survey of the current advanced technologies of exception mining in stock market. The stock market surveillance is to identify market anomalies so as to provide a fair and efficient trading platform. The technologies of market surveillance developed from simple statistical rules to more advanced technologies, such as data mining and artificial intelligent. This chapter provides the basic concepts of exception mining in stock market. Then the recent advances of exception mining in this domain are presented and the key issues are discussed. The advantages and disadvantages of the advanced technologies are analyzed. Furthermore, our model of OMM (Outlier Mining on Multiple time series) is introduced. Finally, this chapter points out the future research directions and related issues in reality.
Zhao, Y, Zhang, H, Cao, L, Bohlscheid, H, Ou, Y & Zhang, C 2009, 'Data Mining Applications in Social Security' in Cao, L, Yu, PS, Zhang, C & Zhang, H (eds), Data Mining for Business Applications, Springer, New York, USA, pp. 81-96.View/Download from: UTS OPUS or Publisher's site
This chapter presents four applications of data mining in social security. The first is an application of decision tree and association rules to find the demographic patterns of customers. Sequence mining is used in the second application to find activity sequence patterns related to debt occurrence. In the third application, combined association rules are mined from heterogeneous data sources to discover patterns of slow payers and quick payers. In the last application, clustering and analysis of variance are employed to check the effectiveness of a new policy.
Li, M, Li, J, Ou, Y, Zhang, Y, Luo, D, Bahtia, M & Cao, L 2012, 'Coupled K-nearest centroid classification for non-iid data', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Transactions on Computational Collective Intelligence XV: International Conference on Practical Applications on Agents and Multi-Agent Systems, Springer Verlag, Salamanca, pp. 89-100.View/Download from: Publisher's site
Most traditional classification methods assume the independence and identical distribution (iid) of objects, attributes and values. However, real world data, such as multi-agent data and behavioral data, usually contains strong couplings among values, attributes and objects, which greatly challenges existing methods and tools. This work targets the coupling similarities from these three perspectives and designs a novel classification method that applies a weighted K-Nearest Centroid to obtain the coupled similarity for non-iid data. From value and attribute perspectives, coupled similarity serves as a metric for nominal objects, which consider not only intra-coupled similarity within an attribute but also inter-coupled similarity between attributes. From the object perspective, we propose a more effective method that measures the centroid object by connecting all related objects. Extensive experiments on UCI and student data sets reveal that the proposed method outperforms classical methods for higher accuracy, especially in imbalanced data.
Li, M, Li, J, Ou, Y, Zhang, Y, Luo, D, Bahtia, M & Cao, L 2014, 'Learning heterogeneous coupling relationships between non-IID terms', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Workshop on Agents and Data Mining Interaction, Springer, Saint Paul, MN, pp. 79-91.View/Download from: Publisher's site
With the rapid proliferation of social media and online community, a vast amount of text data has been generated. Discovering the insightful value of the text data has increased its importance, a variety of text mining and process algorithms have been created in the recent years such as classification, clustering, similarity comparison. Most previous research uses a vector-space model for text representation and analysis. However, the vector-space model does not utilise the information about the relationships between the term to term. Moreover, the classic classification methods also ignore the relationships between each text document to another. In other word, the traditional text mining techniques assume the relation between terms and between documents are independent and identically distributed (iid). In this paper, we will introduce a novel term representation by involving the coupled relations from term to term. This coupled representation provides much richer information that enables us to create a coupled similarity metric for measuring document similarity, and a coupled document similarity based K-Nearest centroid classifier will be applied to the classification task. Experiments verify the proposed approach outperforming the classic vector-space based classifier, and show potential advantages and richness in exploring the other text mining tasks. © 2014 Springer-Verlag.
Zhu, X, Yu, Y, Ou, Y, Luo, D, Zhang, C & Chen, J 2013, 'System modeling of a smart-home healthy lifestyle assistant', Lecture Notes in Computer Science, International Workshop on Agents and Data Mining Interaction, Springer, Valencia, Spain, pp. 65-78.View/Download from: UTS OPUS or Publisher's site
A system modeling is presented for a Smart-home Healthy Lifestyle Assistant System (SHLAS), covering healthy lifestyle promotion by intelligently collecting and analyzing context information, executing control instruction and suggesting health plans for
Dong, X, Zheng, Z, Cao, L, Zhao, Y, Zhang, C, Li, J, Wei, W & Ou, Y 2011, 'e-NSP: efficient negative sequential pattern mining based on identified positive patterns without database rescanning', Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM International Conference on Information and Knowledge Management, ACM, Glasgow, Scotland, UK, pp. 825-830.View/Download from: UTS OPUS or Publisher's site
Mining Negative Sequential Patterns (NSP) is much more challenging than mining Positive Sequential Patterns (PSP) due to the high computational complexity and huge search space required in calculating Negative Sequential Candidates (NSC). Very few approaches are available for mining NSP, which mainly rely on re-scanning databases after identifying PSP. As a result, they are very ine?cient. In this paper, we propose an e?cient algorithm for mining NSP, called e-NSP, which mines for NSP by only involving the identi?ed PSP, without re-scanning databases. First, negative containment is de?ned to determine whether or not a data sequence contains a negative sequence. Second, an e?cient approach is proposed to convert the negative containment problem to a positive containment problem. The supports of NSC are then calculated based only on the corresponding PSP. Finally, a simple but e?cient approach is proposed to generate NSC. With e-NSP, mining NSP does not require additional database scans, and the existing PSP mining algorithms can be integrated into e-NSP to mine for NSP e?ciently. eNSP is compared with two currently available NSP mining algorithms on 14 synthetic and real-life datasets. Intensive experiments show that e-NSP takes as little as 3% of the runtime of the baseline approaches and is applicable for efficient mining of NSP in large datasets.
Wang, C, Cao, L, Li, J, Wei, W, Ou, Y & Wang, M 2011, 'Coupled Nominal Similarity in Unsupervised Learning', Proceedings of the 20th ACM international conference on Information and knowledge management, ACM International Conference on Information and Knowledge Management, ACM, Glasgow, UK, pp. 973-978.View/Download from: UTS OPUS or Publisher's site
The similarity between nominal objects is not straightforward, especially in unsupervised learning. This paper proposes coupled similarity metrics for nominal objects, which consider not only intra-coupled similarity within an attribute (i.e., value frequency distribution) but also inter-coupled similarity between attributes (i.e. feature dependency aggregation). Four metrics are designed to calculate the inter-coupled similarity between two categorical values by considering their relationships with other attributes. The theoretical analysis reveals their equivalent accuracy and superior efficiency based on intersection against others, in particular for large-scale data. Substantial experiments on extensive UCI data sets verify the theoretical conclusions. In addition, experiments of clustering based on the derived dissimilarity metrics show a significant performance improvement.
Cao, L, Ou, Y, Yu, P & Wei, G 2010, 'Detecting Abnormal Coupled Sequences and Sequence Changes in Group-based Manipulative Trading Behaviors', Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data, ACM SIGKDD International Conference on Knowledge Discovery and Data, ACM, Washington DC, DC, USA, pp. 85-93.View/Download from: UTS OPUS or Publisher's site
In capital market surveillance, an emerging trend is that a group of hidden manipulators collaborate with each other to manipulate three trading sequences: buy-orders, sell-orders and trades, through carefully arranging their prices, volumes and time, in order to mislead other investors, affect the instrument movement, and thus maximize personal benefits. If the focus is on only one of the above three sequences in attempting to analyze such hidden group based behavior, or if they are merged into one sequence as per an investor, the coupling relationships among them indicated through trading actions and their prices/volumes/times would be missing, and the resulting findings would have a high probability of mismatching the genuine fact in business. Therefore, typical sequence analysis approaches, which mainly identify patterns on a single sequence, cannot be used here. This paper addresses a novel topic, namely coupled behavior analysis in hidden groups. In particular, we propose a coupled Hidden Markov Models (HMM)-based approach to detect abnormal group-based trading behaviors. The resulting models cater for (1) multiple sequences from a group of people, (2) interactions among them, (3) sequence item properties, and (4) significant change among coupled sequences. We demonstrate our approach in detecting abnormal manipulative trading behaviors on orderbook-level stock data. The results are evaluated against alerts generated by the exchange's surveillance system from both technical and computational perspectives. It shows that the proposed coupled and adaptive HMMs outperform a standard HMM only modeling any single sequence, or the HMM combining multiple single sequences, without considering the coupling relationship. Further work on coupled behavior analysis, including coupled sequence/event analysis, hidden group analysis and behavior dynamics are very critical.
Luo, C, Zhao, Y, Cao, L, Ou, Y & Liu, L 2008, 'Outlier Mining on Multiple Time Series Data in Stock Market', PRICAI 2008: Trends in Artificial Intelligence, Pacific Rim International Conference on Artificial Intelligence, Springer, Hanoi, Vietnam, pp. 1010-1015.View/Download from: UTS OPUS or Publisher's site
In stock market, the key surveillance function is identifying market anomalies, such as insider trading and market manipulation, to provide a fair and efficient trading platform [2,6]. Insider trading refers to the trades on privileged information unavailable to the public . Market manipulation refers to the trade or action which aims to interfere with the demand or supply of a given stock to make the price increase or decrease in a particular way . Recently, new intelligent technologies are required to deal with the challenges of the rapid increase of stock data. Outlier mining technologies have been used to detect market manipulation and insider trading . The objective of outlier mining is to find the data objects which are grossly different from or inconsistent with the majority of data. However, in stock market data, outliers are highly intermixed with normal data  and it is difficult to judge whether an object is an outlier or not. Therefore, a more effective and more efficient approach is in demand. This paper presents a new technique for outlier detection on multiple time series data in stock market. At first, principal curve algorithm is used to detect the outliers from individual measurements of stock market. Then, the generated outliers are measured with the probability of being real alerts. To improve the accuracy and precision, these outliers are combined by some rules associated with the domain knowledge. The experimental results on real stock market data show that the proposed model is feasible in practice and achieves a higher accuracy and precision than traditional methods
Luo, C, Zhao, Y, Cao, L, Ou, Y & Zhang, C 2008, 'Exception Mining on Multiple Time Series in Stock Market', 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM international Conference on Web Intelligence and Intelligent Agent Technology, Springer, Sydney, Australia, pp. 690-693.View/Download from: UTS OPUS or Publisher's site
This paper presents our research on exception mining on multiple time series data which aims to assist stock market surveillance by identifying market anomalies. Traditional technologies on stock market surveillance have shown their limitations to handle large amount of complicated stock market data. In our research, the Outlier Mining on Multiple time series (OMM) is proposed to improve the effectiveness of exception detection for stock market surveillance. The idea of our research is presented, challenges on the research are analyzed, and potential research directions are summarized.
Ou, Y, Cao, L, Luo, C & Liu, L 2008, 'Mining Exceptional Activity Patterns in Microstructure Data', 2008 IEEE/WIC/ACM international Conference on Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM international Conference on Web Intelligence and Intelligent Agent Technology, IEEE Computer Society, University of Technology, Sydney, Australia, pp. 884-887.View/Download from: UTS OPUS or Publisher's site
Market Surveillance plays an important role in maintaining market integrity, transparency and fairnesss. The existing trading pattern analysis only focuses on interday data which discloses explicit and high-level market dynamics. In the mean time, the existing market surveillance systems are facing challenges of misuse, mis-disclosure and misdealing of information, announcement and order in one market or crossing multiple markets. Therefore, there is a crucial need to develop workable methods for smart surveillance. To deal with such issues, we propose an innovative methodology -- microstructure activity pattern analysis. Based on this methodology, a case study in identifying exceptional microstructure activity patterns is carried out. The experiments on real-life stock data show that microstructure activity pattern analysis opens a new and effective means for crucially understanding and analysing market dynamics. The resulting findings such as exceptional microstructure activity patterns can greatly enhance the learning, detection, adaption and decision-making capability of market surveillance.
Ou, Y, Cao, L, Luo, C & Zhang, C 2008, 'Domain-Driven Local Exceptional Pattern Mining for Detecting Stock Price Manipulation', Lecture Notes in Computer Science Vol 5351: PRICAI 2008: Trends in Artificial Intelligence, Pacific Rim International Conference on Artificial Intelligence, Springer, Hanoi,Vietnam, pp. 849-858.View/Download from: UTS OPUS or Publisher's site
Recently, a new data mining methodology, Domain Driven Data Mining (D3M), has been developed. On top of data-centered pattern mining, D3M generally targets the actionable knowledge discovery under domain-specific circumstances. It strongly appreciates the involvement of domain intelligence in the whole process of data mining, and consequently leads to the deliverables that can satisfy business user needs and decision-making. Following the methodology of D3M, this paper investigates local exceptional patterns in real-life microstructure stock data for detecting stock price manipulations. Different from existing pattern analysis mainly on interday data, we deal with tick-by-tick data. Our approach proposes new mechanisms for constructing microstructure order sequences by involving domain factors and business logics, and for measuring the interestingness of patterns from business concern perspective. Real-life data experiments on an exchange data demonstrate that the outcomes generated by following D3M can satisfy business expectations and support business users to take actions for market surveillance.
Cao, L, Zhao, Y, Figueiredo, F, Ou, Y & Luo, D 2007, 'Mining High Impact Exceptional Behavior Patterns', Emerging Technologies in Knowledge Discovery and Data Mining: Revised Selected Papers of PAKDD 2007 International Workshops, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Nanjing, China, pp. 56-63.View/Download from: UTS OPUS or Publisher's site
In the real world, exceptional behavior can be seen in many situations such as security-oriented fields. Such behavior is rare and dispersed, while some of them may be associated with significant impact on the society. A typical example is the event September 11. The key feature of the above rare but significant behavior is its high potential to be linked with some significant impact. Identifying such particular behavior before generating impact on the world is very important. In this paper, we develop several types of high impact exceptional behavior patterns. The patterns include frequent behavior patterns which are associated with either positive or negative impact, and frequent behavior patterns that lead to both positive and negative impact. Our experiments in mining debt-associated customer behavior in social-security areas show the above approaches are useful in identifying exceptional behavior to deeply understand customer behavior and streamline business process.
Ou, Y, Cao, L, Yu, T & Zhang, C 2007, 'Detecting Turning Points of Trading Price and Return Volatility for Market', Workshop on Agents & Data Mining Interaction (ADMI 2007), International Workshop on Agents and Data Mining Interaction, IEEE Computer Soc, San Jose, pp. 491-494.View/Download from: UTS OPUS or Publisher's site
Trading agent concept is very useful for trading strategy design and market mechanism design. In this paper, we introduce the use of trading agent for market surveillance. Market surveillance agents can be developed for market surveillance officers and management teams to present them alerts and indicators of abnormal market movements. In particular, we investigate the strategies for market surveillance agents to detect the impact of company announcements on market movements. This paper examines the performance of segmentation on the time series of trading price and return volatility, respectively. The purpose of segmentation is to detect the turning points of market movements caused by announcements, which are useful to identify the indicators of insider trading. The experimental results indicate that the segmentation on the time series of return volatility outperforms that on the time series of trading price. It is easier to detect the turning points of return volatility than the turning points of trading price. The results will be used to code market surveillance agents for them to monitor abnormal market movements before the disclosure of market sensitive announcements. In this way, the market surveillance agents can assist market surveillance officers with indicators and alerts.
Zhao, Y, Cao, L, Morrow, YK, Ou, Y, Ni, J & Zhang, C 2006, 'Discovering debtor patterns of Centrelink customers', Data mining 2006; Proceedings of AusDM 2006, Australian Data Mining Conference, ACS Inc, Sydney, Australia, pp. 135-144.View/Download from: UTS OPUS
Zhang, S, Liu, L, Lu, J & Ou, Y 2004, 'Is minimum-support appropriate to identifying large itemsets?', Pricai 2004: Trends In Artificial Intelligence, Proceedings, Pacific Rim International Conference on Artificial Intelligence, Springer-Verlag Berlin, Auckland, New Zealand, pp. 474-484.View/Download from: UTS OPUS