Wang, C, Cao, L, Gaussier, E, Li, J, Ou, Y & Luo, D 2014, 'Coupled Behavior Representation, Modeling, Analysis, and Reasoning', IEEE Intelligent Systems, vol. 29, no. 4, pp. 66-69.View/Download from: UTS OPUS or Publisher's site
Behavior refers to the action, reaction,
or property of an entity, human or otherwise,
to situations or stimuli in its
environment.1 The in-depth analysis
of behavior has been increasingly recognized
as a crucial means for understanding
and disclosing interior driving
forces and intrinsic cause-effects
on business and social applications,
including Web community analysis,
counter-terrorism, fraud detection,
and customer relationship management.
With the deepening and widening
of social/business intelligences and
their networking, the concept of behavior
is in great demand to be consolidated
and formalized to deeply
scrutinize the native behavior intention,
lifecycle, and impact on complex
problems and business issues.
Although there's an emerging focus
on deep behavior studies, such as social
network analysis,2 periodic behavior
analysis3 and behavior informatics
approach,1 previous research work has
mainly focused on individual behaviors
without considering the interactions of
them. However, with increasing network
and community-based events
as well as their applications, such as
group-based crime and social network
interactions, coupling relationships between
behaviors contribute to the intrinsic
causes and impacts of eventual
business and social problems. In the real-world applications, group behavior
interactions (that is, coupled behaviors)
are widely seen in natural, social,
and artificial behavior-related problems.
Complex behavior and social applications
often exhibit strong explicit
or implicit coupling relationships both
between their entities and properties.
Moreover, it's also quite difficult to
model, analyze, and check behaviors
coupled with one another due to the
complexity from data, domain, context,
and impact perspectives.
Due to the emerging popularity and
importance of coupled behaviors, the
representation, modeling, analysis,
mining and learning, and determination
of coupled behaviors are becoming
increasingly essential yet challenging
Wei, W, Li, J, Cao, L, Ou, Y & Chen, J 2013, 'Effective Detection of Sophisticated Online Banking Fraud in Extremely Imbalanced Data', World Wide Web, vol. 16, no. 4, pp. 449-475.View/Download from: UTS OPUS or Publisher's site
Sophisticated online banking fraud reflects the integrative abuse of resources in social, cyber and physical worlds. Its detection is a typical use case of the broad-based Wisdom Web of Things (W2T) methodology. However, there is very limited information available to distinguish dynamic fraud from genuine customer behavior in such an extremely sparse and imbalanced data environment, which makes the instant and effective detection become more and more important and challenging. In this paper, we propose an effective online banking fraud detection framework that synthesizes relevant resources and incorporates several advanced data mining techniques. By building a contrast vector for each transaction based on its customerâs historical behavior sequence, we profile the differentiating rate of each current transaction against the customerâs behavior preference. A novel algorithm, ContrastMiner, is introduced to efficiently mine contrast patterns and distinguish fraudulent from genuine behavior, followed by an effective pattern selection and risk scoring that combines predictions from different models. Results from experiments on large-scale real online banking data demonstrate that our system can achieve substantially higher accuracy and lower alert volume than the latest benchmarking fraud detection system incorporating domain knowledge and traditional fraud detection methods.
Cao, L, Ou, Y & Yu, P 2012, 'Coupled Behavior Analysis With Applications', IEEE Transactions On Knowledge And Data Engineering, vol. 24, no. 8, pp. 1378-1392.View/Download from: UTS OPUS or Publisher's site
Coupled behaviors refer to the activities of one to many actors who are associated with each other in terms of certain relationships. With increasing network and community-based events and applications, such as group-based crime and social network intera
Yu, JX, Ou, Y, Zhang, C & Zhang, S 2005, 'Identifying interesting visitors through Web log classification', IEEE Intelligent Systems, vol. 20, no. 3, pp. 55-59.View/Download from: UTS OPUS or Publisher's site
Web site owners have trouble identifying customer purchasing patterns from their Web logs because the two aren't directly related. Thus, organizations must understand their customers' behavior, preferences, and future needs. This imperative leads many companies to develop a great many e-service systems for data collection and analysis. Web mining is a popular technique for analyzing visitor activities in e-service systems. It mainly includes Web text mining, Web structure mining and Web log mining. Our Web log mining approach classifies a particular site's visitors into different groups on the basis of their purchase interest.
Li, M, Li, J, Ou, Y & Luo, D 2015, 'A coupled similarity kernel for pairwise support vector machine' in Agents and Data Mining Interaction (LNCS), Springer, Germany, pp. 114-123.View/Download from: Publisher's site
© 2015 Springer International Publishing Switzerland. Support vector machine is a supervised learning model with associated learning algorithms that analyzes data and recognizes patterns. In various applications, the SVM shows its advantage of the classification performance, however, the original SVM was designed for the numerical data. For using the SVM on the nominal data, most previous research used a certain number to replace each nominal value or transformed the nominal value into the one hot vector. Both methods could not present the original nominal data's structure and the similarity between them, which leads to information loss from the data and reduce the classification performance. In this work, we design a novel coupled similarity metric between nominally attributed data. This metric is pairwise, we also propose an adapted SVMwhich can handle this. The experiment result shows the proposed method outperforms the traditional SVM and other popular classification methods on various public data sets.
Luo, C, Zhao, Y, Luo, D, Ou, Y & Liu, L 2010, 'Recent Advances of Exception Mining in Stock Market' in Pedro Furtado (ed), Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions, IGI Global, Washington, DC, USA, pp. 212-232.View/Download from: UTS OPUS or Publisher's site
This chapter aims to provide a comprehensive survey of the current advanced technologies of exception mining in stock market. The stock market surveillance is to identify market anomalies so as to provide a fair and efficient trading platform. The technologies of market surveillance developed from simple statistical rules to more advanced technologies, such as data mining and artificial intelligent. This chapter provides the basic concepts of exception mining in stock market. Then the recent advances of exception mining in this domain are presented and the key issues are discussed. The advantages and disadvantages of the advanced technologies are analyzed. Furthermore, our model of OMM (Outlier Mining on Multiple time series) is introduced. Finally, this chapter points out the future research directions and related issues in reality.
Li, M, Li, J, Ou, Y, Zhang, Y, Luo, D, Bahtia, M & Cao, L 2014, 'Learning heterogeneous coupling relationships between non-IID terms', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Workshop on Agents and Data Mining Interaction, Springer, Saint Paul, MN, pp. 79-91.View/Download from: Publisher's site
With the rapid proliferation of social media and online community, a vast amount of text data has been generated. Discovering the insightful value of the text data has increased its importance, a variety of text mining and process algorithms have been created in the recent years such as classification, clustering, similarity comparison. Most previous research uses a vector-space model for text representation and analysis. However, the vector-space model does not utilise the information about the relationships between the term to term. Moreover, the classic classification methods also ignore the relationships between each text document to another. In other word, the traditional text mining techniques assume the relation between terms and between documents are independent and identically distributed (iid). In this paper, we will introduce a novel term representation by involving the coupled relations from term to term. This coupled representation provides much richer information that enables us to create a coupled similarity metric for measuring document similarity, and a coupled document similarity based K-Nearest centroid classifier will be applied to the classification task. Experiments verify the proposed approach outperforming the classic vector-space based classifier, and show potential advantages and richness in exploring the other text mining tasks. © 2014 Springer-Verlag.
Li, M, Li, J, Ou, Y, Zhang, Y, Luo, D, Bahtia, M & Cao, L 2012, 'Coupled K-nearest centroid classification for non-iid data', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Transactions on Computational Collective Intelligence XV: International Conference on Practical Applications on Agents and Multi-Agent Systems, Springer Verlag, Salamanca, pp. 89-100.View/Download from: Publisher's site
Most traditional classification methods assume the independence and identical distribution (iid) of objects, attributes and values. However, real world data, such as multi-agent data and behavioral data, usually contains strong couplings among values, attributes and objects, which greatly challenges existing methods and tools. This work targets the coupling similarities from these three perspectives and designs a novel classification method that applies a weighted K-Nearest Centroid to obtain the coupled similarity for non-iid data. From value and attribute perspectives, coupled similarity serves as a metric for nominal objects, which consider not only intra-coupled similarity within an attribute but also inter-coupled similarity between attributes. From the object perspective, we propose a more effective method that measures the centroid object by connecting all related objects. Extensive experiments on UCI and student data sets reveal that the proposed method outperforms classical methods for higher accuracy, especially in imbalanced data.
Zhu, X, Yu, Y, Ou, Y, Luo, D, Zhang, C & Chen, J 2013, 'System modeling of a smart-home healthy lifestyle assistant', Lecture Notes in Computer Science, International Workshop on Agents and Data Mining Interaction, Springer, Valencia, Spain, pp. 65-78.View/Download from: UTS OPUS or Publisher's site
A system modeling is presented for a Smart-home Healthy Lifestyle Assistant System (SHLAS), covering healthy lifestyle promotion by intelligently collecting and analyzing context information, executing control instruction and suggesting health plans for
Wang, C, Cao, L, Li, J, Wei, W, Ou, Y & Wang, M 2011, 'Coupled Nominal Similarity in Unsupervised Learning', Proceedings of the 20th ACM international conference on Information and knowledge management, ACM International Conference on Information and Knowledge Management, ACM, Glasgow, UK, pp. 973-978.View/Download from: UTS OPUS or Publisher's site
The similarity between nominal objects is not straightforward, especially in unsupervised learning. This paper proposes coupled similarity metrics for nominal objects, which consider not only intra-coupled similarity within an attribute (i.e., value frequency distribution) but also inter-coupled similarity between attributes (i.e. feature dependency aggregation). Four metrics are designed to calculate the inter-coupled similarity between two categorical values by considering their relationships with other attributes. The theoretical analysis reveals their equivalent accuracy and superior efficiency based on intersection against others, in particular for large-scale data. Substantial experiments on extensive UCI data sets verify the theoretical conclusions. In addition, experiments of clustering based on the derived dissimilarity metrics show a significant performance improvement.
Dong, X, Zheng, Z, Cao, L, Zhao, Y, Zhang, C, Li, J, Wei, W & Ou, Y 2011, 'e-NSP: efficient negative sequential pattern mining based on identified positive patterns without database rescanning', Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM International Conference on Information and Knowledge Management, ACM, Glasgow, Scotland, UK, pp. 825-830.View/Download from: UTS OPUS or Publisher's site
Mining Negative Sequential Patterns (NSP) is much more challenging than mining Positive Sequential Patterns (PSP) due to the high computational complexity and huge search space required in calculating Negative Sequential Candidates (NSC). Very few approaches are available for mining NSP, which mainly rely on re-scanning databases after identifying PSP. As a result, they are very ine?cient. In this paper, we propose an e?cient algorithm for mining NSP, called e-NSP, which mines for NSP by only involving the identi?ed PSP, without re-scanning databases. First, negative containment is de?ned to determine whether or not a data sequence contains a negative sequence. Second, an e?cient approach is proposed to convert the negative containment problem to a positive containment problem. The supports of NSC are then calculated based only on the corresponding PSP. Finally, a simple but e?cient approach is proposed to generate NSC. With e-NSP, mining NSP does not require additional database scans, and the existing PSP mining algorithms can be integrated into e-NSP to mine for NSP e?ciently. eNSP is compared with two currently available NSP mining algorithms on 14 synthetic and real-life datasets. Intensive experiments show that e-NSP takes as little as 3% of the runtime of the baseline approaches and is applicable for efficient mining of NSP in large datasets.
Luo, C, Zhao, Y, Cao, L, Ou, Y & Liu, L 2008, 'Outlier Mining on Multiple Time Series Data in Stock Market', PRICAI 2008: Trends in Artificial Intelligence, Pacific Rim International Conference on Artificial Intelligence, Springer, Hanoi, Vietnam, pp. 1010-1015.View/Download from: UTS OPUS or Publisher's site
In stock market, the key surveillance function is identifying market anomalies, such as insider trading and market manipulation, to provide a fair and efficient trading platform [2,6]. Insider trading refers to the trades on privileged information unavailable to the public . Market manipulation refers to the trade or action which aims to interfere with the demand or supply of a given stock to make the price increase or decrease in a particular way . Recently, new intelligent technologies are required to deal with the challenges of the rapid increase of stock data. Outlier mining technologies have been used to detect market manipulation and insider trading . The objective of outlier mining is to find the data objects which are grossly different from or inconsistent with the majority of data. However, in stock market data, outliers are highly intermixed with normal data  and it is difficult to judge whether an object is an outlier or not. Therefore, a more effective and more efficient approach is in demand. This paper presents a new technique for outlier detection on multiple time series data in stock market. At first, principal curve algorithm is used to detect the outliers from individual measurements of stock market. Then, the generated outliers are measured with the probability of being real alerts. To improve the accuracy and precision, these outliers are combined by some rules associated with the domain knowledge. The experimental results on real stock market data show that the proposed model is feasible in practice and achieves a higher accuracy and precision than traditional methods
Ou, Y, Cao, L, Luo, C & Zhang, C 2008, 'Domain-Driven Local Exceptional Pattern Mining for Detecting Stock Price Manipulation', Lecture Notes in Computer Science Vol 5351: PRICAI 2008: Trends in Artificial Intelligence, Pacific Rim International Conference on Artificial Intelligence, Springer, Hanoi,Vietnam, pp. 849-858.View/Download from: UTS OPUS or Publisher's site
Recently, a new data mining methodology, Domain Driven Data Mining (D3M), has been developed. On top of data-centered pattern mining, D3M generally targets the actionable knowledge discovery under domain-specific circumstances. It strongly appreciates the involvement of domain intelligence in the whole process of data mining, and consequently leads to the deliverables that can satisfy business user needs and decision-making. Following the methodology of D3M, this paper investigates local exceptional patterns in real-life microstructure stock data for detecting stock price manipulations. Different from existing pattern analysis mainly on interday data, we deal with tick-by-tick data. Our approach proposes new mechanisms for constructing microstructure order sequences by involving domain factors and business logics, and for measuring the interestingness of patterns from business concern perspective. Real-life data experiments on an exchange data demonstrate that the outcomes generated by following D3M can satisfy business expectations and support business users to take actions for market surveillance.
Luo, C, Zhao, Y, Cao, L, Ou, Y & Zhang, C 2008, 'Exception Mining on Multiple Time Series in Stock Market', 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM international Conference on Web Intelligence and Intelligent Agent Technology, Springer, Sydney, Australia, pp. 690-693.View/Download from: UTS OPUS or Publisher's site
This paper presents our research on exception mining on multiple time series data which aims to assist stock market surveillance by identifying market anomalies. Traditional technologies on stock market surveillance have shown their limitations to handle large amount of complicated stock market data. In our research, the Outlier Mining on Multiple time series (OMM) is proposed to improve the effectiveness of exception detection for stock market surveillance. The idea of our research is presented, challenges on the research are analyzed, and potential research directions are summarized.
Ou, Y, Cao, L, Yu, T & Zhang, C 2007, 'Detecting Turning Points of Trading Price and Return Volatility for Market', Workshop on Agents & Data Mining Interaction (ADMI 2007), International Workshop on Agents and Data Mining Interaction, IEEE Computer Soc, San Jose, pp. 491-494.View/Download from: UTS OPUS or Publisher's site
Trading agent concept is very useful for trading strategy design and market mechanism design. In this paper, we introduce the use of trading agent for market surveillance. Market surveillance agents can be developed for market surveillance officers and management teams to present them alerts and indicators of abnormal market movements. In particular, we investigate the strategies for market surveillance agents to detect the impact of company announcements on market movements. This paper examines the performance of segmentation on the time series of trading price and return volatility, respectively. The purpose of segmentation is to detect the turning points of market movements caused by announcements, which are useful to identify the indicators of insider trading. The experimental results indicate that the segmentation on the time series of return volatility outperforms that on the time series of trading price. It is easier to detect the turning points of return volatility than the turning points of trading price. The results will be used to code market surveillance agents for them to monitor abnormal market movements before the disclosure of market sensitive announcements. In this way, the market surveillance agents can assist market surveillance officers with indicators and alerts.
Cao, L, Zhao, Y, Figueiredo, F, Ou, Y & Luo, D 2007, 'Mining High Impact Exceptional Behavior Patterns', Emerging Technologies in Knowledge Discovery and Data Mining: Revised Selected Papers of PAKDD 2007 International Workshops, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Nanjing, China, pp. 56-63.View/Download from: UTS OPUS or Publisher's site
In the real world, exceptional behavior can be seen in many situations such as security-oriented fields. Such behavior is rare and dispersed, while some of them may be associated with significant impact on the society. A typical example is the event September 11. The key feature of the above rare but significant behavior is its high potential to be linked with some significant impact. Identifying such particular behavior before generating impact on the world is very important. In this paper, we develop several types of high impact exceptional behavior patterns. The patterns include frequent behavior patterns which are associated with either positive or negative impact, and frequent behavior patterns that lead to both positive and negative impact. Our experiments in mining debt-associated customer behavior in social-security areas show the above approaches are useful in identifying exceptional behavior to deeply understand customer behavior and streamline business process.
Zhang, S, Liu, L, Lu, J & Ou, Y 2004, 'Is minimum-support appropriate to identifying large itemsets?', Pricai 2004: Trends In Artificial Intelligence, Proceedings, Pacific Rim International Conference on Artificial Intelligence, Springer-Verlag Berlin, Auckland, New Zealand, pp. 474-484.View/Download from: UTS OPUS