Wei, W, Li, J, Cao, L, Ou, Y & Chen, J 2013, 'Effective Detection of Sophisticated Online Banking Fraud in Extremely Imbalanced Data', World Wide Web, vol. 16, no. 4, pp. 449-475.View/Download from: UTS OPUS or Publisher's site
Sophisticated online banking fraud reflects the integrative abuse of resources in social, cyber and physical worlds. Its detection is a typical use case of the broad-based Wisdom Web of Things (W2T) methodology. However, there is very limited information available to distinguish dynamic fraud from genuine customer behavior in such an extremely sparse and imbalanced data environment, which makes the instant and effective detection become more and more important and challenging. In this paper, we propose an effective online banking fraud detection framework that synthesizes relevant resources and incorporates several advanced data mining techniques. By building a contrast vector for each transaction based on its customerâs historical behavior sequence, we profile the differentiating rate of each current transaction against the customerâs behavior preference. A novel algorithm, ContrastMiner, is introduced to efficiently mine contrast patterns and distinguish fraudulent from genuine behavior, followed by an effective pattern selection and risk scoring that combines predictions from different models. Results from experiments on large-scale real online banking data demonstrate that our system can achieve substantially higher accuracy and lower alert volume than the latest benchmarking fraud detection system incorporating domain knowledge and traditional fraud detection methods.
Cao, L, Ou, Y & Yu, P 2012, 'Coupled Behavior Analysis With Applications', IEEE Transactions On Knowledge And Data Engineering, vol. 24, no. 8, pp. 1378-1392.View/Download from: UTS OPUS or Publisher's site
Coupled behaviors refer to the activities of one to many actors who are associated with each other in terms of certain relationships. With increasing network and community-based events and applications, such as group-based crime and social network intera
Yu, JX, Ou, Y, Zhang, C & Zhang, S 2005, 'Identifying interesting visitors through Web log classification', IEEE Intelligent Systems, vol. 20, no. 3, pp. 55-59.View/Download from: UTS OPUS or Publisher's site
Web site owners have trouble identifying customer purchasing patterns from their Web logs because the two aren't directly related. Thus, organizations must understand their customers' behavior, preferences, and future needs. This imperative leads many companies to develop a great many e-service systems for data collection and analysis. Web mining is a popular technique for analyzing visitor activities in e-service systems. It mainly includes Web text mining, Web structure mining and Web log mining. Our Web log mining approach classifies a particular site's visitors into different groups on the basis of their purchase interest.
Li, M, Li, J, Ou, Y & Luo, D 2015, 'A coupled similarity kernel for pairwise support vector machine' in Agents and Data Mining Interaction (LNCS), Springer, Germany, pp. 114-123.View/Download from: Publisher's site
© 2015 Springer International Publishing Switzerland. Support vector machine is a supervised learning model with associated learning algorithms that analyzes data and recognizes patterns. In various applications, the SVM shows its advantage of the classification performance, however, the original SVM was designed for the numerical data. For using the SVM on the nominal data, most previous research used a certain number to replace each nominal value or transformed the nominal value into the one hot vector. Both methods could not present the original nominal data's structure and the similarity between them, which leads to information loss from the data and reduce the classification performance. In this work, we design a novel coupled similarity metric between nominally attributed data. This metric is pairwise, we also propose an adapted SVMwhich can handle this. The experiment result shows the proposed method outperforms the traditional SVM and other popular classification methods on various public data sets.
Li, M, Li, J, Ou, Y, Zhang, Y, Luo, D, Bahtia, M & Cao, L 2014, 'Learning heterogeneous coupling relationships between non-IID terms', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Workshop on Agents and Data Mining Interaction, Springer, Saint Paul, MN, pp. 79-91.View/Download from: Publisher's site
With the rapid proliferation of social media and online community, a vast amount of text data has been generated. Discovering the insightful value of the text data has increased its importance, a variety of text mining and process algorithms have been created in the recent years such as classification, clustering, similarity comparison. Most previous research uses a vector-space model for text representation and analysis. However, the vector-space model does not utilise the information about the relationships between the term to term. Moreover, the classic classification methods also ignore the relationships between each text document to another. In other word, the traditional text mining techniques assume the relation between terms and between documents are independent and identically distributed (iid). In this paper, we will introduce a novel term representation by involving the coupled relations from term to term. This coupled representation provides much richer information that enables us to create a coupled similarity metric for measuring document similarity, and a coupled document similarity based K-Nearest centroid classifier will be applied to the classification task. Experiments verify the proposed approach outperforming the classic vector-space based classifier, and show potential advantages and richness in exploring the other text mining tasks. © 2014 Springer-Verlag.
Li, M, Li, J, Ou, Y, Zhang, Y, Luo, D, Bahtia, M & Cao, L 2012, 'Coupled K-nearest centroid classification for non-iid data', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Transactions on Computational Collective Intelligence XV: International Conference on Practical Applications on Agents and Multi-Agent Systems, Springer Verlag, Salamanca, pp. 89-100.View/Download from: Publisher's site
Most traditional classification methods assume the independence and identical distribution (iid) of objects, attributes and values. However, real world data, such as multi-agent data and behavioral data, usually contains strong couplings among values, attributes and objects, which greatly challenges existing methods and tools. This work targets the coupling similarities from these three perspectives and designs a novel classification method that applies a weighted K-Nearest Centroid to obtain the coupled similarity for non-iid data. From value and attribute perspectives, coupled similarity serves as a metric for nominal objects, which consider not only intra-coupled similarity within an attribute but also inter-coupled similarity between attributes. From the object perspective, we propose a more effective method that measures the centroid object by connecting all related objects. Extensive experiments on UCI and student data sets reveal that the proposed method outperforms classical methods for higher accuracy, especially in imbalanced data.
Zhu, X, Yu, Y, Ou, Y, Luo, D, Zhang, C & Chen, J 2013, 'System modeling of a smart-home healthy lifestyle assistant', Lecture Notes in Computer Science, International Workshop on Agents and Data Mining Interaction, Springer, Valencia, Spain, pp. 65-78.View/Download from: UTS OPUS or Publisher's site
A system modeling is presented for a Smart-home Healthy Lifestyle Assistant System (SHLAS), covering healthy lifestyle promotion by intelligently collecting and analyzing context information, executing control instruction and suggesting health plans for
Wang, C, Cao, L, Li, J, Wei, W, Ou, Y & Wang, M 2011, 'Coupled Nominal Similarity in Unsupervised Learning', Proceedings of the 20th ACM international conference on Information and knowledge management, ACM International Conference on Information and Knowledge Management, ACM, Glasgow, UK, pp. 973-978.View/Download from: UTS OPUS or Publisher's site
The similarity between nominal objects is not straightforward, especially in unsupervised learning. This paper proposes coupled similarity metrics for nominal objects, which consider not only intra-coupled similarity within an attribute (i.e., value frequency distribution) but also inter-coupled similarity between attributes (i.e. feature dependency aggregation). Four metrics are designed to calculate the inter-coupled similarity between two categorical values by considering their relationships with other attributes. The theoretical analysis reveals their equivalent accuracy and superior efficiency based on intersection against others, in particular for large-scale data. Substantial experiments on extensive UCI data sets verify the theoretical conclusions. In addition, experiments of clustering based on the derived dissimilarity metrics show a significant performance improvement.
Luo, C, Zhao, Y, Cao, L, Ou, Y & Liu, L 2008, 'Outlier Mining on Multiple Time Series Data in Stock Market', PRICAI 2008: Trends in Artificial Intelligence, Pacific Rim International Conference on Artificial Intelligence, Springer, Hanoi, Vietnam, pp. 1010-1015.View/Download from: UTS OPUS or Publisher's site
In stock market, the key surveillance function is identifying market anomalies, such as insider trading and market manipulation, to provide a fair and efficient trading platform [2,6]. Insider trading refers to the trades on privileged information unavailable to the public . Market manipulation refers to the trade or action which aims to interfere with the demand or supply of a given stock to make the price increase or decrease in a particular way . Recently, new intelligent technologies are required to deal with the challenges of the rapid increase of stock data. Outlier mining technologies have been used to detect market manipulation and insider trading . The objective of outlier mining is to find the data objects which are grossly different from or inconsistent with the majority of data. However, in stock market data, outliers are highly intermixed with normal data  and it is difficult to judge whether an object is an outlier or not. Therefore, a more effective and more efficient approach is in demand. This paper presents a new technique for outlier detection on multiple time series data in stock market. At first, principal curve algorithm is used to detect the outliers from individual measurements of stock market. Then, the generated outliers are measured with the probability of being real alerts. To improve the accuracy and precision, these outliers are combined by some rules associated with the domain knowledge. The experimental results on real stock market data show that the proposed model is feasible in practice and achieves a higher accuracy and precision than traditional methods
Ou, Y, Cao, L, Luo, C & Zhang, C 2008, 'Domain-Driven Local Exceptional Pattern Mining for Detecting Stock Price Manipulation', Lecture Notes in Computer Science Vol 5351: PRICAI 2008: Trends in Artificial Intelligence, Pacific Rim International Conference on Artificial Intelligence, Springer, Hanoi,Vietnam, pp. 849-858.View/Download from: UTS OPUS or Publisher's site
Recently, a new data mining methodology, Domain Driven Data Mining (D3M), has been developed. On top of data-centered pattern mining, D3M generally targets the actionable knowledge discovery under domain-specific circumstances. It strongly appreciates the involvement of domain intelligence in the whole process of data mining, and consequently leads to the deliverables that can satisfy business user needs and decision-making. Following the methodology of D3M, this paper investigates local exceptional patterns in real-life microstructure stock data for detecting stock price manipulations. Different from existing pattern analysis mainly on interday data, we deal with tick-by-tick data. Our approach proposes new mechanisms for constructing microstructure order sequences by involving domain factors and business logics, and for measuring the interestingness of patterns from business concern perspective. Real-life data experiments on an exchange data demonstrate that the outcomes generated by following D3M can satisfy business expectations and support business users to take actions for market surveillance.
Cao, L, Zhao, Y, Figueiredo, F, Ou, Y & Luo, D 2007, 'Mining High Impact Exceptional Behavior Patterns', Emerging Technologies in Knowledge Discovery and Data Mining: Revised Selected Papers of PAKDD 2007 International Workshops, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Nanjing, China, pp. 56-63.View/Download from: UTS OPUS or Publisher's site
In the real world, exceptional behavior can be seen in many situations such as security-oriented fields. Such behavior is rare and dispersed, while some of them may be associated with significant impact on the society. A typical example is the event September 11. The key feature of the above rare but significant behavior is its high potential to be linked with some significant impact. Identifying such particular behavior before generating impact on the world is very important. In this paper, we develop several types of high impact exceptional behavior patterns. The patterns include frequent behavior patterns which are associated with either positive or negative impact, and frequent behavior patterns that lead to both positive and negative impact. Our experiments in mining debt-associated customer behavior in social-security areas show the above approaches are useful in identifying exceptional behavior to deeply understand customer behavior and streamline business process.