Collective Learning and Big Data Application
Seminar Chairman: Assoc Professor Jinyan Li, Advanced Analytics Institute (AAi), UTS
Abstract
Much of the real-world data have complex dependencies between the individual tuples. For example, the chance that a patient has a particular disease depends on the prevalence of the disease in the immediate neighbourhood. One approach to handling such linked data is Collective Learning. In collective learning one deals with a set of data points taken at a time. The dependencies between the data points are modeled as a graph, with the nodes representing the tuples and the edges between them representing the influence of the tuples on one another. A variety of domains lend themselves naturally to such graph-based modeling. There have been a variety of collective learning and inferencing approaches that have been proposed in the literature. In this talk, I will give a brief introduction to collective learning and describe two applications.
The first of these is a sentiment analysis task. Sentiment analysis is the task of identifying the sentiment expressed in the given piece of text about the target entity under discussion. In this work we look at the problem of analyzing sentiments at different granularities. For example, we want to analyze sentiment about a movie as whole as well as about the acting, directing, etc. Models built for such multi grain sentiment analysis assume fully labeled corpus at fine grained level or coarse grained level or both. Huge amount of online reviews are not fully labeled at any of the levels, but are partially labeled at both the levels. We propose a multi grain collective classification framework to not only exploit the information available at all the levels but also use intra dependencies at each level and inter dependencies between the levels. We demonstrate empirically that the proposed framework enables better performance at both the levels compared to baseline approaches. Part of this work was reported in ECAI 2010 and is joint work with S. Shivashankar and Shamshu Dharwez.
The second task is that of functional site prediction in proteins. Functional site prediction is an important problem in the structural genomics era where we have a large number of experimentally determined protein structures with unknown function. The functional sites provide useful insights into protein function. In this paper, we propose a method for prediction of functional residues in a given protein from its three-dimensional (3D) structure. Our method exploits correlation between labels of interacting residues to obtain significant performance improvements over the existing methods on the benchmark dataset. We represent each protein as a weighted undirected residue interaction network, where spatially proximal residues in terms of their Van der Waal radii are connected by an edge. The edge weight captures correlation between the labels of interacting residues. The correlation is estimated based on the features of interacting residues. We then obtain a label assignment by minimizing combined cost of residue-wise label misclassification and vio- lation of label correlation constraints. We solve this problem in two stages, where the first stage minimizes residue-wise label misclassification cost followed by an iterative collective inference scheme that adjusts the labels predicted in the first stage so as to minimize the correlation constraint violations. Our approach significantly outperforms state of the art methods on standard benchmark dataset. This work was reported in ACM BCB 2012 and is joint work with Ashish V. Tendulkar, Saradindu Kar and Deepak Vijayakeerthi.
Short biography of the Speaker
Balaraman Ravindran is an associate professor at the Department of Computer Science and Engineering at the Indian Institute of Technology Madras. He completed his Ph.D. at the Department of Computer Science, University of Massachusetts, Amherst. He worked with Prof. Andrew G. Barto on an algebraic framework for abstraction in Reinforcement Learning.
His current research interests span the broader area of machine learning, ranging from Spatio-temporal Abstractions in Reinforcement Learning to social network analysis and Data/Text Mining. Much of the work in his group is directed toward understanding interactions and learning from them.
Overview to AAI seminar series
The Advanced Analytics Seminar Series presents the latest theoretical advancement and empirical experience in a broad range of interdisciplinary and business-oriented analytics fields. It covers topics related to data mining, machine learning, statistics, bioinformatics, behavior informatics, marketing analytics and multimedia analytics. It also provides a platform for the showcase of commercial products in ubiquitous advanced analytics. Speakers are invited from both academia and industry. It opens regularly on every Friday afternoon at the garden-like UTS Blackfriars Campus. You are warmly welcome to attend this seminar series.
Jinyan Li, Seminar Coordinator, Associate Professor
Advanced Analytics Institute, School of Software, Faculty of Engineering and IT
University of Technology, Sydney
P.O. Box 123, Broadway, NSW 2007, Australia
Tel: 02 9514-9264 (office)