Marian-Andrei is interested in stochastic behavioural modelling of human actions online, at the intersection of applied statistics, artificial intelligence and social data science. His research has an inter-disciplinary focus. He lead two research grants: the first on quantifying the social influence of automatic diffusion systems in the electoral process (with social scientists) and detecting hate speech for the early prediction of mass atrocities and genocides (with political scientists).
Marian-Andrei's work was published in the most prestigious venues in the field of Data Science and Web Research, such the International World Wide Web Conference (WWW), the conference on Web Search and Data Mining (WSDM), the International Conference of the Web and Social Media (ICWSM), or the Conference on Information and Knowledge Management (CIKM). He serves as a PC member for prestigious conferences and journals, such as AAAI, WWW and ICWSM, and the Journal of Machine Learning Research.
Media attention. Marian-Andrei's work has received significant media attention, among which:
- Both the Business Insider and the ANU Reporter wrote about our findings concerning the bot influence in the 2016 US elections.
- I presented my findings concerning the privacy of Wikipedia editors to the Wikimedia Foundation (the legal entity that handles and represents Wikipedia), in the March 2016 edition of the Wikimedia Research Showcase. The showcase was live streamed on YouTube and it had an international reach to both researchers and general public.
- My Wikipedia privacy work was featured in ANU’s news media outlet.
- My work on social media popularity was covered by the ANU Reporter and NCI News.
Can supervise: YES
- Machine Learning for social media;
- Big Social Data Science: algorithms and applications;
- influence, polarisation, radicalisation through the prism of online social media;
- spatio-temporal information diffusion;
- (technical) stochastic point process modelling, epidemic models, bayesian learning.
See here for the complete list of courses taught and student projects.
Teaching. I hold a pedagogical degree in higher education and I have a teaching experience of 10 years. Overall, I have delivered more than 600 hours of lectures and tutoring for Undergraduates, Masters and Honours and I lectured in international excellent degree programs, such as the Masters Erasmus Mundus Excellence DMKM1 and the Franco-Ukrainian Masters IDSM2 (cooperation between the University Lumiere Lyon and the University of Kharkov, Ukraine).
Supervision completion. More than 45 students: 4 PhD students, 2 RA/postdoc, 1 visiting postgrad students, 5 Honours (Masters by research) students, 4 summer scholar students, more than 30 coursework masters students.
Teaching quality. For the past four years, I obtained high evaluations in ANU’s official Student Experience of Learning and Teaching (SELT) (see attached 2017 SELT evaluation of my teaching).
Diverse teaching. I taught a wide range of CS subjects (Programming, Calculus, Networking, Algorithms Design), of Machine Learning and Data Mining subjects (association rules mining, decision trees, clustering, symbolic learning, ensemble methods) and Social Media Analysis. This document details the complete list of these courses.
Rizoiu, M-A, Velcin, J & Lallich, S 2015, 'Semantic-enriched visual vocabulary construction in a weakly supervised context', INTELLIGENT DATA ANALYSIS, vol. 19, no. 1, pp. 161-185.View/Download from: Publisher's site
Rizoiu, M-A, Guille, A & Velcin, J 2015, 'CommentWatcher: An Open Source Web-based platform for analyzing discussions on web forums.', CoRR, vol. abs/1504.07459.
Rizoiu, M-A, Velcin, J & Lallich, S 2015, 'Semantic-enriched visual vocabulary construction in a weakly supervised context.', Intell. Data Anal., vol. 19, pp. 161-185.
Kim, YM, Velcin, J, Bonnevay, S & Rizoiu, MA 2015, 'Temporal multinomial mixture for instance-oriented evolutionary clustering', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9022, pp. 593-604.View/Download from: UTS OPUS
© Springer International Publishing Switzerland 2015. Evolutionary clustering aims at capturing the temporal evolution of clusters. This issue is particularly important in the context of social media data that are naturally temporally driven. In this paper, we propose a new probabilistic model-based evolutionary clustering technique. The Temporal Multinomial Mixture (TMM) is an extension of classical mixture model that optimizes feature co-occurrences in the trade-off with temporal smoothness. Our model is evaluated for two recent case studies on opinion aggregation over time. We compare four different probabilistic clustering models and we show the superiority of our proposal in the task of instance-oriented clustering.
Rizoiu, M-A, Velcin, J & Lallich, S 2014, 'How to Use Temporal-Driven Constrained Clustering to Detect Typical Evolutions.', International Journal on Artificial Intelligence Tools, vol. 23.
Rizoiu, M-A, Velcin, J & Lallich, S 2013, 'Unsupervised feature construction for improving data representation and semantics', JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, vol. 40, no. 3, pp. 501-527.View/Download from: Publisher's site
Rizoiu, M-A, Velcin, J & Lallich, S 2013, 'Unsupervised feature construction for improving data representation and semantics.', J. Intell. Inf. Syst., vol. 40, pp. 501-527.
Muşat, C, Truşan-Matu, S, Velcin, J & Rizoiu, MA 2012, 'Automatic extraction of conceptual labels from topic models', UPB Scientific Bulletin, Series C: Electrical Engineering, vol. 74, no. 2, pp. 57-68.
This work outlines a novel system that automatically extracts conceptual labels for statistically obtained topics. By creating a projection of the topic, which is a distribution over all the vocabulary words, over the WordNet ontology we succeed in associating concepts to the said groups of words. The most important contributions of this paper are connected to the validation of the role of these concepts as topical labels and the determination of correlations that emerge between the utility of these labels and the strength of the relation between the concepts and the topics.
Rizoiu, MA & Velcin, J 2011, 'Topic extraction for ontology learning' in Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances, pp. 38-60.View/Download from: Publisher's site
This chapter addresses the issue of topic extraction from text corpora for ontology learning. The first part provides an overview of some of the most significant solutions present today in the literature. These solutions deal mainly with the inferior layers of the Ontology Learning Layer Cake. They are related to the challenges of the Terms and Synonyms layers. The second part shows how these pieces can be bound together into an integrated system for extracting meaningful topics. While the extracted topics are not proper concepts as yet, they constitute a convincing approach towards concept building and therefore ontology learning. This chapter concludes by discussing the research undertaken for filling the gap between topics and concepts as well as perspectives that emerge today in the area of topic extraction. © 2011, IGI Global.
Rizoiu, M-A, Mishra, S, Kong, Q, Carman, M & Xie, L 2018, 'SIR-Hawkes: Linking Epidemic Models and Hawkes Processes to Model Diffusions in Finite Populations', WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 27th World Wide Web (WWW) Conference, ASSOC COMPUTING MACHINERY, Lyon, FRANCE, pp. 419-428.View/Download from: Publisher's site
Mishra, S, Rizoiu, MA & Xie, L 2018, 'Modeling popularity in asynchronous social media streams with recurrent neural networks', 12th International AAAI Conference on Web and Social Media, ICWSM 2018, pp. 201-210.
Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Understanding and predicting the popularity of online items is an important open problem in social media analysis. Considerable progress has been made recently in data-driven predictions, and in linking popularity to external promotions. However, the existing methods typically focus on a single source of external influence, whereas for many types of online content such as YouTube videos or news articles, attention is driven by multiple heterogeneous sources simultaneously - e.g. microblogs or traditional media coverage. Here, we propose RNN-MAS, a recurrent neural network for modeling asynchronous streams. It is a sequence generator that connects multiple streams of different granularity via joint inference. We show RNN-MAS not only outperforms the current state-of-the-art Youtube popularity prediction system by 17%, but also captures complex dynamics, such as seasonal trends of unseen influence. We define two new metrics: the promotion score quantifies the gain in popularity from one unit of promotion for a Youtube video; the loudness level captures the effects of a particular user tweeting about the video. We use the loudness level to compare the effects of a video being promoted by a single highly-followed user (in the top 1% most followed users) against being promoted by a group of mid-followed users. We find that results depend on the type of content being promoted: superusers are more successful in promoting Howto and Gaming videos, whereas the cohort of regular users are more influential for Activism videos. This work provides more accurate and explainable popularity predictions, as well as computational tools for content producers and marketers to allocate resources for promotion campaigns.
Wu, S, Rizoiu, MA & Xie, L 2018, 'Beyond views: Measuring and predicting engagement in online videos', 12th International AAAI Conference on Web and Social Media, ICWSM 2018, pp. 434-443.
Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. The share of videos in the internet traffic has been growing, therefore understanding how videos capture attention on a global scale is also of growing importance. Most current research focus on modeling the number of views, but we argue that video engagement, or time spent watching is a more appropriate measure for resource allocation problems in attention, networking, and promotion activities. In this paper, we present a first large-scale measurement of video-level aggregate engagement from publicly available data streams, on a collection of 5.3 million YouTube videos published over two months in 2016. We study a set of metrics including time and the average percentage of a video watched. We define a new metric, relative engagement, that is calibrated against video properties and strongly correlate with recognized notions of quality. Moreover, we find that engagement measures of a video are stable over time, thus separating the concerns for modeling engagement and those for popularity - the latter is known to be unstable over time and driven by external promotions. We also find engagement metrics predictable from a cold-start setup, having most of its variance explained by video context, topics and channel information - R2=0.77. Our observations imply several prospective uses of engagement metrics - choosing engaging topics for video production, or promoting engaging videos in recommender systems.
Rizoiu, MA, Graham, T, Zhang, R, Zhang, Y, Ackland, R & Xie, L 2018, 'DEBATENIGHT: The role and influence of socialbots on twitter during the first 2016 U.S. presidential debate', 12th International AAAI Conference on Web and Social Media, ICWSM 2018, pp. 300-309.
Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Serious concerns have been raised about the role of 'socialbots' in manipulating public opinion and influencing the outcome of elections by retweeting partisan content to increase its reach. Here we analyze the role and influence of socialbots on Twitter by determining how they contribute to retweet diffusions. We collect a large dataset of tweets during the 1st U.S. presidential debate in 2016 and we analyze its 1.5 million users from three perspectives: user influence, political behavior (partisanship and engagement) and botness. First, we define a measure of user influence based on the user's active contributions to information diffusions, i.e. their tweets and retweets. Given that Twitter does not expose the retweet structure - it associates all retweets with the original tweet - we model the latent diffusion structure using only tweet time and user features, and we implement a scalable novel approach to estimate influence over all possible unfoldings. Next, we use partisan hashtag analysis to quantify user political polarization and engagement. Finally, we use the BotOrNot API to measure user botness (the likelihood of being a bot). We build a two-dimensional 'polarization map' that allows for a nuanced analysis of the interplay between botness, partisanship and influence. We find that not only are socialbots more active on Twitter - starting more retweet cascades and retweeting more - but they are 2.5 times more influential than humans, and more politically engaged. Moreover, pro-Republican bots are both more influential and more politically engaged than their pro-Democrat counterparts. However we caution against blanket statements that software designed to appear human dominates politics-related activity on Twitter. Firstly, it is known that accounts controlled by teams of humans (e.g. organizational accounts) are often identified as bots. Seco...
Kong, Q, Rizoiu, M-A, Wu, S & Xie, L 2018, 'Will This Video Go Viral: Explaining and Predicting the Popularity of Youtube Videos.', WWW (Companion Volume), ACM, pp. 175-178.
Rizoiu, M-A, Graham, T, Zhang, R, Zhang, Y, Ackland, R & Xie, L 2018, '#DebateNight: The Role and Influence of Socialbots on Twitter During the 1st 2016 U.S. Presidential Debate.', ICWSM, AAAI Press, pp. 300-309.
Rizoiu, M-A, Xie, L, Sanner, S, Cebrian, M, Yu, H & Van Henteryck, P 2017, 'Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity', PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 26th International Conference on World Wide Web (WWW), ASSOC COMPUTING MACHINERY, Perth, AUSTRALIA, pp. 735-744.View/Download from: Publisher's site
Rizoiu, MA & Xie, L 2017, 'Online popularity under promotion: Viral potential, forecasting, and the economics of time', Proceedings of the 11th International Conference on Web and Social Media, ICWSM 2017, pp. 182-191.
© Copyright 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Modeling the popularity dynamics of an online item is an important open problem in computational social science. This paper presents an in-depth study of popularity dynamics under external promotions, especially in predicting popularity jumps of online videos, and determining effective and efficient schedules to promote online content. The recently proposed Hawkes Intensity Process (HIP) models popularity as a non-linear interplay between exogenous stimuli and the endogenous reactions. Here, we propose two novel metrics based on HIP: to describe popularity gain per unit of promotion, and to quantify the time it takes for such effects to unfold. We make increasingly accurate forecasts of future popularity by including information about the intrinsic properties of the video, promotions it receives, and the non-linear effects of popularity ranking. We illustrate by simulation the interplay between the unfolding of popularity over time, and the time-sensitive value of resources. Lastly, our model lends a novel explanation of the commonly adopted periodic and constant promotion strategy in advertising, as increasing the perceived viral potential. This study provides quantitative guidelines about setting promotion schedules considering content virality, timing, and economics.
Rizoiu, M-A & Xie, L 2017, 'Online Popularity Under Promotion: Viral Potential, Forecasting, and the Economics of Time.', ICWSM, AAAI Press, pp. 182-191.
Mishra, S, Rizoiu, M-A & Xie, L 2016, 'Feature Driven and Point Process Approaches for Popularity Prediction', CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 25th ACM International Conference on Information and Knowledge Management (CIKM), ASSOC COMPUTING MACHINERY, IUPUI, Indianapolis, IN, pp. 1069-1078.View/Download from: Publisher's site
Rizoiu, M-A, Xie, L, Caetano, T & Cebrian, M 2016, 'Evolution of Privacy Loss in Wikipedia', PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16), 9th Annual ACM International Conference on Web Search and Data Mining (WSDM), ASSOC COMPUTING MACHINERY, San Francisco, CA, pp. 215-224.View/Download from: Publisher's site
Rizoiu, M-A, Velcin, J, Bonnevay, S & Lallich, S 2016, 'ClusPath: a temporal-driven clustering to infer typical evolution paths', DATA MINING AND KNOWLEDGE DISCOVERY, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery, SPRINGER, Riva del Garda, ITALY, pp. 1324-1349.View/Download from: Publisher's site
Rizoiu, M-A, Xie, L, Caetano, TS & Cebrián, M 2016, 'Evolution of Privacy Loss in Wikipedia.', WSDM, ACM, pp. 215-224.
Mishra, S, Rizoiu, M-A & Xie, L 2016, 'Feature Driven and Point Process Approaches for Popularity Prediction.', CIKM, ACM, pp. 1069-1078.
Rizoiu, M-A, Velcin, J & Lallich, S 2012, 'How to Use Temporal-Driven Constrained Clustering to Detect Typical Evolutions', INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI), WORLD SCIENTIFIC PUBL CO PTE LTD, Athens, GREECE.View/Download from: Publisher's site
Rizoiu, MA 2013, 'Semi-supervised structuring of complex data', IJCAI International Joint Conference on Artificial Intelligence, pp. 3239-3240.
The objective of the thesis is to explore how complex data can be treated using unsupervised machine learning techniques, in which additional information is injected to guide the exploratory process. Starting from specific problems, our contributions take into account the different dimensions of the complex data: their nature (image, text), the additional information attached to the data (labels, structure, concept ontologies) and the temporal dimension. A special attention is given to data representation and how additional information can be leveraged to improve this representation.
Rizoiu, M-A, Velcin, J & Lallich, S 2012, 'Structuring typical evolutions using Temporal-Driven Constrained Clustering', 2012 IEEE 24TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2012), VOL 1, IEEE 24th International Conference on Tools with Artificial Intelligence (ICTAI), IEEE, Athens, GREECE, pp. 610-617.View/Download from: Publisher's site
We propose a system which employs conceptual knowledge to improve topic models by removing unrelated words from the simplified topic description. We use WordNet to detect which topical words are not conceptually similar to the others and then test our assumptions against human judgment. Results obtained on two different corpora in different test conditions show that the words detected as unrelated had a much greater probability than the others to be chosen by human evaluators as not being part of the topic at all. We prove that there is a strong correlation between the said probability and an automatically calculated topical fitness and we discuss the variation of the correlation depending on the method and data used. © 2011 Springer-Verlag Berlin Heidelberg.
Musat, CC, Velcin, J, Trausan-Matu, S & Rizoiu, MA 2011, 'Improving topic evaluation using conceptual knowledge', IJCAI International Joint Conference on Artificial Intelligence, pp. 1866-1871.View/Download from: Publisher's site
The growing number of statistical topic models led to the need to better evaluate their output. Traditional evaluation means estimate the model's fitness to unseen data. It has recently been proven than the output of human judgment can greatly differ from these measures. Thus the need for methods that better emulate human judgment is stringent. In this paper we present a system that computes the conceptual relevance of individual topics from a given model on the basis of information drawn from a given concept hierarchy, in this case WordNet. The notion of conceptual relevance is regarded as the ability to attribute a concept to each topic and separate words related to the topic from the unrelated ones based on that concept. In multiple experiments we prove the correlation between the automatic evaluation method and the answers received from human evaluators, for various corpora and difficulty levels. By changing the evaluation focus from a statistical one to a conceptual one we were able to detect which topics are conceptually meaningful and rank them accordingly.
Rizoiu, M-A, Velcin, J & Chauchat, J-H 2010, 'Regrouper les données textuelles et nommer les groupes à l'aide de classes recouvrantes.', EGC, Cépaduès-Éditions, pp. 561-572.