Shaowu Liu received his PhD degree in Computer Science from Deakin University in 2016. Currently, he is a postdoctoral research fellow in School of Computer Science and Advanced Analytics Institute, University of Technology Sydney. His current research interests include User Behavior Analytics, Interpretable Machine Learning, and Representation Learning of Knowledge Graphs. Besides research, he is also a data scientist with experiences in FinTech and Digital Health projects sponsored by companies and goverment.
Can supervise: YES
- User Behavior Analytics
- Interpretable Machine Learning
- Representation Learning of Knowledge Graphs
Dr Shaowu Liu teaches various computer science and data science subjects at UTS and previously at Deakin University:
- Analytics Capstone Project (2017,2018,2019) @ University of Technology Sydney
- Modern Data Science (2016) @ Deakin University
- Introduction to Computer Science (2016) @ Deakin University
- Enterprise Business Intelligence (2015) @ Deakin University
- Multimedia Delivery Systems (2014) @ Deakin University
- Multimedia Systems and Technology (2014, 2015) @ Deakin University
- Database and Information Retrieval (2014, 2015) @ Deakin University
- Data Structures and Algorithms (2013, 2014) @ Deakin University
Yan, Z, Liu, J & Liu, S 2019, 'DPWeVote: differentially private weighted voting protocol for cloud-based decision-making', Enterprise Information Systems, vol. 13, no. 2, pp. 236-256.View/Download from: UTS OPUS or Publisher's site
© 2018, © 2018 Informa UK Limited, trading as Taylor & Francis Group. With the advent of Industry 4.0, cloud computing techniques have been increasingly adopted by industry practitioners to achieve better workflows. One important application is cloud-based decision-making, in which multiple enterprise partners need to arrive an agreed decision. Such cooperative decision-making problem is sometimes formed as a weighted voting game, in which enterprise partners express 'YES/NO' opinions. Nevertheless, existing cryptographic approaches to Cloud-Based Weighted Voting Game have restricted collusion tolerance and heavily rely on trusted servers, which are not always available. In this work, we consider the more realistic scenarios of having semi-honest cloud server/partners and assuming maximal collusion tolerance. To resolve the privacy issues in such scenarios, the DPWeVote protocol is proposed which incorporates Randomized Response technique and consists the following three phases: the Randomized Weights Collection phase, the Randomized Opinions Collection phase, and the Voting Results Release phase. Experiments on synthetic data have demonstrated that the proposed DPWeVote protocol managed to retain an acceptable utility for decision-making while preserving privacy in semi-honest environment.
© 2018 John Wiley & Sons, Ltd. Jaccard Similarity has been widely used to measure the distance between two sets (or preference profiles) owned by two different users. Yet, in the private data collection scenario, it requires the untrusted curator could only estimate an approximately accurate Jaccard similarity of the involved users but without being allowed to access their preference profiles. This paper aims to address the above requirements by considering the local differential privacy model. To achieve this, we initially focused on a particular hash technique, MinHash, which was originally invented to estimate the Jaccard similarity efficiently. We designed the PrivMin algorithm to achieve the perturbation of MinHash signature by adopting Exponential mechanism and build the Locally Differentially Private Jaccard Similarity Estimation (LDP-JSE) protocol for allowing the untrusted curator to approximately estimate Jaccard similarity. Theoretical and empirical results demonstrate that the proposed protocol can retain a highly acceptable utility of the estimated similarity as well as preserving privacy.
Vo, NNY, He, X, Liu, S & Xu, G 2019, 'Deep learning for decision making and the optimization of socially responsible investments and portfolio', Decision Support Systems, vol. 124.View/Download from: UTS OPUS or Publisher's site
© 2019 Elsevier B.V. A socially responsible investment portfolio takes into consideration the environmental, social and governance aspects of companies. It has become an emerging topic for both financial investors and researchers recently. Traditional investment and portfolio theories, which are used for the optimization of financial investment portfolios, are inadequate for decision-making and the construction of an optimized socially responsible investment portfolio. In response to this problem, we introduced a Deep Responsible Investment Portfolio (DRIP) model that contains a Multivariate Bidirectional Long Short-Term Memory neural network, to predict stock returns for the construction of a socially responsible investment portfolio. The deep reinforcement learning technique was adapted to retrain neural networks and rebalance the portfolio periodically. Our empirical data revealed that the DRIP framework could achieve competitive financial performance and better social impact compared to traditional portfolio models, sustainable indexes and funds.
Liu, S, Li, G, Tran, T & Jiang, Y 2017, 'Preference Relation-based Markov Random Fields for Recommender Systems', Machine Learning, vol. 106, no. 4, pp. 523-546.View/Download from: UTS OPUS or Publisher's site
© 2016 The Author(s) A preference relation-based Top-N recommendation approach is proposed to capture both second-order and higher-order interactions among users and items. Traditionally Top-N recommendation was achieved by predicting the item ratings first, and then inferring the item rankings, based on the assumption of availability of explicit feedback such as ratings, and the assumption that optimizing the ratings is equivalent to optimizing the item rankings. Nevertheless, both assumptions are not always true in real world applications. The proposed approach drops these assumptions by exploiting preference relations, a more practical user feedback. Furthermore, the proposed approach enjoys the representational power of Markov Random Fields thus side information such as item and user attributes can be easily incorporated. Comparing to related work, the proposed approach has the unique property of modeling both second-order and higher-order interactions among users and items. To the best of our knowledge, this is the first time both types of interactions have been captured in preference-relation based methods. Experimental results on public datasets demonstrate that both types of interactions have been properly captured, and significantly improved Top-N recommendation performance has been achieved.
Beliakov, G, Li, G & Liu, S 2015, 'Parallel bucket sorting on graphics processing units based on convex optimization', OPTIMIZATION, vol. 64, no. 4, pp. 1033-1055.View/Download from: UTS OPUS or Publisher's site
Moonsamy, V, Rong, J & Liu, S 2014, 'Mining permission patterns for contrasting clean and malicious android applications', FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, vol. 36, pp. 122-132.View/Download from: UTS OPUS or Publisher's site
Liu, S, Law, R, Rong, J, Li, G & Hall, J 2013, 'Analyzing changes in hotel customers' expectations by trip mode', INTERNATIONAL JOURNAL OF HOSPITALITY MANAGEMENT, vol. 34, pp. 359-371.View/Download from: UTS OPUS or Publisher's site
Huy, QV, Li, G, Sukhorukova, NS, Beliakov, G, Liu, S, Philippe, C, Amiel, H & Ugon, A 2012, 'K-Complex Detection Using a Hybrid-Synergic Machine Learning Method', IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, vol. 42, no. 6, pp. 1478-1490.View/Download from: UTS OPUS or Publisher's site
Vu, HQ, Liu, S, Yang, X, Li, Z & Ren, Y 2012, 'Identifying microphone from noisy recordings by using representative instance one class-classification approach', Journal of Networks, vol. 7, no. 6, pp. 908-917.View/Download from: UTS OPUS or Publisher's site
Rapid growth of technical developments has created huge challenges for microphone forensics - a subcategory of audio forensic science, because of the availability of numerous digital recording devices and massive amount of recording data. Demand for fast and efficient methods to assure integrity and authenticity of information is becoming more and more important in criminal investigation nowadays. Machine learning has emerged as an important technique to support audio analysis processes of microphone forensic practitioners. However, its application to real life situations using supervised learning is still facing great challenges due to expensiveness in collecting data and updating system. In this paper, we introduce a new machine learning approach which is called One-class Classification (OCC) to be applied to microphone forensics; we demonstrate its capability on a corpus of audio samples collected from several microphones. In addition, we propose a representative instance classification framework (RICF) that can effectively improve performance of OCC algorithms for recording signal with noise. Experiment results and analysis indicate that OCC has the potential to benefit microphone forensic practitioners in developing new tools and techniques for effective and efficient analysis. © 2012 Academy Publisher.
Recommender systems have become an important tool for users to identify interesting items as well as for businesses to promote their products to the right users. With the rapid development of social networks, travelers have started to seek recommendations and advice from web services such as TripAdvisor and Yelp. Although the initial purpose of travelers is to share their opinions on social networks, this provides an opportunity for hospitality businesses to learn about their customers' preferences. Given these data on preferences, recent advances in data science research have made it possible to build automatic recommender systems that can generate hotel recommendations tailored to each traveler. This chapter introduces the basic concepts and tools for creating hotel recommender systems
Beliakov, G & Liu, S 2014, 'Parallel Monotone Spline Interpolation and Approximation on GPUs' in Couturier, R (ed), Designing Scientific Applications on GPUs, CRC Press, USA, pp. 295-310.View/Download from: UTS OPUS or Publisher's site
Monotonicity preserving interpolation and approximation have received
substantial attention in the last thirty years because of their numerous applications in computer aided-design, statistics, and machine learning [9, 10, 19]. Constrained splines are particularly popular because of their
exibility in modeling di erent geometrical shapes, sound theoretical properties, and availability of numerically stable algorithms [9,10,26]. In this work we examine parallelization and adaptation for GPUs of a few algorithms of monotone spline interpolation and data smoothing, which arose in the context of estimating probability distributions.
Estimating Cumulative Probability distribution Functions (CDF) from
data is quite common in data analysis. In our particular case we faced this
problem in the context of partitioning univariate data with the purpose of
e cient sorting. It was necessary to partition large data sets into chunks of
approximately equal size, so that these chunks could be sorted independently and subsequently concatenated. In order to do that, empirical CDF of the data was used to nd the quantiles, which served to partition the data. CDF was estimated from the data based on a number of pairs (xi; yi); i = 1; : : : ; n, where yi was the proportion of data no larger than xi. As data could come from a variety of distributions, a distribution-free nonparametric fitting procedure was required to interpolate the above pairs. Needless to say the whole process was aimed at GPU, and hence the use of CPU for invoking serial algorithms had to be minimized.
Biddle, R, Liu, S & Xu, G 2018, 'Semi-Supervised Soft K-Means Clustering of Life Insurance Questionnaire Responses', Proceedings - 2018 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing, BESC 2018, pp. 30-31.View/Download from: UTS OPUS or Publisher's site
© 2018 IEEE. The life insurance questionnaire is a large document containing responses in a mixture of structured and unstructured data. The unstructured data poses issues for the user, in the form of extra input effort, and the insurance company, in the form of interpretation and analysis. In this work, we aim to address these problems by proposing a semi-supervised framework for clustering responses into categories using vector space embedding of responses and soft k-means clustering. Our experiments show that our method achieves adequate results. The resulting category clusters from our method can be used for analysis and to replace free text input questions with structured questions in the questionnaire.
Biddle, R, Liu, S, Tilocca, P & Xu, G 2018, 'Automated Underwriting in Life Insurance: Predictions and Optimisation', ADC 2018: Databases Theory and Applications (LNCS), Australasian Database Conference, Springer, Gold Coast, QLD, Australia, pp. 135-146.View/Download from: UTS OPUS or Publisher's site
Underwriting is an important stage in the life insurance process and is concerned with accepting individuals into an insurance fund and on what terms. It is a tedious and labour-intensive process for both the applicant and the underwriting team. An applicant must fill out a large survey containing thousands of questions about their life. The underwriting team must then process this application and assess the risks posed by the applicant and offer them insurance products as a result. Our work implements and evaluates classical data mining techniques to help automate some aspects of the process to ease the burden on the underwriting team as well as optimise the survey to improve the applicant experience. Logistic Regression, XGBoost and Recursive Feature Elimination are proposed as techniques for the prediction of underwriting outcomes. We conduct experiments on a dataset provided by a leading Australian life insurer and show that our early-stage results are promising and serve as a foundation for further work in this space.
Vo, NNY, Liu, S, He, X & Xu, G 2018, 'Multimodal Mixture Density Boosting Network for Personality Mining', Advances in Knowledge Discovery and Data Mining (LNCS), Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Melbourne, Australia, pp. 644-655.View/Download from: UTS OPUS or Publisher's site
Knowing people's personalities is useful in various real-world applications, such as personnel selection. Traditionally, we have to rely on qualitative methodologies, e.g. surveys or psychology tests to determine a person's traits. However, recent advances in machine learning have it possible to automate this process by inferring personalities from textual data. Despite of its success, text-based method ignores the facial expression and the way people speak, which can also carry important information about human characteristics. In this work, a personality mining framework is proposed to exploit all the information from videos, including visual, auditory, and textual perspectives. Using a state-of-art cascade network built on advanced gradient boosting algorithms, the result produced by our proposed methodology can achieve lower the prediction errors than most current machine learning algorithms. Our multimodal mixture density boosting network especially perform well with small sample size datasets, which is useful for learning problems in psychology fields where big data is often not available.
Vo, NNY, Xu, G, Liu, S, Brownlow, EJ, Culbert, B & Chu, C 2018, 'Client Churn Prediction with Call Log Analysis', Database Systems for Advanced Applications, International Conference on Database Systems for Advanced Applications, Springer, Gold Coast, Australia, pp. 752-763.View/Download from: UTS OPUS or Publisher's site
Yin, J, Zhou, Z, Liu, S, Wu, Z & Xu, G 2018, 'Social Spammer Detection: A Multi-Relational Embedding Approach', Pacific-Asia Conference on Knowledge Discovery and Data, Springer Link, Melbourne, VIC, Australia.View/Download from: UTS OPUS or Publisher's site
Zhou, Z, Liu, S, Xu, G, Xie, X, Yin, J, Li, Y & Zhang, W 2018, 'Knowledge-based Recommendation with Hierarchical Collaborative Embedding', PAKDD 2018: Advances in Knowledge Discovery and Data Mining, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Melbourne, Australia, pp. 222-234.View/Download from: UTS OPUS or Publisher's site
Data sparsity is a common issue in recommendation systems, particularly collaborative filtering. In real recommendation scenarios, user preferences are often quantitatively sparse because of the application nature. To address the issue, we proposed a knowledge graph-based semantic information enhancement mechanism to enrich the user preferences. Specifically, the proposed Hierarchical Collaborative Embedding (HCE) model leverages both network structure and text info embedded in knowledge bases to supplement traditional collaborative filtering. The HCE model jointly learns the latent representations from user preferences, linkages between items and knowledge base, as well as the semantic representations from knowledge base. Experiment results on GitHub dataset demonstrated that semantic information from knowledge base has been properly captured, resulting improved recommendation performance.
Zhu, D, Pang, N, Li, G & Liu, S 2016, 'WiseFi: Activity Localization and Recognition on Commodity Off-the-shelf WiFi Devices', IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems, IEEE, Sydney, Australia.View/Download from: UTS OPUS or Publisher's site
Zhu, D, Pang, N, Li, G & Liu, S 2017, 'NotiFi: Non-Invasive Abnormal Activity Detection Using Fine-grained WiFi Signals', Proceedings of the International Joint Conference on Neural Networks (IJCNN), International Joint Conference on Neural Networks, IEEE, Anchorage, Alaska, USA, pp. 1766-1773.View/Download from: UTS OPUS or Publisher's site
We build an ubiquitous abnormal activity detection system, namely NotiFi, for accurately detecting the abnormal activities on commercial off-the-shelf (COTS) IEEE 802.11 devices. In contrast to the traditional wearable sensor based and computer vision based systems which require additional sensors or enough lighting in line-of-sight (LoS) scenario, we proceed directly with abnormal activity characterization and activity modeling at the WiFi signal level based on Channel State Information (CSI). The intuition of NotiFi is that whenever the human body occludes the wireless signal transmitting from the access point to the receiver, the phase and the amplitude information of Channel State Information (CSI) will change sensitively. By creating a multiple hierarchical Dirichlet processes, NotiFi automatically learns the number of human body activity categories for abnormal detection. Experimental results in three typical indoor environments indicate that NotiFi can achieve satisfactory performance in accuracy, robustness and stability.
Pang, N, Zhu, D, Li, G & Liu, S 2017, 'WarnFi: Non-Invasive WiFi-based Abnormal Activity Sensing Using Non-parametric Model', IEEE Military Communications Conference, IEEE, Baltimore, MD, USA.View/Download from: UTS OPUS or Publisher's site
Liu, S, Pang, N, Xu, G & Liu, H 2017, 'Collaborative Filtering via Different Preference Structures', International Conference on Knowledge Science, Engineering and Management, Springer, Melbourne, Australia, pp. 309-321.View/Download from: UTS OPUS
Recently, social network websites start to provide third-parity sign-in options via the OAuth 2.0 protocol. For example, users can login Netflix website using their Facebook accounts. By using this service, accounts of the same user are linked together, and so does their information. This fact provides an opportunity of creating more complete profiles of users, leading to improved recommender systems. However, user opinions distributed over different platforms are in different preference structures, such as ratings, rankings, pairwise comparisons, voting, etc. As existing collaborative filtering techniques assume the homogeneity of preference structure, it remains a challenge task of how to learn from different preference structures simultaneously. In this paper, we propose a fuzzy preference relation-based approach to enable collaborative filtering via different preference structures. Experiment results on public datasets demonstrate that our approach can effectively learn from different preference structures, and show strong resistance to noises and biases introduced by cross-structure preference learning.
Liu, S, Xu, G, Zhu, X & Zhou, Z 2017, 'Towards Simplified Insurance Application via Sparse Questionnaire Optimization', 2017 International Conference on Behavioral, Economic, Socio-cultural Computing (BESC), International Conference on Behavioral, Economic, and Socio-Cultural Computing, IEEE, Poland.View/Download from: UTS OPUS or Publisher's site
Life insurance application requires in-person meetings with underwriters, tedious paperwork, and an average waiting period of six weeks before an offer can be made. This outdated process has become a barrier for broader consumer adoption, resulting large coverage gap. In this work, we aim to closing this gap by leveraging data mining techniques to optimize the insurance questionnaire form. Our experiment on 10 years of insurance application data has identified that only ~2% of all questions have shown high relevancy to determining the risks of applicants, resulting a significantly simplified questionnaire.
Zhou, Z, Xu, G, Zhu, X & Liu, S 2017, 'Latent Factor Analysis for Low-dimensional Implicit Preference Prediction', International Conference on Behavioral, Economic, Socio-cultural Computing, IEEE, Poland, pp. 1-2.View/Download from: UTS OPUS or Publisher's site
User preference prediction aims to predict a users future preferences on a large number of items according to his/her preference history. To achieve this goal, many models have been proposed, but mainly for explicit preference data, such as 5-star ratings. Nevertheless, real-world data are often in implicit format, such as purchase action, and the number of items is not always large. In this paper, we demonstrate the use of latent factor models for solving the task of predicting user preferences on implicit and low-dimensional dataset.
Liu, S, Li, G, Tran, T & Jiang, Y 2015, 'Preference Relation-based Markov Random Fields forRecommender Systems', ACML 2015 : Proceedings of 7th Asian Conference on Machine Learning, Asian Conference on Machine Learning, The Proceedings of Machine Learning Research, Hong Kong, pp. 1-16.View/Download from: UTS OPUS
A preference relation-based Top-N recommendation approach, PrefMRF, is proposed to capture both the second-order and the higher-order interactions among users and items. Traditionally Top-N recommendation was achieved by predicting the item ratings fi rst, and then inferring the item rankings, based on the assumption of availability of explicit feed-backs such as ratings, and the assumption that optimizing the ratings is equivalent to optimizing the item rankings. Nevertheless, both assumptions are not always true in real world applications. The proposed PrefMRF approach drops these assumptions by explicitly exploiting the preference relations, a more practical user feedback. Comparing to related work, the proposed PrefMRF approach has the unique property of modeling both the second-order and the higher-order interactions among users and items. To the best of our knowledge, this is the first time both types of interactions have been captured in preference relation-based method. Experiment results on public datasets demonstrate that both types of interactions have been properly captured, and signifi cantly improved Top-N recommendation performance has been achieved.
Liu, S, Tran, T, Li, G & Jiang, Y 2014, 'Ordinal Random Fields for Recommender Systems', JMLR: Workshop and Conference Proceedings, Asian Conference on Machine Learning, The Proceedings of Machine Learning Research, Nha Trang City, Vietnam.View/Download from: UTS OPUS
Recommender Systems heavily rely on numerical preferences, whereas the importance of
ordinal preferences has only been recognised in recent works of Ordinal Matrix Factorisation
(OMF). Although the OMF can effectively exploit ordinal properties, it captures only
the higher-order interactions among users and items, without considering the localised
interactions properly. This paper employs Markov Random Fields (MRF) to investigate the
localised interactions, and proposes a unified model called Ordinal Random Fields (ORF)
to take advantages of both the representational power of the MRF and the ease of modelling
ordinal preferences by the OMF. Experimental result on public datasets demonstrates that
the proposed ORF model can capture both types of interactions, resulting in improved
Moonsamy, V, Rong, J, Liu, S, Li, G & Batten, L 2013, 'Contrasting Permission Patterns between Clean and Malicious Android Applications', Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, International Conference on Security and Privacy for Communication Networks, Springer, Sydney, Australia, pp. 69-85.View/Download from: UTS OPUS or Publisher's site
The Android platform uses a permission system model to allow users and developers to regulate access to private information and system resources required by applications. Permissions have been proved to be useful for inferring behaviors and characteristics of an application. In this paper, a novel method to extract contrasting permission patterns for clean and malicious applications is proposed. Contrary to existing work, both required and used permissions were considered when discovering the patterns. We evaluated our methodology on a clean and a malware dataset, each comprising of 1227 applications. Our empirical results suggest that our permission patterns can capture key differences between clean and malicious applications, which can assist in characterizing these two types of applications.
Vu, H, Liu, S, Li, Z & Li, G 2011, 'Microphone Identification using One Class-Classification Approach', The 2nd Workshop on Applications and Techniques in Information Security, International Conference on Applications and Techniques in Information Security (ATIS), Melbourne, Australia, pp. 30-37.
Rapid growth of technical developments has created huge challenges for microphone forensics -a sub-category of audio forensic science, because of the avail-ability of numerous digital recording devices and massive amount of recording data. Demand for fast and efficient methods to assure integrity and authenticity of information is becoming more and more important in criminal inves-tigation nowadays. Machine learning has emerged as an important technique to support audio analysis processes of microphone forensic practitioners. However, its application to real life situations using supervised learning is still facing great challenges due to expensiveness in collecting data and updating system. In this paper, we introduce a new machine learning approach which is called One-class Classification (OCC) to be applied to microphone forensics; we demonstrate its capability on a corpus of audio samples collected from several microphones. Research results and analysis indicate that OCC has the potential to benefit microphone forensic practitioners in developing new tools and techniques for effective and efficient analysis.