Meng, Q, Catchpoole, D, Skillicorn, D & Kennedy, PJ 2017, 'DBNorm: Normalizing high-density oligonucleotide microarray data based on distributions', BMC Bioinformatics, vol. 18, no. 1.View/Download from: UTS OPUS or Publisher's site
© 2017 The Author(s). Background: Data from patients with rare diseases is often produced using different platforms and probe sets because patients are widely distributed in space and time. Aggregating such data requires a method of normalization that makes patient records comparable. Results: This paper proposed DBNorm, implemented as an R package, is an algorithm that normalizes arbitrarily distributed data to a common, comparable form. Specifically, DBNorm merges data distributions by fitting functions to each of them, and using the probability of each element drawn from the fitted distribution to merge it into a global distribution. DBNorm contains state-of-the-art fitting functions including Polynomial, Fourier and Gaussian distributions, and also allows users to define their own fitting functions if required. Conclusions: The performance of DBNorm is compared with z-score, average difference, quantile normalization and ComBat on a set of datasets, including several that are publically available. The performance of these normalization methods are compared using statistics, visualization, and classification when class labels are known based on a number of self-generated and public microarray datasets. The experimental results show that DBNorm achieves better normalization results than conventional methods. Finally, the approach has the potential to be applicable outside bioinformatics analysis.
Asabere, NY, Xia, F, Meng, Q, Li, F & Liu, H 2015, 'Scholarly paper recommendation based on social awareness and folksonomy', International Journal of Parallel, Emergent and Distributed Systems, vol. 30, no. 3, pp. 211-232.View/Download from: Publisher's site
© 2014 © 2014 Taylor & Francis. The significant proliferation of research papers in both conferences and journals has made it difficult for researchers to easily access relevant scholarly papers for academic learning. This has been a substantial problem for many researchers. Conferences, in comparison with journals, have an aspect of social learning and networking, which leads to personal familiarisation through various interactions among researchers. In this paper, we improve the social awareness of conference participants by proposing a novel folksonomy-based paper recommendation algorithm, called socially aware recommendation of scholarly papers (SARSP). SARSP recommends papers issued by active participants (APs), to other Group Profile Participants at the same conference based on preference similarity of their research interests. In addition, SARSP computes the social ties between an AP and other conference participants to effectively generate social recommendations of scholarly papers. We evaluate our proposed algorithm using a real-world data-set. Our experimental results confirm that SARSP has significant improvement over other existing methods.
Asabere, NY, Xia, F, Meng, Q, Li, F & Liu, H 2014, 'Scholarly paper recommendation based on social awareness and folksonomy', International Journal of Parallel, Emergent and Distributed Systems.View/Download from: Publisher's site
The significant proliferation of research papers in both conferences and journals has made it difficult for researchers to easily access relevant scholarly papers for academic learning. This has been a substantial problem for many researchers. Conferences, in comparison with journals, have an aspect of social learning and networking, which leads to personal familiarisation through various interactions among researchers. In this paper, we improve the social awareness of conference participants by proposing a novel folksonomy-based paper recommendation algorithm, called socially aware recommendation of scholarly papers (SARSP). SARSP recommends papers issued by active participants (APs), to other Group Profile Participants at the same conference based on preference similarity of their research interests. In addition, SARSP computes the social ties between an AP and other conference participants to effectively generate social recommendations of scholarly papers. We evaluate our proposed algorithm using a real-world data-set. Our experimental results confirm that SARSP has significant improvement over other existing methods. © 2014 © 2014 Taylor & Francis.
Brownlow, J, Chu, C, Fu, B, Xu, G, Culbert, B & Meng, Q 2018, 'Cost-sensitive churn prediction in fund management services', Database Systems for Advanced Applications (LNCS), International Conference on Database Systems for Advanced Applications, Spirnger, Gold Coast, QLD, Australia, pp. 776-788.View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing AG, part of Springer Nature 2018. Churn prediction is vital to companies as to identify potential churners and prevent losses in advance. Although it has been addressed as a classification task and a variety of models have been employed in practice, fund management services have presented several special challenges. One is that financial data is extremely imbalanced since only a tiny proportion of customers leave every year. Another is a unique cost-sensitive learning problem, i.e., costs of wrong predictions for churners should be related to their account balances, while costs of wrong predictions for non-churners should be the same. To address these issues, this paper proposes a new churn prediction model based on ensemble learning. In our model, multiple classifiers are built using sampled datasets to tackle the imbalanced data issue while exploiting data fully. Moreover, a novel sampling strategy is proposed to deal with the unique cost-sensitive issue. This model has been deployed in one of the leading fund management institutions in Australia, and its effectiveness has been fully validated in real applications.
Culbert, B, Fu, B, Brownlow, J, Chu, C, Meng, Q & Xu, G 2018, 'Customer Churn Prediction in Superannuation: A Sequential Pattern Mining Approach', Databases Theory and Applications (LNCS), Australasian Database Conference, Springer, Gold Coast, QLD, Australia, pp. 123-134.View/Download from: UTS OPUS or Publisher's site
The role of churn modelling is to maximize the value of marketing dollars spent and minimize the attrition of valuable customers. Though churn prediction is a common classification task, traditional approaches cannot be employed directly due to the unique issues inherent within the wealth management industry. Through this paper we address the issue of unseen churn in superannuation; whereby customer accounts become dormant following the discontinuation of compulsory employer contributions, and suggest solutions to the problem of scarce customer engagement data. To address these issues, this paper proposes a new approach for churn prediction and its application in the superannuation industry. We use the extreme gradient boosting algorithm coupled with contrast sequential pattern mining to extract behaviors preceding a churn event. The results demonstrate a significant lift in the performance of prediction models when pattern features are used in combination with demographic and account features.
Chu, C, Brownlow, J, Meng, Q, Fu, B, Culbert, B, Zhu, M, Xu, G & He, X 2017, 'Combining heterogeneous features for time series prediction', 2017 International Conference on Behavioral, Economic, Socio-cultural Computing (BESC), International Conference on Behavioral, Economic, Socio-cultural Computing, IEEE, Krakow, Poland.View/Download from: UTS OPUS or Publisher's site
Time series prediction is a challenging task in reality, and various methods have been proposed for it. However, only the historical series of values are exploited in most of existing methods. Therefore, the predictive models might be not effective in some cases, due to: (1) the historical series of values is not sufficient usually, and (2) features from heterogeneous sources such as the intrinsic features of data samples themselves, which could be very useful, are not take into consideration. To address these issues, we proposed a novel method in this paper which learns the predictive model based on the combination of dynamic features extracted from series of historical values and static features of data samples. To evaluate the performance of our proposed method, we compare it with linear regression and boosted trees, and the experimental results validate our method's superiority.
Meng, Q, Catchpoole, D, Skillicom, D & Kennedy, PJ 2019, 'Relational autoencoder for feature extraction', Proceedings of the International Joint Conference on Neural Networks, International Joint Conference on Neural Networks, IEEE, Anchorage, AK, USA, pp. 364-371.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE. Feature extraction becomes increasingly important as data grows high dimensional. Autoencoder as a neural network based feature extraction method achieves great success in generating abstract features of high dimensional data. However, it fails to consider the relationships of data samples which may affect experimental results of using original and new features. In this paper, we propose a Relation Autoencoder model considering both data features and their relationships. We also extend it to work with other major autoencoder models including Sparse Autoencoder, Denoising Autoencoder and Variational Autoencoder. The proposed relational autoencoder models are evaluated on a set of benchmark datasets and the experimental results show that considering data relationships can generate more robust features which achieve lower construction loss and then lower error rate in further classification compared to the other variants of autoencoders.
Meng, Q, Wu, J, Ellis, J & Kennedy, PJ 2017, 'Dynamic island model based on spectral clustering in genetic algorithm', Proceedings of the International Joint Conference on Neural Networks, International Joint Conference on Neural Networks, IEEE, Anchorage, AK, USA, pp. 1724-1731.View/Download from: Publisher's site
© 2017 IEEE. How to maintain relative high diversity is important to avoid premature convergence in population-based optimization methods. Island model is widely considered as a major approach to achieve this because of its flexibility and high efficiency. The model maintains a group of sub-populations on different islands and allows sub-populations to interact with each other via predefined migration policies. However, current island model has some drawbacks. One is that after a certain number of generations, different islands may retain quite similar, converged sub-populations thereby losing diversity and decreasing efficiency. Another drawback is that determining the number of islands to maintain is also very challenging. Meanwhile initializing many sub-populations increases the randomness of island model. To address these issues, we proposed a dynamic island model (DIM-SP) which can force each island to maintain different sub-populations, control the number of islands dynamically and starts with one sub-population. The proposed island model outperforms the other three state-of-the-art island models in three baseline optimization problems including job shop scheduler, travelling salesmen, and quadratic multiple knapsack.
Wang, S, Liu, W, Wu, J, Cao, L, Meng, Q & Kennedy, PJ 2016, 'Training deep neural networks on imbalanced data sets', Proceedings of the International Joint Conference on Neural Networks, IEEE International Joint Conference on Neural Networks, IEEE, Vancouver, Canada, pp. 4368-4374.View/Download from: UTS OPUS or Publisher's site
© 2016 IEEE.Deep learning has become increasingly popular in both academic and industrial areas in the past years. Various domains including pattern recognition, computer vision, and natural language processing have witnessed the great power of deep networks. However, current studies on deep learning mainly focus on data sets with balanced class labels, while its performance on imbalanced data is not well examined. Imbalanced data sets exist widely in real world and they have been providing great challenges for classification tasks. In this paper, we focus on the problem of classification using deep network on imbalanced data sets. Specifically, a novel loss function called mean false error together with its improved version mean squared false error are proposed for the training of deep networks on imbalanced data sets. The proposed method can effectively capture classification errors from both majority class and minority class equally. Experiments and comparisons demonstrate the superiority of the proposed approach compared with conventional methods in classifying imbalanced data sets on deep neural networks.
Willey, K, Meng, Q & Gardner, AP 2015, 'Insights from using a subject specific Facebook group for student engagement and learning', Research in Engineering Education Symposium 2015, Dublin, Ireland.View/Download from: UTS OPUS
Although discussion boards have been available in the Learning Management System (LMS) for several years, they have not served well as a means of extending student engagement outside class time. The social media site Facebook was incorporated into an Engineering Mechanics class with the aim of increasing subject specific student engagement. This paper reports a small preliminary study exploring the effect of the introduction of the Facebook group. These students found the Facebook group increased the frequency of their engagement with the subject material compared to other subjects, and they considered it valuable because almost all students and the instructor were involved. However, students emphasised that the Facebook group was a supplement to, and not a substitute for, the face-to-face lecture and tutorial sessions. This study confirmed the value of undertaking focus groups with students to assist interpretation of data collected by more objective methods such as social network analysis.
Meng, Q, Tafavogh, S & Kennedy, PJ 2014, 'Community detection on heterogeneous networks by multiple semantic-path clustering', 2014 6th International Conference on Computational Aspects of Social Networks, CASoN 2014, International Conference on Computational Aspects of Social Networks (CASoN), IEEE, Porto, PORTUGAL, pp. 7-12.View/Download from: Publisher's site
© 2014 IEEE. Heterogeneous networks have become a commonly used model to represent complex and abstract social phenomena. They allow objects to have many different relationships and represent relationships by semantic paths which connect object types via a sequence of relations. A major challenge in community detection on heterogeneous networks is how to organize and combine different semantic paths. In order to acquire desired clustering, we propose a novel community detection method for heterogeneous networks based on matrix decomposition and semantic paths. The major advantage of this method is to treat objects individually and to assign them with different combinations of semantic-path weights so as to improve the clustering quality. The comparative experiments of the proposed method with another two state-of-the-art methods, spectral clustering and path-selection clustering, confirms that it can acquire desired clustering results better.
Tafavogh, S, Meng, Q, Catchpoole, DR & Kennedy, PJ 2014, 'Automated quantitative and qualitative analysis of whole neuroblastoma tumour images for prognosis', Proceedings of the IASTED 11th International Conference on Biomedical Engineering, IASTED International Conference on Biomedical Engineering, ACTA Press, Zurich, Switzerland, pp. 244-251.View/Download from: UTS OPUS or Publisher's site
Meng, Q & Kennedy, PJ 2013, 'Discovering Influential Authors in Heterogeneous Academic Networks by a Co-ranking Method', Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, ACM International Conference on Information and Knowledge Management, Association for Computing Machinery, San Francisco, California, USA, pp. 1029-1036.View/Download from: UTS OPUS or Publisher's site
Research in ranking networked entities is widely applicable to many problems such as optimizing search engines, building recommendation systems and discovering influential nodes in social networks. However, many famous ranking approaches like PageRank are limited to solving this problem in homogeneous networks and are not applicable to heterogeneous networks. Faced with this problem, we propose a co--ranking method to evaluate scientific publications and authors. This novel approach is a flexible framework based on a set of customized rules taking into account both topological features of networks and the included citations. The approach ranks authors and publications iteratively and uses the results of each round to reinforce the ranks of authors and publications. Unlike traditional approaches to assessing publication, which require a great number of citations, our method lowers this requirement. This co--ranking approach has been validated using data collected from DBLP and CiteSeer, and the results suggest that it is effective and efficient in ranking authors and publications based on limited numbers of citations in heterogeneous networks and that it has fast convergence.
Meng, Q & Kennedy, PJ 2012, 'Determining the number of clusters in co-authorship networks using social network theory', The 2nd International Conference on Social Computing and Its Applications, International Conference on Social Computing and Its Applications, IEEE, Xiangtan, Hunan, China, pp. 337-343.View/Download from: UTS OPUS or Publisher's site
Spectral clustering is a modern data clustering methodology with many notable advantages. However, this method has a weakness in that it requires researchers to specify a priori the number of clusters. In most cases, it is a challenge to know the number of clusters accurately. Here, we propose a novel way to solve this problem by involving the concept of group leaders and members from social network theory. From the perspective of social networks, groups are organized by leaders and this can provide a hint to finding the number of clusters in social networks by identifying group leaders. However, due to the fact that a group can have more than one leader, we also propose an algorithm to combine leaders from the same group. The number of leaders after the combination is expected to be the number of clusters in a network. We validate this proposed approach by using spectral clustering to cluster data comprising the co-authorship network from the University of Technology, Sydney (UTS). The experimental results show that our proposed method is effective in determining the number of cluster and can facilitate spectral clustering to achieve better clusters compared with other methods of calculating the number of clusters
Meng, Q & Kennedy, PJ 2012, 'Using field of research codes to discover research groups from co-authorship networks', IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE, Istanbul, Turkey, pp. 289-293.View/Download from: UTS OPUS or Publisher's site
Nowadays, academic collaboration has become more prevalent and crucial than ever before and many studies of academic collaboration analysis are implemented based on coauthor ship networks. This paper aims to build a novel coauthor ship network by importing field of research codes based on Newman's model, and then analyze and extract research groups via spectral clustering. In order to explain the effectiveness of this revised network, we take the academic collaboration at the University of Technology, Sydney (UTS) as an example. The result of this study advances methods for selecting the most prolific research groups and individuals in research institutions, and provides scientific evidence for policymakers to manage laboratories and research groups more efficiently in the future.
Meng, Q & Kennedy, PJ 2012, 'Using network evolution theory and singular value decomposition method to improve accuracy of link prediction in social networks', Proceedings of the Tenth Australasian Data Mining Conference (AusDM-12),, Australian Data Mining Conference, Australian Computer Society, Sydney, pp. 175-181.View/Download from: UTS OPUS
Link prediction in large networks, especially social networks, has received significant recent attention. Although there are many papers contributing methods for link prediction, the accuracy of most predictors is generally low as they treat all nodes equally. We propose an effective approach to identifying the level of activities of nodes in networks by observing their behaviour during network evolution. It is clear that nodes that have been active previously contribute more to the changes in a network than stable nodes, which have low activity. We apply truncated singular value decomposition (SVD) to exclude the interference of stable nodes by treating them as noise in our dataset. Finally, in order to test the effectiveness of our proposed method, we use co-authorship networks from an Australian university from between 2006 and 2011 as an experimental dataset. The results show that our proposed method achieves higher accuracy in link prediction than previous methods, especially in predicting new links.