UTS site search

Dr Guandong Xu

Biography

Dr Guandong Xu is a senior lecturer in the Advanced Analytics Institute at University of Technology Sydney. He received MSc and BSc degree in Computer Science and Engineering from Zhejiang University, China. He gained PhD degree in Computer Science from Victoria University. After that he took various positions, e.g., Postdoctoral research fellow and Vice-Chancellor Postdoctoral Fellow in the Centre for Applied Informatics at Victoria University, Australia, and Research Assistant Professor in Department of Computer Science at Aalborg University, Denmark. He is an Endeavour Postdoctoral Research Fellow in the University of Tokyo in 2008.

Professional

Guandong has had 80+ publications in the areas of Web Data Mining, Recommender System, Social Web and Social Network Analysis and Applied Informatics. He has authored three monograph books and one edited book, and edited five conference proceedings with Springer, Taylor & Francis, and IGI publisher along with dozens of journal and conference papers.

He has been serving in editorial board or as guest editors for several international journals, such as the Computer Journal, Journal of Systems and Software, World Wide Web Journal and International Journal of Social Network Mining, and he is the assistant Editor-in-Chief of World Wide Web Journal. He is also active in organizing or serving for international conferences and workshops, e.g., ASONAM 2014 and BESC 2014.

Image of Guandong Xu
Senior Lecturer, A/DRsch Advanced Analytics Institute
Core Member, AAI - Advanced Analytics Institute
B.Sc(ZJU), M.Sc(ZJU), PhD
Member, Association for Computing Machinery
Member, Institute of Electrical and Electronics Engineers
 
Phone
+61 2 9514 3788

Research Interests

  • Data mining, Machine learning
  • Web usage mining, Web community, Web personalization and Recommender System
  • Information retrieval and processing, Web search
  • Social network analysis, Social media mining, Social Analytics

Can supervise: Yes
Registered at Level 1

Database, Data Analytics, Text Analytics, Recommender Systems

Books

Xu, G., Zong, Y. & Yang, Z. 2013, Applied Data Mining, 1st, CRC Press, USA.
View/Download from: UTS OPUS
Luo, T., Chen, S., Xu, G. & Zhou, J. 2013, Trust-based Collective View Prediction, 1st Edition, Springer Berlin / Heidelberg, Germany.
View/Download from: UTS OPUS or Publisher's site
Collective view prediction is to judge the opinions of an active web user based on unknown elements by referring to the collective mind of the whole community. Content-based recommendation and collaborative filtering are two mainstream collective view prediction techniques. They generate predictions by analyzing the text features of the target object or the similarity of users past behaviors. Still, these techniques are vulnerable to the artificially-injected noise data, because they are not able to judge the reliability and credibility of the information sources. Trust-based Collective View Prediction describes new approaches for tackling this problem by utilizing users trust relationships from the perspectives of fundamental theory, trust-based collective view prediction algorithms and real case studies. The book consists of two main parts a theoretical foundation and an algorithmic study. The first part will review several basic concepts and methods related to collective view prediction, such as state-of-the-art recommender systems, sentimental analysis, collective view, trust management, the Relationship of Collective View and Trustworthy, and trust in collective view prediction. In the second part, the authors present their models and algorithms based on a quantitative analysis of more than 300 thousand users data from popular product-reviewing websites. They also introduce two new trust-based prediction algorithms, one collaborative algorithm based on the second-order Markov random walk model, and one Bayesian fitting model for combining multiple predictors.
Xu, G. & Li, L. 2013, Social media mining and social network analysis: Emerging research, IGI Global, USA.
View/Download from: UTS OPUS or Publisher's site
Social Media Mining and Social Network Analysis: Emerging Research highlights the advancements made in social network analysis and social web mining and its influence in the fields of computer science, information systems, sociology, organization science discipline and much more. This collection of perspectives on developmental practice is useful for industrial practitioners as well as researchers and scholars. © 2013 by IGI Global. All rights reserved.
Xu, G., Zhang, Y. & Li, L. 2011, Web Mining and Social Networking - Techniques and Applications, 1st, Springer Berlin / Heidelberg, Germany.
View/Download from: UTS OPUS
This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web mining, and the issue of how to incorporate web mining into web personalization and recommendation systems are also reviewed. Additionally, the volume explores web community mining and analysis to find the structural, organizational and temporal developments of web communities and reveal the societal sense of individuals or communities. The volume will benefit both academic and industry communities interested in the techniques and applications of web search, web data management, web mining and web knowledge discovery, as well as web community and social network analysis.

Chapters

Xu, G., Wu, Z., Cao, J. & Tao, H. 2014, 'Models for Community Dynamics' in Alhajj, R. & Rokne, J. (eds), Encyclopedia of Social Network Analysis and Mining, Springer Reference, pp. 969-982.
View/Download from: UTS OPUS or Publisher's site
In the realm of network science, a complex network is a graph with non-trivial topological features. A social network, one of the important real-world complex networks, is usually be modeled as a graph, where the nodes are called actors and the edges connecting nodes are used to represent various ties. The dynamic network is usually defined as a sequence of snapshot graphs indexed by time. Community dynamics aims to process such dynamic network to produce a sequence of communities, that is, one community for each timestamp. Different from traditional community detection methods on the static network, community dynamics assumes to obtain communities of the current timestamp relies on the results of the previous timestamps.
Li, L., Xiao, H. & Xu, G. 2013, 'Recommending Related Microblogs' in Xu, G. & Li, L. (eds), Social Media Mining and Social Network Analysis: Emerging Research, IGI Global, Hershey, USA, pp. 202-210.
View/Download from: UTS OPUS or Publisher's site
Computing similarity between short microblogs is an important step in microblog recommendation. In this chapter, the authors utilize three kinds of approachestraditional term-based approach, WordNet-based semantic approach, and topic-based approachto compute similarities between micro-blogs and recommend top related ones to users. They conduct experimental study on the effectiveness of the three approaches in terms of precision. The results show that WordNet-based semantic similarity approach has a relatively higher precision than that of the traditional term-based approach, and the topic-based approach works poorest with 548 tweets as the dataset. In addition, the authors calculated the Kendall tau distance between two lists generated by any two approaches from WordNet, term, and topic approaches. Its average of all the 548 pair lists tells us the WordNet-based and term-based approach have generally high agreement in the ranking of related tweets, while the topic-based approach has a relatively high disaccord in the ranking of related tweets with the WordNet-based approach.
Xu, G., Gu, Y. & Yi, X. 2013, 'On Group Extraction and Fusion for Tag-Based Social Recommendation' in Xu, G. & Li, L. (eds), Social Media Mining and Social Network Analysis: Emerging Research, IGI Global, Hershey, USA, pp. 211-223.
View/Download from: UTS OPUS or Publisher's site
With the recent information explosion, social websites have become popular in many Web 2.0 applications where social annotation services allow users to annotate various resources with freely chosen words, i.e., tags, which can facilitate users finding preferred resources. However, obtaining the proper relationship among user, resource, and tag is still a challenge in social annotation-based recommendation researches. In this chapter, the authors aim to utilize the affinity relationship between tags and resources and between tags and users to extract group information. The key idea is to obtain the implicit relationship groups among users, resources, and tags and then fuse them to generate recommendation. The authors experimentally demonstrate that their strategy outperforms the state-of-the-art algorithms that fail to consider the latent relationships among tagging data.
Zong, Y. & Xu, G. 2013, 'Clustering Algorithms for Tags' in Xu, G. & Li, L. (eds), Social Media Mining and Social Network Analysis: Emerging Research, IGI Global, Hershey, USA, pp. 39-53.
View/Download from: UTS OPUS or Publisher's site
With the development and application of social media, more and more user-generated contents are created. Tag data, a kind of typical user generated content, has attracted lots of interests of researchers. In general, tags are the freely chosen textual descriptions by users to label digital data sources in social tagging systems. Poor retrieval performance remains a major problem of most social tagging systems resulting from the severe difficulty of ambiguity, redundancy, and less semantic nature of tags. Clustering method is a useful tool to increase the ability of information retrieval in the aforementioned systems. In this chapter, the authors (1) review the background of state-of-the-art tagging clustering and the tag data description, (2) present five kinds of tag similarity measurements proposed by researchers, and (3) finally propose a new clustering algorithm for tags based on local information that is derived from Kernel function. This chapter aims to benefit both academic and industry communities who are interested in the techniques and applications of tagging clustering
Xu, G. 2010, 'Building User Communities of Interests by Using Latent Semantic Analysis' in Collaborative Search and Communities of Interest: Trends in Knowledge Sharing and Assessment, IGI Global, pp. 38-68.
View/Download from: UTS OPUS or Publisher's site
Zhang, Y. & Xu, G. 2009, 'Singular Value Decomposition' in Encyclopedia of Database Systems, pp. 2657-2658.
View/Download from: UTS OPUS or Publisher's site

Conferences

Chen, Y., Li, X., Li, L., Liu, G. & Xu, G. 2016, 'Modeling user mobility via user psychological and geographical behaviors towards point of-interest recommendation', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 364-380.
View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing Switzerland 2016. The pervasive employments of Location-based Social Network call for precise and personalized Point-of-Interest (POI) recommendation to predict which places the users prefer. Modeling user mobility, as an important component of understanding user preference, plays an essential role in POI recommendation. However, existing methods mainly model user mobility through analyzing the check-in data and formulating a distribution without considering why a user checks in at a specific place from psychological perspective. In this paper, we propose a POI recommendation algorithm modeling user mobility by considering check-in data and geographical information. Specifically, with check-in data, we propose a novel probabilistic latent factor model to formulate user psychological behavior from the perspective of utility theory, which could help reveal the inner information underlying the comparative choice behaviors of users. Geographical behavior of all the historical check-ins captured by a power law distribution is then combined with probabilistic latent factor model to form the POI recommendation algorithm. Extensive evaluation experiments conducted on two real-world datasets confirm the superiority of our approach over state-of-the-art methods.
Hazber, M.A.G., Li, R., Zhang, Y. & Xu, G. 2016, 'An approach for mapping relational database into ontology', Proceedings - 2015 12th Web Information System and Application Conference, WISA 2015, pp. 120-125.
View/Download from: UTS OPUS or Publisher's site
© 2015 IEEE. Sharing and reusing the big data in relational databases in a semantic way have become a big challenge. In this paper, we propose a new approach to enable semantic web applications to access relational databases (RDBs) and their contents by semantic methods. Domain ontologies can be used to formulate RDB schema and data in order to simplify the mapping of the underlying data sources. Our method consists of two main phases: building ontology from an RDB schema and the generation of ontology instances from an RDB data automatically. In the first phase, we studied different cases of RDB schema to be mapped into ontology represented in RDF(S)-OWL, while in the second phase, the mapping rules are used to transform RDB data to ontological instances represented in RDF triples. Our approach is demonstrated with examples and validated by ontology validator.
Xu, G. 2016, 'Improving Music Recommendation Using Distributed Representation', World Wide Web 2016 Conference, ACM, Montreal, Canada, pp. 125-126.
View/Download from: UTS OPUS
Hazber, M.A.G., Li, R., Gu, X., Xu, G. & Li, Y. 2016, 'Semantic SPARQL Query in a Relational Database Based on Ontology Construction', Proceedings - 2015 11th International Conference on Semantics, Knowledge and Grids, SKG 2015, pp. 25-32.
View/Download from: Publisher's site
© 2015 IEEE.Constructing an ontology from RDBs and its query through ontologies is a fundamental problem for the development of the semantic web. This paper proposes an approach to extract ontology directly from RDB in the form of OWL/RDF triples, to ensure its availability at semantic web. We automatically construct an OWL ontology from RDB schema using direct mapping rules. The mapping rules provide the basic rules for generating RDF triples from RDB data even for column contents null value, and enable semantic query engines to answer more relevant queries. Then we rewriting SPARQL query from SQL by translating SQL relational algebra into an equivalent SPARQL. The proposed method is demonstrated with examples and the effectiveness of the proposed approach is evaluated by experimental results.
Wang, D., Deng, S., Zhang, X. & Xu, G. 2016, 'Learning music embedding with metadata for context aware recommendation', ICMR 2016 - Proceedings of the 2016 ACM International Conference on Multimedia Retrieval, pp. 249-253.
View/Download from: UTS OPUS or Publisher's site
© 2016 ACM.Contextual factors can benefit music recommendation and retrieval tasks remarkably. However, how to acquire and utilize the contextual information still need to be studied. In this paper, we propose a context aware music recommendation approach, which can recommend music appropriate for users' contextual preference for music. In analogy to matrix factorization methods for collaborative filtering, the proposed approach does not require songs to be described by features beforehand, but it learns music pieces' embeddings (vectors in low-dimensional continuous space) from music playing records and corresponding metadata and infer users' general and contextual preference for music from their playing records with the learned embedding. Then, our approach can recommend appropriate music pieces. Experimental evaluations on a real world dataset show that the proposed approach outperforms baseline methods.
Hazber, M.A.G., Li, R., Xu, G. & Alalayah, K.M. 2016, 'An approach for automatically generating R2RML-based direct mapping from relational databases', Communications in Computer and Information Science, pp. 151-169.
View/Download from: Publisher's site
© Springer Science+Business Media Singapore 2016.For integrating relational databases (RDBs) into semantic web applications, the W3C RDB2RDF Working Group recommended two approaches, Direct Mapping (DM) and R2RML. The DM provides a set of mapping rules according to RDB schema, while the R2RML allows users to manually define mappings according to existing target ontology. The major problem to use R2RML is the effort for creating R2RML mapping documents manually. This may lead to appearance of many mistakes in the R2RML documents and requires domain experts. In this paper, we propose and implement an approach to generate an R2RML mapping documents automatically from RDB schema. The R2RML mapping reflects the behavior of the DM specification and allows any R2RML parser to generate a set of RDF triples from relational data. The input of generating approach is DBsInfo class that automatically generated from relational schema. An experimental prototype is developed and shows the effectiveness of our approach algorithms.
Li, F., Xu, G. & Cao, L. 2015, 'Coupled Matrix Factorization within Non-IID Context', Proceedings, Part II, 19th Pacific-Asia Conference, PAKDD 2015, PAKDD 2015, Springer, Ho Chi Minh City, Vietnam, pp. 707-719.
View/Download from: UTS OPUS or Publisher's site
Recommender systems research has experienced different stages such as from user preference understanding to content analysis. Typical recommendation algorithms were built on the following bases: (1) assuming users and items are IID, namely independent and identically distributed, and (2) focusing on specific aspects such as user preferences or contents. In reality, complex recommendation tasks involve and request (1) personalized outcomes to tailor heterogeneous subjective preferences; and (2) explicit and implicit objective coupling relationships between users, items, and ratings to be considered as intrinsic forces driving preferences. This inevitably involves the non-IID complexity and the need of combining subjective preference with objective couplings hidden in recommendation applications. In this paper, we propose a novel generic coupled matrix factorization (CMF) model by incorporating non-IID coupling relations between users and items. Such couplings integrate the intra-coupled interactions within an attribute and inter-coupled interactions among different attributes. Experimental results on two open data sets demonstrate that the user/item couplings can be effectively applied in RS and CMF outperforms the benchmark methods.
Chinchore, A., Jiang, F. & Xu, G. 2015, 'Intelligent Sybil attack detection on abnormal connectivity behavior in mobile social networks', Knowledge Management in Organizations - Lecture Notes in Business Information Processing, Knowledge Management in Organizations, Springer, Maribor, Slovenia, pp. 602-617.
View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing Switzerland 2015. There have been a large number of researches on mobile networks in the literature, focusing on a variety of secured applications over the network, including the use of their connections, fake identification and attacks on social group. These applications are created for the intention to collect confidential information, money laundering, blackmailing and to perform other crime activity. The purpose of this research is to identify the behavior of the honest node (network account) and fake node (network account) on mobile social network. In this research, the behavior survey of these nodes is carried out and further analysed with the help of graph-based Sybil detection system. This paper particularly studies Sybil attacks and its defense system for IoT (Internet-of-Things) environment. To be implied, the identification of each forged Sybil node is to be tracked on the basis of nodes connectivity and their timing of connectivity as well as frequency among each other. Sybil node has a forged identity in different locations and also reports its virtual location information to servers.
Liu, L., Chen, S., Hsu, C.H., Xu, G., Zhang, X., Li, L., Su, G., Liu, M., Huang, Z., Zhu, T., Jin, J., Carlson, D., Chen, W., Wang, B., An, N. & Yang, Y. 2015, 'Message from the PUDA 2014 Workshop Chairs', Proceedings - 2014 IEEE International Conference on Ubiquitous Intelligence and Computing, 2014 IEEE International Conference on Autonomic and Trusted Computing, 2014 IEEE International Conference on Scalable Computing and Communications and Associated Symposia/Workshops, UIC-ATC-ScalCom 2014, p. xxxvii.
View/Download from: UTS OPUS or Publisher's site
Li, L., Sun, Y., Su, C., Xiong, S. & Xu, G. 2015, 'Hashtag Biased Ranking for Keyword Extraction from Microblog Posts', Knowledge Science, Engineering and Management, The 8th International Conference on Knowledge Science, Engineering and Management, Springer, Chongqing, pp. 348-359.
View/Download from: UTS OPUS or Publisher's site
Nowadays, a huge amount of text is being generated for social networking purpose on the Web. Keyword extraction from such text benefit many applications such as advertising, search, and content filtering. Recent studies show that graph based ranking is more effective than traditional term or document frequecy based approaches. However, most work in the literature constructs word to word graph within a document or a collection of documents before applying a kind of random walk. Such a graph does not consider the influence of document importance on keyword extraction. Moreover, social text like a microblog post usually has speical social features such as hashtag and so on, which can help us understand its topic. In this paper, we propose hashtag biased ranking for keyword extraction from a collection of microblog posts. We first build a word-post weighted graph by taking into account the posts themselves. Then, a hashtag biased random walk is applied on this graph, which guides our approach to extract keywords according to the hashtag topic. Last, the final ranking of a word is determined by the stationary probability after a number of interations. We evaluate our proposed method on a real Chinese microblog posts. Experiments show that our method is more effective than the traditional word to word graph based ranking in terms of precision.
Fu, B., Xu, G., Cao, L., Wang, Z. & Wu, Z. 2015, 'Coupling multiple views of relations for recommendation', Advances in Knowledge Discovery and Data Mining - LNCS, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Ho Chi Minh City, Vietnam, pp. 732-743.
View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing Switzerland 2015. Learning user/item relation is a key issue in recommender system, and existing methods mostly measure the user/item relation from one particular aspect, e.g., historical ratings, etc. However, the relations between users/items could be influenced by multifaceted factors, so any single type of measure could get only a partial view of them. Thus it is more advisable to integrate measures from different aspects to estimate the underlying user/item relation. Furthermore, the estimation of underlying user/item relation should be optimal for current task. To this end, we propose a novel model to couple multiple relations measured on different aspects, and determine the optimal user/item relations via learning the optimal way of integrating these relation measures. Specifically, matrix factorization model is extended in this paper by considering the relations between latent factors of different users/items. Experiments are conducted and our method shows good performance and outperforms other baseline methods.
Qi, L., Huang, Y., Li, L. & Xu, G. 2015, 'Learning to rank domain experts in microblogging by combining text and non-text features', Proceedings of Behavioral, Economic and Socio-cultural Computing (BESC), 2015 International Conference on, 2015 International Conference on Behavioral, Economic and Socio-cultural Computing (BESC),, IEEE, Nanjing, China, pp. 28-31.
View/Download from: UTS OPUS or Publisher's site
Currently microblog search engines have the function to find related users according to input topic keywords. Traditional approaches rank users by their authentication information or their self descriptions (introductions or labels).However, many users may not publish the posts closely related to their certification profile. In this paper, we study the problem of identifying domain-dependent influential users (or topic experts). We propose to fuse of non-text features and text features to analysis the influence of the users. In addition we compare three kinds of sorting methods, i.e., order-based rank aggregation, greedy selection based rank aggregation, SVM Rank method. Our experimental results show that the highest precision is achieved by SVM rank method.
Medvediev, K., Xu, G., Berkovsky, S. & Onikienko, Y. 2015, 'An analysis of new visitors' website behaviour before & after TV advertising', Proceedings of the 2015 International Conference on Behavioral, Economic and Socio-cultural Computing (BESC),, 2015 International Conference on Behavioral, Economic and Socio-cultural Computing (BESC),, IEEE, Nanjing, China, pp. 109-115.
View/Download from: UTS OPUS or Publisher's site
This paper explores and analyses the actions of users on an e-commerce website after they have watched TV-advertising. The analysis considers factors such as month, day and time of the website visit. This article utilises visualization tools for the analysis of the frequency ratios (probabilities) of searches, conversions, bookings made by new visitors on the website.
Cuzzocrea, A., Moussa, R., Xu, G. & Grasso, G.M. 2015, 'Cloud-Based OLAP over Big Data: Application Scenarios and Performance Analysis', Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, IEEE, Shen Zhen, pp. 921-927.
View/Download from: UTS OPUS or Publisher's site
Following our previous research results, in this paper we provide two authoritative application scenarios that build on top of OLAP*, a middleware for parallel processing of OLAP queries that truly realizes effective and efficiently OLAP over Big Data. We have provided two authoritative case studies, namely parallel OLAP data cube processing and virtual OLAP data cube design, for which we also propose a comprehensive performance evaluation and analysis. Derived analysis clearly confirms the benefits of our proposed framework.
Li, X., Xu, G., Chen, E. & Li, L. 2015, 'MARS: A multi-aspect Recommender system for Point-of-Interest', Proceedings of the 2015 IEEE 31st International Conference on Data Engineering (ICDE), IEEE International Conference on Data Engineering (ICDE), IEEE, Seoul, Korea, pp. 1436-1439.
View/Download from: UTS OPUS or Publisher's site
With the pervasive use of GPS-enabled smart phones, location-based services, e.g., Location Based Social Networking (LBSN) have emerged . Point-of-Interests (POIs) Recommendation, as a typical component in LBSN, provides additional values to both customers and merchants in terms of user experience and business turnover. Existing POI recommendation systems mainly adopt Collaborative Filtering (CF), which only exploits user given ratings (i.e., user overall evaluation) about a merchant while regardless of the user preference difference across multiple aspects, which exists commonly in real scenarios. Meanwhile, besides ratings, most LBSNs also provide the review function to allow customers to give their opinions when dealing with merchants, which is often overlooked in these recommender systems. In this demo, we present MARS, a novel POI recommender system based on multi-aspect user preference learning from reviews by using utility theory. We first introduce the organization of our system, and then show how the user preferences across multiple aspects are integrated into our system alongside several case studies of mining user preference and POI recommendations.
Li, X., Xu, G., Chen, E. & Li, L. 2015, 'Learning User Preferences across Multiple Aspects for Merchant Recommendation', Proceedings of the 2015 IEEE International Conference on Data Mining, IEEE International Conference on Data Mining (ICDM), IEEE, Atlantic City, NJ, pp. 865-870.
View/Download from: UTS OPUS or Publisher's site
With the pervasive use of mobile devices, Location Based Social Networks (LBSNs) have emerged in past years. These LBSNs, allowing their users to share personal experiences and opinions on visited merchants, have very rich and useful information which enables a new breed of location-based services, namely, Merchant Recommendation. Existing techniques for merchant recommendation simply treat each merchant as an item and apply conventional recommendation algorithms, e.g., Collaborative Filtering, to recommend merchants to a target user. However, they do not differentiate the user's real preferences on various aspects, and thus can only achieve limited success. In this paper, we aim to address this problem by utilizing and analyzing user reviews to discover user preferences in different aspects. Following the intuition that a user rating represents a personalized rational choice, we propose a novel utility-based approach by combining collaborative and individual views to estimate user preference (i.e., rating). An optimization algorithm based on a Gaussian model is developed to train our merchant recommendation approach. Lastly we evaluate the proposed approach in terms of effectiveness, efficiency and cold-start using two real-world datasets. The experimental results show that our approach outperforms the state-of-the-art methods. Meanwhile, a real mobile application is implemented to demonstrate the practicability of our method.
Chen, X., Liu, L., Luo, D., Xu, G., Lu, Y., Liu, M. & Gao, R. 2013, 'A Spectral Clustering Algorithm Based on Hierarchical Method', Agents and Data Mining Interaction - 9th International Workshop, ADMI 2013, Springer, Saint Paul, MN, USA, pp. 111-123.
View/Download from: UTS OPUS or Publisher's site
Wu, L., Xiong, H., Du, L., Liu, B., Xu, G., Ge, Y., Fu, Y., Zhou, Y. & Li, J. 2014, 'Heterogeneous Metric Learning with Content-based Regularization for Software Artifact Retrieval', Proceedings of the IEEE International Conference on Data Mining, 2014 IEEE International Conference on Data Mining, IEEE, Piscataway, USA, pp. 610-619.
View/Download from: UTS OPUS or Publisher's site
The problem of software artifact retrieval has the goal to effectively locate software artifacts, such as a piece of source code, in a large code repository. This problem has been traditionally addressed through the textual query. In other words, information retrieval techniques will be exploited based on the textual similarity between queries and textual representation of software artifacts, which is generated by collecting words from comments, identifiers, and descriptions of programs. However, in addition to these semantic information, there are rich information embedded in source codes themselves. These source codes, if analyzed properly, can be a rich source for enhancing the efforts of software artifact retrieval. To this end, in this paper, we develop a feature extraction method on source codes. Specifically, this method can capture both the inherent information in the source codes and the semantic information hidden in the comments, descriptions, and identifiers of the source codes. Moreover, we design a heterogeneous metric learning approach, which allows to integrate code features and text features into the same latent semantic space. This, in turn, can help to measure the artifact similarity by exploiting the joint power of both code and text features. Finally, extensive experiments on real-world data show that the proposed method can help to improve the performances of software artifact retrieval with a significant margin.
Li, F., Xu, G. & Cao, L. 2014, 'Coupled Item-Based Matrix Factorization', Proceedings, Part I of the Web Information Systems Engineering - WISE 2014 - 15th International Conference, Web Information Systems Engineering, Springer, Thessaloniki, Greece, pp. 1-14.
View/Download from: UTS OPUS or Publisher's site
The essence of the challenges cold start and sparsity in Recommender Systems (RS) is that the extant techniques, such as Collaborative Filtering (CF) and Matrix Factorization (MF), mainly rely on the user-item rating matrix, which sometimes is not informative enough for predicting recommendations. To solve these challenges, the objective item attributes are incorporated as complementary information. However, most of the existing methods for inferring the relationships between items assume that the attributes are 'independently and identically distributed (iid), which does not always hold in reality. In fact, the attributes are more or less coupled with each other by some implicit relationships. Therefore, in this paper we propose an attribute-based coupled similarity measure to capture the implicit relationships between items. We then integrate the implicit item coupling into MF to form the Coupled Item-based Matrix Factorization (CIMF) model. Experimental results on two open data sets demonstrate that CIMF outperforms the benchmark methods.
Hu, L., Cao, J., Xu, G., Cao, L., Gu, Z. & Cao, W. 2014, 'Deep modeling of group preferences for group-based recommendation', Proceedings of the National Conference on Artificial Intelligence, AI Access Foundation, pp. 1861-1867.
View/Download from: UTS OPUS
Nowadays, most recommender systems (RSs) mainly aim to suggest appropriate items for individuals. Due to the social nature of human beings, group activities have become an integral part of our daily life, thus motivating the study on group RS (GRS). However, most existing methods used by GRS make recommendations through aggregating individual ratings or individual predictive results rather than considering the collective features that govern user choices made within a group. As a result, such methods are heavily sensitive to data, hence they often fail to learn group preferences when the data are slightly inconsistent with predefined aggregation assumptions. To this end, we devise a novel GRS approach which accommodates both individual choices and group decisions in a joint model. More specifically, we propose a deep-architecture model built with collective deep belief networks and dual-wing restricted Boltzmann machines. With such a deep model, we can use high-level features, which are induced from lower-level features, to represent group preference so as to relieve the vulnerability of data. Finally, the experiments conducted on a real-world dataset prove the superiority of our deep model over other state-of-the-art methods.
Liu, N., Li, L., Xu, G. & Yang, Z. 2014, 'Identifying Domain-Dependent Influential Microblog Users: A Post-Feature Based Approach', Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada., pp. 3122-3123.
View/Download from: UTS OPUS
Bu, Z., Wu, Z., Qian, L., Cao, J. & Xu, G. 2014, 'A backbone extraction method with Local Search for complex weighted networks', 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2014, August 17-20, 2014, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Beijing, China, pp. 85-88.
View/Download from: UTS OPUS or Publisher's site
The backbone is the natural abstraction of a complex network, which can help people to understand it in a more simplified form. Backbone extraction becomes more challenging as many networks are evolving into large scale and the weight distributions are spanning several orders of magnitude. Traditional filter-based methods tend to include many outliers into the backbone. What is more, they often suffer from the computational inefficiency-the exhaustive search of all nodes or edges is often prohibitively expensive. In this work, we propose a Local Search based Backbone Extraction Heuristic (LS-BEH) to find the backbone in a complex weighted network. First, a strict filtering rule is carefully designed to determine edges to be preserved or discarded. Second, we present a local search model to examine part of edges in an iterative way. Experimental results on two real-life networks demonstrate the advantage of LS-BEH over the classic disparity filter method by either effectiveness or efficiency validity.
Wang, Z., Luo, T., Xu, G. & Wang, X. 2014, 'The Application of Cartesian-Join of Bloom Filters to Supporting Membership Query of Multidimensional Data', 2014 IEEE International Congress on Big Data, Anchorage, AK, USA, June 27 - July 2, 2014, pp. 288-295.
View/Download from: UTS OPUS or Publisher's site
Cuzzocrea, A. & Xu, G. 2014, 'Towards a Framework for Supporting Web Search of Complex Objects via Multidimensional Paradigms', 2014 14th International Conference on Computational Science and Its Applications, Guimaraes, Portugal, June 30 - July 3, 2014, pp. 217-220.
View/Download from: UTS OPUS or Publisher's site
Li, X., Zhang, L., Luo, P., Chen, E., Xu, G., Zong, Y. & Guan, C. 2014, 'Mining user tasks from print logs', 2014 International Joint Conference on Neural Networks, IJCNN 2014, International Joint Conference on Neural Networks, IEEE, Beijing, China, pp. 1250-1257.
View/Download from: UTS OPUS or Publisher's site
With lots of applications emerging in World Wide Web, many interaction data from users are collected and exploited to discover user behavior or interest patterns. In this paper, we attempt to exploit a new interaction data, namely print logs, where each record is printing URLs selected by a user using a popular web printing tool. Users usually print web contents based on an intention (subtask or task). Apparently, mining common print tasks from print logs is able to capture users' intentions, which undoubtedly benefits many web applications, such as task oriented recommendation and behavior targeting. However, it is not an easy job to perform this due to the difficulty of URL topic representation and task formulation. To this end, we propose a general framework, named UPT (Users Print Tasks mining framework), for mining print tasks from print logs. Specifically, we attempt to leverage delicious (a social book marking web service) as an external thesaurus to expand the expression of each URL by selecting tags associated with the domain of each URL. Then, we construct a tag co-occurrence graph where similar tags can be clustered as subtasks. If we view each subtask as an item, then the print log is transformed to a transaction database, on which an efficient pattern mining algorithm is proposed to induce tasks. Finally, we evaluate the effectiveness of the proposed framework through experiments on a real print log
Cuzzocrea, A. & Xu, G. 2014, 'A Novel Heuristic Scheme for Modeling and Managing Time Bound Constraints in Data-Intensive Grid and Cloud Infrastructures', Proceedings of On the Move to Meaningful Internet Systems: OTM 2014 Workshops - Confederated International Workshops: OTM Academy, OTM Industry Case Studies Program, C&TC, EI2N, INBAST, ISDE, META4eS, MSC and OnToContent 2014,, On the Move to Meaningful Internet Systems, Springer Verlag, Amantea, Italy, pp. 172-191.
View/Download from: UTS OPUS or Publisher's site
Inspired by the emerging Cloud Computing challenge, in this paper we provide a comprehensive framework for modeling and managing time bound constraints in data-intensive Grid and Cloud infrastructures, along with its experimental assessment and analysis. We provide both conceptual and theoretical contributions of the proposed framework, along with a heuristic scheme, called RGDTExec, that solves all possible instances of the problem underlying the proposed framework by exploiting a suitable greedy algorithm, called RGDTExecRun. As we demonstrate throughout the paper, the framework keeps several aspects of research innovations that are beneficial in a wide range of application scenarios.
Xu, G. 2014, '2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2014, Beijing, China, August 17-20, 2014', IEEE.
View/Download from: UTS OPUS
Gu, Q., Zhang, Y., Cao, J., Xu, G. & Cuzzocrea, A. 2014, 'A confidence-based entity resolution approach with incomplete information', DSAA 2014 - Proceedings of the 2014 IEEE International Conference on Data Science and Advanced Analytics, Institute of Electrical and Electronics Engineers Inc., pp. 97-103.
View/Download from: UTS OPUS or Publisher's site
Entity resolution identifies entities from different data sources that refer to the same real-world entity and it is an important prerequisite for integrating data from multiple sources. Entity resolution mainly relies on similarity measures on data records. Unfortunately, the data quality of data sources is not so good in practice. Especially web data sources often only provide incomplete information, which leads to the difficulties of direct applying similarity measures to identify the same entities. In order to address this problem, the concept of confidence is introduced to measure the trustworthy of the similarity calculation. An adaptive rule-based approach is used to calculate the similarity between records and its confidence is also derived. Then the similarity and confidence are propagated on the entity relational graph until fix point is reached. Finally, any pair of two records can be determined as matched or unmatched based on a threshold. We performed a series of experiments on real data sets and experiment results show that our approach has a better performance comparing with others.
Hu, L., Cao, W., Cao, J., Xu, G., Cao, L. & Gu, Z. 2014, 'Bayesian Heteroskedastic Choice Modeling on Non-identically Distributed Linkages.', Proceedings of the 2014 IEEE International Conference on Data Mining, 2014 IEEE International Conference on Data Mining, IEEE, Shenzhen, China, pp. 851-856.
View/Download from: UTS OPUS or Publisher's site
Choice modeling (CM) aims to describe and predict choices according to attributes of subjects and options. If we presume each choice making as the formation of link between subjects and options, immediately CM can be bridged to link analysis and prediction (LAP) problem. However, such a mapping is often not trivial and straightforward. In LAP problems, the only available observations are links among objects but their attributes are often inaccessible. Therefore, we extend CM into a latent feature space to avoid the need of explicit attributes. Moreover, LAP is usually based on binary linkage assumption that models observed links as positive instances and unobserved links as negative instances. Instead, we use a weaker assumption that treats unobserved links as pseudo negative instances. Furthermore, most subjects or options may be quite heterogeneous due to the long-tail distribution, which is failed to capture by conventional LAP approaches. To address above challenges, we propose a Bayesian heteroskedastic choice model to represent the non-identically distributed linkages in the LAP problems. Finally, the empirical evaluation on real-world datasets proves the superiority of our approach
You, Y., Xu, G., Cao, J., Zhang, Y. & Huang, G. 2013, 'Leveraging visual features and hierarchical dependencies for conference information extraction', Lecture Notes in Computer Science, 15th Asia-Pacific Web Conference, APWeb 2013, Springer, Sydney, pp. 404-416.
View/Download from: UTS OPUS or Publisher's site
Traditional information extraction methods mainly rely on visual feature assisted techniques; but without considering the hierarchical dependencies within the paragraph structure, some important information is missing. This paper proposes an integrated a
Li, F., Xu, G., Cao, L., Fan, X. & Niu, Z. 2013, 'CGMF: Coupled Group-Based Matrix Factorization for Recommender System', Lecture Notes in Computer Science, 14th International Conference of Web Information Systems Engineering – WISE 2013, Springer, Nanjing, China, pp. 289-298.
View/Download from: UTS OPUS or Publisher's site
With the advent of social influence, social recommender systems have become an active research topic for making recommendations based on the ratings of the users that have close social relations with the given user. The underlying assumption is that a users taste is similar to his/her friends in social networking. In fact, users enjoy different groups of items with different preferences. A user may be treated as trustful by his/her friends more on some specific rather than all groups. Unfortunately, most of the extant social recommender systems are not able to differentiate users social influence in different groups, resulting in the unsatisfactory recommendation results. Moreover, most extant systems mainly rely on social relations, but overlook the influence of relations between items. In this paper, we propose an innovative coupled group-based matrix factorization model for recommender system by leveraging the user and item groups learned by topic modeling and incorporating couplings between users and items and within users and items. Experiments conducted on publicly available data sets demonstrate the effectiveness of our approach.
Hu, L., Cao, J., Xu, G., Wang, J., Gu, Z. & Cao, L. 2013, 'Cross-Domain Collaborative Filtering via Bilinear Multilevel Analysis', Proceedings of the 23rd International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, IJCAI/AAAI, Beijing, China, pp. 2626-2632.
View/Download from: UTS OPUS
Cross-domain collaborative filtering (CDCF), which aims to leverage data from multiple domains to relieve the data sparsity issue, is becoming an emerging research topic in recent years. However, current CDCF methods that mainly consider user and item factors but largely neglect the heterogeneity of domains may lead to improper knowledge transfer issues. To address this problem, we propose a novel CDCF model, the Bilinear Multilevel Analysis (BLMA), which seamlessly introduces multilevel analysis theory to the most successful collaborative filtering method, matrix factorization (MF). Specifically, we employ BLMA to more efficiently address the determinants of ratings from a hierarchical view by jointly considering domain, community, and user effects so as to overcome the issues caused by traditional MF approaches. Moreover, a parallel Gibbs sampler is provided to learn these effects. Finally, experiments conducted on a real-world dataset demonstrate the superiority of the BLMA over other state-of-the-art methods.
Fu, B., Xu, G., Wang, Z. & Cao, L. 2013, 'Leveraging Supervised Label Dependency Propagation for Multi-label Learning', 2013 IEEE 13th International Conference on Data Mining, International Conference on Data Mining, IEEE, Dallas, TX, USA, pp. 1061-1066.
View/Download from: UTS OPUS or Publisher's site
Exploiting label dependency is a key challenge in multi-label learning, and current methods solve this problem mainly by training models on the combination of related labels and original features. However, label dependency cannot be exploited dynamically and mutually in this way. Therefore, we propose a novel paradigm of leveraging label dependency in an iterative way. Specifically, each label's prediction will be updated and also propagated to other labels via an random walk with restart process. Meanwhile, the label propagation is implemented as a supervised learning procedure via optimizing a loss function, thus more appropriate label dependency can be learned. Extensive experiments are conducted, and the results demonstrate that our method can achieve considerable improvements in terms of several evaluation metrics.
Wang, Z., Luo, T., Xu, G. & Wang, X. 2013, 'A New Indexing Technique for Supporting By-attribute Membership Query of Multidimensional Data', Web-Age Information Management - WAIM 2013 International Workshops: HardBD, MDSP, BigEM, TMSN, LQPM, BDMS, Beidaihe, China, June 14-16, 2013. Proceedings, pp. 266-277.
View/Download from: UTS OPUS or Publisher's site
Hu, L., Cao, J., Xu, G., Cao, L., Gu, Z. & Zhu, C. 2013, 'Personalized recommendation via cross-domain triadic factorization', Proceedings of the 22nd international conference on World Wide Web WWW'13, International World Wide Web Conference, ACM, Rio de Janeiro, Brazil, pp. 595-606.
View/Download from: UTS OPUS or Publisher's site
Collaborative filtering (CF) is a major technique in recommender systems to help users find their potentially desired items. Since the data sparsity problem is quite commonly encountered in real-world scenarios, Cross-Domain Collaborative Filtering (CDCF) hence is becoming an emerging research topic in recent years. However, due to the lack of sufficient dense explicit feedbacks and even no feedback available in users' uninvolved domains, current CDCF approaches may not perform satisfactorily in user preference prediction. In this paper, we propose a generalized Cross Domain Triadic Factorization (CDTF) model over the triadic relation user-item-domain, which can better capture the interactions between domain-specific user factors and item factors. In particular, we devise two CDTF algorithms to leverage user explicit and implicit feedbacks respectively, along with a genetic algorithm based weight parameters tuning algorithm to trade off influence among domains optimally. Finally, we conduct experiments to evaluate our models and compare with other state-of-the-art models by using two real world datasets. The results show the superiority of our models against other comparative models
Li, X., Zhang, L., Chen, E., Zong, Y. & Xu, G. 2013, 'Mining Frequent Patterns in Print Logs with Semantically Alternative Labels', Lecture Notes in Computer Science, 9th International Conference, ADMA 2013, Springer Berlin / Heidelberg, Hangzhou, pp. 107-119.
View/Download from: UTS OPUS or Publisher's site
It is common today for users to print the informative information from webpages due to the popularity of printers and internet. Thus, many web printing tools such as Smart Print and PrintUI are developed for online printing. In order to improve the users printing experience, the interaction data between users and these tools are collected to form a so-called print log data, where each record is the set of urls selected for printing by a user within a certain period of time. Apparently, mining frequent patterns from these print log data can capture user intentions for other applications, such as printing recommendation and behavior targeting. However, mining frequent patterns by directly using url as item representation in print log data faces two challenges: data sparsity and pattern interpretability. To tackle these challenges, we attempt to leverage delicious api (a social bookmarking web service) as an external thesaurus to expand the semantics of each url by selecting tags associated with the domain of each url. In this setting, the frequent pattern mining is employed on the tag representation of each url rather than the url or domain representation. With the enhancement of semantically alternative tag representation, the semantics of url is substantially improved, thus yielding the useful frequent patterns. To this end, in this paper we propose a novel pattern mining problem, namely mining frequent patterns with semantically alternative labels, and propose an efficient algorithm named PaSAL (Frequent Patterns with Semantically Alternative Labels Mining Algorithm) for this problem. Specifically, we propose a new constraint named conflict matrix to purify the redundant patterns to achieve a high efficiency. Finally, we evaluate the proposed algorithm on a real print log data.
Wu, L., Chin, A., Xu, G., Du, L., Wang, X., Meng, K., Guo, Y. & Zhou, Y. 2013, 'Who Will Follow Your Shop? Exploiting Multiple Information Sources in Finding Followers', Lecture Notes in Computer Science, DASFAA 2013, Springer Berlin / Heidelberg, Wuhan, pp. 401-415.
View/Download from: UTS OPUS or Publisher's site
WuXianGouXiang is an O2O(offline to online and vice versa)-based mobile application that recommends the nearby coupons and deals for users, by which users can also follow the shops they are interested in. If the potential followers of a shop can be discovered, the merchants targeted advertising can be more effective and the recommendations for users will also be improved. In this paper, we propose to predict the link relations between users and shops based on the following behavior. In order to better model the characteristics of the shops, we first adopt Topic Modeling to analyze the semantics of their descriptions and then propose a novel approach, named INtent Induced Topic Search (INITS) to update the hidden topics of the shops with and without a description. In addition, we leverage the user logs and search engine results to get the similarity between users and shops. Then we adopt the latent factor model to calculate the similarity between users and shops, in which we use the multiple information sources to regularize the factorization. The experimental results demonstrate that the proposed approach is effective for detecting followers of the shops and the INITS model is useful for shop topic inference.
Cuzzocrea, A., Moussa, R. & Xu, G. 2013, 'OLAP*: Effectively and Efficiently Supporting Parallel OLAP over Big Data', Lecture Notes in Computer Science, MEDI 2013, Springer Berlin / Heidelberg, Amantea, Italy, pp. 38-49.
View/Download from: UTS OPUS or Publisher's site
In this paper, we investigate solutions relying on data partitioning schemes for parallel building of OLAP data cubes, suitable to novel Big Data environments, and we propose the framework OLAP*, along with the associated benchmark TPC-H*d, a suitable transformation of the well-known data warehouse benchmark TPC-H. We demonstrate through performance measurements the efficiency of the proposed framework, developed on top of the ROLAP server Mondrian
Yi, X., Paulet, R., Bertino, E. & Xu, G. 2014, 'Private data warehouse queries', Proceedings of the 18th ACM Symposium on Access Control Models and Technologies, 18th ACM Symposium on Access Control Models and Technologies, ACM, Amsterdam, pp. 25-36.
View/Download from: UTS OPUS or Publisher's site
Publicly accessible data warehouses are an indispensable resource for data analysis. But they also pose a significant risk to the privacy of the clients, since a data warehouse operator may follow the client's queries and infer what the client is interested in. Private Information Retrieval (PIR) techniques allow the client to retrieve a cell from a data warehouse without revealing to the operator which cell is retrieved. However, PIR cannot be used to hide OLAP operations performed by the client, which may disclose the client's interest. This paper presents a solution for private data warehouse queries on the basis of the Boneh-Goh-Nissim cryptosystem which allows one to evaluate any multi-variate polynomial of total degree 2 on ciphertexts. By our solution, the client can perform OLAP operations on the data warehouse and retrieve one (or more) cell without revealing any information about which cell is selected. Furthermore, our solution supports some types of statistical analysis on data warehouse, such as regression and variance analysis, without revealing the client's interest. Our solution ensures both the server's security and the client's security.
Wu, Z., Yin, W., Cao, J., Xu, G. & Cuzzocrea, A. 2013, 'Community Detection in Multi-relational Social Networks', Lecture Notes in Computer Science, 14th International Conference of Web Information Systems Engineering – WISE 2013, Springer Berlin / Heidelberg, Nanjing, pp. 43-56.
View/Download from: UTS OPUS or Publisher's site
Multi-relational networks are ubiquitous in many fields such as bibliography, twitter, and healthcare. There have been many studies in the literature targeting at discovering communities from social networks. However, most of them have focused on single-relational networks. A hint of methods detected communities from multi-relational networks by converting them to single-relational networks first. Nevertheless, they commonly assumed different relations were independent from each other, which is obviously unreal to real-life cases. In this paper, we attempt to address this challenge by introducing a novel co-ranking framework, named MutuRank. It makes full use of the mutual influence between relations and actors to transform the multi-relational network to the single-relational network. We then present GMM-NK (Gaussian Mixture Model with Neighbor Knowledge) based on local consistency principle to enhance the performance of spectral clustering process in discovering overlapping communities. Experimental results on both synthetic and real-world data demonstrate the effectiveness of the proposed method.
Xu, G. 2013, 'Advances in Knowledge Discovery and Data Mining, 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, April 14-17, 2013, Proceedings, Part II', Springer.
View/Download from: UTS OPUS or Publisher's site
Liu, L., Fan, D., Liu, M. & Xu, G. 2012, 'A MapReduce-Based Parallel Clustering Algorithm for Large Protein-Protein Interaction Networks', Lecture Notes in Computer Science, 8th International Conference, ADMA 2012, Springer, Nanjing, China, pp. 138-148.
View/Download from: UTS OPUS or Publisher's site
Clustering proteins or identifying functionally related proteins in Protein-Protein Interaction (PPI) networks is one of the most computation-intensive problems in the proteomic community. Most researches focused on improving the accuracy of the clustering algorithms. However, the high computation cost of these clustering algorithms, such as Girvan and Newmans clustering algorithm, has been an obstacle to their use on large-scale PPI networks. In this paper, we propose an algorithm, called Clustering-MR, to address the problem. Our solution can effectively parallelize the Girvan and Newmans clustering algorithms based on edge-betweeness using Map Reduce. We evaluated the performance of our Clustering-MR algorithm in a cloud environment with different sizes of testing datasets and different numbers of worker nodes. The experimental results show that our Clustering-MR algorithm can achieve high performance for large-scale PPI networks with more than 1000 proteins or 5000 interactions
Shangguan, Q., Hu, L., Cao, J. & Xu, G. 2012, 'Book Recommendation Based On Joint Multi-Relational Model', Second International Conference on Social Computing and Its Applications, IEEE, Xiangtan, China, pp. 523-530.
View/Download from: UTS OPUS
Pan, R., Xu, G., Dolog, P. & Zong, Y. 2012, 'Group Division for Recommendation in Tag-based Systems', 2012 Second International Conference on Cloud and Green Computing, IEEE, Xiangtan, China, pp. 399-404.
View/Download from: UTS OPUS
Zhou, J., Luo, T. & Xu, G. 2012, 'Academic Recommendation on Graph with Dynamic Transfer Chain', 2012 Second International Conference on Cloud and Green Computing, Cloud and Green Computing (CGC), 2012 Second International Conference on, IEEE, Xiangtan, China, pp. 331-336.
View/Download from: UTS OPUS
Chen, X., Li, L., Xu, G., Yang, Z. & Kitsuregawa, M. 2012, 'Recommending Related Microblogs: A Comparison Between Topic and WordNet based Approaches', Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI Press, Toronto, pp. 2417-2418.
View/Download from: UTS OPUS
Fu, B., Wang, Z., Pan, R., Xu, G. & Dolog, P. 2012, 'Learning Tree Structure of Label Dependency for Multi-label Learning', Advances in Knowledge Discovery and Data Mining Lecture Notes in Computer Science, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer Berlin / Heidelberg, Kuala Lumpur, Malaysia, pp. 159-170.
View/Download from: UTS OPUS or Publisher's site
There always exists some kind of label dependency in multi- label data. Learning and utilizing those dependencies could improve the learning performance further. Therefore, an approach for multi-label learning is proposed in this paper, which quantifies the dependencies of pairwise labels firstly, and then builds a tree structure of the labels to describe them. Thus the approach could find out potential strong la- bel dependencies and produce more generalized dependent relationships. The experimental results have validated that compared with other state- of-the-art algorithms, the method is not only a competitive alternative, but also has shown better performance after ensemble learning especially.
Xu, G. & Wu, Z. 2012, 'On Smart and Accurate Contextual Advertising', Lecture Notes in Computer Science, Database Systems for Advanced Applications, Springer Berlin / Heidelberg, Busan, South Korea, pp. 104-104.
View/Download from: UTS OPUS or Publisher's site
Wide Web to attract customers, has become one of the most important marketing channels. As one prevalent type ofWeb advertising, contextual advertising refers to the placement of the most relevant commercial ads into the content of a Web page, so as to increase the number of adclicks. However, some problems such as homonymy and polysemy, low intersection of keywords, and context mismatch, can lead to the selection of irrelevant ads for a generic page, making that the traditional keyword matching techniques generally present a poor accuracy. Furthermore, existing contextual advertising techniques only take into consideration how to select as relevant ads for a generic page as possible, without considering the positional effect of the ad placement in the page. In this paper, we propose a new contextual advertising framework to tackle problems, which (1) usesWikipedia concept and category information to enrich the semantic representation of a page (or a textual ad) and (2) takes the placement position of embedded advertise into account. To accomplish these steps, we first map each page (or ad) into three feature vectors: a keyword vector, a concept vector and a category vector. Second, we determine the relevant ads for a given page based on a similarity measure which combines the above three feature vectors. In dealing with position-wise contextual advertising, the relevant ads are selected based on not only global context relevance but also local context relevance, so that the embedded ads yield contextual relevance to both the whole targeted page and the insertion positions where the ads are placed. We experimentally validate our approach by using a real ads set, a real pages set , and a set of more than 260,000 concepts and 12,000 categories from Wikipedia. The experimental results show that our approach performs better than the simple keyword matching and can improve the precision of ads-selection effectively.
Li, L., Xiao, H. & Xu, G. 2012, 'Finding Related Micro-blogs Based on WordNet', Database Systems for Advanced Applications - 17th International Conference, DASFAA 2012, International Workshops: FlashDB, ITEMS, SNSM, SIM3, DQDI, Busan, South Korea, April 15-19, 2012. Proceedings, pp. 115-122.
View/Download from: UTS OPUS or Publisher's site
Zong, Y., Xu, G., Jin, P., Yi, X., Chen, E. & Wu, Z. 2012, 'A projective clustering algorithm based on significant local dense areas', Proceedings of The 2012 International Joint Conference on Neural Networks (IJCNN), The 2012 International Joint Conference on Neural Networks (IJCNN), IEEE, Brisbane, Australia, pp. 1-8.
View/Download from: UTS OPUS
Hu, L., Cao, J., Xu, G. & Gu, Z. 2012, 'Latent Informative Links Detection', Advances in Knowledge-Based and Intelligent Information and Engineering Systems, 16th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, IOS Press, San Sebastian, Spain, pp. 1233-1242.
View/Download from: UTS OPUS
Fu, B., Wang, Z., Pan, R., Xu, G. & Dolog, P. 2012, 'An Integrated Pruning Criterion for Ensemble Learning Based on Classification Accuracy and Diversity', Proceedings of 7th International Conference on Knowledge Management in Organizations: Service and Cloud Computing, Springer Berlin / Heidelberg, Salamanca, Spain, pp. 47-58.
View/Download from: UTS OPUS
Liu, L., Zhou, Y., Liu, M., Xu, G., Chen, X., Fan, D. & Wang, Q. 2012, 'Preemptive Hadoop Jobs Scheduling under a Deadline', Proceedings of Eighth International Conference on Semantics, Knowledge and Grids, IEEE Computer Society, Beijing, China, pp. 72-79.
View/Download from: UTS OPUS
XU, G. 2012, 'Web Technologies and Applications - 14th Asia-Pacific Web Conference, APWeb 2012, Kunming, China, April 11-13, 2012. Proceedings', Springer.
View/Download from: UTS OPUS or Publisher's site
Xu, G. 2012, '2012 Second International Conference on Cloud and Green Computing, CGC 2012, Xiangtan, Hunan, China, November 1-3, 2012', 012 Second International Conference on Cloud and Green Computing, CGC 2012, IEEE, Xiangtan, Hunan, China,.
Xu, G. 2012, 'Health Information Science - First International Conference, HIS 2012, Beijing, China, April 8-10, 2012. Proceedings', Springer.
View/Download from: UTS OPUS or Publisher's site
Xu, G., Gu, Y., Dolog, P., Zhang, Y. & Kitsuregawa, M. 2011, 'SemRec: A Semantic Enhancement Framework for Tag Based Recommendation.', Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI Publications, Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, California, pp. 1267-1272.
View/Download from: UTS OPUS
Li, L., Xu, G., Yang, Z., Zhang, Y. & Kitsuregawa, M. 2011, 'A Feature-Free Flexible Approach to Topical Classification of Web Queries', Proceedings of Seventh International Conference on Semantics Knowledge and Grid, Semantics Knowledge and Grid (SKG), 2011 Seventh International Conference on, IEEE, Beijing, China, pp. 59-66.
View/Download from: UTS OPUS
Wu, Z., Xu, G., Pan, R., Zhang, Y., Hu, Z. & Lu, J. 2011, 'Leveraging Wikipedia concept and category information to enhance contextual advertising', Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management, ACM, Glasgow, United Kingdom, pp. 2105-2108.
View/Download from: UTS OPUS
Zong, Y., Xu, G., Jin, P., Zhang, Y., Chen, E. & Pan, R. 2011, 'APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags', Advanced Data Mining and Applications, Lecture Notes in Computer Science, 7th International Conference, ADMA 2011, Springer Berlin / Heidelberg, Beijing, China, pp. 175-189.
View/Download from: UTS OPUS or Publisher's site
In social annotation systems, users label digital resources by using tags which are freely chosen textual descriptions. Tags are used to index, anno- tate and retrieve resource as an additional metadata of resource . Poor retrieval performance remains a major problem of most social tagging systems resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficul- ties. Most of the researches on tag cluste ring are directly using traditional clus- tering algorithms such as K-means or Hierarchical Agglomerative Clustering on tagging data, which possess the inherent drawbacks, such as the sensitivity of initialization. In this paper, we instead make use of the approximate backbone of tag clustering results to find out better tag clusters. In particular, we propose an APProximate backbonE-based Clustering algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C 1 ,C 2 ,...,C m } ; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone as the initial tag clustering result and then assign the rest tags into the corresponding clusters based on the similarity. Experimental results on three real world datasets namely MedWorm, MovieLens and Dmoz demonstrate the effectiveness and the superiority of the proposed method against the traditional approaches.
Zong, Y., Xu, G., Jin, P., Dolog, P. & Jiang, S. 2011, 'A Local Information Passing Clustering Algorithm for Tagging Systems', Database Systems for Adanced Applications, Lecture Notes in Computer Science, 16th International Conference, DASFAA 2011, International Workshops: GDB, SIM3, FlashDB, SNSMW, DaMEN, DQIS, Springer Berlin / Heidelberg, Hong Kong, China, pp. 333-343.
View/Download from: UTS OPUS or Publisher's site
Under social tagging systems, a typical Web2.0 application, users label digital data sources by using tags which are freely chosen textual descriptions. Tags are used to index, annotate and retrieve resource as an additional metadata of resource. Poor retrieval performance remains a major problem of most social tagging systems resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to increase the ability of information retrieval in the aforementioned systems. In this paper, we propose a novel clustering algorithm named LIPC (Local Information Passing Clustering algorithm). The main steps of LIPC are: (1) we estimate a KNN neighbor directed graph G of tags and calculate the kernel density of each tag in its neighborhood; (2) we generate local information, local coverage and local kernel of each tag; (3) we pass the local information on G by I and O operators until they are converged and tag priory are generated; (4) we use tag priory to find out the clusters of tags. Experimental results on two real world datasets namely MedWorm and MovieLens demonstrate the efficiency and the superiority of the proposed method.
Xu, G., Zong, Y., Pan, R., Dolog, P. & Jin, P. 2014, 'On Kernel Information Propagation for Tag Clustering in Social Annotation Systems', Knowlege-Based and Intelligent Information and Engineering Systems Lecture Notes in Computer Science, International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Springer Berlin / Heidelberg, Kaiserslautern, Germany, pp. 505-514.
View/Download from: UTS OPUS or Publisher's site
In social annotation systems, users label digital resources by using tags which are freely chosen textual descriptors. Tags are used to index, annotate and retrieve resource as an additional metadata of re- source. Poor retrieval performance remains a major challenge of most social annotation systems resulting from the severe problems of ambigu- ity, redundancy and less semantic nature of tags. Clustering method is a useful approach to handle these problems in the social annotation sys- tems. In this paper, we propose a novel clustering algorithm named kernel information propagation for tag clustering. This approach makes use of the kernel density estimation of the KNN neighbor directed graph as a start to reveal the prestige rank of tags in tagging data. The random walk with restart algorithm is then employed to determine the center points of tag clusters. The main strength of the proposed approach is the capability of partitioning tags from the perspective of tag prestige rank rather than the intuitive similarity calculation itself. Experimental studies on three real world datasets demonstrate the effectiveness and superiority of the proposed method.
Xu, G., Gu, Y., Zhang, Y., Yang, Z. & Kitsuregawa, M. 2011, 'TOAST: A Topic-Oriented Tag-Based Recommender System', Web Information System Engineering WISE 2011 Lecture Notes in Computer Science, International Conference on Web Information Systems Engineering, Springer Berlin / Heidelberg, Sydney, Australia, pp. 158-171.
View/Download from: UTS OPUS or Publisher's site
Social Annotation Systems have emerged as a popular application with the advance of Web 2.0 technologies. Tags generated by users using arbitrary words to express their own opinions and perceptions on various resources provide a new intermediate dimension between users and resources, which deemed to convey the user preference information. Using clustering for topic extraction and incorporating it with the capture of user preference and resource affiliation is becoming an effective practice in tag-based recommender systems. In this paper, we aim to address these challenges via a topic graph approach. We first propose a Topic Oriented Graph (TOG), which models the user preference and resource affiliation on various topics. Based on the graph, we devise a Topic-Oriented Tag-based Recommendation System (TOAST) by using the preference propagation on the graph. We conduct experiments on two real datasets to demonstrate that our approach outperforms other state-of-the-art algorithms.
Hijikata, Y. & Xu, G. 2010, 'Snsmw 2010 Workshop Organizers' Message', Database Systems For Advanced Applications, 15th International Conference on DASFAA 2010, Springer-verlag Berlin, Tsukuba, JAPAN, pp. 239-239.
NA
Zhang, Y., Xu, G., Wang, L. & Bennett, K. 2010, 'A framework of data integration, knowledge management and user behaviour modelling in healthcare applications of diabetes', Proceedings of the Twenty-First Australasian Database Conference, ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies, Australian Computer Society, Brisbane, Australia, pp. 3-4.
View/Download from: UTS OPUS
Xu, G., Zong, Y., Dolog, P. & Zhang, Y. 2014, 'Co-clustering Analysis of Weblogs Using Bipartite Spectral Projection Approach', Knowledge-Based and Intelligent Information and Engineering Systems Lecture Notes in Computer Science, International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Springer Berlin / Heidelberg, Cardiff, UK, pp. 398-407.
View/Download from: UTS OPUS or Publisher's site
Web clustering is an approach for aggregating Web objects into various groups according to underlying relationships among them. Finding co-clusters of Web objects is an interesting topic in the context of Web usage mining, which is able to capture the underlying user navigational interest and content preference simultaneously. In this paper we will present an algorithm using bipartite spectral clustering to cocluster Web users and pages. The usage data of users visiting Web sites is modeled as a bipartite graph and the spectral clustering is then applied to the graph representation of usage data. The proposed approach is evaluated by experiments performed on real datasets, and the impact of using various clustering algorithms is also investigated. Experimental results have demonstrated the employed method can effectively reveal the subset aggregates of Web users and pages which are closely related.
Zong, Y., Xu, G., Dolog, P., Zhang, Y. & Liu, R. 2014, 'Co-clustering for Weblogs in Semantic Space', Web Information Systems Engineering WISE 2010 Lecture Notes in Computer Science, International Conference on Web Information Systems Engineering, Springer Berlin / Heidelberg, Hong Kong, China, pp. 120-127.
View/Download from: UTS OPUS or Publisher's site
Web clustering is an approach for aggregating web objects into various groups according to underlying relationships among them. Finding co-clusters of web objects in semantic space is an interesting topic in the context of web usage mining, which is able to capture the underlying user navigational interest and content preference simultane- ously. In this paper we will present a novel web co-clustering algorithm named Co-Clustering in Semantic space (COCS) to simultaneously par- tition web users and pages via a latent semantic analysis approach. In COCS, we first, train the latent semantic space of weblog data by using Probabilistic Latent Semantic Analysis (PLSA) model, and then, project all weblog data objects into this semantic space with probability distribu- tion to capture the relationship among web pages and web users, at last, propose a clustering algorithm to generate the co-cluster corresponding to each semantic factor in the latent semantic space via probability in- ference. The proposed approach is evaluated by experiments performed on real datasets in terms of precision and recall metrics. Experimental results have demonstrated the proposed method can effectively reveal the co-aggregates of web users and pages which are closely related.
Li, L., Xu, G., Zhang, Y. & Kitsuregawa, M. 2009, 'Enhancing Web Search by Aggregating Results of Related Web Queries', Web Information Systems Engineering - WISE 2009 Lecture Notes in Computer Science, Web Information Systems Engineering - WISE 2009, Springer Berlin / Heidelberg, Poznan, Poland, pp. 203-217.
View/Download from: UTS OPUS or Publisher's site
Currently, commercial search engines have implemented methods to suggest alternative Web queries to users, which helps them specify alternative related queries in pursuit of finding needed Web pages. In this paper, we address the Web search problem on related queries to improve retrieval quality by devising a novel search rank aggregation mechanism. Given an initial query and the suggested related queries, our search system concurrently processes their search result lists from an existing search engine and then forms a single list aggregated by all the retrieved lists. In particular we propose a generic rank aggregation framework which considers not only the number of wins that an item won in a competition, but also the quality of its competitor items in calculat- ing the ranking of Web items. The framework combines the traditional and random walk based rank aggregation methods to produce a more reasonable list to users. Experimental results show that the proposed approach can clearly improve the retrieval quality in a parallel man- ner over the traditional search strategy that serially returns result lists. Moreover, we also empirically investigate how different rank aggregation methods affect the retrieval performance.
Thongkam, J., Xu, G. & Zhang, Y. 2008, 'AdaBoost algorithm with random forests for predicting breast cancer survivability', Proceedings of the International Joint Conference on Neural Networks, 2008 IEEE International Joint Conference on Neural Networks, IEEE, Hong Kong, China, pp. 3062-3069.
View/Download from: UTS OPUS
Xu, G., Zhang, Y. & Yi, X. 2008, 'Modelling User Behaviour for Web Recommendation Using LDA Model', Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology - Workshops, IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08., IEEE, Sydney, NSW, Australia, pp. 529-532.
View/Download from: UTS OPUS or Publisher's site
Thongkam, J., Xu, G., Zhang, Y. & Huang, F. 2008, 'Support Vector Machine for Outlier Detection in Breast Cancer Survivability Prediction', Advanced Web and NetworkTechnologies, and Applications Lecture Notes in Computer Science, Asia Pacific Web Conference, Springer Berlin / Heidelberg, Shenyang, China, pp. 99-109.
View/Download from: UTS OPUS or Publisher's site
Finding and removing misclassified instances are important steps in data mining and machine learning that affect the performance of the data mining algorithm in general. In this paper, we propose a C-Support Vector Classification Filter (C-SVCF) to identify and remove the misclassified instances (outliers) in breast cancer survivability samples collected from Srinagarind hospital in Thai- land, to improve the accuracy of the prediction models. Only instances that are correctly classified by the filter are passed to the learning algorithm. Perform- ance of the proposed technique is measured with accuracy and area under the re- ceiver operating characteristic curve (AUC), as well as compared with several popular ensemble filter approaches including AdaBoost, Bagging and ensemble of SVM with AdaBoost and Bagging filters. Our empirical results indicate that C-SVCF is an effective method for identifying misclassified outliers. This ap- proach significantly benefits ongoing research of developing accurate and robust prediction models for breast cancer survivability.
Zhang, Y. & Xu, G. 2008, 'Using Web Clustering for Web Communities Mining and Analysis', Proceedings of 2008 IEEE / WIC / ACM International Conference on Web Intelligence, IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08., IEEE, Sydney, NSW, Australia.
View/Download from: UTS OPUS
Xu, G. 2008, 'Progress in WWW Research and Development, 10th Asia-Pacific Web Conference, APWeb 2008, Shenyang, China, April 26-28, 2008. Proceedings', 10th Asia-Pacific Web Conference, APWeb 2008, Springer, Shenyang, China.
Xu, G. 2008, 'Advanced Web and NetworkTechnologies, and Applications, APWeb 2008 International Workshops: BIDM, IWHDM, and DeWeb Shenyang, China, April 26-28, 2008. Revised Selected Papers', Advanced Web and NetworkTechnologies, and Applications, APWeb 2008 International Workshops: BIDM, IWHDM, and DeWeb, Springer, Shenyang, China.
View/Download from: UTS OPUS or Publisher's site
Zhang, Y. & Xu, G. 2007, 'On Web Communities Mining and Analysis', Third International Conference on Semantics, Knowledge and Grid, Xian, Shan Xi, China, October 29-31, 2007, Third International Conference on Semantics, Knowledge and Grid,, Shan Xi, China, pp. 20-25.
View/Download from: UTS OPUS or Publisher's site
Xu, G., Zhang, Y. & Begg, R. 2006, 'Mining Gait Pattern For Clinical Locomotion Diagnosis Based On Clustering Techniques', Advanced Data Mining And Applications, Proceedings, 2nd International Conference on Advanced Data Mining and Applications, Springer-verlag Berlin, Xian, PEOPLES R CHINA, pp. 296-307.
View/Download from: UTS OPUS
Scientific gait (walking) analysis provides valuable information about an individual's locomotion function, in turn, to assist clinical diagnosis and prevention, such as assessing treatment for patients with impaired postural control and detecting risk o
Xu, G., Zhang, Y. & Zhou, X. 2006, 'Discovering task-oriented usage pattern for web recommendation', Database Technologies 2006, Proceedings of the 17th Australasian Database Conference, 17th Australasian Database Conference, Australian Computer Society, Hobart, Tasmania, Australia, pp. 167-174.
View/Download from: UTS OPUS
Zhang, Y., Xu, G. & Zhou, X. 2005, 'A Latent Usage Approach for Clustering Web Transaction and Building User Profile', Lecture Notes in Computer Science, First International Conference, ADMA 2005, Springer Berlin / Heidelberg, Wuhan, China.
View/Download from: UTS OPUS
Xu, G., Zhang, Y. & Zhou, X. 2005, 'Towards User Profiling for Web Recommendation', Lecture Notes in Computer Science, 18th Australian Joint Conference on Artificial Intelligence, Springer Berlin / Heidelberg, Sydney, Australia.
View/Download from: UTS OPUS
Xu, G., Zhang, Y., Ma, J. & Zhou, X. 2005, 'Discovering User Access Pattern Based on Probabilistic Latent Factor Model', Database Technologies 2005, Proceedings of the Sixteenth Australasian Database Conference, 16th Australasian Database Conference, Australian Computer Society, Newcastle, Australia, pp. 27-35.
View/Download from: UTS OPUS
Xu, G., Zhang, Y. & Zhou, X. 2005, 'Using Probabilistic Latent Semantic Analysis for Web Page Grouping', 15th International Workshop on Research Issues in Data Engineering (RIDE-SDMA 2005), Stream Data Mining and Applications, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications, IEEE Computer Society, Tokyo, Japan.
View/Download from: UTS OPUS
Xu, G., Zhang, Y. & Zhou, X. 2005, 'A Web Recommendation Technique Based on Probabilistic Latent Semantic Analysis', Lecture Notes in Computer Science, 6th International Conference on Web Information Systems Engineering, Springer Berlin / Heidelberg, New York, NY, USA.
View/Download from: UTS OPUS
Liu, W., Zhu, X., Xu, G., Zhang, Q. & Gao, L. 2005, 'A DNA Based Evolutionary Algorithm for the Minimal Set Cover Problem', Advances in Intelligent Computing, International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part II, International Conference on Intelligent Computing, ICIC 2005, Hefei, China, pp. 80-89.
View/Download from: UTS OPUS or Publisher's site

Journal articles

Li, F., Xu, G. & Cao, L. 2016, 'Two-level matrix factorization for recommender systems', Neural Computing and Applications.
View/Download from: UTS OPUS or Publisher's site
Many existing recommendation methods such as matrix factorization (MF) mainly rely on user–item rating matrix, which sometimes is not informative enough, often suffering from the cold-start problem. To solve this challenge, complementary textual relations between items are incorporated into recommender systems (RS) in this paper. Specifically, we first apply a novel weighted textual matrix factorization (WTMF) approach to compute the semantic similarities between items, then integrate the inferred item semantic relations into MF and propose a two-level matrix factorization (TLMF) model for RS. Experimental results on two open data sets not only demonstrate the superiority of TLMF model over bench-mark methods, but also show the effectiveness of TLMF for solving the cold-start problem.
Xu, G., Fu, B. & Gu, Y. 2016, 'Point-of-Interest Recommendations via a Supervised Random Walk Algorithm', IEEE Intelligent Systems, vol. 31, no. 1, pp. 15-23.
View/Download from: UTS OPUS or Publisher's site
Recently, location-based social networks (LBSNs) such as Foursquare and Whrrl have emerged as a new application for users to establish personal social networks and review various points of interest (POIs), triggering a new recommendation service aimed at helping users locate more preferred POIs. Although users' check-in activities could be explicitly considered as user ratings, in turn being utilized directly for collaborative filtering-based recommendations, such solutions don't differentiate the sentiment of reviews accompanying check-ins, resulting in unsatisfactory recommendations. This article proposes a new POI recommendation framework by simultaneously incorporating user check-ins and reviews along with side information into a tripartite graph and predicting personalized POI recommendations via a sentiment-supervised random walk algorithm. The experiments conducted on real data demonstrate the superiority of this approach in comparison with state-of-the-art techniques.
Yi, X., Paulet, R., Bertino, E. & Xu, G. 2016, 'Private Cell Retrieval from Data Warehouses', IEEE Transactions on Information Forensics and Security, vol. 11, no. 6, pp. 1346-1361.
View/Download from: UTS OPUS or Publisher's site
© 2015 IEEE.Publicly accessible data warehouses are an indispensable resource for data analysis. However, they also pose a significant risk to the privacy of the clients, since a data warehouse operator may follow the client's queries and infer what the client is interested in. Private information retrieval (PIR) techniques allow the client to retrieve a cell from a data warehouse without revealing to the operator which cell is retrieved and, therefore, protects the privacy of the client's queries. However, PIR cannot be used to hide online analytical processing (OLAP) operations performed by the client, which may disclose the client's interest. This paper presents a solution for private cell retrieval from a data warehouse on the basis of the Paillier cryptosystem. By our solution, the client can privately perform OLAP operations on the data warehouse and retrieve one (or more) cell without revealing any information about which cell is selected. In addition, we propose a solution for private block download on the basis of the Paillier cryptosystem. Our private block download allows the client to download an encrypted block from a data warehouse without revealing which block in a cloaking region is downloaded and improves the feasibility of our private cell retrieval. Our solutions ensure both the server's privacy and the client's privacy. Our experiments have shown that our solutions are practical.
He, W. & Xu, G. 2016, 'Social media analytics: unveiling the value, impact and implications of social media analytics for the management and use of online information', Online Information Review, vol. 40, no. 1.
View/Download from: UTS OPUS or Publisher's site
Zhang, Z., Liu, Y., Xu, G. & Luo, G. 2016, 'Recommendation using DMF-based fine tuning method', Journal of Intelligent Information Systems, pp. 1-14.
View/Download from: UTS OPUS or Publisher's site
© 2016 Springer Science+Business Media New York Recommender Systems (RS) have been comprehensively analyzed in the past decade, Matrix Factorization (MF)-based Collaborative Filtering (CF) method has been proved to be an useful model to improve the performance of recommendation. Factors that inferred from item rating patterns shows the vectors which are useful for MF to characterize both items and users. A recommendation can concluded from good correspondence between item and user factors. A basic MF model starts with an object function, which is consisted of the squared error between original training matrix and predicted matrix as well as the regularization term (regularization parameters). To learn the predicted matrix, recommender systems minimize the squared error which has been regularized. However, two important details have been ignored: (1) the predicted matrix will be more and more accuracy as the iterations carried out, then a fix value of regularization parameters may not be the most suitable choice. (2) the final distribution trend of ratings of predicted matrix is not similar with the original training matrix. Therefore, we propose a Dynamic-MF algorithm and fine tuning method which is quite general to overcome the mentioned detail problems. Some other information, such as social relations, etc, can be easily incorporated into this method (model). The experimental analysis on two large datasets demonstrates that our approaches outperform the basic MF-based method.
Hazber, M.A.G., Li, R., Gu, X. & Xu, G. 2016, 'Integration mapping rules: Transforming relational database to semantic web ontology', Applied Mathematics and Information Sciences, vol. 10, no. 3, pp. 881-901.
View/Download from: Publisher's site
© 2016 NSP.Semantic integration became an attractive area of research in several disciplines, such as information integration, databases and ontologies. Huge amount of data is still stored in relational databases (RDBs) that can be used to build ontology, and the database cannot be used directly by the semantic web. Therefore, one of the main challenges of the semantic web is mapping relational databases to ontologies (RDF(S)-OWL). Moreover, the use of manual work in the mapping of web contents to ontologies is impractical because it contains billions of pages and the most of these contents are generated from relational databases. Hence, we propose a new approach, which enables semantic web applications to access relational databases and their contents by semantic methods. Domain ontologies can be used to formulate relational database schema and data in order to simplify the mapping (transformation) of the underlying data sources. Our method consists of two main phases: building ontology from an RDB schema and the generation of ontology instances from an RDB data automatically. In the first phase, we studied different cases of RDB schema to be mapped into ontology represented in RDF(S)-OWL, while in the second phase, the mapping rules are used to transform RDB data to ontological instances represented in RDF triples. Our approach is demonstrated with examples, validated by ontology validator and implemented using Apache Jena in Java Language and MYSQL. This approach is effective for building ontology and important for mining semantic information from huge web resources.
Xu, G., Wu, Z., Li, G. & Chen, E. 2015, 'Improving contextual advertising matching by using Wikipedia thesaurus knowledge', Knowledge And Information Systems, vol. 43, no. 3, pp. 599-631.
View/Download from: UTS OPUS or Publisher's site
As a prevalent type of Web advertising, contextual advertising refers to the placement of the most relevant commercial ads within the content of a Web page, to provide a better user experience and as a result increase the user's ad-click rate. However, due to the intrinsic problems of homonymy and polysemy, the low intersection of keywords, and a lack of sufficient semantics, traditional keyword matching techniques are not able to effectively handle contextual matching and retrieve relevant ads for the user, resulting in an unsatisfactory performance in ad selection. In this paper, we introduce a new contextual advertising approach to overcome these problems, which uses Wikipedia thesaurus knowledge to enrich the semantic expression of a target page (or an ad). First, we map each page into a keyword vector, upon which two additional feature vectors, the Wikipedia concept and category vector derived from the Wikipedia thesaurus structure, are then constructed. Second, to determine the relevant ads for a given page, we propose a linear similarity fusion mechanism, which combines the above three feature vectors in a unified manner. Last, we validate our approach using a set of real ads, real pages along with the external Wikipedia thesaurus. The experimental results show that our approach outperforms the conventional contextual advertising matching approaches and can substantially improve the performance of ad selection.
Xu, G., Zong, Y., Jin, P., Pan, R. & Wu, Z. 2015, 'KIPTC: a kernel information propagation tag clustering algorithm', Journal of Intelligent Information Systems, vol. 45, no. 1, pp. 95-112.
View/Download from: UTS OPUS or Publisher's site
In the social annotation systems, users annotate digital data sources by using tags which are freely chosen textual descriptions. Tags are used to index, annotate and retrieve resource as an additional metadata of resource. Poor retrieval performance remains a major challenge of most social annotation systems resulting from several problems of ambiguity, redundancy and less semantic nature of tags. Clustering is a useful tool to handle these problems in social annotation systems. In this paper, we propose a novel tag clustering algorithm based on kernel information propagation. This approach makes use of the kernel density estimation of the kNN neighborhood directed graph as a start to reveal the prestige rank of tags in tagging data. The random walk with restart algorithm is then employed to determine the center points of tag clusters. The main strength of the proposed approach is the capability of partitioning tags from the perspective of tag prestige rank rather than the intuitive similarity calculation itself. Experimental studies on the six real world data sets demonstrate the effectiveness and superiority of the proposed method against other state-of-the-art clustering approaches in terms of various evaluation metrics.
Li, X., Xu, G., Chen, E. & Zong, Y. 2015, 'Learning recency based comparative choice towards point-of-interest recommendation', Expert Systems with Applications, vol. 42, no. 9, pp. 4274-4283.
View/Download from: UTS OPUS or Publisher's site
With the prevalence of GPS-enabled smart phones, Location Based Social Network (LBSN) has emerged and become a hot research topic during the past few years. As one of the most important components in LBSN, Points-of-Interests (POIs) has been extensively studied by both academia and industry, yielding POI recommendations to enhance user experience in exploring the city. In conventional methods, rating vectors for both users and POIs are utilized for similarity calculation, which might yield inaccuracy due to the differences of user biases. In our opinion, the rating values themselves do not give exact preferences of users, however the numeric order of ratings given by a user within a certain period provides a hint of preference order of POIs by such user. Firstly, we propose an approach to model users preference by employing utility theory. Secondly, We devise a collection-wise learning method over partial orders through an effective stochastic gradient descent algorithm. We test our model on two real world datasets, i.e., Yelp and TripAdvisor, by comparing with some state-of-the-art approaches including PMF and several user preference modeling methods. In terms of MAP and Recall, we averagely achieve 15% improvement with regard to the baseline methods. The results show the significance of comparative choice in a certain time window and show its superiority to the existing methods.
Jeong, Y.-.S., Shyu, M.-.L., Xu, G. & Wagner, R.R. 2015, 'Guest Editorial: Advanced Technologies and Services for Multimedia Big Data Processing', Multimedia Tools and Applications, vol. 74, no. 10, pp. 3413-3418.
View/Download from: UTS OPUS or Publisher's site
Xu, G., Wu, Z., Zhang, Y. & Cao, J. 2015, 'Social networking meets recommender systems: survey', International Journal of Social Network Mining, vol. 2, no. 1, pp. 64-100.
View/Download from: UTS OPUS or Publisher's site
Today, the emergence of web-based communities and hosted services such as social networking sites, wikis and folksonomies, brings in tremendous freedom of web autonomy and facilitate collaboration and knowledge sharing between users. Along with the interaction between users and computers, social media is rapidly becoming an important part of our digital experience, ranging from digital textual information to diverse multimedia forms. These aspects and characteristics constitute of the core of second generation of web. Social networking (SN) and recommender system (RS) are two hot and popular topics in the current Web 2.0 era, where the former emphasises the generation, dissemination and evolution of user relations, and the latter focuses on the use of collective preferences of users so as to provide the better experience and loyalty of users in various web applications. Leveraging user social connections is able to alleviate the common problems of sparsity and cold-start encountered in RS. This paper aims to summarise the research progresses and findings in these two areas and showcase the empowerment of integrating these two kinds of research strengths.
Wu, Z., Shi, J., Lu, C., Chen, E., Xu, G., Li, G., Xie, S. & Yu, P.S. 2015, 'Constructing plausible innocuous pseudo queries to protect user query intention', Information Sciences, vol. 325, pp. 215-226.
View/Download from: UTS OPUS or Publisher's site
Deng, S., Wang, D., Li, X. & Xu, G. 2015, 'Exploring user emotion in microblogs for music recommendation', Expert Systems with Applications, vol. 42, no. 23, pp. 9284-9293.
View/Download from: UTS OPUS or Publisher's site
© 2015 Elsevier Ltd. All rights reserved. Context-aware recommendation has become increasingly important and popular in recent years when users are immersed in enormous music contents and have difficulty to make their choices. User emotion, as one of the most important contexts, has the potential to improve music recommendation, but has not yet been fully explored due to the great difficulty of emotion acquisition. This article utilizes users' microblogs to extract their emotions at different granularity levels and during different time windows. The approach then correlates three elements: user, music and the user's emotion when he/she is listening to the music piece. Based on the associations extracted from a data set crawled from a Chinese Twitter service, we develop several emotion-aware methods to perform music recommendation. We conduct a series of experiments and show that the proposed solution proves that considering user emotional context can indeed improve recommendation performance in terms of hit rate, precision, recall, and F1 score.
Li, Y., Li, Y. & Xu, G. 2015, 'Protecting private geosocial networks against practical hybrid attacks with heterogeneous information', Neurocomputing.
View/Download from: UTS OPUS or Publisher's site
© 2016 Elsevier B.V.GeoSocial Networks (GSNs) are becoming increasingly popular due to its power in providing high-performance and flexible service capabilities. More and more Internet users have accepted this innovative service model. However, even GSNs have great business value for data analysis by integrated with location information, it may seriously compromise users' privacy in publishing the GSN data. In this paper, we study the identity disclosure problem in publishing GSN data. We first discuss the attack problem by considering both the location-based and structure-based properties, as background knowledge, and then formalize two general models, named (k,m)-anonymity and (k,m,l)-anonymity Then we propose a complete solution to achieve (k,m)-anonymization and (k,m,l)-anonymization to prevent the released data from the above attacks above. We also take data utility into consideration by defining specific information loss metrics. It is validated by real-world data that the proposed methods can prevent GSN dataset from the attacks while retaining good utility.
Durao, F., Bayyapu, K., Xu, G., Dolog, P. & Lage, R. 2014, 'Expanding user's query with tag-neighbors for effective medical information retrieval', Multimedia Tools and Applications, vol. 71, no. 2, pp. 905-929.
View/Download from: UTS OPUS or Publisher's site
Fu, B., Wang, Z., Xu, G. & Cao, L. 2014, 'Multi-label learning based on iterative label propagation over graph', Pattern Recognition Letters, vol. 42, no. 1, pp. 85-90.
View/Download from: UTS OPUS
Deng, S., Huang, L. & Xu, G. 2014, 'Social network-based service recommendation with trust enhancement', Expert Systems with Applications, vol. 41, no. 18, pp. 8075-8084.
View/Download from: UTS OPUS or Publisher's site
Given the increasing applications of service computing and cloud computing, a large number of Web services are deployed on the Internet, triggering the research of Web service recommendation. Despite of service QoS, the use of user feedback is becoming the current trend in service recommendation. Likewise in traditional recommender systems, sparsity, cold-start and trustworthiness are major issues challenging service recommendation in adopting similarity-based approaches. Meanwhile, with the prevalence of social networks, nowadays people become active in interacting with various computers and users, resulting in a huge volume of data available, such as service information, user-service ratings, interaction logs, and user relationships. Therefore, how to incorporate the trust relationship in social networks with user feedback for service recommendation motivates this work. In this paper, we propose a social network-based service recommendation method with trust enhancement known as RelevantTrustWalker. First, a matrix factorization method is utilized to assess the degree of trust between users in social network. Next, an extended random walk algorithm is proposed to obtain recommendation results. To evaluate the accuracy of the algorithm, experiments on a real-world dataset are conducted and experimental results indicate that the quality of the recommendation and the speed of the method are improved compared with existing algorithms. © 2014 Elsevier Ltd. All rights reserved.
Cao, L. & Xu, G. 2014, 'Behavior Informatics: A New Perspective', IEEE Intelligent Systems, vol. 29, no. 4, pp. 62-80.
View/Download from: UTS OPUS or Publisher's site
Agarwal, N., Zhou, A. & Xu, G. 2014, 'Social cyber systems—Challenges, opportunities, and beyond', Journal of Systems and Software, vol. 94, pp. 1-3.
View/Download from: UTS OPUS or Publisher's site
Xu, G., Zhou, A. & Agarwal, N. 2014, 'Special issue on social computing and its applications', Computer Journal, vol. 57, no. 9, pp. 1279-1280.
View/Download from: UTS OPUS or Publisher's site
Gu, Y., Yang, Z., Xu, G., Nakano, M., Toyoda, M. & Kitsuregawa, M. 2014, 'Exploration on efficient similar sentences extraction', World Wide Web, vol. 17, no. 4, pp. 595-626.
View/Download from: UTS OPUS or Publisher's site
Measuring the semantic similarity between sentences is an essential issue for many applications, such as text summarization, Web page retrieval, question-answer model, image extraction, and so forth. A few studies have explored on this issue by several techniques, e.g., knowledge-based strategies, corpus-based strategies, hybrid strategies, etc. Most of these studies focus on how to improve the effectiveness of the problem. In this paper, we address the efficiency issue, i.e., for a given sentence collection, how to efficiently discover the top-k semantic similar sentences to a query. The previous methods cannot handle the big data efficiently, i.e., applying such strategies directly is time consuming because every candidate sentence needs to be tested. In this paper, we propose efficient strategies to tackle such problem based on a general framework. The basic idea is that for each similarity, we build a corresponding index in the preprocessing. Traversing these indices in the querying process can avoid to test many candidates, so as to improve the efficiency. Moreover, an optimal aggregation algorithm is introduced to assemble these similarities. Our framework is general enough that many similarity metrics can be incorporated, as will be discussed in the paper. We conduct extensive experimental evaluation on three real datasets to evaluate the efficiency of our proposal. In addition, we illustrate the trade-off between the effectiveness and efficiency. The experimental results demonstrate that the performance of our proposal outperforms the state-of-the-art techniques on efficiency while keeping the same high precision as them. © 2013 Springer Science+Business Media New York.
Wu, Z., Xu, G., Lu, C., Chen, E.X., Zhang, Y. & Zhang, H. 2013, 'Position-wise contextual advertising: Placing relevant ads at appropriate positions of a web page', Neurocomputing, vol. 120, no. 1, pp. 524-535.
View/Download from: UTS OPUS or Publisher's site
Web advertising, a form of online advertising, which uses the Internet as a medium to post product or service information and attract customers, has become one of the most important marketing channels. As one prevalent type of web advertising, contextual
Xu, G., Yu, J.X. & Lee, W. 2013, 'Social networks and social Web mining', World Wide Web-Internet And Web Information Systems, vol. 16, no. 5-6, pp. 541-544.
View/Download from: UTS OPUS or Publisher's site
NA
Li, L., Xu, G., Yang, Z., Dolog, P., Zhang, Y. & Kitsuregawa, M. 2013, 'An efficient approach to suggesting topically related web queries using hidden topic model', World Wide Web, vol. 16, no. 3, pp. 273-297.
View/Download from: UTS OPUS or Publisher's site
Keyword-based Web search is a widely used approach for locating information on the Web. However, Web users usually suffer from the difficulties of organizing and formulating appropriate input queries due to the lack of sufficient domain knowledge, which greatly affects the search performance. An effective tool to meet the information needs of a search engine user is to suggest Web queries that are topically related to their initial inquiry. Accurately computing query-to-query similarity scores is a key to improve the quality of these suggestions. Because of the short lengths of queries, traditional pseudo-relevance or implicit-relevance based approaches expand the expression of the queries for the similarity computation. They explicitly use a search engine as a complementary source and directly extract additional features (such as terms or URLs) from the top-listed or clicked search results. In this paper, we propose a novel approach by utilizing the hidden topic as an expandable feature. This has two steps. In the offline model-learning step, a hidden topic model is trained, and for each candidate query, its posterior distribution over the hidden topic space is determined to re-express the query instead of the lexical expression. In the online query suggestion step, after inferring the topic distribution for an input query in a similar way, we then calculate the similarity between candidate queries and the input query in terms of their corresponding topic distributions; and produce a suggestion list of candidate queries based on the similarity scores. Our experimental results on two real data sets show that the hidden topic based suggestion is much more efficient than the traditional term or URL based approach, and is effective in finding topically related queries for suggestion.
Liu, L., Chen, X., Luo, D., Lu, Y., Xu, G. & Liu, M. 2013, 'HSC: A spectral clustering algorithm combined with hierarchical method', Neural Network World, vol. 23, no. 6, pp. 499-521.
View/Download from: UTS OPUS
Most of the traditional clustering algorithms are poor for clustering more complex structures other than the convex spherical sample space. In the past few years, several spectral clustering algorithms were proposed to cluster arbitrarily shaped data in various real applications. However, spectral clustering relies on the dataset where each cluster is approximately well separated to a certain extent. In the case that the cluster has an obvious inflection point within a non-convex space, the spectral clustering algorithm would mistakenly recognize one cluster to be different clusters. In this paper, we propose a novel spectral clustering algorithm called HSC combined with hierarchical method, which obviates the disadvantage of the spectral clustering by not using the misleading information of the noisy neighboring data points. The simple clustering procedure is applied to eliminate the misleading information, and thus the HSC algorithm could cluster both convex shaped data and arbitrarily shaped data more efficiently and accurately. The experiments on both synthetic data sets and real data sets show that HSC outperforms other popular clustering algorithms. Furthermore, we observed that HSC can also be used for the estimation of the number of clusters
Wu, Z., Xu, G., Zong, Y., Yi, X., Chen, E. & Zhang, Y. 2012, 'Executing SQL queries over encrypted character strings in the Database-As-Service model', Knowledge-based Systems, vol. 35, pp. 332-348.
View/Download from: UTS OPUS or Publisher's site
Wu, Z., Xu, G., Zhang, Y., Dolog, P. & Lu, C. 2012, 'An Improved Contextual Advertising Matching Approach Based On Wikipedia Knowledge', Computer Journal, vol. 55, no. 3, pp. 277-292.
View/Download from: UTS OPUS or Publisher's site
The current boom of the Web is associated with the revenues originated from Web advertising. As one prevalent type of Web advertising, contextual advertising refers to the placement of the most relevant commercial textual ads within the content of a Web
Wu, Z., Xu, G., Zhang, Y., Cao, Z., Li, G. & Hu, Z. 2012, 'GMQL: A graphical multimedia query language', Knowledge-based Systems, vol. 26, pp. 135-143.
View/Download from: UTS OPUS or Publisher's site
The rapid increase of multimedia data makes multimedia query more and more important. To better satisfy users query requirements, developing a functional multimedia query language is becoming a promising and interesting task. In this paper, we propose a graphical multimedia query language called GMQL, which is developed based on a semi-structured data organization model. In GMQL, we combine the advantages of graphs and texts, making the query language much clear, easy to use and with powerful expressiveness. In this paper, we first present the notations and basic capabilities of GMQL by query examples. Second, we discuss the GMQL query processing techniques. Last, we evaluate and analyze our multimedia query language through the comparison with other existing multimedia query languages. The evaluation results show that, GMQL has powerful expressiveness, and thus is much applicable for multimedia information retrieval.
Li, L., Zhong, L., Xu, G. & Kitsuregawa, M. 2012, 'A feature-free search query classification approach using semantic distance', Expert Systems with Applications, vol. 39, no. 12, pp. 10739-10748.
View/Download from: UTS OPUS or Publisher's site
When classifying search queries into a set of target categories, machine learning based conventional approaches usually make use of external sources of information to obtain additional features for search queries and training data for target categories. Unfortunately, these approaches rely on large amount of training data for high classification precision. Moreover, they are known to suffer from inability to adapt to different target categories which may be caused by the dynamic changes observed in both Web topic taxonomy and Web content. In this paper, we propose a feature-free classification approach using semantic distance. We analyze queries and categories themselves and utilizes the number of Web pages containing both a query and a category as a semantic distance to determine their similarity. The most attractive feature of our approach is that it only utilizes the Web page counts estimated by a search engine to provide the search query classification with respectable accuracy. In addition, it can be easily adaptive to the changes in the target categories, since machine learning based approaches require extensive updating process, e.g., re-labeling outdated training data, re-training classifiers, to name a few, which is time consuming and high-cost. We conduct experimental study on the effectiveness of our approach using a set of rank measures and show that our approach performs competitively to some popular state-of-the-art solutions which, however, frequently use external sources and are inherently insufficient in flexibility.
Pan, R., Dolog, P. & Xu, G. 2012, 'KNN-Based Clustering for Improving Social Recommender Systems', Agents and Data Mining Interaction - 8th International Workshop, ADMI 2012, Valencia, Spain, June 4-5, 2012, Revised Selected Papers, pp. 115-125.
View/Download from: UTS OPUS or Publisher's site
Li, L., Xu, G., Zhang, Y. & Kitsuregawa, M. 2011, 'Random walk based rank aggregation to improving web search', Knowledge-based Systems, vol. 24, no. 7, pp. 943-951.
View/Download from: UTS OPUS or Publisher's site
In Web search, with the aid of related query recommendation, Web users can revise their initial queries in several serial rounds in pursuit of finding needed Web pages. In this paper, we address the Web search problem on aggregating search results of related queries to improve the retrieval quality. Given an initial query and the suggested related queries, our search system concurrently processes their search result lists from an existing search engine and then forms a single list aggregated by all the retrieved lists. We specifically propose a generic rank aggregation framework which consists of three steps. First we build a so-called Win/Loss graph of Web pages according to a competition rule, and then apply the random walk mechanism on the Win/Loss graph. Last we sort these Web pages by their ranks using a PageRank-like rank mechanism. The proposed framework considers not only the number of wins that an item won in competitions, but also the quality of its competitor items in calculating the ranking of Web page items. Experimental results show that our search system can clearly improve the retrieval quality in a parallel manner over the traditional search strategy that serially returns result lists. Moreover, we also provide empirical evidences as to demonstrate how different rank aggregation methods affect the retrieval quality.
Xu, G., Li, L., Zhang, Y., Yi, X. & Kitsuregawa, M. 2011, 'Modeling user hidden navigational behavior for Web recommendation', Web Intelligence and Agent Systems-An international journal, vol. 9, no. 3, pp. 239-255.
View/Download from: UTS OPUS or Publisher's site
Web users exhibit a variety of navigational interests through clicking a sequence of Web pages. Analyses of Web usage data will lead to discovering Web user access patterns, and in turn, facilitating users to locate more preferable Web contents via collaborative recommendation techniques. In the context of Web usage mining, Latent Semantic Analysis (LSA) based on probability inference provides a promising approach to capture not only user hidden navigational patterns, but also the associations between users, pages and hidden navigational patterns. The discovered user access patterns could be used as a usage reference base for identifying the new users access preferences and making usage-based collaborative Web recommendations. In this paper, we propose a novel usage-based Web recommendation framework, in which Latent Dirichlet Allocation (LDA) is employed to learn the underlying task space from the training Web log data and infer the task distribution for a target user via task inference. The main advantages of the adapted LDA model are its capabilities of efficiently learning the semantic usage information from the Web log data and effectively inferring the access preference of the target user even with a few Web clicks that might be unseen in the training data. In this paper, we also investigate the determination of an optimizing task number, which is another important problem commonly encountered in latent semantic analysis. Experiments conducted on a real Web log dataset show that this approach can achieve better recommendation performance in comparison to other existing techniques. And the discovered task-simplex expression can also provide a better interpretation for Web designers or users to better understand the user navigational preference.
Zong, Y., Xu, G., Jin, P., Zhang, Y. & Chen, E. 2011, 'HC_AB: A new heuristic clustering algorithm based on Approximate Backbone', Information Processing Letters, vol. 111, no. 17, pp. 857-863.
View/Download from: UTS OPUS or Publisher's site
Clustering is an important research area with numerous applications in pattern recognition, machine learning, and data mining. Since the clustering problem on numeric data sets can be formulated as a typical combinatorial optimization problem, many researches have addressed the design of heuristic algorithms for finding sub-optimal solutions in a reasonable period of time. However, most of the heuristic clustering algorithms suffer from the problem of being sensitive to the initialization and do not guarantee the high quality results. Recently, Approximate Backbone (AB), i.e., the commonly shared intersection of several sub-optimal solutions, has been proposed to address the sensitivity problem of initialization. In this paper, we aim to introduce the AB into heuristic clustering to overcome the initialization sensitivity of conventional heuristic clustering algorithms. The main advantage of the proposed method is the capability of restricting the initial search space around the optimal result by defining the AB, and in turn, reducing the impact of initialization on clustering, eventually improving the performance of heuristic clustering. Experiments on synthetic and real world data sets are performed to validate the effectiveness of the proposed approach in comparison to three conventional heuristic clustering algorithms and three other algorithms with improvement on initialization
Zong, Y., Xu, G., Zhang, Y., Jiang, H. & Li, M. 2010, 'A Robust Iterative Refinement Clustering Algorithm With Smoothing Search Space', Knowledge-based Systems, vol. 23, no. 5, pp. 389-396.
View/Download from: UTS OPUS or Publisher's site
Iterative refinement clustering algorithms are widely used in data mining area, but they are sensitive to the initialization. In the past decades, many modified initialization methods have been proposed to reduce the influence of initialization sensitivity problem. The essence of iterative refinement clustering algorithms is the local search method. The big numbers of the local minimum points which are embedded in the search space make the local search problem hard and sensitive to the initialization. The smaller number of local minimum points, the more robust of initialization for a local search algorithm is. In this paper, we propose a TopDown Clustering algorithm with Smoothing Search Space (TDCS3) to reduce the influence of initialization. The main steps of TDCS3 are to: (1) dynamically reconstruct a series of smoothed search spaces into a hierarchical structure by `filling the local minimum points; (2) at the top level of the hierarchical structure, an existing iterative refinement clustering algorithm is run with random initialization to generate the clustering result; (3) eventually from the second level to the bottom level of the hierarchical structure, the same clustering algorithm is run with the initialization derived from the previous clustering result. Experiment results on 3 synthetic and 10 real world data sets have shown that TDCS3 has significant effects on finding better, robust clustering result and reducing the impact of initialization.
Zhang, Y. & Xu, G. 2009, 'On Web Communities Mining And Recommendation', Concurrency And Computation-practice & Experience, vol. 21, no. 5, pp. 561-582.
View/Download from: UTS OPUS or Publisher's site
Because of the lack of a uniform schema for web documents and the sheer amount and dynamics of web data, both the effectiveness and the efficiency of information management and retrieval of web data are often unsatisfactory when using conventional data m
Thongkam, J., Xu, G., Zhang, Y. & Huang, F. 2009, 'Toward breast cancer survivability prediction models through improving training space', Expert Systems with Applications, vol. 36, no. 10, pp. 12200-12209.
View/Download from: UTS OPUS or Publisher's site
Due to the difficulties of outlier and skewed data, the prediction of breast cancer survivability has presented many challenges in the field of data mining and pattern precognition, especially in medical research. To solve these problems, we have proposed a hybrid approach to generating higher quality data sets in the creation of improved breast cancer survival prediction models. This approach comprises two main steps: (1) utilization of an outlier filtering approach based on C-Support Vector Classification (C-SVC) to identify and eliminate outlier instances; and (2) application of an over-sampling approach using over-sampling with replacement to increase the number of instances in the minority class. In order to assess the capability and effectiveness of the proposed approach, several measurement methods including basic performance (e.g., accuracy, sensitivity, and specificity), Area Under the receiver operating characteristic Curve (AUC) and F-measure were utilized. Moreover, a 10-fold cross-validation method was used to reduce the bias and variance of the results of breast cancer survivability prediction models. Results have indicated that the proposed approach leads to improving the performance of breast cancer survivability prediction models by up to 28.34% due to the improved training data space.