Xueping Peng obtained his joint PhD degrees in computer software and theory from Beijing Institute of Technology (BIT) in 2013 and in computer science from University of Technology Sydney (UTS) in 2015. He is a Lecturer at the Centre for Artificial Intelligence, University of Technology Sydney. Prior to this position, he was a Postdoctoral Research Fellow at the Centre for Artificial Intelligence, UTS.
He is currently conducting application-driven research focusing on artificial intelligence, data mining and healthcare.
Can supervise: YES
- Data Mining
- Artificial Intelligence
Niu, K, Guo, J, Pan, Y, Gao, X, Peng, X, Li, N & Li, H 2020, 'Multichannel Deep Attention Neural Networks for the Classification of Autism Spectrum Disorder Using Neuroimaging and Personal Characteristic Data', COMPLEXITY, vol. 2020.View/Download from: UTS OPUS or Publisher's site
Niu, K, Zhao, X, Li, F, Li, N, Peng, X & Chen, W 2019, 'UTSP: User-based two-step recommendation with popularity normalization towards diversity and novelty', IEEE Access, vol. 7, pp. 145426-145434.View/Download from: UTS OPUS or Publisher's site
Zhang, X, Lu, W, Li, F, Peng, X & Zhang, R 2019, 'Deep Feature Fusion Model for Sentence Semantic Matching', Computers, Materials & Continua, vol. 61, no. 2, pp. 601-616.View/Download from: UTS OPUS or Publisher's site
Xu, G, Qiu, L & Peng, X 2016, 'Hot Topic Extraction and Public Opinion Classification of Tibetan Texts', Journal of Digital Information Management, vol. 14, no. 3, pp. 160-167.View/Download from: UTS OPUS
The increasing amount of Tibetan information
has made Tibetan text processing popular and highly
significant. In this study, Tibetan hot topic extraction and
public opinion classification were investigated to accelerate
the development of Tibetan information processing.
First, Tibetan word segmentation in Tibetan hot topic
extraction was presented. Second, feature selection based
on term frequency and that based on document frequency
was adopted to decrease feature dimensions. Third, a
vector space model was used to conduct text representation.
Finally, a statistical-based method was utilized to
extract hot topics. In studying public opinion classification,
a keyword table of public opinion needed to be established
to conduct Tibetan public opinion classification.
According to field, 18 classes were selected and
used for public opinion classification. A keyword table of
public opinion was constructed by domain experts. The
approach to public opinion classification was introduced
on the basis of the proposed similarity computation
method. Depending on the proposed approaches, the
application system was developed and used to carry out
the experiments. Experiments show that the proposed
method can extract topics effectively and classify public
opinion rapidly. This research is helpful and meaningful
for text classification, information retrieval, and construction
of high-quality corpus.
Tibetan text clustering has potential in
Tibetan information processing domain. In this paper,
clustering research across Chinese and Tibetan texts is
proposed to benefit Chinese and Tibetan machine
translation and sentence alignment. A Tibetan and
Chinese keyword table is the main way to implement the
text clustering across these two languages. Improved Kmeans
and improved density-based spatial clustering of
applications with noise (DBSCAN) algorithm are proposed.
Experiments show that improved K-means algorithm gains
stable text clustering result and performs better than
traditional K-means after eliminating the limitation of
random selection of initial k data. The improved DBSCAN
algorithm obtains good performance through reasonable
parameter setting. Improved DBSCAN performs better
than improved K-means. The study is helpful and
meaningful for the parallel corpus construction of Chinese
and Tibetan texts
Zhao, Y, Niu, Z & Peng, X 2014, 'Research on Data Mining Technologies for Complicated Attributes Relationship in Digital Library Collections', Applied Mathematics & Information Sciences, vol. 8, no. 3, pp. 1173-1178.View/Download from: UTS OPUS
The authors present the research work on data mining technologies for complicated attributes relationship in digital library collections. Firstly, the work and ideology is introduced as the research background of this paper. Secondly, related preliminary research is introduced. The authors researched on attributes of digital library collections, proposed a parallel discretization algorithm based on z-score theory, and by the discretization algorithm discovered a complicated condition attribute relation among attributes, it is the reason why traditional data prediction algorithm didn't work well. At last, a stratified decision tree algorithm for value prediction about digital collection is put forward as the ultimate solution to solve the problem. Stratified attribute concept is imported in this algorithm. It can expand the selection of splitting attribute in decision tree from flat information to stereoscopic information, eliminate the influence of complicated condition attribute relationship, nested use existing decision tree algorithms, and solve the bottleneck of data mining application in digital library evaluation.
Peng, X, Niu, Z & Huang, S 2012, 'Query Suggestion Based on the Query Semantics and Clickthrough Data', Advanced Science Letters, vol. 9, no. 1, pp. 748-753.View/Download from: UTS OPUS or Publisher's site
Query suggestion plays an important role in improving the usability of search engines. For a given query raised by a specific user, the query suggestion technique aims at recommending relevant queries which may suit user's potential information needs. Due to the complexity of Web structure and the ambiguity of users' inputs query, most of existing suggestion algorithms suffer from the problem of poor recommendation accuracy. In this paper, aiming at providing semantically relevant queries for users, we develop a novel, effective and efficient query suggestion model by the query semantics and clickthrough data. First, we propose a method which combines query similarity with query semantics information, and calculates subject relevance among queries by word frequency information and the word's concept of Knowledge Network (HowNet). Second we propose another method which utilizes bipartite graph (query-URL bipartite graph) to learn the low-rank query feature space, and then builds a query similarity matrix model based on the features. Based on these, we design a ranking algorithm to propagate similarities on users' query log information, and finally recommend semantically relevant queries to users. Empirical experiments on the click-through data of a commercial search engine have proved the effectiveness and the efficiency of our method.
Peng, X, Niu, Z, Huang, S & Zhao, Y 2012, 'Personalized Web Search Using Clickthrough Data and Web Page Rating', Journal of Computers, vol. 7, no. 10, pp. 2578-2584.View/Download from: UTS OPUS or Publisher's site
Personalization of Web search is to carry out
retrieval for each user incorporating his/her interests. We
propose a novel technique to construct personalized
information retrieval model from the users’ clickthrough
data and Web page ratings. This model builds on the userbased
collaborative filtering technology and the top-N
resource recommending algorithm, which consists of three
parts: user profile, user-based collaborative filtering, and
the personalized search model. Firstly, we conduct user’s
preference score to construct the user profile from clicked
sequence score and Web page rating. Then it attains similar
users with a given user by user-based collaborative filtering
algorithm and calculates the recommendable Web page
scoring value. Finally, personalized informaion retrieval be
modeled by three case applies (rating information for the
user himself; at least rating information by similar users;
not make use of any rating information). Experimental
results indicate that our technique significantly improves
the search performance.
Zhao, Y, Niu, Z, Peng, X & Dai, L 2011, 'A Discretization Algorithm of Numerical Attributes for Digital Library Evaluation Based on Data Mining Technology', Lecture Notes in Computer Science, vol. 7008, pp. 70-76.View/Download from: UTS OPUS or Publisher's site
We present here a discretization algorithm for numerical attributes of digital collections. In our research data mining technology is imported into digital library evaluation to provide a better decision-making support. But data prediction algorithms work not well based on the traditional discretization method during the data mining process. The reason is that numerical attributes of digital collections are complicated and not in the same scale of distribution distance. We study the characteristic of numerical attributes and put forward a discretization method based on the Z-score idea of mathematical statistics. This algorithm can reflect the dynamic semantic distance for different numerical attributes and significantly enhance the precision rate and recall rate of data prediction algorithms. Furthermore a ‘nonlinear conditional relationship’ among attributes of digital collections is discovered during the study of discretization algorithm and impacts the actual application result of traditional data mining algorithms.
Cao, Y, Niu, Z, Zhao, K & Peng, X 2010, 'Near duplicated Web pages detection based on concept and semantic network', Ruanjian Xuebao/Journal of Software, vol. 22, no. 8, pp. 1816-1826.View/Download from: Publisher's site
Reprinting websites and blogs produces a great deal redundant Web Pages. To improve search efficiency and user satisfaction, the near-Duplicate Web Pages Detection based on Concept and Semantic network (DWDCS) is proposed. In the course of developing a near-duplicate detection system for a multi-billion pages repository, this paper makes two research contributions. First, the key concept is extracted, instead of the key phrase, to build Small Word Network (SWN). This not only reduces the complexity of the semantic network, but also resolves the
“expression difference” problem. Second, this paper considers both syntactic and semantic information to present and compute the documents’ similarities. In a large-scale test, experimental results demonstrate that this approach outperforms that of both I-Match and key phrase extraction algorithms based on SWN. Many advantages such as linear time and space complexity, without using a corpus, make the algorithm valuable in actual practice.
Peng, X, Long, G, Pan, S, Jiang, J & Niu, Z 2019, 'Attentive Dual Embedding for Understanding Medical Concept in Electronic Health Record', The 2019 International Joint Conference on Neural Networks (IJCNN 2019), Budapest, Hungary.View/Download from: UTS OPUS
Peng, X, Long, G, Shen, T, Wang, S, Jiang, J & Blumenstein, M 2019, 'Temporal Self-Attention Network for Medical Concept Embedding', 19th IEEE International Conference on Data Mining (ICDM), International Conference on Data Mining, Beijing, China.View/Download from: UTS OPUS
In longitudinal electronic health records (EHRs), the event records of a patient are distributed over a long period of time and the temporal relations between the events reflect sufficient domain knowledge to benefit prediction tasks such as the rate of inpatient mortality. Medical concept embedding as a feature extraction method that transforms a set of medical concepts with a specific time stamp into a vector, which will be fed into a supervised learning algorithm. The quality of the embedding significantly determines the learning performance over the medical data. In this paper, we propose a medical concept embedding method based on applying a self-attention mechanism to represent each medical concept. We propose a novel attention mechanism which captures the contextual information and temporal relationships between medical concepts. A light-weight neural net, “Temporal Self-Attention Network (TeSAN)”, is then proposed to learn medical concept embedding based solely on the proposed attention mechanism. To test the effectiveness of our proposed methods, we have conducted clustering and prediction tasks on two public EHRs datasets comparing TeSAN against five state-of-the-art embedding methods. The experimental results demonstrate that the proposed TeSAN model is superior to all the compared methods. To the best of our knowledge, this work is the first to exploit temporal self-attentive relations between medical events.
Wang, Y, Long, G, Peng, X, Clarke, A, Stevenson, R & Gerrard, L 2019, 'Interactive Deep Metric Learning for Healthcare Cohort Discovery', Le T. et al. (eds) Data Mining. AusDM 2019. Communications in Computer and Information Science, Adelaide, Australia.View/Download from: UTS OPUS
Jia, B, Niu, K, Hou, X, Li, N, Peng, X, Gu, P & Jia, R 2019, 'Prediction for Student Academic Performance Using SMNaive Bayes Model', Lecture Notes in Computer Science, Advanced Data Mining and Applications, Springer Verlag, Dalian, China.View/Download from: UTS OPUS or Publisher's site
Predicting students academic performance is very important for students future development. There are a large number of students who can not graduate from colleges on time for various reasons every year. Nowadays, a large volume of students academic data has been generated in the process of promoting education informatization from the field of education. It becomes critical to predict student performance and ensure students to graduate on time by taking the best of these data. Machine learning models that predict students performance are widely available. However, some existing machine learning models still have the problem of low accuracy in predicting students performance. To solve this problem, we proposes a SMNaive Bayes (SMNB) model, which integrates Sequential Minimal Optimization (SMO) and Naive Bayes to make the prediction result more accurate. The basic idea is that the model predicts the performance of students professional courses via their basic course performance in the previous stage. In particular, SMO algorithm is leveraged to predict students academic performance of the first step and produces the results of the prediction; Naive Bayes then makes decision about the inconsistent results of the initial prediction; Lastly, the final results of students professional course performance prediction are produced. To test the effectiveness of our proposed model, we have conducted extensive experiments to compare SMNB against four prediction methods. The experimental results demonstrate that the proposed SMNB model is superior to all the compared methods.
Yin, Q, Niu, K, Li, N, Peng, X & Pan, Y 2019, 'ACO-RR: Ant Colony Optimization Ridge Regression in Reuse of Smart City System', Reuse in the Big Data Era. ICSR 2019. Lecture Notes in Computer Science, The 18th International Conference on Software and Systems Reuse, Springer, Cham, Cincinnati, OH, United States, pp. 204-219.View/Download from: UTS OPUS or Publisher's site
With the rapid development of artificial intelligence, governments of different countries have been focusing on building smart cities. To build a smart city is a system construction process which not only requires a lot of human and material resources, but also takes a long period of time. Due to the lack of enough human and material resources, it is a key challenge for lots of small and medium-sized cities to develop the intelligent construction, compared with the large cities with abundant resources. Reusing the existing smart city system to assist the intelligent construction of the small and medium-sizes cities is a reasonable way to solve this challenge. Following this idea, we propose a model of Ant Colony Optimization Ridge Regression (ACO-RR), which is a smart city evaluation method based on the ridge regression. The model helps small and medium-sized cities to select and reuse the existing smart city systems according to their personalized characteristics from different successful stories. Furthermore, the proposed model tackles the limitation of ridge parameters’ selection affecting the stability and generalization ability, because the parameters of the traditional ridge regression is manually random selected. To evaluate our model performance, we conduct experiments on real-world smart city data set. The experimental results demonstrate that our model outperforms the baseline methods, such as support vector machine and neural network.
Jiang, X, Peng, X & Long, G 2015, 'Discovering sequential rental patterns by fleet tracking', Data Science (LNCS), International Conference on Data Science, Springer, Sydney, Australia, pp. 42-49.View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing Switzerland 2015. As one of the most well-known methods on customer analysis, sequential pattern mining generally focuses on customer business transactions to discover their behaviors. However in the real-world rental industry, behaviors are usually linked to other factors in terms of actual equipment circumstance. Fleet tracking factors, such as location and usage, have been widely considered as important features to improve work performance and predict customer preferences. In this paper, we propose an innovative sequential pattern mining method to discover rental patterns by combining business transactions with the fleet tracking factors. A novel sequential pattern mining framework is designed to detect the effective items by utilizing both business transactions and fleet tracking information. Experimental results on real datasets testify the effectiveness of our approach.
Huang, S, Liu, X, Peng, X & Niu, Z 2012, 'Fine-grained Product Features Extraction and Categorization in Reviews Opinion Mining', 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), International Conference on Data Mining Workshops (ICDMW), IEEE, Institute of Electrical and Electronics Engineers, Brussels, Belgium, pp. 680-686.View/Download from: UTS OPUS or Publisher's site
With the growth of user-generated contents on the Web, product reviews opinion mining increasingly becomes a research practice of great value to e-commerce, search and recommendation. Unfortunately, the number of reviews is rising up to hundreds or even thousands, especially for some popular items, which makes it a laborious work for the potential buyers and the manufacturers to read through them to make a wise decision. Besides, the free format and the uncertainty of reviews expressions, make fine-grained product features extraction and categorization a more difficult task than traditional information extraction techniques. In this work, we propose to treat product feature extraction as a sequence labeling task and employ a discriminative learning model using Conditional Random Fields (CRFs) to tackle it. We innovatively incorporate the part-of-speech features and the sentence structure features into the CRFs learning process. For product feature categorization, we introduce the semantic knowledge-based and distributional context-based similarity measures to calculate the similarities between product feature expressions, then an effective graph pruning based categorizing algorithm is proposed to classify the collection of feature expressions into different semantic groups. The empirical studies have proved the effectiveness and efficiency of our approaches compared with other counterpart methods.
Huang, S, Peng, X, Niu, Z & Wang, K 2011, 'News topic detection based on hierarchical clustering and named entity', 2011 7th International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), Natural Language Processing and Knowledge Engineering, IEEE, Tokushima, Japan, pp. 280-284.View/Download from: UTS OPUS or Publisher's site
News topic detection is the process of organizing news story collections and real-time news/broadcast streams into news topics. While unlike the traditional text analysis, it is a process of incremental clustering, and generally divided into retrospective topic detection and online topic detection. This paper considers the feature changes of modern news data experienced from the past, and presents a new topic detection strategy based on hierarchical clustering and named entities. Topic detection process is also divided into retrospective and online steps, and named entities in the news stories are employed in the topic clustering algorithm. For the online step's efficiency and precision, this paper first clusters news stories in each time window into micro-clusters, and then extracts three representation vectors for each micro-cluster to calculate the similarity to existing topics. The experimental results show remarkable improvement compared with recently most applied topic detection method.
Peng, X, Huang, S & Niu, Z 2010, 'A study on personalized recommendation model based on search behaviors and resource properties', 2nd International Conference on Information Engineering and Computer Science - Proceedings, ICIECS 2010.View/Download from: Publisher's site
This paper presents an personalized recommendation model to recommend potentially interesting resources to users based on the users' search behaviors and resource properties. This model builds on the user-based collaborative filtering technology and the top-N resource recommending algorithm, which consists of three parts: users' preference description, similar users' calculation and the resource recommending model. Firstly, our model generates users' preference to resources by calculating relevance score between query string and resource, the score of resource owner, the score of resource category and the score of browse sequence. Then it attains similar users by given user through calculated preferences before. Finally, it recommends filtered and sorted resources to users based top-N resource recommendation model. Our recommendation model is proved more accurate than the model purely based on users' search behaviors by the experiments of our paper. ©2010 IEEE.
Cao, Y, Peng, X, Kun, Z, Niu, Z, Xu, G & Wang, W 2009, 'Query expansion based on query log and small world characteristic', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 573-580.View/Download from: Publisher's site
Automatic query expansion is an effective way to solve the word mismatching and short query problems. This paper presents a novel approach to Expand Queries Based on User log and Small world characteristic of the document (QEBUS). When the query is submitted, the synonymic concept of the query is gotten by searching a synonymic concept dictionary. Then the query log is explored and the key words are extracted from the user clicked documents based on small world network (SWN) characteristic. By analyzing the semantic network of the document based on SWN and exploring the correlations between the key words and the queries based on mutual information, high-quality expansion terms can be gotten. The experiment results show that our technique outperforms some traditional query expansion methods significantly. © 2009 Springer-Verlag Berlin Heidelberg.
Peng, X, Cao, Y & Niu, Z 2008, 'Mining web access log for the personalization recommendation', Proceedings - 2008 International Conference on MultiMedia and Information Technology, MMIT 2008, pp. 172-175.View/Download from: Publisher's site
This paper presents a personalization recommenddation model to recommend potentially interesting resources to users based on the web access log of users. This model builds on the apriori algorithm and the tf-idf technology, which consists of three parts: resource description, user's preference extraction and the personalization recommendation. Firstly ,our model generates resource text space vector by analyzing the resource information achieved by mining user's web access log, then it attains interest set to make use of the apriori algorithm based on the vector, finally, it recommends filtered and sorted resources to users content based recommendation model. © 2008 IEEE.
- Beijing Institute of Technology (BIT)
- Australia Federal Department of Health
- oOH!media Limited