Peng, H, Zheng, Y, Blumenstein, M, Tao, D & Li, J 2018, 'CRISPR/Cas9 cleavage efficiency regression through boosting algorithms and Markov sequence profiling', BIOINFORMATICS, vol. 34, no. 18, pp. 3069-3077.View/Download from: UTS OPUS or Publisher's site
Zhao, Z, Peng, H, Lan, C, Zheng, Y, Fang, L & Li, J 2018, 'Imbalance learning for the prediction of N-6-Methylation sites in mRNAs', BMC GENOMICS, vol. 19.View/Download from: UTS OPUS or Publisher's site
Peng, H, Zheng, Y, Zhao, Z, Liu, T & Li, J 2018, 'Recognition of CRISPR/Cas9 off-target sites through ensemble learning of uneven mismatch distributions.', Bioinformatics, vol. 34, no. 17, pp. i757-i765.View/Download from: UTS OPUS or Publisher's site
Motivation:CRISPR/Cas9 is driving a broad range of innovative applications from basic biology to biotechnology and medicine. One of its current issues is the effect of off-target editing that should be critically resolved and should be completely avoided in the ideal use of this system. Results:We developed an ensemble learning method to detect the off-target sites of a single guide RNA (sgRNA) from its thousands of genome-wide candidates. Nucleotide mismatches between on-target and off-target sites have been studied recently. We confirm that there exists strong mismatch enrichment and preferences at the 5'-end close regions of the off-target sequences. Comparing with the on-target sites, sequences of no-editing sites can be also characterized by GC composition changes and position-specific mismatch binary features. Under this novel space of features, an ensemble strategy was applied to train a prediction model. The model achieved a mean score 0.99 of Aera Under Receiver Operating Characteristic curve and a mean score 0.45 of Aera Under Precision-Recall curve in cross-validations on big datasets, outperforming state-of-the-art methods in various test scenarios. Our predicted off-target sites also correspond very well to those detected by high-throughput sequencing techniques. Especially, two case studies for selecting sgRNAs to cure hearing loss and retinal degeneration partly prove the effectiveness of our method. Availability and implementation:The python and matlab version of source codes for detecting off-target sites of a given sgRNA and the supplementary files are freely available on the web at https://github.com/penn-hui/OfftargetPredict. Supplementary information:Supplementary data are available at Bioinformatics online.
Zheng, Y, Peng, H, Zhang, X, Zhao, Z, Yin, J & Li, J 2018, 'Predicting adverse drug reactions of combined medication from heterogeneous pharmacologic databases.', BMC bioinformatics, vol. 19, no. Suppl 19, pp. 49-59.View/Download from: UTS OPUS or Publisher's site
BACKGROUND:Early and accurate identification of potential adverse drug reactions (ADRs) for combined medication is vital for public health. Existing methods either rely on expensive wet-lab experiments or detecting existing associations from related records. Thus, they inevitably suffer under-reporting, delays in reporting, and inability to detect ADRs for new and rare drugs. The current application of machine learning methods is severely impeded by the lack of proper drug representation and credible negative samples. Therefore, a method to represent drugs properly and to select credible negative samples becomes vital in applying machine learning methods to this problem. RESULTS:In this work, we propose a machine learning method to predict ADRs of combined medication from pharmacologic databases by building up highly-credible negative samples (HCNS-ADR). Specifically, we fuse heterogeneous information from different databases and represent each drug as a multi-dimensional vector according to its chemical substructures, target proteins, substituents, and related pathways first. Then, a drug-pair vector is obtained by appending the vector of one drug to the other. Next, we construct a drug-disease-gene network and devise a scoring method to measure the interaction probability of every drug pair via network analysis. Drug pairs with lower interaction probability are preferentially selected as negative samples. Following that, the validated positive samples and the selected credible negative samples are projected into a lower-dimensional space using the principal component analysis. Finally, a classifier is built for each ADR using its positive and negative samples with reduced dimensions. The performance of the proposed method is evaluated on simulative prediction for 1276 ADRs and 1048 drugs, comparing using four machine learning algorithms and with two baseline approaches. Extensive experiments show that the proposed way to represent drugs characterizes drugs accu...
Peng, H, Lan, C, Zheng, Y, Hutvagner, G, Tao, D & Li, J 2017, 'Cross disease analysis of co-functional microRNA pairs on a reconstructed network of disease-gene-microRNA tripartite.', BMC Bioinformatics, vol. 18, pp. 1-17.View/Download from: UTS OPUS or Publisher's site
MicroRNAs always function cooperatively in their regulation of gene expression. Dysfunctions of these co-functional microRNAs can play significant roles in disease development. We are interested in those multi-disease associated co-functional microRNAs that regulate their common dysfunctional target genes cooperatively in the development of multiple diseases. The research is potentially useful for human disease studies at the transcriptional level and for the study of multi-purpose microRNA therapeutics.We designed a computational method to detect multi-disease associated co-functional microRNA pairs and conducted cross disease analysis on a reconstructed disease-gene-microRNA (DGR) tripartite network. The construction of the DGR tripartite network is by the integration of newly predicted disease-microRNA associations with those relationships of diseases, microRNAs and genes maintained by existing databases. The prediction method uses a set of reliable negative samples of disease-microRNA association and a pre-computed kernel matrix instead of kernel functions. From this reconstructed DGR tripartite network, multi-disease associated co-functional microRNA pairs are detected together with their common dysfunctional target genes and ranked by a novel scoring method. We also conducted proof-of-concept case studies on cancer-related co-functional microRNA pairs as well as on non-cancer disease-related microRNA pairs.With the prioritization of the co-functional microRNAs that relate to a series of diseases, we found that the co-function phenomenon is not unusual. We also confirmed that the regulation of the microRNAs for the development of cancers is more complex and have more unique properties than those of non-cancer diseases.
Zheng, Y, Peng, H, Zhang, X, Gao, X & Li, J 2018, 'Predicting Drug Targets from Heterogeneous Spaces using Anchor Graph Hashing and Ensemble Learning', Proceedings of the International Joint Conference on Neural Networks, International Joint Conference on Neural Networks, IEEE, Rio de Janeiro, Brazil.View/Download from: UTS OPUS or Publisher's site
© 2018 IEEE. The in silico prediction of potential drug-targetinteractions is of critical importance in drug research. Existing computational methods have achieved remarkable prediction accuracy, however usually obtain poor prediction efficiency due to computational problems. To improve the prediction efficiency, we propose to predict drug targets based on inte- gration of heterogeneous features with anchor graph hashing and ensemble learning. First, we encode each drug as a 5682- bit vector, and each target as a 4198-bit vector using their heterogeneous features respectively. Then, these vectors are embedded into low-dimensional Hamming Space using anchor graph hashing. Next, we append hashing bits of a target to hashing bits of a drug as a vector to represent the drug-target pair. Finally, vectors of positive samples composed of known drug-target pairs and randomly selected negative samples are used to train and evaluate the ensemble learning model. The performance of the proposed method is evaluated on simulative target prediction of 1094 drugs from DrugBank. Ex- tensive comparison experiments demonstrate that the proposed method can achieve high prediction efficiency while preserving satisfactory accuracy. In fact, it is 99.3 times faster and only 0.001 less in AUC than the best literature method 'Pairwise Kernel Method'.
Zhang, X, Liu, Y, Zheng, Y, Zhao, Z, Li, J & Liu, Y 2018, 'Distinction between Ships and Icebergs in SAR Images Using Ensemble Loss Trained Convolutional Neural Networks', AI 2018: AI 2018: Advances in Artificial Intelligence (LNAI), Australasian Joint Conference on Artificial Intelligence, Springer, Wellington, New Zealand, pp. 216-223.View/Download from: UTS OPUS or Publisher's site
With the phenomenon of global warming, more new shipping routes will be open and utilized by more and more ships in the polar regions, particularly in the Arctic. Synthetic aperture radar (SAR) has been widely used in ship and iceberg monitoring for maritime surveillance and safety in the Arctic waters. At present, compared with the object detection of ship or iceberg, the task of ship and iceberg distinction in SAR images is still in challenge. In this work, we propose a novel loss function called ensemble loss to train convolutional neural networks (CNNs), which is a convex function and incorporates the traits of cross entropy and hinge loss. The ensemble loss trained CNNs model for the distinction between ship and iceberg is evaluated on a real-world SAR data set, which can get a higher classification accuracy to 90.15%. Experiment on another real image data set also confirm the effectiveness of the proposed ensemble loss.
Zheng, Y, Ghosh, S & Li, J 2017, 'An optimized drug similarity framework for side-effect prediction', Computing in Cardiology, Computing in Cardiology Conference, Rennes, France, pp. 1-4.View/Download from: UTS OPUS or Publisher's site
© 2017 IEEE Computer Society. All rights reserved. Drug side-effects are crucial issues in both the pre-market drug developing process and post-market drug clinical applications. They contribute to one-third of drug failures and cause significant fatality and severe morbidity. Thus the early identification of potential drug side-effects is of great interests. Most existing methods essentially rely on leveraging few drug similarities directly for side-effect predictions, ignoring the performance improvement by drug similarity integration and optimization. In this study, we proposed an optimized drug similarity framework (ODSF) to improve the performance of side-effect predictions. First, this framework integrates four different drug similarities into a comprehensive similarity. Next, the comprehensive similarity is optimized via clustering and then enhanced by indirect drug similarity. Finally, the optimized drug similarity is employed for side-effect predictions. The performance of ODSF was evaluated on simulative side-effect predictions of 917 drugs from the DrugBank. Extensive comparison experiments demonstrate that ODSF is competent to capture drug features from diverse perspectives and the prediction performance is significantly improved owing to the optimized drug similarity.
Zheng, Y, Lan, C, Peng, H & Li, J 2016, 'Using Constrained Information Entropy to Detect Rare Adverse Drug Reactions from Medical Forums', 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, IEEE, pp. 2460-2463.View/Download from: UTS OPUS or Publisher's site
Adverse drug reactions (ADRs) detection is critical to avoid malpractices yet challenging due to its uncertainty in pre-marketing review and the underreporting in post-marketing surveillance. To conquer this predicament, social media based ADRs detection methods have been proposed recently. However, existing researches are mostly co-occurrence based methods and face several issues, in particularly, leaving out the rare ADRs and unable to distinguish irrelevant ADRs. In this work, we introduce a constrained information entropy (CIE) method to solve these problems. CIE first recognizes the drug-related adverse reactions using a predefined keyword dictionary and then captures high- and low-frequency (rare) ADRs by information entropy. Extensive experiments on medical forums dataset demonstrate that CIE outperforms the state-of-the-art co-occurrence based methods, especially in rare ADRs detection.
Ghosh, S, Zheng, Y, Lammers, T, Chen, YY, Fitzmaurice, C, Johnston, S & Li, J 2016, 'Deriving public sector workforce insights: A case study using Australian public sector employment profiles', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Conference on Advanced Data Mining and Applications, Springer, Gold Coast, Queensland, Australia, pp. 764-774.View/Download from: UTS OPUS or Publisher's site
© Springer International Publishing AG 2016.Effective approaches for measurement of human capital in public sector and government agencies is essential for robust workforce planning against changing economic conditions. To this purpose, adopting innovative hypotheses driven workforce data analysis can help discover hidden patterns and trends about the workforce. These trends are useful for decision making and support the development of policies to reach desired employment outcomes. In this study, the data challenges and approaches to a real life workforce analytics scenario are described. Statistical results from numerous workforce data experiments are combined to derive three hypotheses that are useful to public sector organisations for human resources management and decision making.