MapReduce-based approach for accurate error correction of next-generation sequencing data
Seminar Title: MapReduce-based approach for accurate error correction of next-generation sequencing data
Speaker: A/Prof. Liang ZHAO
Date and time: 1:00 – 2:00pm, Wed 9 March, 2016
Seminar venue: CB11.12.113
Seminar Chairman: A/Prof Jinyan Li, AAi
Abstract:
Next-generation sequencing platforms have produced huge amounts of sequence data. This is revolutionizing every aspect of genetic and genomic research. However, these sequence datasets contain quite a number of machine-induced errors—e.g. errors due to substitution can be as high as 2.5%. Existing error-correction methods are still far from perfect. In fact, more errors are sometimes introduced than correct corrections, especially by prevalent methods based on k-mers. Existing methods have also made limited exploitation of parallelization and cloud computing. We describe the error-correction method MEC, which uses a two-layered MapReduce approach, to achieve both high accuracy and cloud scalability. In the first layer, input sequences are mapped to groups to identify the candidate erroneous bases in parallel. In the second layer, erroneous bases at the same position are linked together from all groups for making statistically reliable corrections. Experiments on real and simulated datasets show that our method outperforms existing approaches remarkably. Its per-position error rate is consistently the lowest, and correction gain the highest.
Biography:
Dr. Liang Zhao is an associate professor of the School of Computing and Electronic Information of Guangxi University. He earned his bachelor degree from Wuhan University, China in 2007 (computer science), and his Ph.D degree from Nanyang Technological University, Singapore in 2012 (bioinformatics). He completed his postdoctoral training at Baylor College of Medicine, USA in 2015 (statistical genetics). Dr Zhao's research interests include statistical genetics, immunoinformatics, computational biology, big data analysis, cloud computing, data mining and machine learning theories and their applications. He has published several peer reviewed journal and conference papers, and has pioneered the work on antibody-specific B-cell epitope prediction.