Aaron Darling is a Professor in Computational Genomics and Bioinformatics in the UTS Faculty of Science's ithree institute. He has over a decade of experience developing computational methods for comparative genomics and evolutionary modeling and in 2013 moved from the University of California-Davis to start a computational genomics group at UTS.
Darling embarked on his research career at the University of Wisconsin-Madison. Following a bachelor's degree in Computer Science, he worked with members of the UW-Madison Genome Center to sequence and analyze the first genomes of pathogenic E. coli. During this time Darling led the development of some widely used computational methods for analysing genomic data, including the mpiBLAST open source parallel BLAST software and the Mauve software for comparing multiple genome sequences.
Following the award of a Ph.D. at UW-Madison, Darling received a fellowship from the US National Science Foundation to pursue postdoctoral studies at The University of Queensland. After two years at UQ he then returned to UC Davis to develop a research program in computational metagenomics -- the study of uncultivated microorganisms from the environment using computational methods.
Darling now brings his experience to understand the relationship between humans and microorganisms in collaboration with microbiologists at the ithree institute.
- PLOS Computational Biology
- ISMB Microbiome COSI (2018, 2019)
- Workshop on Algorithms in Bioinformatics (2013)
- RECOMB Comparative Genomics (2011)
Professional society memberships:
- President, Australian Bioinformatics and Computational Biology Society (ABACBS)
Can supervise: YES
Designing and developing scalable computational algorithms to identify the complete set of genetic differences between two or more organisms and relating these differences to aspects of the organism's biology. Associating genomic changes to phenotypic changes.
The vast majority of life on the planet is microbial, and most of it can not be studied by laboratory cultivation. Metagenomics involves DNA sequencing of microbes taken directly from the environment. Current metagenomic methods require advanced computational, statistical, and machine learning techniques to identify the organisms present in a sample and characterize their potential for encoding functional proteins.
Life is thought to have existed on earth for at least four billion years. During this time, evolution has shaped the genomes of modern organisms. Using statistical methods such as continuous time Markov chain models we can infer the history of genome evolution that led to modern organisms. I am interested in applying methods from statistical mechanics and financial market modeling to develop scalable computational methods to reconstruct evolutionary histories.
Next-generation DNA sequencing
DNA is fundamentally a molecule that encodes digital information. New sequencing technology enables us to read this biological information en masse so that it can be analyzed computationally. I am interested in designing sequencing experiments and protocols in ways that maximize the useful information obtained about a biological system.
Professor Darling supervises research higher degree students.
Ayres, DL, Cummings, MP, Baele, G, Darling, AE, Lewis, PO, Swofford, DL, Huelsenbeck, JP, Lemey, P, Rambaut, A & Suchard, MA 2019, 'BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics.', Systematic biology, vol. 68, no. 6, pp. 1052-1061.View/Download from: UTS OPUS or Publisher's site
BEAGLE is a high-performance likelihood-calculation library for phylogenetic inference. The BEAGLE library defines a simple, but flexible, application programming interface (API), and includes a collection of efficient implementations for calculation under a variety of evolutionary models on different hardware devices. The library has been integrated into recent versions of popular phylogenetics software packages including BEAST and MrBayes and has been widely used across a diverse range of evolutionary studies. Here, we present BEAGLE 3 with new parallel implementations, increased performance for challenging data sets, improved scalability, and better usability. We have added new OpenCL and central processing unit-threaded implementations to the library, allowing the effective utilization of a wider range of modern hardware. Further, we have extended the API and library to support concurrent computation of independent partial likelihood arrays, for increased performance of nucleotide-model analyses with greater flexibility of data partitioning. For better scalability and usability, we have improved how phylogenetic software packages use BEAGLE in multi-GPU (graphics processing unit) and cluster environments, and introduced an automated method to select the fastest device given the data set, evolutionary model, and hardware. For application developers who wish to integrate the library, we also have developed an online tutorial. To evaluate the effect of the improvements, we ran a variety of benchmarks on state-of-the-art hardware. For a partitioned exemplar analysis, we observe run-time performance improvements as high as 5.9-fold over our previous GPU implementation. BEAGLE 3 is free, open-source software licensed under the Lesser GPL and available at https://beagle-dev.github.io.
Coil, D, Jospin, G, Darling, A, Wallis, C, Davis, I, Harris, S, Eisen, J, Holcombe, L & O'Flynn, C 2019, 'Genomes from Bacteria Associated with the Canine Oral Cavity: a Test Case for Automated Genome-Based Taxonomic Assignment', PLoS ONE, vol. 14, no. 6.View/Download from: UTS OPUS or Publisher's site
Abstract Taxonomy for bacterial isolates is commonly assigned via sequence analysis. However, the most common sequence-based approaches (e.g. 16S rRNA gene-based phylogeny or whole genome comparisons) are still labor intensive and subjective to varying degrees. Here we present a set of 33 bacterial genomes, isolated from the canine oral cavity. Taxonomy of these isolates was first assigned by PCR amplification of the 16S rRNA gene, Sanger sequencing, and taxonomy assignment using BLAST. After genome sequencing, taxonomy was revisited through a manual process using a combination of average nucleotide identity (ANI), concatenated marker gene phylogenies, and 16S rRNA gene phylogenies. This taxonomy was then compared to the automated taxonomic assignment given by the recently proposed Genome Taxonomy Database (GTDB). We found the results of all three methods to be similar (25 out of the 33 had matching genera), but the GTDB approach was less subjective, and required far less labor. The primary differences in the remaining taxonomic assignments related to proposed taxonomy changes by the GTDB team.
DeMaere, MZ & Darling, AE 2019, 'bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes', Genome Biology, vol. 20, pp. 46-46.View/Download from: UTS OPUS or Publisher's site
Fritz, A, Hofmann, P, Majda, S, Dahms, E, Dröge, J, Fiedler, J, Lesker, TR, Belmann, P, DeMaere, MZ, Darling, AE, Sczyrba, A, Bremges, A & McHardy, AC 2019, 'CAMISIM: simulating metagenomes and microbial communities.', Microbiome, vol. 7, no. 1.View/Download from: UTS OPUS or Publisher's site
BACKGROUND:Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. RESULTS:We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. CONCLUSIONS:CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM.
Roy Chowdhury, P, Fourment, M, DeMaere, MZ, Monahan, L, Merlino, J, Gottlieb, T, Darling, AE & Djordjevic, SP 2019, 'Identification of a novel lineage of plasmids within phylogenetically diverse subclades of IncHI2-ST1 plasmids.', Plasmid, vol. 102, pp. 56-61.View/Download from: UTS OPUS or Publisher's site
IncHI2-ST1 plasmids play an important role in co-mobilizing genes conferring resistance to critically important antibiotics and heavy metals. Here we present the identification and analysis of IncHI2-ST1 plasmid pSPRC-Echo1, isolated from an Enterobacter hormaechei strain from a Sydney hospital, which predates other multi-drug resistant IncHI2-ST1 plasmids reported from Australia. Our time-resolved phylogeny analysis indicates pSPRC-Echo1 represents a new lineage of IncHI2-ST1 plasmids and show how their diversification relates to the era of antibiotics.
ABSTRACT We developed Hackflex, a low-cost method for the production of Illumina-compatible sequencing libraries that allows up to 11 times more libraries for high-throughput Illumina sequencing to be generated at a fixed cost. We call this new method Hackflex. Quality of library preparation was tested by constructing libraries from E. coli MG1655 genomic DNA using either Hackflex, standard Nextera Flex or a variation of standard Nextera Flex in which the bead-linked transposase is diluted prior to use. We demonstrated that Hackflex can produce high quality libraries and yields a highly uniform coverage, equivalent to the standard Nextera Flex kit. Using Hackflex, we were able to achieve a per sample reagent cost of library prep of A$8.66, which is 8.23 times lower than the Standard Nextera Flex protocol at advertised retail price. An additional simple modification to the protocol enables a further price reduction of up to 11 fold or about A$6.50/sample. This method will allow researchers to construct more libraries within a given budget, thereby yielding more data and facilitating research programs where sequencing large numbers of libraries is beneficial.
Monahan, LG, DeMaere, MZ, Cummins, ML, Djordjevic, SP, Roy Chowdhury, P & Darling, AE 2019, 'High contiguity genome sequence of a multidrug-resistant hospital isolate of Enterobacter hormaechei.', Gut pathogens, vol. 11, no. 1.View/Download from: UTS OPUS or Publisher's site
Background:Enterobacter hormaechei is an important emerging pathogen and a key member of the highly diverse Enterobacter cloacae complex. E. hormaechei strains can persist and spread in nosocomial environments, and often exhibit resistance to multiple clinically important antibiotics. However, the genomic regions that harbour resistance determinants are typically highly repetitive and impossible to resolve with standard short-read sequencing technologies. Results:Here we used both short- and long-read methods to sequence the genome of a multidrug-resistant hospital isolate (C15117), which we identified as E. hormaechei. Hybrid assembly generated a complete circular chromosome of 4,739,272 bp and a fully resolved plasmid of 339,920 bp containing several antibiotic resistance genes. The strain also harboured a 34,857 bp repeat encoding copper resistance, which was present in both the chromosome and plasmid. Long reads that unambiguously spanned this repeat were required to resolve the chromosome and plasmid into separate replicons. Conclusion:This study provides important insights into the evolution and potential spread of antimicrobial resistance in a nosocomial E. hormaechei strain. More broadly, it further exemplifies the power of long-read sequencing technologies, particularly the Oxford Nanopore platform, for the characterisation of bacteria with complex resistance loci and large repeat elements.
Bogema, DR, McKinnon, J, Liu, M, Hitchick, N, Miller, N, Venturini, C, Iredell, J, Darling, AE, Roy Chowdury, P & Djordjevic, SP 2019, 'Whole-genome analysis of extraintestinal Escherichia coli sequence type 73 from a single hospital over a 2 year period identified different circulating clonal groups.', Microbial genomics.View/Download from: UTS OPUS or Publisher's site
Sequence type (ST)73 has emerged as one of the most frequently isolated extraintestinal pathogenic Escherichia coli. To examine the localized diversity of ST73 clonal groups, including their mobile genetic element profile, we sequenced the genomes of 16 multiple-drug resistant ST73 isolates from patients with urinary tract infection from a single hospital in Sydney, Australia, between 2009 and 2011. Genome sequences were used to generate a SNP-based phylogenetic tree to determine the relationship of these isolates in a global context with ST73 sequences (n=210) from public databases. There was no evidence of a dominant outbreak strain of ST73 in patients from this hospital, rather we identified at least eight separate groups, several of which reoccurred, over a 2 year period. The inferred phylogeny of all ST73 strains (n=226) including the ST73 clone D i2 reference genome shows high bootstrap support and clusters into four major groups that correlate with serotype. The Sydney ST73 strains carry a wide variety of virulence-associated genes, but the presence of iss, pic and several iron-acquisition operons was notable.
Kretzschmar, AL, Verma, A, Murray, S, Kahlke, T, Fourment, M & Darling, A 2019, 'Trial by phylogenetics - Evaluating the Multi-Species Coalescent for phylogenetic inference on taxa with high levels of paralogy (Gonyaulacales, Dinophyceae)'.View/Download from: UTS OPUS or Publisher's site
ABSTRACT From publicly available next-gen sequencing datasets of non-model organisms, such as marine protists, arise opportunities to explore their evolutionary relationships. In this study we explored the effects that dataset and model selection have on the phylogenetic inference of the Gonyaulacales, single celled marine algae of the phylum Dinoflagellata with genomes that show extensive paralogy. We developed a method for identifying and extracting single copy genes from RNA-seq libraries and compared phylogenies inferred from these single copy genes with those inferred from commonly used genetic markers and phylogenetic methods. Comparison of two datasets and three different phylogenetic models showed that exclusive use of ribosomal DNA sequences, maximum likelihood and gene concatenation showed very different results to that obtained with the multi-species coalescent. The multi-species coalescent has recently been recognized as being robust to the inclusion of paralogs, including hidden paralogs present in single copy gene sets (pseudoorthologs). Comparisons of model fit strongly favored the multi-species coalescent for these data, over a concatenated alignment (single tree) model. Our findings suggest that the multi-species coalescent (inferred either via Maximum Likelihood or Bayesian Inference) should be considered for future phylogenetic studies of organisms where accurate selection of orthologs is difficult.
Deutscher, AT, Burke, CM, Darling, AE, Riegler, M, Reynolds, OL & Chapman, TA 2018, 'Near full-length 16S rRNA gene next-generation sequencing revealed Asaia as a common midgut bacterium of wild and domesticated Queensland fruit fly larvae.', Microbiome, vol. 6, no. 1.View/Download from: UTS OPUS or Publisher's site
BACKGROUND:Gut microbiota affects tephritid (Diptera: Tephritidae) fruit fly development, physiology, behavior, and thus the quality of flies mass-reared for the sterile insect technique (SIT), a target-specific, sustainable, environmentally benign form of pest management. The Queensland fruit fly, Bactrocera tryoni (Tephritidae), is a significant horticultural pest in Australia and can be managed with SIT. Little is known about the impacts that laboratory-adaptation (domestication) and mass-rearing have on the tephritid larval gut microbiome. Read lengths of previous fruit fly next-generation sequencing (NGS) studies have limited the resolution of microbiome studies, and the diversity within populations is often overlooked. In this study, we used a new near full-length (> 1300 nt) 16S rRNA gene amplicon NGS approach to characterize gut bacterial communities of individual B. tryoni larvae from two field populations (developing in peaches) and three domesticated populations (mass- or laboratory-reared on artificial diets). RESULTS:Near full-length 16S rRNA gene sequences were obtained for 56 B. tryoni larvae. OTU clustering at 99% similarity revealed that gut bacterial diversity was low and significantly lower in domesticated larvae. Bacteria commonly associated with fruit (Acetobacteraceae, Enterobacteriaceae, and Leuconostocaceae) were detected in wild larvae, but were largely absent from domesticated larvae. However, Asaia, an acetic acid bacterium not frequently detected within adult tephritid species, was detected in larvae of both wild and domesticated populations (55 out of 56 larval gut samples). Larvae from the same single peach shared a similar gut bacterial profile, whereas larvae from different peaches collected from the same tree had different gut bacterial profiles. Clustering of the Asaia near full-length sequences at 100% similarity showed that the wild flies from different locations had different Asaia strains. CONCLUSIONS:Variation in the gut bac...
Darling, AE, Fritz, A, Hofmann, P, Majda, S, Dahms, E, Droge, J, Fiedler, J, Lesker, TR, Belmann, P, DeMaere, MZ, Sczyrba, A, Bremges, A & McHardy, AC 2018, 'CAMISIM: Simulating metagenomes and microbial communities'.View/Download from: UTS OPUS or Publisher's site
DeMaere, MZ & Darling, AE 2018, 'Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies.', GigaScience, vol. 7, no. 2, pp. 1-12.View/Download from: UTS OPUS or Publisher's site
Chromosome conformation capture (3C) and Hi-C DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols, potentially hindering the advancement of algorithms to exploit this new datatype.We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error.We have introduced the first comprehensive simulator for 3C and Hi-C sequencing protocols. We expect the simulator to have use in testing of Hi-C data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions can be made in advance in order to ensure adequate statistical power with respect to experimental hypothesis testing.
O'Donoghue, S, Baldi, B, Clark, S, Darling, A, Hogan, J, Kaur, S, Maier-Hein, L, McCarthy, D, Moore, W, Stenau, E, Swedlow, J, Vuong, J & Procter, J 2018, 'Visualization of Biomedical Data'.View/Download from: Publisher's site
The rapid increase in volume and complexity of biomedical data requires changes in research, communication, training, and clinical practices. This includes learning how to effectively integrate automated analysis with high-data-density visualizations that clearly express complex phenomena. In this review, we summarize key principles and resources from data visualization research that address this difficult challenge. We then survey how visualization is being used in a selection of emerging biomedical research areas, including: 3D genomics, single-cell RNA-seq, the protein structure universe, phosphoproteomics, augmented-reality surgery, and metagenomics. While specific areas need highly tailored visualization tools, there are common visualization challenges that can be addressed with general methods and strategies. Unfortunately, poor visualization practices are also common; however, there are good prospects for improvements and innovations that will revolutionize how we see and think about our data. We outline initiatives aimed at fostering these improvements via better tools, peer-to-peer learning, and interdisciplinary collaboration with computer scientists, science communicators, and graphic designers.
O'Donoghue, SI, Baldi, BF, Clark, SJ, Darling, AE, Hogan, JM, Kaur, S, Maier-Hein, L, McCarthy, DJ, Moore, WJ, Stenau, E, Swedlow, JR, Vuong, J & Procter, JB 2018, 'Visualization of Biomedical Data', Annual Review of Biomedical Data Science, vol. 1, no. 1, pp. 275-304.View/Download from: UTS OPUS or Publisher's site
Van Deynze, A, Zamora, P, Delaux, P-M, Heitmann, C, Gibson, D, Schwartz, K, Berry, A, Graham, D, Jayaraman, D, Rajasekar, S, Maeda, J, Bhatnagar, S, Jospin, G, Darling, AE, Jeannotte, R, Lopez, J, Weimer, B, Eisen, J, Shapiro, H-Y, Ané, J-M & Bennett, A 2018, 'Nitrogen fixation in a landrace of maize is supported by a mucilage-associated diazotrophic microbiota.', PLoS Biology, vol. 16, no. 8.View/Download from: UTS OPUS or Publisher's site
Vu, D, Darling, AE & Matsen, FA 2018, 'Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo', SYSTEMATIC BIOLOGY, vol. 67, no. 3, pp. 503-517.View/Download from: UTS OPUS or Publisher's site
Time-resolved phylogenetic methods use information about the time of sample collection to estimate the rate of evolution. Originally, the models used to estimate evolutionary rates were quite simple, assuming that all lineages evolve at the same rate, an assumption commonly known as the molecular clock. Richer and more complex models have since been introduced to capture the phenomenon of substitution rate variation among lineages. Two well known model extensions are the local clock, wherein all lineages in a clade share a common substitution rate, and the uncorrelated relaxed clock, wherein the substitution rate on each lineage is independent from other lineages while being constrained to fit some parametric distribution. We introduce a further model extension, called the flexible local clock (FLC), which provides a flexible framework to combine relaxed clock models with local clock models. We evaluate the flexible local clock on simulated and real datasets and show that it provides substantially improved fit to an influenza dataset. An implementation of the model is available for download from https://www.github.com/4ment/flc.
Fourment, M, Claywell, BC, Dinh, V, McCoy, C, Matsen Iv, FA & Darling, AE 2018, 'Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals.', Systematic Biology, vol. 67, no. 3, pp. 490-502.View/Download from: UTS OPUS or Publisher's site
Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop "guided" proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy.
Bogema, DR, Micallef, ML, Liu, M, Padula, MP, Djordjevic, SP, Darling, AE & Jenkins, C 2018, 'Analysis of Theileria orientalis draft genome sequences reveals potential species-level divergence of the Ikeda, Chitose and Buffeli genotypes.', BMC genomics, vol. 19, no. 1, pp. 298-298.View/Download from: UTS OPUS or Publisher's site
BACKGROUNDTheileria orientalis (Apicomplexa: Piroplasmida) has caused clinical disease in cattle of Eastern Asia for many years and its recent rapid spread throughout Australian and New Zealand herds has caused substantial economic losses to production through cattle deaths, late term abortion and morbidity. Disease outbreaks have been linked to the detection of a pathogenic genotype of T. orientalis, genotype Ikeda, which is also responsible for disease outbreaks in Asia. Here, we sequenced and compared the draft genomes of one pathogenic (Ikeda) and two apathogenic (Chitose, Buffeli) isolates of T. orientalis sourced from Australian herds.RESULTSUsing de novo assembled sequences and a single nucleotide variant (SNV) analysis pipeline, we found extensive genetic divergence between the T. orientalis genotypes. A genome-wide phylogeny reconstructed to address continued confusion over nomenclature of this species displayed concordance with prior phylogenetic studies based on the major piroplasm surface protein (MPSP) gene. However, average nucleotide identity (ANI) values revealed that the divergence between isolates is comparable to that observed between other theilerias which represent distinct species. Analysis of SNVs revealed putative recombination between the Chitose and Buffeli genotypes and also between Australian and Japanese Ikeda isolates. Finally, to inform future vaccine studies, dN/dS ratios and surface location predictions were analysed. Six predicted surface protein targets were confirmed to be expressed during the piroplasm phase of the parasite by mass spectrometry.CONCLUSIONSWe used whole genome sequencing to demonstrate that the T. orientalis Ikeda, Chitose and Buffeli variants show substantial genetic divergence. Our data indicates that future researchers could potentially consider disease-associated Ikeda and closely related genotypes as a separate species from non-pathogenic Chitose and Buffeli.
Darling, AE & DeMaere, MZ 2017, 'Critical Assessment of Metagenome Interpretation − a benchmark of metagenomics software', Nature Methods, pp. 1063-1073.View/Download from: UTS OPUS or Publisher's site
In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented
complexity and realism. Benchmark metagenomes were generated from newly sequenced ~700 microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related
strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer
specific research questions.
Abstract Background Chromosome conformation capture (3C) and HiC DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols. Findings We describe a computational simulator that, given reference genome sequences and some basic parameters, will simulate HiC sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in HiC and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of topologically associating domains (TADs) is provided. The simulator also models several sources of error common to 3C and HiC library preparation and sequencing methods, including spurious proximity ligation events and sequencing error. Conclusions We have introduced the first comprehensive simulator for 3C and HiC sequencing protocols. We expect the simulator to have use in testing of HiC data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions must be made in advance in order to ensure adequate statistical power to test the relevant hypotheses.
Quince, C, Delmont, TO, Raguideau, S, Alneberg, J, Darling, AE, Collins, G & Eren, AM 2017, 'DESMAN: a new tool for de novo extraction of strains from metagenomes', GENOME BIOLOGY, vol. 18.View/Download from: UTS OPUS or Publisher's site
Wang, K, Chen, Y-Q, Salido, MM, Kohli, GS, Kong, J-L, Liang, H-J, Yao, Z-T, Xie, Y-T, Wu, H-Y, Cai, S-Q, Drautz-Moses, DI, Darling, AE, Schuster, SC, Yang, L & Ding, Y 2017, 'The rapid in vivo evolution of Pseudomonas aeruginosa in ventilator-associated pneumonia patients leads to attenuated virulence.', Open Biology, vol. 7, no. 9, pp. 1-13.View/Download from: UTS OPUS or Publisher's site
Pseudomonas aeruginosa is an opportunistic pathogen that causes severe airway infections in humans. These infections are usually difficult to treat and associated with high mortality rates. While colonizing the human airways, P. aeruginosa could accumulate genetic mutations that often lead to its better adaptability to the host environment. Understanding these evolutionary traits may provide important clues for the development of effective therapies to treat P. aeruginosa infections. In this study, 25 P. aeruginosa isolates were longitudinally sampled from the airways of four ventilator-associated pneumonia (VAP) patients. Pacbio and Illumina sequencing were used to analyse the in vivo evolutionary trajectories of these isolates. Our analysis showed that positive selection dominantly shaped P. aeruginosa genomes during VAP infections and led to three convergent evolution events, including loss-of-function mutations of lasR and mpl, and a pyoverdine-deficient phenotype. Specifically, lasR encodes one of the major transcriptional regulators in quorum sensing, whereas mpl encodes an enzyme responsible for recycling cell wall peptidoglycan. We also found that P. aeruginosa isolated at late stages of VAP infections produce less elastase and are less virulent in vivo than their earlier isolated counterparts, suggesting the short-term in vivo evolution of P. aeruginosa leads to attenuated virulence.
Fourment, Darling, AE & Holmes, EC 2017, 'The Impact of Migratory Flyways on the Spread of Avian Influenza Virus in North America', BMC Evolutionary Biology, vol. 17, no. 1.View/Download from: UTS OPUS or Publisher's site
Wild birds are the major reservoir hosts for influenza A viruses (AIVs) and have been implicated in the emergence of pandemic events in livestock and human populations. Understanding how AIVs spread within and across continents is therefore critical to the development of successful strategies to manage and reduce the impact of influenza outbreaks. In North America many bird species undergo seasonal migratory movements along a North-South axis, thereby fostering opportunities for viruses to spread over long distances. However, the role played by such avian flyways in shaping the genetic structure of AIV populations has proven controversial. To assess the relative contribution of bird migration along flyways to the genetic structure of AIV we performed a large-scale phylogeographic study of viruses sampled in the USA and Canada, involving the analysis of 3805 to 4505 sequences from 36 to 38 geographic localities depending on the gene data set. To assist this we developed a maximum likelihood-based genetic algorithm to explore a wide range of complex spatial models, thereby depicting a more complete picture of the migration network than previous studies. Based on phylogenies estimated from nucleotide data sets, our results show that AIV migration rates within flyways are significantly higher than those between flyways, indicating that the migratory patterns of birds play a key role in pathogen dispersal. These findings provide valuable insights into the evolution, maintenance and transmission of AIVs, in turn allowing the development of improved programs for surveillance and risk assessment.
Reid, C, Wyrsch, E, Chowdhury, PR, Zingali, T, Liu, M, Darling, A, Chapman, T & Djordjevic, S 2017, 'Porcine commensal Escherichia coli: A reservoir for class 1 integrons associated with IS26'.View/Download from: UTS OPUS or Publisher's site
Abstract Porcine faecal waste is a serious environmental pollutant. Carriage of antimicrobial resistance and virulence-associated genes (VAGs) and the zoonotic potential of commensal Escherichia coli from swine is largely unknown. Furthermore, little is known about the role of commensal E. coli as contributors to the mobilisation of antimicrobial resistance genes between food animals and the environment. Here, we report whole genome sequence analysis of 141 E. coli from the faeces of healthy pigs. Most strains belonged to phylogroups A and B1 and carried i) a class 1 integron; ii) VAGs linked with extraintestinal infection in humans; iii) antimicrobial resistance genes bla TEM , aphAl, cmlA, strAB, tet(A) A, dfrA12, dfrA5, sul1, sul2, sul3 ; iv) IS26; and v) heavy metal resistance genes ( merA, cusA, terA ). Carriage of the sulphonamide resistance gene sul3 was notable in this study. The 141 strains belonged to 42 multilocus sequence types, but clonal complex 10 featured prominently. Structurally diverse class 1 integrons that were frequently associated with IS26 carried unique genetic features that were also identified in extraintestinal pathogenic E. coli (ExPEC) from humans. This study provides the first detailed genomic analysis and point of reference for commensal E. coli of porcine origin, facilitating tracking of specific lineages and the mobile resistance genes they carry. Conflict of Interest Statement None to declare.
Reid, CJ, Wyrsch, ER, Roy Chowdhury, P, Zingali, T, Liu, M, Darling, AE, Chapman, TA & Djordjevic, SP 2017, 'Porcine commensal Escherichia coli: a reservoir for class 1 integrons associated with IS26.', Microbial Genomics, vol. 3, no. 12, pp. 1-13.View/Download from: UTS OPUS or Publisher's site
Porcine faecal waste is a serious environmental pollutant. Carriage of antimicrobial-resistance genes (ARGs) and virulence-associated genes (VAGs), and the zoonotic potential of commensal Escherichia coli from swine are largely unknown. Furthermore, little is known about the role of commensal E. coli as contributors to the mobilization of ARGs between food animals and the environment. Here, we report whole-genome sequence analysis of 103 class 1 integron-positive E. coli from the faeces of healthy pigs from two commercial production facilities in New South Wales, Australia. Most strains belonged to phylogroups A and B1, and carried VAGs linked with extraintestinal infection in humans. The 103 strains belonged to 37 multilocus sequence types and clonal complex 10 featured prominently. Seventeen ARGs were detected and 97 % (100/103) of strains carried three or more ARGs. Heavy-metal-resistance genes merA, cusA and terA were also common. IS26 was observed in 98 % (101/103) of strains and was often physically associated with structurally diverse class 1 integrons that carried unique genetic features, which may be tracked. This study provides, to our knowledge, the first detailed genomic analysis and point of reference for commensal E. coli of porcine origin in Australia, facilitating tracking of specific lineages and the mobile resistance genes they carry.
Darling, AE, Liu, M, Worden, P, Monahan, L, Demaere, M, Burke, C, Djordjevic, S & Charles, I 2017, 'Evaluation of ddRADseq for reduced representation metagenome sequencing', PeerJ, vol. 5.View/Download from: UTS OPUS or Publisher's site
'Who is doing what' is the ultimate open question in microbiome study. Shotgun metagenomics is often applied to gain knowledge of functional roles for bacteria in microbial communities, where the data can be used to predict protein encoding genes and enzymatic pathways present in the community, sometimes leading to testable hypotheses for microbial
function. We describe a method and basic analysis for a metagenomic adaptation of the double digest restriction site associated DNA sequencing (ddRADseq) protocol for reduced representation metagenome profiling. This technique takes advantage of the sequence
specificity of restriction endonucleases to construct an Illumina-compatible sequencing library containing DNA fragments that are between a pair of restriction sites located within close proximity. This results in a reduced sequencing library with coverage breadth that can
be tuned by size selection.
We assessed the performance of the metagenomic ddRADseq approach by applying the method to human stool samples and generating sequence data. We evaluate the extent to which ddRADseq data provides an unbiased reduced representation for microbiome profiling.
Although ddRADseq does introduce some bias in taxonomic representation, the bias is likely to be small relative to DNA extraction bias. ddRADseq appears feasible and could have value as a tool for metagenome-wide association studies.
Gardiner, M, Vicaretti, M, Sparks, J, Bansal, S, Bush, S, Liu, M, Darling, A, Harry, E & Burke, CM 2017, 'A longitudinal study of the diabetic skin and wound microbiome.', PeerJ, vol. 5, pp. e3543-e3543.View/Download from: UTS OPUS or Publisher's site
BACKGROUND: Type II diabetes is a chronic health condition which is associated with skin conditions including chronic foot ulcers and an increased incidence of skin infections. The skin microbiome is thought to play important roles in skin defence and immune functioning. Diabetes affects the skin environment, and this may perturb skin microbiome with possible implications for skin infections and wound healing. This study examines the skin and wound microbiome in type II diabetes. METHODS: Eight type II diabetic subjects with chronic foot ulcers were followed over a time course of 10 weeks, sampling from both foot skin (swabs) and wounds (swabs and debrided tissue) every two weeks. A control group of eight control subjects was also followed over 10 weeks, and skin swabs collected from the foot skin every two weeks. Samples were processed for DNA and subject to 16S rRNA gene PCR and sequencing of the V4 region. RESULTS: The diabetic skin microbiome was significantly less diverse than control skin. Community composition was also significantly different between diabetic and control skin, however the most abundant taxa were similar between groups, with differences driven by very low abundant members of the skin communities. Chronic wounds tended to be dominated by the most abundant skin Staphylococcus, while other abundant wound taxa differed by patient. No significant correlations were found between wound duration or healing status and the abundance of any particular taxa. DISCUSSION: The major difference observed in this study of the skin microbiome associated with diabetes was a significant reduction in diversity. The long-term effects of reduced diversity are not yet well understood, but are often associated with disease conditions.
Chernomoretz, A, Stolovitzky, G, Labaj, PP, Graf, AB, Darling, A, Burke, C, Noushmehr, H, Moraes, MO, Dias-Neto, E, Guo, Y, Xie, Z, Lee, P, Shi, L, Ruiz-Perez, CA, Mercedes Zambrano, M, Siam, R, Ouf, A, Richard, H, Lafontaine, I, Wieler, LH, Semmler, T, Ahmed, N, Prithi-viraj, B, Nedunuri, N, Mehr, S, Banihashemi, K, Lista, F, Anselmo, A, Suzuki, H, Kuroda, M, Yamashita, R, Sato, Y, Kaminuma, E, Alpuche Aranda, CM, Martinez, J, Dada, C, Dybwad, M, Oliveira, M, Schuster, S, Siwo, GH, Jang, S, Seo, SC, Hwang, SH, Ossowski, S, Bezdan, D, Chaker, S, Chatziefthimiou, AD, Udekwu, K, Liungdahl, P, Sezerman, U, Meydan, C, Elhaik, E, Gonnet, G, Schriml, LM, Mongodin, E, Huttenhower, C, Gilbert, J, Mason, CE, Eisen, J, Hirschberg, D & Hernandez, M 2016, 'The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report', MICROBIOME, vol. 4.View/Download from: UTS OPUS or Publisher's site
Joss, TV, Burke, CM, Hudson, BJ, Darling, AE, Forer, M, Alber, DG, Charles, IG & Stow, NW 2016, 'Bacterial Communities Vary between Sinuses in Chronic Rhinosinusitis Patients.', Frontiers in Microbiology, vol. 6, pp. 1-11.View/Download from: UTS OPUS or Publisher's site
Chronic rhinosinusitis (CRS) is a common and potentially debilitating disease characterized by inflammation of the sinus mucosa for longer than 12 weeks. Bacterial colonization of the sinuses and its role in the pathogenesis of this disease is an ongoing area of research. Recent advances in culture-independent molecular techniques for bacterial identification have the potential to provide a more accurate and complete assessment of the sinus microbiome, however there is little concordance in results between studies, possibly due to differences in the sampling location and techniques. This study aimed to determine whether the microbial communities from one sinus could be considered representative of all sinuses, and examine differences between two commonly used methods for sample collection, swabs, and tissue biopsies. High-throughput DNA sequencing of the bacterial 16S rRNA gene was applied to both swab and tissue samples from multiple sinuses of 19 patients undergoing surgery for treatment of CRS. Results from swabs and tissue biopsies showed a high degree of similarity, indicating that swabbing is sufficient to recover the microbial community from the sinuses. Microbial communities from different sinuses within individual patients differed to varying degrees, demonstrating that it is possible for distinct microbiomes to exist simultaneously in different sinuses of the same patient. The sequencing results correlated well with culture-based pathogen identification conducted in parallel, although the culturing missed many species detected by sequencing. This finding has implications for future research into the sinus microbiome, which should take this heterogeneity into account by sampling patients from more than one sinus.
Coil, DA, Alexiev, A, Wallis, C, O'Flynn, C, Deusch, O, Davis, I, Horsfall, A, Kirkwood, N, Jospin, G, Eisen, JA, Harris, S & Darling, AE 2016, 'Draft genome sequences of 26 Porphyromonas strains isolated from the canine oral microbiome', Genome Announcements, vol. 3, no. 2.View/Download from: UTS OPUS or Publisher's site
� 2015 Coil et al. We present the draft genome sequences for 26 strains of Porphyromonas (P. canoris, P. gulae, P. cangingavalis, P. macacae, and 7 unidentified) and an unidentified member of the Porphyromonadaceae family. All of these strains were isolated from the canine oral cavity, from dogs with and without early periodontal disease.
DeMaere, MZ & Darling, AE 2016, 'Deconvoluting simulated metagenomes: the performance of hard- and soft-clustering algorithms applied to metagenomic chromosome conformation capture (3C)', PEERJ, vol. 4.View/Download from: UTS OPUS or Publisher's site
Chowdhury, PR, DeMaere, M, Chapman, T, Worden, P, Charles, IG, Darling, AE & Djordjevic, SP 2016, 'Comparative genomic analysis of toxin-negative strains of Clostridium difficile from humans and animals with symptoms of gastrointestinal disease', BMC MICROBIOLOGY, vol. 16.View/Download from: UTS OPUS or Publisher's site
Coil, D, Jospin, G & Darling, AE 2015, 'A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data', BIOINFORMATICS, vol. 31, no. 4, pp. 587-589.View/Download from: UTS OPUS or Publisher's site
© 2015 Liu M and Darling A. We review currently available technologies for deconvoluting metagenomic data into individual genomes that represent populations, strains, or genotypes present in the community. An evaluation of chromosome conformation capture (3C) and related techniques in the context of metagenomics is presented, using mock microbial communities as a reference. We provide the first independent reproduction of the metagenomic 3C technique described last year, propose some simple improvements to that protocol, and compare the quality of the data with that provided by the more complex Hi-C protocol.
O'Flynn, C, Deusch, O, Darling, AE, Eisen, JA, Wallis, C, Davis, IJ & Harris, SJ 2015, 'Comparative Genomics of the Genus Porphyromonas Identifies Adaptations for Heme Synthesis within the Prevalent Canine Oral Species Porphyromonas cangingivalis', GENOME BIOLOGY AND EVOLUTION, vol. 7, no. 12, pp. 3397-3413.View/Download from: UTS OPUS or Publisher's site
Wyrsch, E, Roy Chowdhury, P, Abraham, S, Santos, J, Darling, AE, Charles, IG, Chapman, TA & Djordjevic, SP 2015, 'Comparative genomic analysis of a multiple antimicrobial resistant enterotoxigenic E. coli O157 lineage from Australian pigs.', BMC Genomics, vol. 16, pp. 1-11.View/Download from: UTS OPUS or Publisher's site
BACKGROUND: Enterotoxigenic Escherichia coli (ETEC) are a major economic threat to pig production globally, with serogroups O8, O9, O45, O101, O138, O139, O141, O149 and O157 implicated as the leading diarrhoeal pathogens affecting pigs below four weeks of age. A multiple antimicrobial resistant ETEC O157 (O157 SvETEC) representative of O157 isolates from a pig farm in New South Wales, Australia that experienced repeated bouts of pre- and post-weaning diarrhoea resulting in multiple fatalities was characterized here. Enterohaemorrhagic E. coli (EHEC) O157:H7 cause both sporadic and widespread outbreaks of foodborne disease, predominantly have a ruminant origin and belong to the ST11 clonal complex. Here, for the first time, we conducted comparative genomic analyses of two epidemiologically-unrelated porcine, disease-causing ETEC O157; E. coli O157 SvETEC and E. coli O157:K88 734/3, and examined their phylogenetic relationship with EHEC O157:H7. RESULTS: O157 SvETEC and O157:K88 734/3 belong to a novel sequence type (ST4245) that comprises part of the ST23 complex and are genetically distinct from EHEC O157. Comparative phylogenetic analysis using PhyloSift shows that E. coli O157 SvETEC and E. coli O157:K88 734/3 group into a single clade and are most similar to the extraintestinal avian pathogenic Escherichia coli (APEC) isolate O78 that clusters within the ST23 complex. Genome content was highly similar between E. coli O157 SvETEC, O157:K88 734/3 and APEC O78, with variability predominantly limited to laterally acquired elements, including prophages, plasmids and antimicrobial resistance gene loci. Putative ETEC virulence factors, including the toxins STb and LT and the K88 (F4) adhesin, were conserved between O157 SvETEC and O157:K88 734/3. The O157 SvETEC isolate also encoded the heat stable enterotoxin STa and a second allele of STb, whilst a prophage within O157:K88 734/3 encoded the serum survival gene bor. Both isolates harbor a large repertoire of antibi...
Becker, EA, Seitzer, PM, Tritt, A, Larsen, D, Krusor, M, Yao, AI, Wu, D, Madern, D, Eisen, JA, Darling, AE & Facciotti, MT 2014, 'Phylogenetically Driven Sequencing of Extremely Halophilic Archaea Reveals Strategies for Static and Dynamic Osmo-response', PLOS GENETICS, vol. 10, no. 11.View/Download from: UTS OPUS or Publisher's site
Beitel, C, Froenicke, L, Lang, JM, Korf, IF, Michelmore, RW, Eisen, JA & Darling, AE 2014, 'Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products', PeerJ, vol. 2.View/Download from: UTS OPUS or Publisher's site
Metagenomics is a valuable tool for the study of microbial communities but has been limited by the difficulty of binning the resulting sequences into groups corresponding to the individual species and strains that constitute the community. Moreover, there are presently no methods to track the flow of mobile DNA elements such as plasmids through communities or to determine which of these are co-localized within the same cell. We address these limitations by applying Hi-C, a technology originally designed for the study of three-dimensional genome structure in eukaryotes, to measure the cellular co-localization of DNA sequences. We leveraged Hi-C data generated from a simple synthetic metagenome sample to accurately cluster metagenome assembly contigs into groups that contain nearly complete genomes of each species. The Hi-C data also reliably associated plasmids with the chromosomes of their host and with each other. We further demonstrated that Hi-C data provides a long-range signal of strain-specific genotypes, indicating such data may be useful for high-resolution genotyping of microbial populations. Our work demonstrates that Hi-C sequencing data provide valuable information for metagenome analyses that are not currently obtainable by other methods. This metagenomic Hi-C method could facilitate future studies of the fine-scale population structure of microbes, as well as studies of how antibiotic resistance plasmids (or other genetic elements) mobilize in microbial communities. The method is not limited to microbiology; the genetic architecture of other heterogeneous populations of cells could also be studied with this technique.
We describe a method for sequencing full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform. The resulting sequences have about 100-fold higher accuracy than standard Illumina reads and are chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection. We demonstrate that the data provides fine scale phylogenetic resolution not available from Illumina amplicon methods targeting smaller variable regions of the 16S rRNA gene.
Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection. In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata. These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).
Earl, D, Nguyen, N, Hickey, G, Harris, RS, Fitzgerald, S, Beal, K, Seledtsov, I, Molodtsov, V, Raney, BJ, Clawson, H, Kim, J, Kemena, C, Chang, J-M, Erb, I, Poliakov, A, Hou, M, Herrero, J, Kent, WJ, Solovyev, V, Darling, AE, Ma, J, Notredame, C, Brudno, M, Dubchak, I, Haussler, D & Paten, B 2014, 'Alignathon: a competitive assessment of whole-genome alignment methods', GENOME RESEARCH, vol. 24, no. 12, pp. 2077-2089.View/Download from: UTS OPUS or Publisher's site
Pineda, SS, Sollod, B, Wilson, D, Darling, AE, Sunagar, K, Undheim, E, Kely, L, Agostinho, A, Fry, B & King, GF 2014, 'Diversification of a single ancestral gene into a successful toxin superfamily in highly venomous Australian funnel-web spiders', BMC Genomics, vol. 15, no. 1, pp. 1-16.View/Download from: UTS OPUS or Publisher's site
Background Spiders have evolved pharmacologically complex venoms that serve to rapidly subdue prey and deter predators. The major toxic factors in most spider venoms are small, disulfide-rich peptides. While there is abundant evidence that snake venoms evolved by recruitment of genes encoding normal body proteins followed by extensive gene duplication accompanied by explosive structural and functional diversification, the evolutionary trajectory of spider-venom peptides is less clear. Results Here we present evidence of a spider-toxin superfamily encoding a high degree of sequence and functional diversity that has evolved via accelerated duplication and diversification of a single ancestral gene. The peptides within this toxin superfamily are translated as prepropeptides that are posttranslationally processed to yield the mature toxin. The N-terminal signal sequence, as well as the protease recognition site at the junction of the propeptide and mature toxin are conserved, whereas the remainder of the propeptide and mature toxin sequences are variable. All toxin transcripts within this superfamily exhibit a striking cysteine codon bias. We show that different pharmacological classes of toxins within this peptide superfamily evolved under different evolutionary selection pressures. Conclusions Overall, this study reinforces the hypothesis that spiders use a combinatorial peptide library strategy to evolve a complex cocktail of peptide toxins that target neuronal receptors and ion channels in prey and predators. We show that the ?-hexatoxins that target insect voltage-gated calcium channels evolved under the influence of positive Darwinian selection in an episodic fashion, whereas the ?-hexatoxins that target insect calcium-activated potassium channels appear to be under negative selection. A majority of the diversifying sites in the ?-hexatoxins are concentrated on the molecular surface of the toxins, thereby facilitating neofunctionalisation leading to new toxin...
Lauro, FM, Senstius, SJ, Cullen, J, Neches, R, Jensen, RM, Brown, MV, Darling, AE, Givskov, M, McDougald, D, Hoeke, R, Ostrowski, M, Philip, GK, Paulsen, IT & Grzymski, JJ 2014, 'The common oceanographer: crowdsourcing the collection of oceanographic data.', PLoS biology, vol. 12, no. 9, pp. e1001947-e1001947.View/Download from: UTS OPUS or Publisher's site
Darling, AE, McKinnon, J, Santos, J, Charles, IG, Roy Chowdhury, P, Djordjevic, S & Worden, P 2014, 'A draft genome of Escherichia coli sequence type 127 strain 2009-46.', Gut Pathogens, vol. Sept 1, no. 6, pp. 32-32.View/Download from: UTS OPUS or Publisher's site
Background Clostridium difficile is the leading cause of infectious diarrhea in humans and responsible for large outbreaks of enteritis in neonatal pigs in both North America and Europe. Disease caused by C. difficile typically occurs during antibiotic therapy and its emergence over the past 40 years is linked with the widespread use of broad-spectrum antibiotics in both human and veterinary medicine. Results We sequenced the genome of Clostridium difficile 5.3 using the Illumina Nextera XT and MiSeq technologies. Assembly of the sequence data reconstructed a 4,009,318 bp genome in 27 scaffolds with an N50 of 786 kbp. The genome has extensive similarity to other sequenced C. difficile genomes, but also has several genes that are potentially related to virulence and pathogenicity that are not present in the reference C. difficile strain. Conclusion Genome sequencing of human and animal isolates is needed to understand the molecular events driving the emergence of C. difficile as a gastrointestinal pathogen of humans and food animals and to better define its zoonotic potential.
Bendiks, ZA, Lang, JM, Darling, AE, Eisen, JA & Coil, DA 2013, 'Draft Genome Sequence of Microbacterium sp. Strain UCD-TDU (Phylum Actinobacteria).', Genome Announcements, vol. 1, no. 2, pp. 1-2.View/Download from: UTS OPUS or Publisher's site
Here, we present the draft genome sequence of Microbacterium sp. strain UCD-TDU, a member of the phylum Actinobacteria. The assembly contains 3,746,321 bp (in 8 scaffolds). This strain was isolated from a residential toilet as part of an undergraduate student research project to sequence reference genomes of microbes from the built environment.
Coil, DA, Doctor, JI, Lang, JM, Darling, AE & Eisen, JA 2013, 'Draft genome sequence of Kocuria sp. strain UCD-OTCP (phylum Actinobacteria)', Genome Announcements, vol. 1, no. 3.View/Download from: UTS OPUS or Publisher's site
© 2013 Coil et al. Here, we present the draft genome of Kocuria sp. strain UCD-OTCP, a member of the phylum Actinobacteria, isolated from a restaurant chair cushion. The assembly contains 3,791,485 bp (G+C content of 73%) and is contained in 68 scaffolds.
Diep, AL, Lang, JM, Darling, AE, Eisen, JA & Coil, DA 2013, 'Draft Genome Sequence of Dietzia sp. Strain UCD-THP (Phylum Actinobacteria).', Genome Announcements, vol. 1, no. 3, pp. 198-204.View/Download from: UTS OPUS or Publisher's site
Here, we present the draft genome sequence of an actinobacterium, Dietzia sp. strain UCD-THP, isolated from a residential toilet handle. The assembly contains 3,915,613 bp. The genome sequences of only two other Dietzia species have been published, those of Dietzia alimentaria and Dietzia cinnamea.
Flanagan, JC, Lang, JM, Darling, AE, Eisen, JA & Coil, DA 2013, 'Draft Genome Sequence of Curtobacterium flaccumfaciens Strain UCD-AKU (Phylum Actinobacteria).', Genome Announcements, vol. 1, no. 3, pp. 1-2.View/Download from: UTS OPUS or Publisher's site
Here we present the draft genome of an actinobacterium, Curtobacterium flaccumfaciens strain UCD-AKU, isolated from a residential carpet. The genome assembly contains 3,692,614 bp in 130 contigs. This is the first member of the Curtobacterium genus to be sequenced.
Holland-Moritz, HE, Bevans, DR, Lang, JM, Darling, AE, Eisen, JA & Coil, DA 2013, 'Draft Genome Sequence of Leucobacter sp. Strain UCD-THU (Phylum Actinobacteria).', Genome Announcements, vol. 1, no. 3, pp. 1-2.View/Download from: UTS OPUS or Publisher's site
Here we present the draft genome of Leucobacter sp. strain UCD-THU. The genome contains 3,317,267 bp in 11 scaffolds. This strain was isolated from a residential toilet as part of an undergraduate project to sequence reference genomes of microbes from the built environment.
Lang, J, Darling, AE & Eisen, JA 2013, 'Phylogeny Of Bacterial And Archaeal Genomes Using Conserved Genes: Supertrees And Supermatrices', PLoS ONE, vol. 8, no. 4, pp. 1-15.View/Download from: UTS OPUS or Publisher's site
Over 3000 microbial (bacterial and archaeal) genomes have been made publically available to date, providing an unprecedented opportunity to examine evolutionary genomic trends and offering valuable reference data for a variety of other studies such as me
Lo, JR, Lang, JM, Darling, AE, Eisen, JA & Coil, DA 2013, 'Draft genome sequence of an Actinobacterium, Brachybacterium muris strain UCD-AY4.', Genome Announcements, vol. 1, no. 2, pp. 1-2.View/Download from: UTS OPUS or Publisher's site
Here we present the draft genome of an actinobacterium, Brachybacterium muris UCD-AY4. The assembly contains 3,257,338 bp and has a GC content of 70%. This strain was isolated from a residential bath towel and has a 16S rRNA gene 99.7% identical to that of the original B. muris strain, C3H-21.
Rands, C, Darling, AE, Fujita, M, Kong, L, Webster, M, Clabaut, C, Emes, R, Heger, A, Meader, S, Hawkins, M, Eisen, M, Teiling, C, Affourtit, J, Boese, B, Grant, P, Grant, BR, Eisen, JA, Abzhanov, A & Ponting, C 2013, 'Insights Into The Evolution Of Darwin's Finches From Comparative Analysis Of The Geospiza Magnirostris Genome Sequence', BMC Genomics, vol. 14, pp. 1-15.View/Download from: UTS OPUS or Publisher's site
Background: A classical example of repeated speciation coupled with ecological diversification is the evolution of 14 closely related species of Darwin's (Galápagos) finches (Thraupidae, Passeriformes). Their adaptive radiation in the Galápagos archipelago took place in the last 2-3 million years and some of the molecular mechanisms that led to their diversification are now being elucidated. Here we report evolutionary analyses of genome of the large ground finch, Geospiza magnirostris.Results: 13,291 protein-coding genes were predicted from a 991.0 Mb G. magnirostris genome assembly. We then defined gene orthology relationships and constructed whole genome alignments between the G. magnirostris and other vertebrate genomes. We estimate that 15% of genomic sequence is functionally constrained between G. magnirostris and zebra finch. Genic evolutionary rate comparisons indicate that similar selective pressures acted along the G. magnirostris and zebra finch lineages suggesting that historical effective population size values have been similar in both lineages. 21 otherwise highly conserved genes were identified that each show evidence for positive selection on amino acid changes in the Darwin's finch lineage. Two of these genes (Igf2r and Pou1f1) have been implicated in beak morphology changes in Darwin's finches. Five of 47 genes showing evidence of positive selection in early passerine evolution have cilia related functions, and may be examples of adaptively evolving reproductive proteins.Conclusions: These results provide insights into past evolutionary processes that have shaped G. magnirostris genes and its genome, and provide the necessary foundation upon which to build population genomics resources that will shed light on more contemporaneous adaptive and non-adaptive processes that have contributed to the evolution of the Darwin's finches
Rinke, C, Schwientek, P, Sczyrba, A, Ivanova, N, Anderson, IJ, Cheng, J, Darling, AE, Malfatti, S, Swan, BK, Gies, EA, Dodsworth, JA, Hedlund, BP, Tsiamis, G, Sievert, SM, Liu, W, Eisen, JA, Hallam, SJ, Kyrpides, NC, Stepanauskas, R, Rubin, E, Hugenholtz, P & Woyke, T 2013, 'Insights into the phylogeny and coding potential of microbial dark matter', Nature, vol. 499, no. 7459, pp. 431-437.View/Download from: UTS OPUS or Publisher's site
Genome sequencing enhances our understanding of the biological world by providing blueprints for the evolutionary and functional diversity that shapes the biosphere. However, microbial genomes that are currently available are of limited phylogenetic breadth, owing to our historical inability to cultivate most microorganisms in the laboratory. We apply single-cell genomics to target and sequence 201?uncultivated archaeal and bacterial cells from nine diverse habitats belonging to 29?major mostly uncharted branches of the tree of life, so-called `microbial dark matter. With this additional genomic information, we are able to resolve many intra- and inter-phylum-level relationships and to propose two new superphyla. We uncover unexpected metabolic features that extend our understanding of biology and challenge established boundaries between the three domains of life. These include a novel amino acid use for the opal stop codon, an archaeal-type purine synthesis in Bacteria and complete sigma factors in Archaea similar to those in Bacteria. The single-cell genomes also served to phylogenetically anchor up to 20% of metagenomic reads in some habitats, facilitating organism-level interpretation of ecosystem function. This study greatly expands the genomic representation of the tree of life and provides a systematic step towards a better understanding of biological evolution on our planet.
Sheppard, SK, Didelot, X, Jolley, KA, Darling, AE, Pascoe, B, Meric, G, Kelly, DJ, Cody, A, Colles, FM, Strachan, NJC, Ogden, ID, Forbes, K, French, NP, Carter, P, Miller, WG, Mccarthy, ND, Owen, R, Litrup, E, Egholm, M, Affourtit, JP, Bentley, SD, Parkhill, J, Maiden, MCJ & Falush, D 2013, 'Progressive genome-wide introgression in agricultural Campylobacter coli', MOLECULAR ECOLOGY, vol. 22, no. 4, pp. 1051-1064.View/Download from: UTS OPUS or Publisher's site
Treangen, T, Koren, S, Sommer, D, Liu, B, Astrovskaya, I, Ondov, B, Darling, AE, Phillippy, A & Pop, M 2013, 'Metamos: A Modular And Open Source Metagenomic Assembly And Analysis Pipeline', Genome Biology, vol. 14, no. 1, pp. 1-20.View/Download from: UTS OPUS or Publisher's site
We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds,
Islam, MA, Labbate, M, Djordjevic, SP, Alam, M, Darling, AE, Melvold, JA, Holmes, AJ, Johura, FT, Cravioto, A, Charles, IG & Stokes, H 2013, 'Indigenous Vibrio cholerae strains from a non-endemic region are pathogenic', Open Biology, vol. 3, p. 120181.View/Download from: UTS OPUS or Publisher's site
Of the 200þ serogroups of Vibrio cholerae, only O1 or O139 strains are reported to cause cholera, and mostly in endemic regions. Cholera outbreaks elsewhere are considered to be via importation of pathogenic strains. Using established animal models, we show that diverse V. cholerae strains indigenous to a nonendemic environment (Sydney, Australia), including non-O1/O139 serogroup strains, are able to both colonize the intestine and result in fluid accumulation despite lacking virulence factors believed to be important. Most strains lacked the type three secretion system considered a mediator of diarrhoea in nonO1/O13 V. cholerae. Multi-locus sequence typing (MLST) showed that the Sydney isolates did not form a single clade and were distinct from O1/O139 toxigenic strains. There was no correlation between genetic relatedness and the profile of virulence-associated factors. Current analyses of diseases mediated by V. cholerae focus on endemic regions, with only those strains that possess particular virulence factors considered pathogenic. Our data suggest that factors other than those previously well described are of potential importance in influencing disease outbreaks.
Ayres, D, Darling, AE, Zwickl, D, Beerli, P, Holder, M, Lewis, P, Huelsenbeck, J, Ronquist, F, Swofford, D, Cummings, M, Rambaut, A & Suchard, M 2012, 'Beagle: An Application Programming Interface And High-performance Computing Library For Statistical Phylogenetics', Systematic Biology, vol. 61, no. 1, pp. 170-173.View/Download from: UTS OPUS or Publisher's site
Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood es
Cadillo-quiroz, H, Didelot, X, Held, N, Herrera, A, Darling, AE, Reno, M, Krause, D & Whitaker, R 2012, 'Patterns Of Gene Flow Define Species Of Thermophilic Archaea', PLoS Biology, vol. 10, no. 2, pp. 1-11.View/Download from: UTS OPUS or Publisher's site
Despite a growing appreciation of their vast diversity in nature, mechanisms of speciation are poorly understood in Bacteria and Archaea. Here we use high-throughput genome sequencing to identify ongoing speciation in the thermoacidophilic Archaeon Sulfo
Didelot, X, Meric, G, Falush, D & Darling, AE 2012, 'Impact Of Homologous And Non-homologous Recombination In The Genomic Evolution Of Escherichia Coli', BMC Genomics, vol. 13, pp. 1-15.View/Download from: UTS OPUS or Publisher's site
Background: Escherichia coli is an important species of bacteria that can live as a harmless inhabitant of the guts of many animals, as a pathogen causing life-threatening conditions or freely in the non-host environment. This diversity of lifestyles has
Lynch, E, Langille, M, Darling, AE, Wilbanks, E, Haltiner, C, Shao, K, Starr, M, Teiling, C, Harkins, T, Edwards, R, Eisen, JA & Facciotti, M 2012, 'Sequencing Of Seven Haloarchaeal Genomes Reveals Patterns Of Genomic Flux', PLoS ONE, vol. 7, no. 7, pp. 1-13.View/Download from: UTS OPUS or Publisher's site
We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model grou
Ronquist, F, Teslenko, M, Van Der Mark, P, Ayres, D, Darling, AE, Hohna, S, Larget, B, Liu, L, Suchard, M & Huelsenbeck, J 2012, 'MrBayes 3.2: Efficient Bayesian Phylogenetic Inference And Model Choice Across A Large Model Space', Systematic Biology, vol. 61, no. 3, pp. 539-542.View/Download from: UTS OPUS or Publisher's site
Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest
Tritt, A, Eisen, JA, Facciotti, M & Darling, AE 2012, 'An Integrated Pipeline For De Novo Assembly Of Microbial Genomes', PLoS ONE, vol. 7, no. 9, pp. 1-9.View/Download from: UTS OPUS or Publisher's site
Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtai
High-throughput DNA sequencing technologies have spurred the development of numerous novel methods for genome assembly. With few exceptions, these algorithms are heuristic and require one or more parameters to be manually set by the user. One approach to
Earl, D, Bradnam, K, St John, J, Darling, AE, Lin, D, Fass, J, Hung, O, Buffalo, V, Zerbino, D, Diekhans, M, Nguyen, N, Ariyaratne, P, Sung, W, Ning, Z, Haimel, M, Simpson, J, Fonseca, N, Birol, I, Docking, T, Ho, I, Rokhsar, D & Chikhi, R 2011, 'Assemblathon 1: A Competitive Assessment Of De Novo Short Read Assembly Methods', Genome Research, vol. 21, no. 12, pp. 2224-2241.View/Download from: UTS OPUS or Publisher's site
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively ass
Darling, AE, Mau, B & Perna, N 2010, 'Progressivemauve: Multiple Genome Alignment With Gene Gain, Loss And Rearrangement', Plos One, vol. 5, no. 6, pp. 1-17.View/Download from: UTS OPUS or Publisher's site
Background: Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. Methodology/Princip
Didelot, X, Lawson, D, Darling, A & Falush, D 2010, 'Inference of Homologous Recombination in Bacteria Using Whole-Genome Sequences', GENETICS, vol. 186, no. 4, pp. 1435-U567.View/Download from: Publisher's site
Background: Microbial life dominates the earth, but many species are difficult or even impossible to study under laboratory conditions. Sequencing DNA directly from the environment, a technique commonly referred to as metagenomics, is an important tool f
Srivastava, M, Simakov, O, Chapman, J, Fahey, B, Gauthier, MEA, Mitros, T, Richards, GS, Conaco, C, Dacre, M, Hellsten, U, Larroux, C, Putnam, NH, Stanke, M, Adamska, M, Darling, A, Degnan, SM, Oakley, TH, Plachetzki, DC, Zhai, Y, Adamski, M, Calcino, A, Cummins, SF, Goodstein, DM, Harris, C, Jackson, DJ, Leys, SP, Shu, S, Woodcroft, BJ, Vervoort, M, Kosik, KS, Manning, G, Degnan, BM & Rokhsar, DS 2010, 'The Amphimedon queenslandica genome and the evolution of animal complexity', NATURE, vol. 466, no. 7307, pp. 720-U3.View/Download from: Publisher's site
Chan, C, Beiko, R, Darling, AE & Ragan, M 2009, 'Lateral Transfer Of Genes And Gene Fragments In Prokaryotes', Genome Biology and Evolution, vol. 1, no. NA, pp. 429-438.View/Download from: UTS OPUS or Publisher's site
Lateral genetic transfer (LGT) involves the movement of genetic material from one lineage into another and its subsequent incorporation into the new host genome via genetic recombination. Studies in individual taxa have indicated lateral origins for stre
Background: In prokaryotes and some eukaryotes, genetic material can be transferred laterally among unrelated lineages and recombined into new host genomes, providing metabolic and physiological novelty. Although the process is usually framed in terms of
Chan, CX, Beiko, RG, Darling, AE & Ragan, MA 2009, 'Lateral Transfer of Genes and Gene Fragments in Prokaryotes', GENOME BIOLOGY AND EVOLUTION, vol. 1, pp. 429-438.View/Download from: UTS OPUS or Publisher's site
BACKGROUND: In prokaryotes and some eukaryotes, genetic material can be transferred laterally among unrelated lineages and recombined into new host genomes, providing metabolic and physiological novelty. Although the process is usually framed in terms of gene sharing (e.g. lateral gene transfer, LGT), there is little reason to imagine that the units of transfer and recombination correspond to entire, intact genes. Proteins often consist of one or more spatially compact structural regions (domains) which may fold autonomously and which, singly or in combination, confer the protein's specific functions. As LGT is frequent in strongly selective environments and natural selection is based on function, we hypothesized that domains might also serve as modules of genetic transfer, i.e. that regions of DNA that are transferred and recombined between lineages might encode intact structural domains of proteins. METHODOLOGY/PRINCIPAL FINDINGS: We selected 1,462 orthologous gene sets representing 144 prokaryotic genomes, and applied a rigorous two-stage approach to identify recombination breakpoints within these sequences. Recombination breakpoints are very significantly over-represented in gene sets within which protein domain-encoding regions have been annotated. Within these gene sets, breakpoints significantly avoid the domain-encoding regions (domons), except where these regions constitute most of the sequence length. Recombination breakpoints that fall within longer domons are distributed uniformly at random, but those that fall within shorter domons may show a slight tendency to avoid the domon midpoint. As we find no evidence for differential selection against nucleotide substitutions following the recombination event, any bias against disruption of domains must be a consequence of the recombination event per se. CONCLUSIONS/SIGNIFICANCE: This is the first systematic study relating the units of LGT to structural features at the protein level. Many genes have been int...
Acquisition and loss of genetic material are essential forces in bacterial microevolution. They have been repeatedly linked with adaptation of lineages to new lifestyles, and in particular, pathogenicity. Comparative genomics has the potential to elucida
Genome evolution underpins all of biology, yet its principles can be difficult to communicate to the non-specialist. To facilitate broader understanding of genome evolution, we have designed an interactive 3D environment that enables visualization of div
Kropinski, AM, Borodovsky, M, Carver, TJ, Cerdeño-Tárraga, AM, Darling, A, Lomsadze, A, Mahadevan, P, Stothard, P, Seto, D, Van Domselaar, G & Wishart, DS 2009, 'In silico identification of genes in bacteriophage DNA.', Methods in molecular biology (Clifton, N.J.), vol. 502, pp. 57-89.
One of the most satisfying aspects of a genome sequencing project is the identification of the genes contained within it.These are of two types: those which encode tRNAs and those which produce proteins. After a general introduction on the properties of protein-encoding genes and the utility of the Basic Local Alignment Search Tool (BLASTX) to identify genes through homologs, a variety of tools are discussed by their creators. These include for genome annotation: GeneMark, Artemis, and BASys; and, for genome comparisons: Artemis Comparison Tool (ACT), Mauve, CoreGenes, and GeneOrder.
Miklos, I & Darling, AE 2009, 'Efficient Sampling Of Parsimonious Inversion Histories With Application To Genome Rearrangement In Yersinia', Genome Biology and Evolution, vol. 1, no. NA, pp. 153-164.View/Download from: UTS OPUS or Publisher's site
Inversions are among the most common mutations acting on the order and orientation of genes in a genome, and polynomial-time algorithms exist to obtain a minimal length series of inversions that transform one genome arrangement to another. However, the m
Rissman, A, Mau, B, Biehl, B, Darling, AE, Glasner, J & Perna, N 2009, 'Reordering Contigs Of Draft Genomes Using The Mauve Aligner', Bioinformatics, vol. 25, no. 16, pp. 2071-2073.View/Download from: UTS OPUS or Publisher's site
Mauve Contig Mover provides a new method for proposing the relative order of contigs that make up a draft genome based on comparison to a complete or draft reference genome. A novel application of the Mauve aligner and viewer provides an automated reorde
Timmins, M, Thomas-hall, S, Darling, AE, Zhang, E, Hankamer, B, Marx, U & Schenk, P 2009, 'Phylogenetic And Molecular Analysis Of Hydrogen-producing Green Algae', Journal Of Experimental Botany, vol. 60, no. 6, pp. 1691-1702.View/Download from: UTS OPUS or Publisher's site
A select set of microalgae are reported to be able to catalyse photobiological H(2) production from water. Based on the model organism Chlamydomonas reinhardtii, a method was developed for the screening of naturally occurring H(2)-producing microalgae. B
Treangen, T, Darling, AE, Achaz, G, Ragan, M, Messeguer, X & Rocha, E 2009, 'A Novel Heuristic For Local Multiple Alignment Of Interspersed DNA Repeats', IEEE-acm Transactions On Computational Biology And Bioinformatics, vol. 6, no. 2, pp. 180-189.View/Download from: UTS OPUS or Publisher's site
Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered fro
Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and p
The Double Cut and Join is an operation acting locally at four chromosomal positions without regard to chromosomal context. This chapter discusses its application and the resulting menu of operations for genomes consisting of arbitrary numbers of circular chromosomes, as well as for a general mix of linear and circular chromosomes. In the general case the menu includes: inversion, translocation, transposition, formation and absorption of circular intermediates, conversion between linear and circular chromosomes, block interchange, fission, and fusion. This chapter discusses the well-known edge graph and its dual, the adjacency graph, recently introduced by Bergeron et al. Step-by-step procedures are given for constructing and manipulating these graphs. Simple algorithms are given in the adjacency graph for computing the minimal DCJ distance between two genomes and finding a minimal sorting; and use of an online tool (Mauve) to generate synteny blocks and apply DCJ is described. © 2008 Humana Press, a part of Springer Science+Business Media, LLC.
Friedberg, R, Darling, AE & Yancopoulos, S 2008, 'Genome rearrangement by the double cut and join operation.', Methods in molecular biology (Clifton, N.J.), vol. 452, pp. 385-416.
The Double Cut and Join is an operation acting locally at four chromosomal positions without regard to chromosomal context. This chapter discusses its application and the resulting menu of operations for genomes consisting of arbitrary numbers of circular chromosomes, as well as for a general mix of linear and circular chromosomes. In the general case the menu includes: inversion, translocation, transposition, formation and absorption of circular intermediates, conversion between linear and circular chromosomes, block interchange, fission, and fusion. This chapter discusses the well-known edge graph and its dual, the adjacency graph, recently introduced by Bergeron et al. Step-by-step procedures are given for constructing and manipulating these graphs. Simple algorithms are given in the adjacency graph for computing the minimal DCJ distance between two genomes and finding a minimal sorting; and use of an online tool (Mauve) to generate synteny blocks and apply DCJ is described.
Glasner, JD, Plunkett, G, Anderson, BD, Baumler, DJ, Biehl, BS, Burland, V, Cabot, EL, Darling, AE, Mau, B, Neeno-Eckwall, EC, Pot, D, Qiu, Y, Rissman, AI, Worzella, S, Zaremba, S, Fedorko, J, Hampton, T, Liss, P, Rusch, M, Shaker, M, Shaull, L, Shetty, P, Thotakura, S, Whitmore, J, Blattner, FR, Greene, JM & Perna, NT 2008, 'Enteropathogen Resource Integration Center (ERIC): bioinformatics support for research on biodefense-relevant enterobacteria', NUCLEIC ACIDS RESEARCH, vol. 36, pp. D519-D523.View/Download from: UTS OPUS or Publisher's site
Darling, AE, Treangen, TJ, Messeguer, X & Perna, NT 2007, 'Analyzing patterns of microbial evolution using the mauve genome alignment system', Methods in Molecular Biology, vol. 396, pp. 135-152.View/Download from: Publisher's site
During the course of evolution, genomes can undergo large-scale mutation events such as rearrangement and lateral transfer. Such mutations can result in significant variations in gene order and gene content among otherwise closely related organisms. The Mauve genome alignment system can successfully identify such rearrangement and lateral transfer events in comparisons of multiple microbial genomes even under high levels of recombination. This chapter outlines the main features of Mauve and provides examples that describe how to use Mauve to conduct a rigorous multiple genome comparison and study evolutionary patterns. © Humana Press Inc.
Darling, AE, Treangen, TJ, Messeguer, X & Perna, NT 2007, 'Analyzing patterns of microbial evolution using the mauve genome alignment system.', Methods in molecular biology (Clifton, N.J.), vol. 396, pp. 135-152.View/Download from: Publisher's site
During the course of evolution, genomes can undergo large-scale mutation events such as rearrangement and lateral transfer. Such mutations can result in significant variations in gene order and gene content among otherwise closely related organisms. The Mauve genome alignment system can successfully identify such rearrangement and lateral transfer events in comparisons of multiple microbial genomes even under high levels of recombination. This chapter outlines the main features of Mauve and provides examples that describe how to use Mauve to conduct a rigorous multiple genome comparison and study evolutionary patterns.
Glasner, JD, Rusch, M, Liss, P, Plunkett, G, Cabot, EL, Darling, A, Anderson, BD, Infield-Harm, P, Gilson, MC & Perna, NT 2006, 'ASAP: a resource for annotating, curating, comparing, and disseminating genomic data', NUCLEIC ACIDS RESEARCH, vol. 34, pp. D41-D45.View/Download from: Publisher's site
Glasner, JD, Rusch, M, Liss, P, Plunkett, G, Cabot, EL, Darling, A, Anderson, BD, Infield-Harm, P, Gilson, MC & Perna, NT 2006, 'ASAP: a resource for annotating, curating, comparing, and disseminating genomic data.', Nucleic acids research, vol. 34, no. Database issue.
ASAP is a comprehensive web-based system for community genome annotation and analysis. ASAP is being used for a large-scale effort to augment and curate annotations for genomes of enterobacterial pathogens and for additional genome sequences. New tools, such as the genome alignment program Mauve, have been incorporated into ASAP in order to improve display and analysis of related genomes. Recent improvements to the database and challenges for future development of the system are discussed. ASAP is available on the web at https://asap.ahabs.wisc.edu/asap/logon.php.
Mau, B, Glasner, J, Darling, AE & Perna, N 2006, 'Genome-wide Detection And Analysis Of Homologous Recombination Among Sequenced Strains Of Escherichia Coli', Genome Biology, vol. 7, no. 5, pp. 1-12.View/Download from: UTS OPUS
Background: Comparisons of complete bacterial genomes reveal evidence of lateral transfer of DNA across otherwise clonally diverging lineages. Some lateral transfer events result in acquisition of novel genomic segments and are easily detected through ge
Mau, B, Glasner, JD, Darling, AE & Perna, NT 2006, 'Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli.', Genome biology, vol. 7, no. 5, p. R44.View/Download from: UTS OPUS or Publisher's site
Comparisons of complete bacterial genomes reveal evidence of lateral transfer of DNA across otherwise clonally diverging lineages. Some lateral transfer events result in acquisition of novel genomic segments and are easily detected through genome comparison. Other more subtle lateral transfers involve homologous recombination events that result in substitution of alleles within conserved genomic regions. This type of event is observed infrequently among distantly related organisms. It is reported to be more common within species, but the frequency has been difficult to quantify since the sequences under comparison tend to have relatively few polymorphic sites.Here we report a genome-wide assessment of homologous recombination among a collection of six complete Escherichia coli and Shigella flexneri genome sequences. We construct a whole-genome multiple alignment and identify clusters of polymorphic sites that exhibit atypical patterns of nucleotide substitution using a random walk-based method. The analysis reveals one large segment (approximately 100 kb) and 186 smaller clusters of single base pair differences that suggest lateral exchange between lineages. These clusters include portions of 10% of the 3,100 genes conserved in six genomes. Statistical analysis of the functional roles of these genes reveals that several classes of genes are over-represented, including those involved in recombination, transport and motility.We demonstrate that intraspecific recombination in E. coli is much more common than previously appreciated and may show a bias for certain types of genes. The described method provides high-specificity, conservative inference of past recombination events.
GRIL is a tool to automatically identify collinear regions in a set of bacterial-size genome sequences. GRIL uses three basic steps. First, regions of high sequence identity are located. Second, some of these regions are filtered based on user-specified
Darling, AE, Mau, B, Blattner, F & Perna, N 2004, 'Mauve: Multiple Alignment Of Conserved Genomic Sequence With Rearrangements', Genome Research, vol. 14, no. 7, pp. 1394-1403.View/Download from: UTS OPUS or Publisher's site
As genomes evolve, they undergo large-scale evolutionary processes that present a challenge to sequence comparison not posed by short sequences. Recombination causes frequent genome rearrangements, horizontal transfer introduces new sequences into bacter
Glasner, J, Liss, P, Plunkett, G, Darling, AE, Prasad, T, Rusch, M, Byrnes, A, Gilson, M, Biehl, B, Blattner, F & Perna, N 2003, 'ASAP, A Systematic Annotation Package For Community Analysis Of Genomes', Nucleic Acids Research, vol. 31, no. 1, pp. 147-151.View/Download from: Publisher's site
ASAP (a systematic annotation package for community analysis of genomes) is a relational database and web interface developed to store, update and distribute genome sequence data and functional characterization (https://asap.ahabs.wisc.edu/annotation/php
Wei, J, Goldberg, M, Burland, V, Venkatesan, M, Deng, W, Fournier, G, Mayhew, G, Plunkett, G, Rose, D, Darling, AE, Mau, B, Perna, N, Payne, S, Runyen-janecky, L, Zhou, S, Schwartz, D & Blattner, F 2003, 'Complete Genome Sequence And Comparative Genomics Of Shigella Flexneri Serotype 2a Strain 2457T', Infection And Immunity, vol. 71, no. 5, pp. 2775-2786.View/Download from: UTS OPUS or Publisher's site
We determined the complete genome sequence of Shigella flexneri serotype 2a strain 2457T (4,599,354 bp). Shigella species cause >1 million deaths per year from dysentery and diarrhea and have a lifestyle that is markedly different from those of closely r
Wei, J, Goldberg, MB, Burland, V, Venkatesan, MM, Deng, W, Fournier, G, Mayhew, GF, Plunkett, G, Rose, DJ, Darling, A, Mau, B, Perna, NT, Payne, SM, Runyen-Janecky, LJ, Zhou, S, Schwartz, DC & Blattner, FR 2003, 'Erratum: Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457T (Infection and Immunity (2003) 71:5 (2775-2786))', Infection and Immunity, vol. 71, no. 7, p. 4223.View/Download from: Publisher's site
Genomes evolve as modules. In prokaryotes (and some eukaryotes), genetic
material can be transferred between species and integrated into the genome via
homologous or illegitimate recombination. There is little reason to imagine
that the units of transfer correspond to entire genes; however, such units have
not been rigorously characterized. We examined fragmentary genetic transfers in
single-copy gene families from 144 prokaryotic genomes and found that
breakpoints are located significantly closer to the boundaries of genomic
regions that encode annotated structural domains of proteins than expected by
chance, particularly when recombining sequences are more divergent. This
correlation results from recombination events themselves and not from
differential nucleotide substitution. We report the first systematic study
relating genetic recombination to structural features at the protein level.
Multiple genome alignment remains a challenging problem. Effects of
recombination including rearrangement, segmental duplication, gain, and loss
can create a mosaic pattern of homology even among closely related organisms.
We describe a method to align two or more genomes that have undergone
large-scale recombination, particularly genomes that have undergone substantial
amounts of gene gain and loss (gene flux). The method utilizes a novel
alignment objective score, referred to as a sum-of-pairs breakpoint score. We
also apply a probabilistic alignment filtering method to remove erroneous
alignments of unrelated sequences, which are commonly observed in other genome
alignment methods. We describe new metrics for quantifying genome alignment
accuracy which measure the quality of rearrangement breakpoint predictions and
indel predictions. The progressive genome alignment algorithm demonstrates
markedly improved accuracy over previous approaches in situations where genomes
have undergone realistic amounts of genome rearrangement, gene gain, loss, and
duplication. We apply the progressive genome alignment algorithm to a set of 23
completely sequenced genomes from the genera Escherichia, Shigella, and
Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content
conserved among all taxa and total unique content of 15.2Mbp. We document
substantial population-level variability among these organisms driven by
homologous recombination, gene gain, and gene loss. Free, open-source software
implementing the described genome alignment approach is available from
Darling, A & Stoye, J 2013, 'Preface' in Algorithms in Bioinformatics, pp. VI-VI.
Kehr, B, Reinert, K & Darling, AE 2012, 'Hidden breakpoints in genome alignments', Algorithms in Bioinformatics (LNCS), Workshop on Algorithms in Bioinformatics (WABI), Springer, Ljubljana, Slovenia,, pp. 391-403.View/Download from: UTS OPUS or Publisher's site
During the course of evolution, an organism's genome can
undergo changes that affect the large-scale structure of the genome.
These changes include gene gain, loss, duplication, chromosome fusion,
fission, and rearrangement. When gene gain and loss occurs in addition
to other types of rearrangement, breakpoints of rearrangement can exist
that are only detectable by comparison of three or more genomes. An
arbitrarily large number of these 'hidden' breakpoints can exist among
genomes that exhibit no rearrangements in pairwise comparisons.
We present an extension of the multichromosomal breakpoint median
problem to genomes that have undergone gene gain and loss. We then
demonstrate that the median distance among three genomes can be used
to calculate a lower bound on the number of hidden breakpoints present.
We provide an implementation of this calculation including the median
distance, along with some practical improvements on the time complexity
of the underlying algorithm.
We apply our approach to measure the abundance of hidden breakpoints
in simulated data sets under a wide range of evolutionary scenarios.
We demonstrate that in simulations the hidden breakpoint counts
depend strongly on relative rates of inversion and gene gain/loss. Finally
we apply current multiple genome aligners to the simulated genomes,
and show that all aligners introduce a high degree of error in hidden
breakpoint counts, and that this error grows with evolutionary distance
in the simulation. Our results suggest that hidden breakpoint error may
be pervasive in genome alignments.
Attie, O, Darling, AE & Yancopoulos, S 2010, 'The Rise And Fall Of Breakpoint Reuse Depending On Genome Resolution', BMC Bioinformatics, 9th Annual Conference on Research in Computational Molecular Biology (RECOMB)/Satellite Workshop on Comparative Genomics, Biomed Central Ltd, Galway, Ireland, pp. 1-15.View/Download from: UTS OPUS or Publisher's site
Background: During evolution, large-scale genome rearrangements of chromosomes shuffle the order of homologous genome sequences ('synteny blocks') across species. Some years ago, a controversy erupted in genome rearrangement studies over whether rearrang
Treangen, TJ, Darling, AE, Ragan, MA & Messeguer, X 2008, 'Gapped extension for local multiple alignment of interspersed DNA repeats', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 74-86.View/Download from: Publisher's site
The identification of homologous DNA is a fundamental building block of comparative genomic and molecular evolution studies. To date, pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with a previously described efficient filtration method for local multiple alignment. During gapped extension, we use the MUSCLE implementation of progressive multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any strand/species-symmetric nucleotide substitution matrix, and we have developed a method to adapt an arbitrary substitution matrix (i.e. HOXD) to organisms with different G+C content. We evaluate the performance of our method and previous approaches on a hybrid dataset of real genomic DNA with simulated interspersed repeats. Our method outperforms existing methods in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in the free, open-source procrastAligner software, available from: http://alggen.lsi.upc.es/recerca/align/ procrastination © 2008 Springer-Verlag Berlin Heidelberg.
Darling, AE, Treangen, T, Zhang, L, Kuiken, C, Messeguer, X & Perna, N 2006, 'Procrastination Leads To Efficient Filtration For Local Multiple Alignment', Algorithms In Bioinformatics, Proceedings, 6th International Workshop on Algorithms in Bioinformatics (WABI 2006), Springer-verlag Berlin, Zurich, SWITZERLAND, pp. 126-137.
We describe an efficient local multiple alignment filtration heuristic for identification of conserved regions in one or more DNA sequences. The method incorporates several novel ideas: (1) palindromic spaced seed patterns to match both DNA strands simul
Mau, B, Darling, AE & Perna, N 2004, 'Identifying Evolutionarily Conserved Segments Among Multiple Divergent And Rearranged Genomes', Comparative Genomics, RECOMB International Workshop on Comparative Genomics, Springer-verlag Berlin, Bertinoro, ITALY, pp. 72-84.
We describe a new method for reliably identifying conserved segments among genome sequences that have undergone rearrangement, horizontal transfer, and substantial nucleotide-level divergence. A Gibbs-like sampler explores different combinations of seque
Darling, AE, Mau, B, Craven, M & Perna, N 2004, 'Multiple Alignments Of Rearranged Genomes', 2004 Ieee Computational Systems Bioinformatics Conference, Proceedings, IEEE Computational Systems Bioinformatics Conference (CSB 2004), Ieee Computer Soc, Stanford, CA, pp. 738-739.
The nature of large-scale evolutionary processes that shape genomes over time fundamentally differs from the forces governing local evolution within individual genes. Large-scale events such as horizontal transfer, genome re-arrangements, gene duplicatio