UTS site search

Associate Professor Aaron Darling


Aaron Darling is an Associate Professor in Computational Genomics and Bioinformatics in the UTS Faculty of Science's ithree institute. He has over a decade of experience developing computational methods for comparative genomics and evolutionary modeling and in 2013 moved from the University of California-Davis to start a computational genomics group at UTS.

Darling embarked on his research career at the University of Wisconsin-Madison. Following a bachelor's degree in Computer Science, he worked with members of the UW-Madison Genome Center to sequence and analyze the first genomes of pathogenic E. coli. During this time Darling led the development of some widely used computational methods for analysing genomic data, including the mpiBLAST open source parallel BLAST software and the Mauve software for comparing multiple genome sequences.

Following the award of a Ph.D. at UW-Madison, Darling received a fellowship from the US National Science Foundation to pursue postdoctoral studies at The University of Queensland. After two years at UQ he then returned to UC Davis to develop a research program in computational metagenomics -- the study of uncultivated microorganisms from the environment using computational methods.

Darling now brings his experience to understand the relationship between humans and microorganisms in collaboration with microbiologists at the ithree institute.


Journal Editor:

  • BMC Bioinformatics

Conference chair:

  • Workshop on Algorithms in Bioinformatics (2013)
  • RECOMB Comparative Genomics (2011)

Professional society memberships:

  • Australian Bioinformatics and Computational Biology Society (ABACBS)
Image of Aaron Darling
Associate Professor, The ithree Institute
Core Member, ithree - Institute of Infection, Immunity and Innovation
BSc Computer Sciences, PhD Computational Biology, Ph.D.
Member, American Society for Microbiology
+61 2 9514 2232

Research Interests

Comparative genomics

Designing and developing scalable computational algorithms to identify the complete set of genetic differences between two or more organisms and relating these differences to aspects of the organism's biology. Associating genomic changes to phenotypic changes. 

Computational metagenomics

The vast majority of life on the planet is microbial, and most of it can not be studied by laboratory cultivation. Metagenomics involves DNA sequencing of microbes taken directly from the environment. Current metagenomic methods require advanced computational, statistical, and machine learning techniques to identify the organisms present in a sample and characterize their potential for encoding functional proteins.

Genome evolution

Life is thought to have existed on earth for at least four billion years. During this time, evolution has  shaped the genomes of modern organisms. Using statistical methods such as continuous time Markov chain models we can infer the history of genome evolution that led to modern organisms. I am interested in applying methods from statistical mechanics and financial market modeling to develop scalable computational methods to reconstruct evolutionary histories.

Next-generation DNA sequencing

DNA is fundamentally a molecule that encodes digital information. New sequencing technology enables us to read this biological information en masse so that it can be analyzed computationally. I am interested in designing sequencing experiments and protocols in ways that maximize the useful information obtained about a biological system.

Can supervise: Yes

I am actively seeking students with a computational, mathematical, or statistical background to undertake Ph.D. studies and research.

Associate Professor Darling supervises research higher degree students.


Darling, A. & Stoye, J. 2013, 'Preface' in Algorithms in Bioinformatics, pp. VI-VI.


Fourment, Darling, A.E. & Holmes, E.C. 2016, 'The Impact of Migratory Flyways on the Spread of Avian Influenza Virus in North America', Annual Meeting of the Society of Molecular Biology and Evolution, Gold Coast.
View/Download from: UTS OPUS
Fourment, Darling, A.E. & Matsen, E. 2016, 'Phylogenetic inference with streaming data using sequential Monte Carlo', Sydney Bioinformatics Research Symposium, Sydney.
Kehr, B., Reinert, K. & Darling, A.E. 2012, 'Hidden breakpoints in genome alignments', Algorithms in Bioinformatics (LNCS), Workshop on Algorithms in Bioinformatics (WABI), Springer, Ljubljana, Slovenia,, pp. 391-403.
View/Download from: Publisher's site
During the course of evolution, an organism's genome can undergo changes that affect the large-scale structure of the genome. These changes include gene gain, loss, duplication, chromosome fusion, fission, and rearrangement. When gene gain and loss occurs in addition to other types of rearrangement, breakpoints of rearrangement can exist that are only detectable by comparison of three or more genomes. An arbitrarily large number of these 'hidden breakpoints can exist among genomes that exhibit no rearrangements in pairwise comparisons. We present an extension of the multichromosomal breakpoint median problem to genomes that have undergone gene gain and loss. We then demonstrate that the median distance among three genomes can be used to calculate a lower bound on the number of hidden breakpoints present. We provide an implementation of this calculation including the median distance, along with some practical improvements on the time complexity of the underlying algorithm. We apply our approach to measure the abundance of hidden breakpoints in simulated data sets under a wide range of evolutionary scenarios. We demonstrate that in simulations the hidden breakpoint counts depend strongly on relative rates of inversion and gene gain/loss. Finally we apply current multiple genome aligners to the simulated genomes, and show that all aligners introduce a high degree of error in hidden breakpoint counts, and that this error grows with evolutionary distance in the simulation. Our results suggest that hidden breakpoint error may be pervasive in genome alignments.
Attie, O., Darling, A.E. & Yancopoulos, S. 2010, 'The Rise And Fall Of Breakpoint Reuse Depending On Genome Resolution', BMC Bioinformatics, 9th Annual Conference on Research in Computational Molecular Biology (RECOMB)/Satellite Workshop on Comparative Genomics, Biomed Central Ltd, Galway, Ireland, pp. 1-15.
View/Download from: UTS OPUS or Publisher's site
Background: During evolution, large-scale genome rearrangements of chromosomes shuffle the order of homologous genome sequences ('synteny blocks') across species. Some years ago, a controversy erupted in genome rearrangement studies over whether rearrang
Treangen, T.J., Darling, A.E., Ragan, M.A. & Messeguer, X. 2008, 'Gapped extension for local multiple alignment of interspersed DNA repeats', BIOINFORMATICS RESEARCH AND APPLICATIONS, pp. 74-86.
Darling, A.E., Treangen, T., Zhang, L., Kuiken, C., Messeguer, X. & Perna, N. 2006, 'Procrastination Leads To Efficient Filtration For Local Multiple Alignment', Algorithms In Bioinformatics, Proceedings, 6th International Workshop on Algorithms in Bioinformatics (WABI 2006), Springer-verlag Berlin, Zurich, SWITZERLAND, pp. 126-137.
We describe an efficient local multiple alignment filtration heuristic for identification of conserved regions in one or more DNA sequences. The method incorporates several novel ideas: (1) palindromic spaced seed patterns to match both DNA strands simul
Mau, B., Darling, A.E. & Perna, N. 2004, 'Identifying Evolutionarily Conserved Segments Among Multiple Divergent And Rearranged Genomes', Comparative Genomics, RECOMB International Workshop on Comparative Genomics, Springer-verlag Berlin, Bertinoro, ITALY, pp. 72-84.
We describe a new method for reliably identifying conserved segments among genome sequences that have undergone rearrangement, horizontal transfer, and substantial nucleotide-level divergence. A Gibbs-like sampler explores different combinations of seque
Darling, A.E., Mau, B., Craven, M. & Perna, N. 2004, 'Multiple Alignments Of Rearranged Genomes', 2004 Ieee Computational Systems Bioinformatics Conference, Proceedings, IEEE Computational Systems Bioinformatics Conference (CSB 2004), Ieee Computer Soc, Stanford, CA, pp. 738-739.
The nature of large-scale evolutionary processes that shape genomes over time fundamentally differs from the forces governing local evolution within individual genes. Large-scale events such as horizontal transfer, genome re-arrangements, gene duplicatio

Journal articles

Gardiner, M., Vicaretti, M., Sparks, J., Bansal, S., Bush, S., Liu, M., Darling, A., Harry, E. & Burke, C.M. 2017, 'A longitudinal study of the diabetic skin and wound microbiome.', PeerJ, vol. 5, pp. e3543-e3543.
View/Download from: UTS OPUS or Publisher's site
BACKGROUND: Type II diabetes is a chronic health condition which is associated with skin conditions including chronic foot ulcers and an increased incidence of skin infections. The skin microbiome is thought to play important roles in skin defence and immune functioning. Diabetes affects the skin environment, and this may perturb skin microbiome with possible implications for skin infections and wound healing. This study examines the skin and wound microbiome in type II diabetes. METHODS: Eight type II diabetic subjects with chronic foot ulcers were followed over a time course of 10 weeks, sampling from both foot skin (swabs) and wounds (swabs and debrided tissue) every two weeks. A control group of eight control subjects was also followed over 10 weeks, and skin swabs collected from the foot skin every two weeks. Samples were processed for DNA and subject to 16S rRNA gene PCR and sequencing of the V4 region. RESULTS: The diabetic skin microbiome was significantly less diverse than control skin. Community composition was also significantly different between diabetic and control skin, however the most abundant taxa were similar between groups, with differences driven by very low abundant members of the skin communities. Chronic wounds tended to be dominated by the most abundant skin Staphylococcus, while other abundant wound taxa differed by patient. No significant correlations were found between wound duration or healing status and the abundance of any particular taxa. DISCUSSION: The major difference observed in this study of the skin microbiome associated with diabetes was a significant reduction in diversity. The long-term effects of reduced diversity are not yet well understood, but are often associated with disease conditions.
Joss, T.V., Burke, C.M., Hudson, B.J., Darling, A.E., Forer, M., Alber, D.G., Charles, I.G. & Stow, N.W. 2016, 'Bacterial Communities Vary between Sinuses in Chronic Rhinosinusitis Patients.', Frontiers in microbiology, vol. 6, pp. 1532-1532.
View/Download from: UTS OPUS or Publisher's site
Chronic rhinosinusitis (CRS) is a common and potentially debilitating disease characterized by inflammation of the sinus mucosa for longer than 12 weeks. Bacterial colonization of the sinuses and its role in the pathogenesis of this disease is an ongoing area of research. Recent advances in culture-independent molecular techniques for bacterial identification have the potential to provide a more accurate and complete assessment of the sinus microbiome, however there is little concordance in results between studies, possibly due to differences in the sampling location and techniques. This study aimed to determine whether the microbial communities from one sinus could be considered representative of all sinuses, and examine differences between two commonly used methods for sample collection, swabs, and tissue biopsies. High-throughput DNA sequencing of the bacterial 16S rRNA gene was applied to both swab and tissue samples from multiple sinuses of 19 patients undergoing surgery for treatment of CRS. Results from swabs and tissue biopsies showed a high degree of similarity, indicating that swabbing is sufficient to recover the microbial community from the sinuses. Microbial communities from different sinuses within individual patients differed to varying degrees, demonstrating that it is possible for distinct microbiomes to exist simultaneously in different sinuses of the same patient. The sequencing results correlated well with culture-based pathogen identification conducted in parallel, although the culturing missed many species detected by sequencing. This finding has implications for future research into the sinus microbiome, which should take this heterogeneity into account by sampling patients from more than one sinus.
Chowdhury, P.R., DeMaere, M., Chapman, T., Worden, P., Charles, I.G., Darling, A.E. & Djordjevic, S.P. 2016, 'Comparative genomic analysis of toxin-negative strains of Clostridium difficile from humans and animals with symptoms of gastrointestinal disease', BMC MICROBIOLOGY, vol. 16.
View/Download from: UTS OPUS or Publisher's site
Burke, C.M. & Darling, A.E. 2016, 'A method for high precision sequencing of near full-length 16S rRNA genes on an Illumina MiSeq.', PeerJ, vol. 4, p. e2492.
View/Download from: UTS OPUS or Publisher's site
The bacterial 16S rRNA gene has historically been used in defining bacterial taxonomy and phylogeny. However, there are currently no high-throughput methods to sequence full-length 16S rRNA genes present in a sample with precision.We describe a method for sequencing near full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform and test it using DNA from human skin swab samples. Proof of principle of the approach is demonstrated, with the generation of 1,604 sequences greater than 1,300 nt from a single Nano MiSeq run, with accuracy estimated to be 100-fold higher than standard Illumina reads. The reads were chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection.This method could be scaled up to generate many thousands of sequences per MiSeq run and could be applied to other sequencing platforms. This has great potential for populating databases with high quality, near full-length 16S rRNA gene sequences from under-represented taxa and environments and facilitates analyses of microbial communities at higher resolution.
MetaSUB International Consortium 2016, 'The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report.', Microbiome, vol. 4, no. 1, p. 24.
View/Download from: UTS OPUS or Publisher's site
The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium is a novel, interdisciplinary initiative comprised of experts across many fields, including genomics, data analysis, engineering, public health, and architecture. The ultimate goal of the MetaSUB Consortium is to improve city utilization and planning through the detection, measurement, and design of metagenomics within urban environments. Although continual measures occur for temperature, air pressure, weather, and human activity, including longitudinal, cross-kingdom ecosystem dynamics can alter and improve the design of cities. The MetaSUB Consortium is aiding these efforts by developing and testing metagenomic methods and standards, including optimized methods for sample collection, DNA/RNA isolation, taxa characterization, and data visualization. The data produced by the consortium can aid city planners, public health officials, and architectural designers. In addition, the study will continue to lead to the discovery of new species, global maps of antimicrobial resistance (AMR) markers, and novel biosynthetic gene clusters (BGCs). Finally, we note that engineered metagenomic ecosystems can help enable more responsive, safer, and quantified cities.
DeMaere, M.Z. & Darling, A.E. 2016, 'Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C).', PeerJ, vol. 4, p. e2676.
View/Download from: UTS OPUS or Publisher's site
BACKGROUND: Chromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised. METHODS: We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure. RESULTS: When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft-clustering extension of the Louvain method achieves the highest performance. DISCUSSION: Previously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development.
Coil, D., Jospin, G. & Darling, A.E. 2015, 'A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data.', Bioinformatics (Oxford, England), vol. 31, no. 4, pp. 587-589.
View/Download from: UTS OPUS
MOTIVATION: Open-source bacterial genome assembly remains inaccessible to many biologists because of its complexity. Few software solutions exist that are capable of automating all steps in the process of de novo genome assembly from Illumina data. RESULTS: A5-miseq can produce high-quality microbial genome assemblies on a laptop computer without any parameter tuning. A5-miseq does this by automating the process of adapter trimming, quality filtering, error correction, contig and scaffold generation and detection of misassemblies. Unlike the original A5 pipeline, A5-miseq can use long reads from the Illumina MiSeq, use read pairing information during contig generation and includes several improvements to read trimming. Together, these changes result in substantially improved assemblies that recover a more complete set of reference genes than previous methods. AVAILABILITY: A5-miseq is licensed under the GPL open-source license. Source code and precompiled binaries for Mac OS X 10.6+ and Linux 2.6.15+ are available from http://sourceforge.net/projects/ngopt CONTACT: aaron.darling@uts.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Wyrsch, E., Roy Chowdhury, P., Abraham, S., Santos, J., Darling, A.E., Charles, I.G., Chapman, T.A. & Djordjevic, S.P. 2015, 'Comparative genomic analysis of a multiple antimicrobial resistant enterotoxigenic E. coli O157 lineage from Australian pigs.', BMC genomics, vol. 16, pp. 165-165.
View/Download from: UTS OPUS
BACKGROUND: Enterotoxigenic Escherichia coli (ETEC) are a major economic threat to pig production globally, with serogroups O8, O9, O45, O101, O138, O139, O141, O149 and O157 implicated as the leading diarrhoeal pathogens affecting pigs below four weeks of age. A multiple antimicrobial resistant ETEC O157 (O157 SvETEC) representative of O157 isolates from a pig farm in New South Wales, Australia that experienced repeated bouts of pre- and post-weaning diarrhoea resulting in multiple fatalities was characterized here. Enterohaemorrhagic E. coli (EHEC) O157:H7 cause both sporadic and widespread outbreaks of foodborne disease, predominantly have a ruminant origin and belong to the ST11 clonal complex. Here, for the first time, we conducted comparative genomic analyses of two epidemiologically-unrelated porcine, disease-causing ETEC O157; E. coli O157 SvETEC and E. coli O157:K88 734/3, and examined their phylogenetic relationship with EHEC O157:H7. RESULTS: O157 SvETEC and O157:K88 734/3 belong to a novel sequence type (ST4245) that comprises part of the ST23 complex and are genetically distinct from EHEC O157. Comparative phylogenetic analysis using PhyloSift shows that E. coli O157 SvETEC and E. coli O157:K88 734/3 group into a single clade and are most similar to the extraintestinal avian pathogenic Escherichia coli (APEC) isolate O78 that clusters within the ST23 complex. Genome content was highly similar between E. coli O157 SvETEC, O157:K88 734/3 and APEC O78, with variability predominantly limited to laterally acquired elements, including prophages, plasmids and antimicrobial resistance gene loci. Putative ETEC virulence factors, including the toxins STb and LT and the K88 (F4) adhesin, were conserved between O157 SvETEC and O157:K88 734/3. The O157 SvETEC isolate also encoded the heat stable enterotoxin STa and a second allele of STb, whilst a prophage within O157:K88 734/3 encoded the serum survival gene bor. Both isolates harbor a large repertoire of antibi...
Dunitz, M.I., Lang, J.M., Jospin, G., Darling, A.E., Eisen, J.A. & Coil, D.A. 2015, 'Swabs to genomes: a comprehensive workflow.', PeerJ, vol. 3, p. e960.
View/Download from: UTS OPUS or Publisher's site
The sequencing, assembly, and basic analysis of microbial genomes, once a painstaking and expensive undertaking, has become much easier for research labs with access to standard molecular biology and computational tools. However, there are a confusing variety of options available for DNA library preparation and sequencing, and inexperience with bioinformatics can pose a significant barrier to entry for many who may be interested in microbial genomics. The objective of the present study was to design, test, troubleshoot, and publish a simple, comprehensive workflow from the collection of an environmental sample (a swab) to a published microbial genome; empowering even a lab or classroom with limited resources and bioinformatics experience to perform it.
Coil, D.A., Alexiev, A., Wallis, C., O'Flynn, C., Deusch, O., Davis, I., Horsfall, A., Kirkwood, N., Jospin, G., Eisen, J.A., Harris, S. & Darling, A.E. 2015, 'Draft genome sequences of 26 porphyromonas strains isolated from the canine oral microbiome.', Genome announcements, vol. 3, no. 2.
View/Download from: UTS OPUS or Publisher's site
We present the draft genome sequences for 26 strains of Porphyromonas (P. canoris, P. gulae, P. cangingavalis, P. macacae, and 7 unidentified) and an unidentified member of the Porphyromonadaceae family. All of these strains were isolated from the canine oral cavity, from dogs with and without early periodontal disease.
O'Flynn, C., Deusch, O., Darling, A.E., Eisen, J.A., Wallis, C., Davis, I.J. & Harris, S.J. 2015, 'Comparative Genomics of the Genus Porphyromonas Identifies Adaptations for Heme Synthesis within the Prevalent Canine Oral Species Porphyromonas cangingivalis.', Genome biology and evolution, vol. 7, no. 12, pp. 3397-3413.
View/Download from: UTS OPUS
Porphyromonads play an important role in human periodontal disease and recently have been shown to be highly prevalent in canine mouths. Porphyromonas cangingivalis is the most prevalent canine oral bacterial species in both plaque from healthy gingiva and plaque from dogs with early periodontitis. The ability of P. cangingivalis to flourish in the different environmental conditions characterized by these two states suggests a degree of metabolic flexibility. To characterize the genes responsible for this, the genomes of 32 isolates (including 18 newly sequenced and assembled) from 18 Porphyromonad species from dogs, humans, and other mammals were compared. Phylogenetic trees inferred using core genes largely matched previous findings; however, comparative genomic analysis identified several genes and pathways relating to heme synthesis that were present in P. cangingivalis but not in other Porphyromonads. Porphyromonas cangingivalis has a complete protoporphyrin IX synthesis pathway potentially allowing it to synthesize its own heme unlike pathogenic Porphyromonads such as Porphyromonas gingivalis that acquire heme predominantly from blood. Other pathway differences such as the ability to synthesize siroheme and vitamin B12 point to enhanced metabolic flexibility for P. cangingivalis, which may underlie its prevalence in the canine oral cavity.
Liu, M. & Darling, A. 2015, 'Metagenomic Chromosome Conformation Capture (3C): techniques, applications, and challenges.', F1000Research, vol. 4, p. 1377.
View/Download from: UTS OPUS
We review currently available technologies for deconvoluting metagenomic data into individual genomes that represent populations, strains, or genotypes present in the community. An evaluation of chromosome conformation capture (3C) and related techniques in the context of metagenomics is presented, using mock microbial communities as a reference. We provide the first independent reproduction of the metagenomic 3C technique described last year, propose some simple improvements to that protocol, and compare the quality of the data with that provided by the more complex Hi-C protocol.
Darling, A.E., Worden, P.J., Chapman, T., Roy Chowdhury, P., Charles, I.G. & Djordjevic, S.P. 2014, 'The genome of Clostridium difficile 5.3', Gut Pathogens, vol. 6, no. 4.
View/Download from: UTS OPUS
Background Clostridium difficile is the leading cause of infectious diarrhea in humans and responsible for large outbreaks of enteritis in neonatal pigs in both North America and Europe. Disease caused by C. difficile typically occurs during antibiotic therapy and its emergence over the past 40 years is linked with the widespread use of broad-spectrum antibiotics in both human and veterinary medicine. Results We sequenced the genome of Clostridium difficile 5.3 using the Illumina Nextera XT and MiSeq technologies. Assembly of the sequence data reconstructed a 4,009,318 bp genome in 27 scaffolds with an N50 of 786 kbp. The genome has extensive similarity to other sequenced C. difficile genomes, but also has several genes that are potentially related to virulence and pathogenicity that are not present in the reference C. difficile strain. Conclusion Genome sequencing of human and animal isolates is needed to understand the molecular events driving the emergence of C. difficile as a gastrointestinal pathogen of humans and food animals and to better define its zoonotic potential.
Pineda, S.S., Sollod, B., Wilson, D., Darling, A.E., Sunagar, K., Undheim, E., Kely, L., Agostinho, A., Fry, B. & King, G.F. 2014, 'Diversification of a single ancestral gene into a successful toxin superfamily in highly venomous Australian funnel-web spiders', BMC Genomics, vol. 15, no. 1, pp. 1-16.
View/Download from: UTS OPUS or Publisher's site
Background Spiders have evolved pharmacologically complex venoms that serve to rapidly subdue prey and deter predators. The major toxic factors in most spider venoms are small, disulfide-rich peptides. While there is abundant evidence that snake venoms evolved by recruitment of genes encoding normal body proteins followed by extensive gene duplication accompanied by explosive structural and functional diversification, the evolutionary trajectory of spider-venom peptides is less clear. Results Here we present evidence of a spider-toxin superfamily encoding a high degree of sequence and functional diversity that has evolved via accelerated duplication and diversification of a single ancestral gene. The peptides within this toxin superfamily are translated as prepropeptides that are posttranslationally processed to yield the mature toxin. The N-terminal signal sequence, as well as the protease recognition site at the junction of the propeptide and mature toxin are conserved, whereas the remainder of the propeptide and mature toxin sequences are variable. All toxin transcripts within this superfamily exhibit a striking cysteine codon bias. We show that different pharmacological classes of toxins within this peptide superfamily evolved under different evolutionary selection pressures. Conclusions Overall, this study reinforces the hypothesis that spiders use a combinatorial peptide library strategy to evolve a complex cocktail of peptide toxins that target neuronal receptors and ion channels in prey and predators. We show that the ?-hexatoxins that target insect voltage-gated calcium channels evolved under the influence of positive Darwinian selection in an episodic fashion, whereas the ?-hexatoxins that target insect calcium-activated potassium channels appear to be under negative selection. A majority of the diversifying sites in the ?-hexatoxins are concentrated on the molecular surface of the toxins, thereby facilitating neofunctionalisation leading to new toxin...
Beitel, C., Froenicke, L., Lang, J.M., Korf, I.F., Michelmore, R.W., Eisen, J.A. & Darling, A.E. 2014, 'Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products', PeerJ, vol. 2.
View/Download from: UTS OPUS or Publisher's site
Metagenomics is a valuable tool for the study of microbial communities but has been limited by the difficulty of binning the resulting sequences into groups corresponding to the individual species and strains that constitute the community. Moreover, there are presently no methods to track the flow of mobile DNA elements such as plasmids through communities or to determine which of these are co-localized within the same cell. We address these limitations by applying Hi-C, a technology originally designed for the study of three-dimensional genome structure in eukaryotes, to measure the cellular co-localization of DNA sequences. We leveraged Hi-C data generated from a simple synthetic metagenome sample to accurately cluster metagenome assembly contigs into groups that contain nearly complete genomes of each species. The Hi-C data also reliably associated plasmids with the chromosomes of their host and with each other. We further demonstrated that Hi-C data provides a long-range signal of strain-specific genotypes, indicating such data may be useful for high-resolution genotyping of microbial populations. Our work demonstrates that Hi-C sequencing data provide valuable information for metagenome analyses that are not currently obtainable by other methods. This metagenomic Hi-C method could facilitate future studies of the fine-scale population structure of microbes, as well as studies of how antibiotic resistance plasmids (or other genetic elements) mobilize in microbial communities. The method is not limited to microbiology; the genetic architecture of other heterogeneous populations of cells could also be studied with this technique.
Darling, A.E., Jospin, G., Lowe, E., Matsen, I.V.F.A., Bik, H.M. & Eisen, J.A. 2014, 'PhyloSift: phylogenetic analysis of genomes and metagenomes', PeerJ, vol. 2.
View/Download from: UTS OPUS or Publisher's site
Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection. In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata. These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).
Earl, D., Nguyen, N., Hickey, G., Harris, R.S., Fitzgerald, S., Beal, K., Seledtsov, I., Molodtsov, V., Raney, B.J., Clawson, H., Kim, J., Kemena, C., Chang, J.M., Erb, I., Poliakov, A., Hou, M., Herrero, J., Kent, W.J., Solovyev, V., Darling, A.E., Ma, J., Notredame, C., Brudno, M., Dubchak, I., Haussler, D. & Paten, B. 2014, 'Alignathon: a competitive assessment of whole-genome alignment methods.', Genome research, vol. 24, no. 12, pp. 2077-2089.
View/Download from: UTS OPUS
Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.
Lauro, F.M., Senstius, S.J., Cullen, J., Neches, R., Jensen, R.M., Brown, M.V., Darling, A.E., Givskov, M., McDougald, D., Hoeke, R., Ostrowski, M., Philip, G.K., Paulsen, I.T. & Grzymski, J.J. 2014, 'The Common Oceanographer: Crowdsourcing the Collection of Oceanographic Data', PLOS BIOLOGY, vol. 12, no. 9.
View/Download from: UTS OPUS or Publisher's site
Becker, E.A., Seitzer, P.M., Tritt, A., Larsen, D., Krusor, M., Yao, A.I., Wu, D., Madern, D., Eisen, J.A., Darling, A.E. & Facciotti, M.T. 2014, 'Phylogenetically driven sequencing of extremely halophilic archaea reveals strategies for static and dynamic osmo-response.', PLoS genetics, vol. 10, no. 11, p. e1004784.
View/Download from: UTS OPUS or Publisher's site
Organisms across the tree of life use a variety of mechanisms to respond to stress-inducing fluctuations in osmotic conditions. Cellular response mechanisms and phenotypes associated with osmoadaptation also play important roles in bacterial virulence, human health, agricultural production and many other biological systems. To improve understanding of osmoadaptive strategies, we have generated 59 high-quality draft genomes for the haloarchaea (a euryarchaeal clade whose members thrive in hypersaline environments and routinely experience drastic changes in environmental salinity) and analyzed these new genomes in combination with those from 21 previously sequenced haloarchaeal isolates. We propose a generalized model for haloarchaeal management of cytoplasmic osmolarity in response to osmotic shifts, where potassium accumulation and sodium expulsion during osmotic upshock are accomplished via secondary transport using the proton gradient as an energy source, and potassium loss during downshock is via a combination of secondary transport and non-specific ion loss through mechanosensitive channels. We also propose new mechanisms for magnesium and chloride accumulation. We describe the expansion and differentiation of haloarchaeal general transcription factor families, including two novel expansions of the TATA-binding protein family, and discuss their potential for enabling rapid adaptation to environmental fluxes. We challenge a recent high-profile proposal regarding the evolutionary origins of the haloarchaea by showing that inclusion of additional genomes significantly reduces support for a proposed large-scale horizontal gene transfer into the ancestral haloarchaeon from the bacterial domain. The combination of broad (17 genera) and deep (5 species in four genera) sampling of a phenotypically unified clade has enabled us to uncover both highly conserved and specialized features of osmoadaptation. Finally, we demonstrate the broad utility of such datasets, for m...
Islam, M.A., Labbate, M., Djordjevic, S.P., Alam, M., Darling, A.E., Melvold, J.A., Holmes, A.J., Johura, F.T., Cravioto, A., Charles, I.G. & Stokes, H. 2013, 'Indigenous Vibrio cholerae strains from a non-endemic region are pathogenic', Open Biology, vol. 3, p. 120181.
View/Download from: UTS OPUS or Publisher's site
Of the 200þ serogroups of Vibrio cholerae, only O1 or O139 strains are reported to cause cholera, and mostly in endemic regions. Cholera outbreaks elsewhere are considered to be via importation of pathogenic strains. Using established animal models, we show that diverse V. cholerae strains indigenous to a nonendemic environment (Sydney, Australia), including non-O1/O139 serogroup strains, are able to both colonize the intestine and result in fluid accumulation despite lacking virulence factors believed to be important. Most strains lacked the type three secretion system considered a mediator of diarrhoea in nonO1/O13 V. cholerae. Multi-locus sequence typing (MLST) showed that the Sydney isolates did not form a single clade and were distinct from O1/O139 toxigenic strains. There was no correlation between genetic relatedness and the profile of virulence-associated factors. Current analyses of diseases mediated by V. cholerae focus on endemic regions, with only those strains that possess particular virulence factors considered pathogenic. Our data suggest that factors other than those previously well described are of potential importance in influencing disease outbreaks.
Treangen, T., Koren, S., Sommer, D., Liu, B., Astrovskaya, I., Ondov, B., Darling, A.E., Phillippy, A. & Pop, M. 2013, 'Metamos: A Modular And Open Source Metagenomic Assembly And Analysis Pipeline', Genome Biology, vol. 14, no. 1, pp. 0-0.
View/Download from: UTS OPUS or Publisher's site
We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds,
Lang, J., Darling, A.E. & Eisen, J.A. 2013, 'Phylogeny Of Bacterial And Archaeal Genomes Using Conserved Genes: Supertrees And Supermatrices', PLoS ONE, vol. 8, no. 4, pp. 1-15.
View/Download from: UTS OPUS or Publisher's site
Over 3000 microbial (bacterial and archaeal) genomes have been made publically available to date, providing an unprecedented opportunity to examine evolutionary genomic trends and offering valuable reference data for a variety of other studies such as me
Rands, C., Darling, A.E., Fujita, M., Kong, L., Webster, M., Clabaut, C., Emes, R., Heger, A., Meader, S., Hawkins, M., Eisen, M., Teiling, C., Affourtit, J., Boese, B., Grant, P., Grant, B.R., Eisen, J.A., Abzhanov, A. & Ponting, C. 2013, 'Insights Into The Evolution Of Darwin's Finches From Comparative Analysis Of The Geospiza Magnirostris Genome Sequence', BMC Genomics, vol. 14, no. NA, pp. 1-15.
View/Download from: UTS OPUS or Publisher's site
Background: A classical example of repeated speciation coupled with ecological diversification is the evolution of 14 closely related species of Darwin's (Galapagos) finches (Thraupidae, Passeriformes). Their adaptive radiation in the Galapagos archipela
Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N., Anderson, I.J., Cheng, J., Darling, A.E., Malfatti, S., Swan, B.K., Gies, E.A., Dodsworth, J.A., Hedlund, B.P., Tsiamis, G., Sievert, S.M., Liu, W., Eisen, J.A., Hallam, S.J., Kyrpides, N.C., Stepanauskas, R., Rubin, E., Hugenholtz, P. & Woyke, T. 2013, 'Insights into the phylogeny and coding potential of microbial dark matter', Nature, vol. 499, no. 7459, pp. 431-437.
View/Download from: UTS OPUS or Publisher's site
Genome sequencing enhances our understanding of the biological world by providing blueprints for the evolutionary and functional diversity that shapes the biosphere. However, microbial genomes that are currently available are of limited phylogenetic breadth, owing to our historical inability to cultivate most microorganisms in the laboratory. We apply single-cell genomics to target and sequence 201?uncultivated archaeal and bacterial cells from nine diverse habitats belonging to 29?major mostly uncharted branches of the tree of life, so-called `microbial dark matter. With this additional genomic information, we are able to resolve many intra- and inter-phylum-level relationships and to propose two new superphyla. We uncover unexpected metabolic features that extend our understanding of biology and challenge established boundaries between the three domains of life. These include a novel amino acid use for the opal stop codon, an archaeal-type purine synthesis in Bacteria and complete sigma factors in Archaea similar to those in Bacteria. The single-cell genomes also served to phylogenetically anchor up to 20% of metagenomic reads in some habitats, facilitating organism-level interpretation of ecosystem function. This study greatly expands the genomic representation of the tree of life and provides a systematic step towards a better understanding of biological evolution on our planet.
Holland-Moritz, H.E., Bevans, D.R., Lang, J.M., Darling, A.E., Eisen, J.A. & Coil, D.A. 2013, 'Draft Genome Sequence of Leucobacter sp. Strain UCD-THU (Phylum Actinobacteria).', Genome announcements, vol. 1, no. 3, pp. S69-S82.
Here we present the draft genome of Leucobacter sp. strain UCD-THU. The genome contains 3,317,267 bp in 11 scaffolds. This strain was isolated from a residential toilet as part of an undergraduate project to sequence reference genomes of microbes from the built environment.
Flanagan, J.C., Lang, J.M., Darling, A.E., Eisen, J.A. & Coil, D.A. 2013, 'Draft Genome Sequence of Curtobacterium flaccumfaciens Strain UCD-AKU (Phylum Actinobacteria).', Genome announcements, vol. 1, no. 3, p. 5.
Here we present the draft genome of an actinobacterium, Curtobacterium flaccumfaciens strain UCD-AKU, isolated from a residential carpet. The genome assembly contains 3,692,614 bp in 130 contigs. This is the first member of the Curtobacterium genus to be sequenced.
Diep, A.L., Lang, J.M., Darling, A.E., Eisen, J.A. & Coil, D.A. 2013, 'Draft Genome Sequence of Dietzia sp. Strain UCD-THP (Phylum Actinobacteria).', Genome announcements, vol. 1, no. 3, pp. 198-204.
Here, we present the draft genome sequence of an actinobacterium, Dietzia sp. strain UCD-THP, isolated from a residential toilet handle. The assembly contains 3,915,613 bp. The genome sequences of only two other Dietzia species have been published, those of Dietzia alimentaria and Dietzia cinnamea.
Coil, D.A., Doctor, J.I., Lang, J.M., Darling, A.E. & Eisen, J.A. 2013, 'Draft Genome Sequence of Kocuria sp. Strain UCD-OTCP (Phylum Actinobacteria).', Genome announcements, vol. 1, no. 3, pp. 198-204.
View/Download from: UTS OPUS
Here, we present the draft genome of Kocuria sp. strain UCD-OTCP, a member of the phylum Actinobacteria, isolated from a restaurant chair cushion. The assembly contains 3,791,485 bp (G+C content of 73%) and is contained in 68 scaffolds.
Bendiks, Z.A., Lang, J.M., Darling, A.E., Eisen, J.A. & Coil, D.A. 2013, 'Draft Genome Sequence of Microbacterium sp. Strain UCD-TDU (Phylum Actinobacteria).', Genome announcements, vol. 1, no. 2, p. e0012013.
Here, we present the draft genome sequence of Microbacterium sp. strain UCD-TDU, a member of the phylum Actinobacteria. The assembly contains 3,746,321 bp (in 8 scaffolds). This strain was isolated from a residential toilet as part of an undergraduate student research project to sequence reference genomes of microbes from the built environment.
Lo, J.R., Lang, J.M., Darling, A.E., Eisen, J.A. & Coil, D.A. 2013, 'Draft genome sequence of an Actinobacterium, Brachybacterium muris strain UCD-AY4.', Genome announcements, vol. 1, no. 2, p. e0008613.
Here we present the draft genome of an actinobacterium, Brachybacterium muris UCD-AY4. The assembly contains 3,257,338 bp and has a GC content of 70%. This strain was isolated from a residential bath towel and has a 16S rRNA gene 99.7% identical to that of the original B. muris strain, C3H-21.
Sheppard, S.K., Didelot, X., Jolley, K.A., Darling, A.E., Pascoe, B., Meric, G., Kelly, D.J., Cody, A., Colles, F.M., Strachan, N.J.C., Ogden, I.D., Forbes, K., French, N.P., Carter, P., Miller, W.G., Mccarthy, N.D., Owen, R., Litrup, E., Egholm, M., Affourtit, J.P., Bentley, S.D., Parkhill, J., Maiden, M.C.J. & Falush, D. 2013, 'Progressive genome-wide introgression in agricultural Campylobacter coli', MOLECULAR ECOLOGY, vol. 22, no. 4, pp. 1051-1064.
View/Download from: Publisher's site
Ronquist, F., Teslenko, M., Van Der Mark, P., Ayres, D., Darling, A.E., Hohna, S., Larget, B., Liu, L., Suchard, M. & Huelsenbeck, J. 2012, 'MrBayes 3.2: Efficient Bayesian Phylogenetic Inference And Model Choice Across A Large Model Space', Systematic Biology, vol. 61, no. 3, pp. 539-542.
View/Download from: UTS OPUS or Publisher's site
Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest
Cadillo-quiroz, H., Didelot, X., Held, N., Herrera, A., Darling, A.E., Reno, M., Krause, D. & Whitaker, R. 2012, 'Patterns Of Gene Flow Define Species Of Thermophilic Archaea', PLoS Biology, vol. 10, no. 2, pp. 1-11.
View/Download from: UTS OPUS or Publisher's site
Despite a growing appreciation of their vast diversity in nature, mechanisms of speciation are poorly understood in Bacteria and Archaea. Here we use high-throughput genome sequencing to identify ongoing speciation in the thermoacidophilic Archaeon Sulfo
Lynch, E., Langille, M., Darling, A.E., Wilbanks, E., Haltiner, C., Shao, K., Starr, M., Teiling, C., Harkins, T., Edwards, R., Eisen, J.A. & Facciotti, M. 2012, 'Sequencing Of Seven Haloarchaeal Genomes Reveals Patterns Of Genomic Flux', PLoS ONE, vol. 7, no. 7, pp. 1-13.
View/Download from: UTS OPUS or Publisher's site
We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model grou
Tritt, A., Eisen, J.A., Facciotti, M. & Darling, A.E. 2012, 'An Integrated Pipeline For De Novo Assembly Of Microbial Genomes', PLoS ONE, vol. 7, no. 9, pp. 1-9.
View/Download from: UTS OPUS or Publisher's site
Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtai
Ayres, D., Darling, A.E., Zwickl, D., Beerli, P., Holder, M., Lewis, P., Huelsenbeck, J., Ronquist, F., Swofford, D., Cummings, M., Rambaut, A. & Suchard, M. 2012, 'Beagle: An Application Programming Interface And High-performance Computing Library For Statistical Phylogenetics', Systematic Biology, vol. 61, no. 1, pp. 170-173.
View/Download from: UTS OPUS or Publisher's site
Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood es
Didelot, X., Meric, G., Falush, D. & Darling, A.E. 2012, 'Impact Of Homologous And Non-homologous Recombination In The Genomic Evolution Of Escherichia Coli', BMC Genomics, vol. 13, no. NA, pp. 1-15.
View/Download from: UTS OPUS or Publisher's site
Background: Escherichia coli is an important species of bacteria that can live as a harmless inhabitant of the guts of many animals, as a pathogen causing life-threatening conditions or freely in the non-host environment. This diversity of lifestyles has
Earl, D., Bradnam, K., St John, J., Darling, A.E., Lin, D., Fass, J., Hung, O., Buffalo, V., Zerbino, D., Diekhans, M., Nguyen, N., Ariyaratne, P., Sung, W., Ning, Z., Haimel, M., Simpson, J., Fonseca, N., Birol, I., Docking, T., Ho, I., Rokhsar, D. & Chikhi, R. 2011, 'Assemblathon 1: A Competitive Assessment Of De Novo Short Read Assembly Methods', Genome Research, vol. 21, no. 12, pp. 2224-2241.
View/Download from: UTS OPUS or Publisher's site
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively ass
Darling, A.E., Tritt, A., Eisen, J.A. & Facciotti, M. 2011, 'Mauve Assembly Metrics', Bioinformatics, vol. 27, no. 19, pp. 2756-2757.
View/Download from: UTS OPUS or Publisher's site
High-throughput DNA sequencing technologies have spurred the development of numerous novel methods for genome assembly. With few exceptions, these algorithms are heuristic and require one or more parameters to be manually set by the user. One approach to
Darling, A.E., Mau, B. & Perna, N. 2010, 'Progressivemauve: Multiple Genome Alignment With Gene Gain, Loss And Rearrangement', Plos One, vol. 5, no. 6, pp. 1-17.
View/Download from: UTS OPUS or Publisher's site
Background: Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. Methodology/Princip
Morgan, J., Darling, A.E. & Eisen, J.A. 2010, 'Metagenomic Sequencing Of An In Vitro-simulated Microbial Community', PLoS ONE, vol. 5, no. 4, pp. 1-10.
View/Download from: UTS OPUS or Publisher's site
Background: Microbial life dominates the earth, but many species are difficult or even impossible to study under laboratory conditions. Sequencing DNA directly from the environment, a technique commonly referred to as metagenomics, is an important tool f
Srivastava, M., Simakov, O., Chapman, J., Fahey, B., Gauthier, M.E.A., Mitros, T., Richards, G.S., Conaco, C., Dacre, M., Hellsten, U., Larroux, C., Putnam, N.H., Stanke, M., Adamska, M., Darling, A., Degnan, S.M., Oakley, T.H., Plachetzki, D.C., Zhai, Y., Adamski, M., Calcino, A., Cummins, S.F., Goodstein, D.M., Harris, C., Jackson, D.J., Leys, S.P., Shu, S., Woodcroft, B.J., Vervoort, M., Kosik, K.S., Manning, G., Degnan, B.M. & Rokhsar, D.S. 2010, 'The Amphimedon queenslandica genome and the evolution of animal complexity.', Nature, vol. 466, no. 7307, pp. 720-726.
View/Download from: Publisher's site
Sponges are an ancient group of animals that diverged from other metazoans over 600 million years ago. Here we present the draft genome sequence of Amphimedon queenslandica, a demosponge from the Great Barrier Reef, and show that it is remarkably similar to other animal genomes in content, structure and organization. Comparative analysis enabled by the sequencing of the sponge genome reveals genomic events linked to the origin and early evolution of animals, including the appearance, expansion and diversification of pan-metazoan transcription factor, signalling pathway and structural genes. This diverse 'toolkit' of genes correlates with critical aspects of all metazoan body plans, and comprises cell cycle control and growth, development, somatic- and germ-cell specification, cell adhesion, innate immunity and allorecognition. Notably, many of the genes associated with the emergence of animals are also implicated in cancer, which arises from defects in basic processes associated with metazoan multicellularity.
Didelot, X., Lawson, D., Darling, A. & Falush, D. 2010, 'Inference of homologous recombination in bacteria using whole-genome sequences.', Genetics, vol. 186, no. 4, pp. 1435-1449.
View/Download from: Publisher's site
Bacteria and archaea reproduce clonally, but sporadically import DNA into their chromosomes from other organisms. In many of these events, the imported DNA replaces an homologous segment in the recipient genome. Here we present a new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes. We introduce a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample. The model represents a simplification of the previously described coalescent with gene conversion. We implement a Monte Carlo Markov chain algorithm to perform inference under this model from sequence data alignments and show that inference is feasible for whole-genome alignments through parallelization. Using simulated data, we demonstrate accurate and reliable identification of individual recombination events and global recombination rate parameters. We applied our approach to an alignment of 13 whole genomes from the Bacillus cereus group. We find, as expected from laboratory experiments, that the recombination rate is higher between closely related organisms and also that the genome contains several broad regions of elevated levels of recombination. Application of the method to the genomic data sets that are becoming available should reveal the evolutionary history and private lives of populations of bacteria and archaea. The methods described in this article have been implemented in a computer software package, ClonalOrigin, which is freely available from http://code.google.com/p/clonalorigin/.
Rissman, A., Mau, B., Biehl, B., Darling, A.E., Glasner, J. & Perna, N. 2009, 'Reordering Contigs Of Draft Genomes Using The Mauve Aligner', Bioinformatics, vol. 25, no. 16, pp. 2071-2073.
View/Download from: UTS OPUS or Publisher's site
Mauve Contig Mover provides a new method for proposing the relative order of contigs that make up a draft genome based on comparison to a complete or draft reference genome. A novel application of the Mauve aligner and viewer provides an automated reorde
Timmins, M., Thomas-hall, S., Darling, A.E., Zhang, E., Hankamer, B., Marx, U. & Schenk, P. 2009, 'Phylogenetic And Molecular Analysis Of Hydrogen-producing Green Algae', Journal Of Experimental Botany, vol. 60, no. 6, pp. 1691-1702.
View/Download from: UTS OPUS or Publisher's site
A select set of microalgae are reported to be able to catalyse photobiological H(2) production from water. Based on the model organism Chlamydomonas reinhardtii, a method was developed for the screening of naturally occurring H(2)-producing microalgae. B
Esteban-marcos, A., Darling, A.E. & Ragan, M. 2009, 'Seevolution: Visualizing Chromosome Evolution', Bioinformatics, vol. 25, no. 7, pp. 960-961.
View/Download from: UTS OPUS or Publisher's site
Genome evolution underpins all of biology, yet its principles can be difficult to communicate to the non-specialist. To facilitate broader understanding of genome evolution, we have designed an interactive 3D environment that enables visualization of div
Treangen, T., Darling, A.E., Achaz, G., Ragan, M., Messeguer, X. & Rocha, E. 2009, 'A Novel Heuristic For Local Multiple Alignment Of Interspersed DNA Repeats', IEEE-acm Transactions On Computational Biology And Bioinformatics, vol. 6, no. 2, pp. 180-189.
View/Download from: UTS OPUS or Publisher's site
Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered fro
Chan, C., Darling, A.E., Beiko, R. & Ragan, M. 2009, 'Are Protein Domains Modules Of Lateral Genetic Transfer?', PLoS ONE, vol. 4, no. 2, pp. 1-8.
View/Download from: UTS OPUS
Background: In prokaryotes and some eukaryotes, genetic material can be transferred laterally among unrelated lineages and recombined into new host genomes, providing metabolic and physiological novelty. Although the process is usually framed in terms of
Miklos, I. & Darling, A.E. 2009, 'Efficient Sampling Of Parsimonious Inversion Histories With Application To Genome Rearrangement In Yersinia', Genome Biology and Evolution, vol. 1, no. NA, pp. 153-164.
View/Download from: UTS OPUS or Publisher's site
Inversions are among the most common mutations acting on the order and orientation of genes in a genome, and polynomial-time algorithms exist to obtain a minimal length series of inversions that transform one genome arrangement to another. However, the m
Didelot, X., Darling, A.E. & Falush, D. 2009, 'Inferring Genomic Flux In Bacteria', Genome Research, vol. 19, no. 2, pp. 306-317.
View/Download from: UTS OPUS or Publisher's site
Acquisition and loss of genetic material are essential forces in bacterial microevolution. They have been repeatedly linked with adaptation of lineages to new lifestyles, and in particular, pathogenicity. Comparative genomics has the potential to elucida
Chan, C., Beiko, R., Darling, A.E. & Ragan, M. 2009, 'Lateral Transfer Of Genes And Gene Fragments In Prokaryotes', Genome Biology and Evolution, vol. 1, no. NA, pp. 429-438.
View/Download from: UTS OPUS or Publisher's site
Lateral genetic transfer (LGT) involves the movement of genetic material from one lineage into another and its subsequent incorporation into the new host genome via genetic recombination. Studies in individual taxa have indicated lateral origins for stre
Chan, C.X., Darling, A.E., Beiko, R.G. & Ragan, M.A. 2009, 'Are protein domains modules of lateral genetic transfer?', PloS one, vol. 4, no. 2, p. e4524.
View/Download from: UTS OPUS
BACKGROUND: In prokaryotes and some eukaryotes, genetic material can be transferred laterally among unrelated lineages and recombined into new host genomes, providing metabolic and physiological novelty. Although the process is usually framed in terms of gene sharing (e.g. lateral gene transfer, LGT), there is little reason to imagine that the units of transfer and recombination correspond to entire, intact genes. Proteins often consist of one or more spatially compact structural regions (domains) which may fold autonomously and which, singly or in combination, confer the protein's specific functions. As LGT is frequent in strongly selective environments and natural selection is based on function, we hypothesized that domains might also serve as modules of genetic transfer, i.e. that regions of DNA that are transferred and recombined between lineages might encode intact structural domains of proteins. METHODOLOGY/PRINCIPAL FINDINGS: We selected 1,462 orthologous gene sets representing 144 prokaryotic genomes, and applied a rigorous two-stage approach to identify recombination breakpoints within these sequences. Recombination breakpoints are very significantly over-represented in gene sets within which protein domain-encoding regions have been annotated. Within these gene sets, breakpoints significantly avoid the domain-encoding regions (domons), except where these regions constitute most of the sequence length. Recombination breakpoints that fall within longer domons are distributed uniformly at random, but those that fall within shorter domons may show a slight tendency to avoid the domon midpoint. As we find no evidence for differential selection against nucleotide substitutions following the recombination event, any bias against disruption of domains must be a consequence of the recombination event per se. CONCLUSIONS/SIGNIFICANCE: This is the first systematic study relating the units of LGT to structural features at the protein level. Many genes have been in...
Chan, C.X., Beiko, R.G., Darling, A.E. & Ragan, M.A. 2009, 'Lateral Transfer of Genes and Gene Fragments in Prokaryotes', GENOME BIOLOGY AND EVOLUTION, vol. 1, pp. 429-438.
View/Download from: UTS OPUS or Publisher's site
Kropinski, A.M., Borodovsky, M., Carver, T.J., Cerdeño-Tárraga, A.M., Darling, A., Lomsadze, A., Mahadevan, P., Stothard, P., Seto, D., Van Domselaar, G. & Wishart, D.S. 2009, 'In silico identification of genes in bacteriophage DNA.', Methods in molecular biology (Clifton, N.J.), vol. 502, pp. 57-89.
View/Download from: Publisher's site
One of the most satisfying aspects of a genome sequencing project is the identification of the genes contained within it.These are of two types: those which encode tRNAs and those which produce proteins. After a general introduction on the properties of protein-encoding genes and the utility of the Basic Local Alignment Search Tool (BLASTX) to identify genes through homologs, a variety of tools are discussed by their creators. These include for genome annotation: GeneMark, Artemis, and BASys; and, for genome comparisons: Artemis Comparison Tool (ACT), Mauve, CoreGenes, and GeneOrder.
Darling, A.E., Miklos, I. & Ragan, M. 2008, 'Dynamics Of Genome Rearrangement In Bacterial Populations', PLoS Genetics, vol. 4, no. 7, pp. 1-16.
View/Download from: UTS OPUS or Publisher's site
Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and p
Darling, A.E., Miklos, I. & Ragan, M.A. 2008, 'Dynamics of Genome Rearrangement in Bacterial Populations', PLOS GENETICS, vol. 4, no. 7.
View/Download from: UTS OPUS or Publisher's site
Friedberg, R., Darling, A.E. & Yancopoulos, S. 2008, 'Genome rearrangement by the double cut and join operation.', Methods in molecular biology (Clifton, N.J.), vol. 452, pp. 385-416.
The Double Cut and Join is an operation acting locally at four chromosomal positions without regard to chromosomal context. This chapter discusses its application and the resulting menu of operations for genomes consisting of arbitrary numbers of circular chromosomes, as well as for a general mix of linear and circular chromosomes. In the general case the menu includes: inversion, translocation, transposition, formation and absorption of circular intermediates, conversion between linear and circular chromosomes, block interchange, fission, and fusion. This chapter discusses the well-known edge graph and its dual, the adjacency graph, recently introduced by Bergeron et al. Step-by-step procedures are given for constructing and manipulating these graphs. Simple algorithms are given in the adjacency graph for computing the minimal DCJ distance between two genomes and finding a minimal sorting; and use of an online tool (Mauve) to generate synteny blocks and apply DCJ is described.
Glasner, J.D., III, P.G., Anderson, B.D., Baumler, D.J., Biehl, B.S., Burland, V., Cabot, E.L., Darling, A.E., Mau, B., Neeno-Eckwall, E.C., Pot, D., Qiu, Y., Rissman, A.I., Worzella, S., Zaremba, S., Fedorko, J., Hampton, T., Liss, P., Rusch, M., Shaker, M., Shaull, L., Shetty, P., Thotakura, S., Whitmore, J., Blattner, F.R., Greene, J.M. & Perna, N.T. 2008, 'Enteropathogen Resource Integration Center (ERIC): bioinformatics support for research on biodefense-relevant enterobacteria', NUCLEIC ACIDS RESEARCH, vol. 36, pp. D519-D523.
View/Download from: UTS OPUS or Publisher's site
Darling, A.E., Treangen, T.J., Messeguer, X. & Perna, N.T. 2007, 'Analyzing patterns of microbial evolution using the mauve genome alignment system.', Methods in molecular biology (Clifton, N.J.), vol. 396, pp. 135-152.
During the course of evolution, genomes can undergo large-scale mutation events such as rearrangement and lateral transfer. Such mutations can result in significant variations in gene order and gene content among otherwise closely related organisms. The Mauve genome alignment system can successfully identify such rearrangement and lateral transfer events in comparisons of multiple microbial genomes even under high levels of recombination. This chapter outlines the main features of Mauve and provides examples that describe how to use Mauve to conduct a rigorous multiple genome comparison and study evolutionary patterns.
Darling, A.E., Treangen, T.J., Messeguer, X. & Perna, N.T. 2007, 'Analyzing patterns of microbial evolution using the mauve genome alignment system', Methods in Molecular Biology, vol. 396, pp. 135-152.
View/Download from: Publisher's site
During the course of evolution, genomes can undergo large-scale mutation events such as rearrangement and lateral transfer. Such mutations can result in significant variations in gene order and gene content among otherwise closely related organisms. The Mauve genome alignment system can successfully identify such rearrangement and lateral transfer events in comparisons of multiple microbial genomes even under high levels of recombination. This chapter outlines the main features of Mauve and provides examples that describe how to use Mauve to conduct a rigorous multiple genome comparison and study evolutionary patterns. © Humana Press Inc.
Mau, B., Glasner, J., Darling, A.E. & Perna, N. 2006, 'Genome-wide Detection And Analysis Of Homologous Recombination Among Sequenced Strains Of Escherichia Coli', Genome Biology, vol. 7, no. 5, pp. 1-12.
View/Download from: UTS OPUS
Background: Comparisons of complete bacterial genomes reveal evidence of lateral transfer of DNA across otherwise clonally diverging lineages. Some lateral transfer events result in acquisition of novel genomic segments and are easily detected through ge
Mau, B., Glasner, J.D., Darling, A.E. & Perna, N.T. 2006, 'Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli.', Genome biology, vol. 7, no. 5, p. R44.
View/Download from: UTS OPUS
BACKGROUND: Comparisons of complete bacterial genomes reveal evidence of lateral transfer of DNA across otherwise clonally diverging lineages. Some lateral transfer events result in acquisition of novel genomic segments and are easily detected through genome comparison. Other more subtle lateral transfers involve homologous recombination events that result in substitution of alleles within conserved genomic regions. This type of event is observed infrequently among distantly related organisms. It is reported to be more common within species, but the frequency has been difficult to quantify since the sequences under comparison tend to have relatively few polymorphic sites. RESULTS: Here we report a genome-wide assessment of homologous recombination among a collection of six complete Escherichia coli and Shigella flexneri genome sequences. We construct a whole-genome multiple alignment and identify clusters of polymorphic sites that exhibit atypical patterns of nucleotide substitution using a random walk-based method. The analysis reveals one large segment (approximately 100 kb) and 186 smaller clusters of single base pair differences that suggest lateral exchange between lineages. These clusters include portions of 10% of the 3,100 genes conserved in six genomes. Statistical analysis of the functional roles of these genes reveals that several classes of genes are over-represented, including those involved in recombination, transport and motility. CONCLUSION: We demonstrate that intraspecific recombination in E. coli is much more common than previously appreciated and may show a bias for certain types of genes. The described method provides high-specificity, conservative inference of past recombination events.
Glasner, J.D., Rusch, M., Liss, P., Plunkett, G., Cabot, E.L., Darling, A., Anderson, B.D., Infield-Harm, P., Gilson, M.C. & Perna, N.T. 2006, 'ASAP: a resource for annotating, curating, comparing, and disseminating genomic data.', Nucleic acids research, vol. 34, no. Database issue, pp. D41-D45.
View/Download from: Publisher's site
ASAP is a comprehensive web-based system for community genome annotation and analysis. ASAP is being used for a large-scale effort to augment and curate annotations for genomes of enterobacterial pathogens and for additional genome sequences. New tools, such as the genome alignment program Mauve, have been incorporated into ASAP in order to improve display and analysis of related genomes. Recent improvements to the database and challenges for future development of the system are discussed. ASAP is available on the web at https://asap.ahabs.wisc.edu/asap/logon.php.
Glasner, J.D., Rusch, M., Liss, P., Plunkett, G., Cabot, E.L., Darling, A., Anderson, B.D., Infield-Harm, P., Gilson, M.C. & Perna, N.T. 2006, 'ASAP: a resource for annotating, curating, comparing, and disseminating genomic data.', Nucleic acids research., vol. 34, no. Database issue.
ASAP is a comprehensive web-based system for community genome annotation and analysis. ASAP is being used for a large-scale effort to augment and curate annotations for genomes of enterobacterial pathogens and for additional genome sequences. New tools, such as the genome alignment program Mauve, have been incorporated into ASAP in order to improve display and analysis of related genomes. Recent improvements to the database and challenges for future development of the system are discussed. ASAP is available on the web at https://asap.ahabs.wisc.edu/asap/logon.php.
Darling, A.E., Mau, B., Blattner, F. & Perna, N. 2004, 'Mauve: Multiple Alignment Of Conserved Genomic Sequence With Rearrangements', Genome Research, vol. 14, no. 7, pp. 1394-1403.
View/Download from: UTS OPUS or Publisher's site
As genomes evolve, they undergo large-scale evolutionary processes that present a challenge to sequence comparison not posed by short sequences. Recombination causes frequent genome rearrangements, horizontal transfer introduces new sequences into bacter
Darling, A.E., Mau, B., Blattner, F. & Perna, N. 2004, 'GRIL: Genome Rearrangement And Inversion Locator', Bioinformatics, vol. 20, no. 1, pp. 122-124.
View/Download from: Publisher's site
GRIL is a tool to automatically identify collinear regions in a set of bacterial-size genome sequences. GRIL uses three basic steps. First, regions of high sequence identity are located. Second, some of these regions are filtered based on user-specified
Wei, J., Goldberg, M., Burland, V., Venkatesan, M., Deng, W., Fournier, G., Mayhew, G., Plunkett, G., Rose, D., Darling, A.E., Mau, B., Perna, N., Payne, S., Runyen-janecky, L., Zhou, S., Schwartz, D. & Blattner, F. 2003, 'Complete Genome Sequence And Comparative Genomics Of Shigella Flexneri Serotype 2a Strain 2457T', Infection And Immunity, vol. 71, no. 5, pp. 2775-2786.
View/Download from: UTS OPUS or Publisher's site
We determined the complete genome sequence of Shigella flexneri serotype 2a strain 2457T (4,599,354 bp). Shigella species cause >1 million deaths per year from dysentery and diarrhea and have a lifestyle that is markedly different from those of closely r
Glasner, J., Liss, P., Plunkett, G., Darling, A.E., Prasad, T., Rusch, M., Byrnes, A., Gilson, M., Biehl, B., Blattner, F. & Perna, N. 2003, 'ASAP, A Systematic Annotation Package For Community Analysis Of Genomes', Nucleic Acids Research, vol. 31, no. 1, pp. 147-151.
View/Download from: Publisher's site
ASAP (a systematic annotation package for community analysis of genomes) is a relational database and web interface developed to store, update and distribute genome sequence data and functional characterization (https://asap.ahabs.wisc.edu/annotation/php
Wei, J., Goldberg, M.B., Burland, V., Venkatesan, M.M., Deng, W., Fournier, G., Mayhew, G.F., Plunkett, G., Rose, D.J., Darling, A., Mau, B., Perna, N.T., Payne, S.M., Runyen-Janecky, L.J., Zhou, S., Schwartz, D.C. & Blattner, F.R. 2003, 'Erratum: Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457T (Infection and Immunity (2003) 71:5 (2775-2786))', Infection and Immunity, vol. 71, no. 7, p. 4223.
View/Download from: Publisher's site
Chan, C.X., Beiko, R.G., Darling, A.E. & Ragan, M.A., 'Protein domains as units of genetic transfer'.
View/Download from: UTS OPUS
Genomes evolve as modules. In prokaryotes (and some eukaryotes), genetic material can be transferred between species and integrated into the genome via homologous or illegitimate recombination. There is little reason to imagine that the units of transfer correspond to entire genes; however, such units have not been rigorously characterized. We examined fragmentary genetic transfers in single-copy gene families from 144 prokaryotic genomes and found that breakpoints are located significantly closer to the boundaries of genomic regions that encode annotated structural domains of proteins than expected by chance, particularly when recombining sequences are more divergent. This correlation results from recombination events themselves and not from differential nucleotide substitution. We report the first systematic study relating genetic recombination to structural features at the protein level.
Darling, A.E., Mau, B. & Perna, N.T., 'Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement'.
View/Download from: UTS OPUS
Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. We describe a method to align two or more genomes that have undergone large-scale recombination, particularly genomes that have undergone substantial amounts of gene gain and loss (gene flux). The method utilizes a novel alignment objective score, referred to as a sum-of-pairs breakpoint score. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The progressive genome alignment algorithm demonstrates markedly improved accuracy over previous approaches in situations where genomes have undergone realistic amounts of genome rearrangement, gene gain, loss, and duplication. We apply the progressive genome alignment algorithm to a set of 23 completely sequenced genomes from the genera Escherichia, Shigella, and Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content conserved among all taxa and total unique content of 15.2Mbp. We document substantial population-level variability among these organisms driven by homologous recombination, gene gain, and gene loss. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve .

Fred Hutchinson Cancer Research Center
Dr. Erick Matsen

University of California - Davis
Professor Jonathan A. Eisen

NSW Department of Primary Industries
Dr. Jef Hammond, Dr. Toni A. Chapman, Dr. Daniel Bogema

Helmholtz Centre for Infection Research
Prof. Alice McHardy