Next generation transcriptomes for next generation genomes using est2assembly
© 2009 Papanicolaou et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
BACKGROUND: The decreasing costs of capillary-based Sanger sequencing and next generation technologies, such as 454 pyrosequencing, have prompted an explosion of transcriptome projects in non-model species, where even shallow sequencing of transcriptomes can now be used to examine a range of research questions. This rapid growth in data has outstripped the ability of researchers working on non-model species to analyze and mine transcriptome data efficiently. RESULTS: Here we present a semi-automated platform 'est2assembly' that processes raw sequence data from Sanger or 454 sequencing into a hybrid de-novo assembly, annotates it and produces GMOD compatible output, including a SeqFeature database suitable for GBrowse. Users are able to parameterize assembler variables, judge assembly quality and determine the optimal assembly for their specific needs. We used est2assembly to process Drosophila and Bicyclus public Sanger EST data and then compared them to published 454 data as well as eight new insect transcriptome collections. CONCLUSIONS: Analysis of such a wide variety of data allows us to understand how these new technologies can assist EST project design. We determine that assembler parameterization is as essential as standardized methods to judge the output of ESTs projects. Further, even shallow sequencing using 454 produces sufficient data to be of wide use to the community. est2assembly is an important tool to assist manual curation for gene models, an important resource in their own right but especially for species which are due to acquire a genome project using Next Generation Sequencing.
We would like to thank Karl Gordon (CSIRO) for helping with end-user testing, two anonymous referees for improving the manuscript and the following for making pre-publication data available: Chris Jiggins and his laboratory (Univ. of Cambridge), Owen McMillan and his laboratory (State Univ. of N. Carolina), Yannick Pauchet and Iva Fuková (Univ. of Exeter). Further, Bastien Chevreux provided development versions of MIRA and excellent support, Jose Blanca provided sff_extract, James Wasmuth provided support for prot4EST, Ralf Schmid for annot8r, Derek Huntley for SEAN and Steffi Gebauer-Jung for TrimbyWindow. David Clements and Scott Cain helped with Chado and GBrowse. We also thank the TU-Dresden Deimos PC-Farm for computational support. The authors report no conflicting interests. AP was supported by the Max Planck Gesellschaft and the European Union Research Network GAMEXP; DGH was supported by the Max Planck Gesellschaft; RHfC was supported by the European Union Research Network EMBEK1.
This is the final version of the article. Available from the publisher via the DOI in this record.
PubMed Central ID
Place of publication