

The basic pipeline from the before (i.e., the publication) was to use CLC Genomics Workbench to do the following:
CLC GENOMICS WORKBENCH 10 CITATION UPDATE
Admittedly, at the time I was quite new to graduate school and to handling and analyzing whole-genome sequencing data, and being more seasoned now, I thought I would update my approach to one that makes more sense and uses some different tools.

CLC GENOMICS WORKBENCH 10 CITATION FULL
I’m also coming full circle and playing around with this approach again with some of the work I am currently doing. Not many have taken notice of this work (at least based on citations), though I recently had an inquiry from someone who was trying to implement it. Moreover, I think that even in cases of high sequencing coverage, where a de novo assembly is the norm, this type of approach could complement existing assembly methods to produce a better overall assembly. Despite this and other shortcomings, I hold out some hope that this type of assembly approach will become more common as more and more high-quality reference genomes from across the tree of life become available. Obviously, we were making the assumption that enough synteny exists to prevent our method from producing spurious assemblies, and this assumption may or may not hold up once some evidence starts emerging (it currently hasn’t across broader taxonomic scales). This was helped along by the fact that we were working with bird genomes, which are the best-case scenario for this type of approach in vertebrates.

We found that with the 3-5x genome coverage we had, this approach produced a better genome assembly, based on several metrics, than simply performing de novo genome assembly, despite mapping to a guide genome that was quite distantly related in both cases. As the paper pointed out, this has been applied between strains of single species, but hasn’t really been tried across larger evolutionary distances. A (still in progress) best practices workflow for reference-guided genome assemblyĪ couple years ago I led a paper in PLoS ONE outlining a reference-guided genome assembly approach that, more-or-less, simply maps reads from the target species to a high-quality guide genome and exports a consensus, producing a genome sequence for the target species one is interested in.
