This lesson is being piloted (Beta version)

(27255) Week 11 - Genome Assembly and Annotation: Glossary

Key Points

Introduction
  • You can start working in Galaxy by getting familiar to importing data, using history, and manipulating workflows.

  • Tools such as Nanoplot and Kraken2 can give you information about the quantity (yield), quality (Q-score, N50, and read distribution), and possible contamination.

  • QC are valuable to decide how the assembly process should be approached. It can tell us whether more sequencing data need to be generated or a certain part of the data need to be filtered. Overall, the decision might be a case by case basis.

Filtering and Assembly
  • Before assembling a genome, make sure to do a proper filtering and trimming to achieve the best assembly possible. There are various algorithm and tools to assemble and polish your genome. Decide which one suits your need.

  • You can do a sanity check of your assembly by using prior knowledge such as: (1) expected genome size, (2) expected structure of the genome (circular or linear), (3) compare contigs length and depth distribution, and (4) make sense of the assembly graph. From this information, you can decide whether you need to add more depths (by re-sequencing) or do another run if the result is too fragmented.

Taxonomic Placement
  • Taxonomic placement of a newly assembled genome can be achieved by calculating nearest reference organism and placing the query genome into existing tree in the database. Such example of tools are autoMLST and GTDB.

Genome Annotation
  • Genome annotation starts by identifying genes and other functional elements (rRNA, tRNA, etc.) within the nucleotides. This is followed by comparison with databases of interest to predict the functions encoded in the genes.

Tools for analysing genomes
  • Genome annotation starts by identifying genes and other functional elements (rRNA, tRNA, etc.) within the nucleotides. This is followed by comparison with databases of interest to predict the functions encoded in the genes.

Extra
  • Galaxy might not have all the tools and version required to run your analysis. Having a skill to work with command line tools and other programming language give you flexibility in your research.

Glossary

FIXME