以下表格来自于:
A beginner’s guide to eukaryotic genome annotation
Mark Yandell & Daniel Ence. Nature Reviews Genetics 13, 329-342 (May 2012). doi:10.1038/nrg3174
三种基本的基因组注释方案
常用注释工具
Software | Description | Refs |
---|---|---|
Ab initioand evidence-drivable gene predictors | ||
Augustus | Accepts expressed sequence tag (EST)-based and protein-based evidence hints. Highly accurate | 66,67 |
mGene | Support vector machine (SVM)-based discriminative gene predictor. Directly predicts 5′ and 3′ untranslated regions (UTRs) and poly(A) sites | 133 |
SNAP | Accepts EST and protein-based evidence hints. Easily trained | 62 |
FGENESH | Training files are constructed by SoftBerry and supplied to users | 72 |
Geneid | First published in 1992 and revised in 2000. Accepts external hints from EST and protein-based evidence | 134 |
Genemark | A self-training gene finder | 69,70 |
Twinscan | Extension of the popular Genscan algorithm that can use homology between two genomes to guide gene prediction | 71 |
GAZE | Highly configurable gene predictor | 74 |
GenomeScan | Extension of the popular Genscan algorithm that can use BLASTX searches to guide gene prediction | 135 |
Conrad | Discriminative gene predictor that uses conditional random fields (CRFs) | 136 |
Contrast | Discriminative gene predictor that uses both SVMs and CRFs | 137 |
CRAIG | Discriminative gene predictor that uses CRFs | 138 |
Gnomon | Hidden Markov model (HMM) tool based on Genscan that uses EST and protein alignments to guide gene prediction | 73 |
GeneSeqer | A tool for identifying potential exon–intron structure in precursor mRNAs (pre-mRNAs) by splice site prediction and spliced alignment | 139 |
EST, protein and RNA-seq aligners and assemblers | ||
BLAST | Suite of rapid database search tools that uses Karlin–Altschul statistics | 31,32,33 |
BLAT | Faster than BLAST but has fewer features | 42 |
Splign | Splice-aware tool designed to align cDNA to genomic sequence | 44 |
Spidey | mRNA-to-DNA alignment tool that is designed to account for possible paralogous alignments | 45 |
Prosplign | Global alignment tool that uses BLAST hits to align in a splice-site- and paralogy-aware manner | 140 |
sim4 | Splice-aware cDNA-to-DNA alignment tool | 46 |
Exonerate | Splice-site-aware alignment algorithm that can align both protein and EST sequences to a genome | 43 |
Cufflinks | Extension to TopHat. Uses TopHat outputs to create transcript models | 54 |
Trinity | High-quality de novo transcriptome assembler | 50 |
MapSplice | Spliced aligner that does not use a model of canonical splice junction | 141 |
TopHat | Transcriptome aligner that aligns RNA sequencing (RNA-seq) reads to a reference genome using Bowtie to identify splice sites | 51 |
GSNAP | A fast short-read assembler | 52 |
Choosers and combiners | ||
JIGSAW | Combines evidence from alignment and ab initio gene prediction tools to produce a consensus gene model | 78 |
EVidenceModeler | Produces a consensus gene model by combining evidence from protein and transcript alignments together with ab initio predictions using weights for both abundance and the sources of the evidence | 79 |
GLEAN | Tool for creating consensus gene lists by integrating gene evidence through latent class analysis | 80 |
Evigan | Probabilistic evidence combiner that use a Bayeisan network to weigh and integrate evidence from ab initio predictors, alignments and expression data to produce a consensus gene model | 81 |
Genome annotation pipelines | ||
PASA | Annotation pipeline that aligns EST and protein sequences to the genome and produces evidence-driven consensus gene models | 56,82 |
MAKER | Annotation pipeline that uses BLAST and exonerate to align protein and EST sequences. Also accepts features from RNA-seq alignment tools (such as TopHat). Massively parallel | 10,83 |
NCBI | The genome annotation pipeline from the US National Center for Biotechnology Information (NCBI). Uses BLAST alignments together with predictions from Gnomon and GenomeScan to produce gene models | 142 |
Ensembl | Ensembl’s genome annotation pipeline. Uses species-specific and cross-species alignments to build gene models. Also annotates non-coding RNAs | 107 |
Genome browsers for curation | ||
Artemis | Java-based genome browser for feature viewing and annotation. Can use binary alignment map (BAM) files as input | 99 |
Apollo | Java-based genome browser that allows the user to create and edit gene models and write their edits to a remote database | 97 |
JBROWSE | JavaScript- and HTML-based genome browser that can be embedded into wikis for community work. Excellent for Web-based use | 87 |
IGV | Genome browser that supports BAM files and expression data | 143 |