ACT (Artemis Comparison Tool) can disply pairwise comparisons between two or more DNA sequences.
To use ACT we will need three essential files: sequence 1, sequence 2, and the comparison file (eg blast output). Additionally we can load annotation file to show features in the sequences (eg. gff or embl etc).
I my case I wanted to compare a gene or part of scaffold with a haplotype contig
gene to contig
I will need to
extract gene sequence from gff
compare gene sequence with contig sequence, ie. blastn
modify the gene annotation to fit it’s sequence coordinate (eg. starting from 1)
load gene seq (sequence 1), blastn output (comparison file), and contig seq (sequence 2) to ACT
load gff annotations to sequence 1 and 2 to disply features
I have a bash script to do this (file_act_gene.sh). To load gff quicker the gff needs bgzip and tabix
#!/bin/bash
## usage: ./file_act_coord.sh [scaffold No] [start Coord] [end Coord] [haplo contig No]### get sequence of range
python alan_extract_fa.py Smansoni_v7.fa SM_V7_$1$2$3 > SM_V7_$1_$2-$3.fa
### blast fraction to contig
blastn -query SM_V7_$1_$2-$3.fa -subject SM_V7_${4}.fa -outfmt 6 > SM_V7_$1_$2-$3_$4.blast.out
### annotation for contig
grep SM_V7_$4 Sm_v7.2_Basic.gff | sort -k1,1 -k4,4n > $4.gff
bgzip $4.gff
tabix $4.gff.gz
### annotation for selected region #### pay attention to usage of variable in awkdeclare -i START=$2declare -i END=$3
cat Sm_v7.2_Basic.gff | awk '$1=="SM_V7_""'$1'"'| awk '$4>='$START' && $5<='$END''| awk '{print "SM_V7_""'$1'""_""'$2'""-""'$3'",$2,$3,$4-'$START'+1, $5-'$START'+1,$6,$7,$8,$9}'|awk '$4>0'|tr ' ''\t'|sort -k1,1 -k4,4n> SM_V7_$1_$2-$3_rel.gff
### load into ACT: SM_V7_$1_$2-$3.fa, SM_V7_${4}.fa, SM_V7_$1_$2-$3_$4.blast.out### read after: SM_V7_$1_$2-$3_rel.gff, $4.gff.gz
act-scaffold
Just another tip for Artemis, after loading the fasta file, use the menu “Read Entry” instead of “Read Entry into” to load gff. The latter option can cause an error.