This is a note to myself.
I would like to load a single gene into the genome browser (Artemis/JBrowse). Besides the gene sequence, I also need to modify the coordinates of the gene in the gff because it starts now from 1.
I have the annotation gff for the whole genome (Sm_v7.2.gff) and I would like to extract gene Smp_093620 (provided as an argument).
1
2
3
4
5
6
7
|
SM_V7_2 AUGUSTUS exon 344806 345499 . + . Parent=Smp_093620.1
SM_V7_2 AUGUSTUS five_prime_UTR 344806 345760 . + . Parent=Smp_093620.1
SM_V7_2 AUGUSTUS gene 344806 347235 0.11 + . ID=Smp_093620
SM_V7_2 AUGUSTUS mRNA 344806 347235 0.11 + . ID=Smp_093620.1;Parent=Smp_093620
SM_V7_2 AUGUSTUS exon 345758 347235 . + . Parent=Smp_093620.1
SM_V7_2 AUGUSTUS CDS 345761 347137 0.95 + 0 ID=Smp_093620.1.cds;Parent=Smp_093620.1
SM_V7_2 AUGUSTUS three_prime_UTR 347138 347235 . + . Parent=Smp_093620.1
|
This is the bash script “reset_gff_coord.sh”
1
2
3
4
5
|
#!/bin/bash
## set the start coord in the genome gff as START; $1 is the first bash argument (my gene ID)
START="$(grep $1 Sm_v7.2.gff | grep gene | awk '{print $4}')"
## reset feature coordinates by offsetting the START in $4 and $5; pay attention to the use of variable in the command
grep $1 Sm_v7.2.gff | awk '{print $1,$2,$3,$4-"'$START'"+1, $5-"'$START'"+1,$6,$7,$8,$9}'|tr ' ' '\t'> $1_rel.gff
|
So I used: ./reset_gff_coord.sh Smp_093620
1
2
3
4
5
6
7
|
SM_V7_2 AUGUSTUS exon 1 694 . + . Parent=Smp_093620.1
SM_V7_2 AUGUSTUS five_prime_UTR 1 955 . + . Parent=Smp_093620.1
SM_V7_2 AUGUSTUS gene 1 2430 0.11 + . ID=Smp_093620
SM_V7_2 AUGUSTUS mRNA 1 2430 0.11 + . ID=Smp_093620.1;Parent=Smp_093620
SM_V7_2 AUGUSTUS exon 953 2430 . + . Parent=Smp_093620.1
SM_V7_2 AUGUSTUS CDS 956 2332 0.95 + 0 ID=Smp_093620.1.cds;Parent=Smp_093620.1
SM_V7_2 AUGUSTUS three_prime_UTR 2333 2430 . + . Parent=Smp_093620.1
|