Bash variable and argument

Set a bash variable from a command and process with argument.

This is a note to myself.

I would like to load a single gene into the genome browser (Artemis/JBrowse). Besides the gene sequence, I also need to modify the coordinates of the gene in the gff because it starts now from 1.

I have the annotation gff for the whole genome (Sm_v7.2.gff) and I would like to extract gene Smp_093620 (provided as an argument).

SM_V7_2	AUGUSTUS	exon	344806	345499	.	+	.	Parent=Smp_093620.1
SM_V7_2	AUGUSTUS	five_prime_UTR	344806	345760	.	+	.	Parent=Smp_093620.1
SM_V7_2	AUGUSTUS	gene	344806	347235	0.11	+	.	ID=Smp_093620
SM_V7_2	AUGUSTUS	mRNA	344806	347235	0.11	+	.	ID=Smp_093620.1;Parent=Smp_093620
SM_V7_2	AUGUSTUS	exon	345758	347235	.	+	.	Parent=Smp_093620.1
SM_V7_2	AUGUSTUS	CDS	345761	347137	0.95	+	0	ID=Smp_093620.1.cds;Parent=Smp_093620.1
SM_V7_2	AUGUSTUS	three_prime_UTR	347138	347235	.	+	.	Parent=Smp_093620.1

This is the bash script “reset_gff_coord.sh”

#!/bin/bash
## set the start coord in the genome gff as START; $1 is the first bash argument (my gene ID) 
START="$(grep $1 Sm_v7.2.gff | grep gene | awk '{print $4}')"
## reset feature coordinates by offsetting the START in $4 and $5; pay attention to the use of variable in the command
grep $1 Sm_v7.2.gff | awk '{print $1,$2,$3,$4-"'$START'"+1, $5-"'$START'"+1,$6,$7,$8,$9}'|tr ' ' '\t'> $1_rel.gff

So I used: ./reset_gff_coord.sh Smp_093620

SM_V7_2	AUGUSTUS	exon	1	694	.	+	.	Parent=Smp_093620.1
SM_V7_2	AUGUSTUS	five_prime_UTR	1	955	.	+	.	Parent=Smp_093620.1
SM_V7_2	AUGUSTUS	gene	1	2430	0.11	+	.	ID=Smp_093620
SM_V7_2	AUGUSTUS	mRNA	1	2430	0.11	+	.	ID=Smp_093620.1;Parent=Smp_093620
SM_V7_2	AUGUSTUS	exon	953	2430	.	+	.	Parent=Smp_093620.1
SM_V7_2	AUGUSTUS	CDS	956	2332	0.95	+	0	ID=Smp_093620.1.cds;Parent=Smp_093620.1
SM_V7_2	AUGUSTUS	three_prime_UTR	2333	2430	.	+	.	Parent=Smp_093620.1
Z. Lu avatar
Z. Lu
Computer biologist, amature photographer, vintage fan and web lover.
comments powered by Disqus