After searching the web, here are a few methods:
samtools
samtools faidx [FASTA file] [Contig]:[Start]-[End]
e.g., samtools faidx ~/Smansoni/V7/Smansoni_v7.fa SM_V7_1:1-10
Similar tool in Python is pyfaidx or pyfasta
Using BioMart
Python script
There is a Python script available for this purpose by peterthorpe5: https://github.com/peterthorpe5/public_scripts/blob/master/genomic_upstream_regions/get_upstream_regions.py
We can also write a Python script using the Seq module and the gffutils + pyfasta package. This will need a gff3 file and a genome fasta file. We can extract upstream regions either based on gene or mRNA feature.
Here is the core part:
|
|
Full script is available on Github.