Because 454 reads often have unequal read lengths, I got an error using STAR. I got some success when using hisat2.
The STAR error looks like this:
1
2
3
4
5
EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
@SRR22XXXX.85715
TTAATGGTTGTCGTATGATATTTGTTACGATAATGAGGCTTTTTGTGATAGAAATATCATTAATGTTAATAATTGTAGGTGTAGAGATAAAGGAGG...
SOLUTION: fix your fastq file
Inspired by my colleague’s work, I tried hisat2:
First download the source code, decompress and make.
Similar to other aligners, you need to build the genome index first:
Where as explained in the manual p is for threads, -U indicates files containing unpaired reads to be aligned, -q for fastq input, and -S for sam output.
The mapping is also pretty fast.
For counting the mapped reads, it’s common to use htseq-count, but it is slow. I tried another tool “featureCounts”, which also uses a gtf annotation:
featureCounts -t exon -g gene_id -a annotation.gtf -o counts_out.txt outfile.sam
You can also use “-s 2” for reversely stranded.
The counting is super fast, within < 1min you can get your count file for downstream analyses. So there is no need to submit a job to a cluster. Below is the hilarious output interface: