Map Roche 454 reads to a reference genome

Because 454 reads often have unequal read lengths, I got an error using STAR. I got some success when using hisat2.

The STAR error looks like this:

EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
@SRR22XXXX.85715
TTAATGGTTGTCGTATGATATTTGTTACGATAATGAGGCTTTTTGTGATAGAAATATCATTAATGTTAATAATTGTAGGTGTAGAGATAAAGGAGG...

SOLUTION: fix your fastq file

Inspired by my colleague’s work, I tried hisat2:

First download the source code, decompress and make.

Similar to other aligners, you need to build the genome index first:

hisat2-build -f genome.fasta index_name

It takes about 26 min for a 360M genome.

Then the run command is like this:

bin/software/hisat2-2.1.0/hisat2 -p 8 hisat2-index/index_name -U reads2map.fastq -q -S outfile.sam

Where as explained in the manual p is for threads, -U indicates files containing unpaired reads to be aligned, -q for fastq input, and -S for sam output.

The mapping is also pretty fast.

For counting the mapped reads, it’s common to use htseq-count, but it is slow. I tried another tool “featureCounts", which also uses a gtf annotation:

featureCounts -t exon -g gene_id -a annotation.gtf -o counts_out.txt outfile.sam

You can also use “-s 2” for reversely stranded.

The counting is super fast, within < 1min you can get your count file for downstream analyses. So there is no need to submit a job to a cluster. Below is the hilarious output interface:


        ==========     _____ _    _ ____  _____  ______          _____
        =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \
          =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
            ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
              ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
        ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
	v1.4.5-p1

//========================== featureCounts setting ====================\\
||                                                                     ||
||             Input files : 1 SAM file                                ||
||                           S SRR2245469egg-v7.sam                    ||
||                                                                     ||
||             Output file : Eggcounts-featureCounts_s1.txt            ||
||             Annotations : V7_r7.gtf				       ||
||                                                                     ||
||                 Threads : 1                                         ||
||                   Level : meta-feature level                        ||
||              Paired-end : no                                        ||
||         Strand specific : yes                                       ||
||      Multimapping reads : not counted                               ||
|| Multi-overlapping reads : not counted                               ||
||                                                                     ||
\\===================== http://subread.sourceforge.net/ ===============//
Z. Lu avatar
Z. Lu
Data scientist, bioinformatician, retro fan and web lover.
comments powered by Disqus