Structure
Primary
- promoters
- intron-exon junctions
- 5' 3' UTRs
- polyA site
Secondary
Tertiary
RNAseq can provide
- 5' TSS
- 5' UTR
- exon-intron boundaries
- 3' UTR
- polyA site
- alternative usage of any of above
Fusion genes: cytogenetic derangements; genomic amplification; translocation deletions
long non-coding RNAs (lncRNAs)
-
200nt
- not overlap protein coding exons
- can control transcription as enhancers (eRNA), competitors (ceRNA) or as noise
small non-coding RNAs
- miRNA (21-23 nt)
- piRNA
- endo-siRNA
- snoRNA
- snRNA
- tRNA (73-93 nt)
- moRNA
- eRNA
Reads coverage:
- Illumina for coverage
- SOLID for accuracy
- Roche 454 / Pacbio for length
Pre-processing: remove low-quality baseds / artifacts, incl. adapters / lib-construct sequences
QC
- base quality: filtering low-quality bases (Trimmomatic / FastX / prinSeq)
- ambiguous bases
- adapters (TagCleaner / CutAdapt)
- read length
- sequence-specific bias
- GC-content
- duplicates (for DGE not recommend to remove)
- sequence contamination
- low-complexity sequences / polyA tails
mapping stats: samtools or RseQC
de novo assembly
different from genome assembly; de Bruijn Graph
- mapping based assembly: Cufflinks and Scripture
- de novo assembly: Velvet + Oases; Trinity (Inchworm-Chrysalis-Butterfly)
Read mapping
- reads per gene: htseq-count / Qualimap / Bedtools / Cufflinks: differ in how to handle multimapping reads
- reads per transcript: Expectation Maximization (EM) Approach: Cufflinks / eXpress
- reads per exon: DEXSeq
DEXSeq
- input from GTF + bam/sam
- counts per exon (from script) -> table in R
- normalise by estimating size factor
- estimation exon-specific dispersion values
- testing for diff exon usage
- can be used for alternative splicing preidction ExonCountSet
DE analysis
the same gene across different cells follow log-normal distribution (qPCR)
different individuals viariability: negative binomial distribution (DESeq / edgeR)
zero inflation hard to fit netative binomial model
tweeDESeq: Possion-Tweedie family
Normalisation
- RPKM/FPKM
- TPM
- TMM
small ncRNAs
miRNA
- miRdeep2
- miRanalyzer
miRNA target (SVMs)
complementary of first 7-8 nt to mRNA
thermodynamic stability positionof particular GC/AU matches
- targetScan
- DIANA-microT
data bases:
- microRNA.org
- miRBase
- piRNABank
- Rfam
- miRGator
- mirWIP
- TarBase
- miRTarBase
- RNAmmer
tRNA prediction
- tRNA-Scan-SE