exomePeak2: README.md

2023-08-10 18:28| 来源: 网络整理| 查看: 265

In exomePeak2: Bias Awared Peak Calling and Quantification for MeRIP-Seq The exomePeak2 user's guide

2019-12-20

Introduction

exomePeak2 provides bias awared quantification and peak detection on Methylated RNA immunoprecipitation sequencing data (MeRIP-Seq). MeRIP-Seq is a commonly applied sequencing technology to measure the transcriptome-wide location and abundance of RNA modification sites under a given cellular condition. However, the quantification and peak calling in MeRIP-Seq are sensitive to PCR amplification bias which is prevalent in next generation sequencing (NGS) techniques. In addition, the RNA-Seq based count data exhibits biological variation in small reads count. exomePeak2 collectively address these challanges by introducing a rich set of robust data science models tailored for MeRIP-Seq. With exomePeak2, users can perform peak calling, modification site quantification, and differential analysis with a straightforward one-step function. Alternatively, users could customize methods for their own analysis through multi-step functions and diagnostic plots.

Package Installation

To install exomePeak2 from Github, use the following codes.

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c("SummarizedExperiment","cqn","Rsamtools", "GenomicAlignments","GenomicRanges","GenomicFeatures", "DESeq2","ggplot2","mclust", "genefilter","BSgenome","BiocParallel", "IRanges","S4Vectors","quantreg", "reshape2","rtracklayer","apeglm","RMariaDB")) if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") devtools::install_github("ZW-xjtlu/exomePeak2") Peak Calling

For peak calling of MeRIP-Seq experiment, exomePeak2 demands the reads alignment results in BAM files. Users can specify the biological replicates of the IP and input samples by a character vector of the corresponding BAM directories at the arguments bam_ip and bam_input separately.

In the following example, the transcript annotation is provided using GFF files. Transcript annotation can also be provided by the TxDb object. exomePeak2 will automatically download the TxDb if the genome argument is filled with the corresponding UCSC genome name.

The genome sequence is required to conduct GC content bias correction. If the genome argument is missing ( = NULL ), exomPeak2 will perform peak calling without correcting the GC content bias.

library(exomePeak2) GENE_ANNO_GTF = system.file("extdata", "example.gtf", package="exomePeak2") f1 = system.file("extdata", "IP1.bam", package="exomePeak2") f2 = system.file("extdata", "IP2.bam", package="exomePeak2") f3 = system.file("extdata", "IP3.bam", package="exomePeak2") f4 = system.file("extdata", "IP4.bam", package="exomePeak2") IP_BAM = c(f1,f2,f3,f4) f1 = system.file("extdata", "Input1.bam", package="exomePeak2") f2 = system.file("extdata", "Input2.bam", package="exomePeak2") f3 = system.file("extdata", "Input3.bam", package="exomePeak2") INPUT_BAM = c(f1,f2,f3) exomePeak2(bam_ip = IP_BAM, bam_input = INPUT_BAM, gff_dir = GENE_ANNO_GTF, genome = "hg19", paired_end = FALSE) ## class: SummarizedExomePeak ## dim: 31 7 ## metadata(0): ## assays(2): counts GCsizeFactors ## rownames(31): peak_11 peak_13 ... control_13 control_14 ## rowData names(2): GC_content feature_length ## colnames(7): IP1.bam IP2.bam ... Input2.bam Input3.bam ## colData names(3): design_IP design_Treatment sizeFactor

exomePeak2 will export the modification peaks in formats of BED file and CSV table, the data will be saved automatically under a folder named by exomePeak2_output.

The modification peak statistics are derived from the βi, 1 terms in the following linear regression design.

log2(Qi, j) = βi, 0 + βi, 1I(ρ(j) = IP)+ti, j

Qi, j is the expected value of reads abundence of modification i under sample j. βi, 0 is the intercept coefficient, βi, 1 is the coefficient for IP/input log2 fold change, I(ρ(j)=IP) is the regression covariate that is the indicator variable for the sample j being IP sample. ti, j is the regression offset that account for the sequencing depth variation and the GC content biases.

Under the default settings, the linear models fitted are the regularized GLM (Generalized Linear Model) of NB developed by DESeq2. If one of the IP and input group has no biological replicates, Poisson GLMs will be fitted to the modification peaks.

Explaination over the columns of the exported table:

chr: the chromosomal name of the peak. chromStart: the start of the peak on the chromosome. chromEnd: the end of the peak on the chromosome. name: the unique ID of the modification peak. score: the -log2 p value of the peak. strand: the strand of the peak on genome. thickStart: the start position of the peak. thickEnd: the end position of the peak. itemRgb: the column for the RGB encoded color in BED file visualization. blockCount: the block (exon) number within the peak. blockSizes: the widths of blocks. blockStarts: the start positions of blocks. geneID: the gene ID of the peak. ReadsCount.input: the reads count of the input sample. ReadsCount.IP: the reads count of the IP sample. log2FoldChange: the estimates of IP over input log2 fold enrichment (coefficient estimates of βi, 1). pvalue: the Wald test p value on the modification coefficient. padj: the adjusted Wald test p value using BH approach. Differential Modification Analysis

The code below could conduct differential modification analysis (Comparison of Two Conditions) on exon regions defined by the transcript annotation.

In differential modification mode, exomePeak2 will first perform Peak calling on exon regions using both the control and treated samples. Then, it will conduct the differential modification analysis on peaks reported from peak calling using an interactive GLM.

f1 = system.file("extdata", "treated_IP1.bam", package="exomePeak2") TREATED_IP_BAM = c(f1) f1 = system.file("extdata", "treated_Input1.bam", package="exomePeak2") TREATED_INPUT_BAM = c(f1) exomePeak2(bam_ip = IP_BAM, bam_input = INPUT_BAM, bam_treated_input = TREATED_INPUT_BAM, bam_treated_ip = TREATED_IP_BAM, gff_dir = GENE_ANNO_GTF, genome = "hg19", paired_end = FALSE) ## class: SummarizedExomePeak ## dim: 23 9 ## metadata(0): ## assays(2): counts GCsizeFactors ## rownames(23): peak_10 peak_11 ... control_5 control_6 ## rowData names(2): GC_content feature_length ## colnames(9): IP1.bam IP2.bam ... treated_IP1.bam ## treated_Input1.bam ## colData names(3): design_IP design_Treatment sizeFactor

In differential modification mode, exomePeak2 will export the differential modification peaks in formats of BED file and CSV table, the data will also be saved automatically under a folder named by exomePeak2_output.

The peak statistics in differential modification setting are derived from the interactive coefficient βi, 3 in the following regression design of the NB GLM:

log2(Qi, j) = βi, 0 + βi, 1I(ρ(j)=IP)+βi, 2I(ρ(j) = Treatment)+βi, 3I(ρ(j) = IP&Treatment)+ti, j

Explaination for the additional table columns:

ModLog2FC_control: the modification log2 fold enrichment in the control condition. ModLog2FC_treated: the modification log2 fold enrichment in the treatment condition. DiffModLog2FC: the log2 Fold Change estimates of differential modification (coefficient estimates of βi, 3). pvalue: the Wald test p value on the differential modification coefficient. padj: the adjusted Wald test p value using BH approach. Quantification and Statistical Analysis with Single Based Modification Annotation

exomePeak2 supports the modification quantification and differential modification analysis on single based modification annotation. The modification sites with single based resolution can provide a more accurate mapping of modification locations compared with the peaks called directly from the MeRIP-seq datasets.

Some of the datasets in epitranscriptomics have single based resolution, e.x. Data generated by the m6A-CLIP-Seq or m6A-miCLIP-Seq techniques. Reads count on the single based modification sites could provide a more accurate and consistent quantification on MeRIP-Seq experiments with single based annotation.

exomePeak2 will automatically initiate the mode of single based modification quantification by providing a sigle based annotation file under the argument mod_annot.

The single based annotation information should be provided to the exomePeak2 function in the format of a GRanges object.

f2 = system.file("extdata", "mod_annot.rds", package="exomePeak2") MOD_ANNO_GRANGE

【本文地址】

公司简介

联系我们

今日新闻

推荐新闻

专题文章