This version is currently tested on Linux distribution only.
What is Map2Peak?
Map2Peak is an ultrafast peak calling bioinformatics tool which combines the process of read alignment and peak calling. Map2Peak focuses only on those genomic regions which are likely to contain Transcription Factor Binding Sites (TFBS). By focusing on such regions, Map2Peak is able to quickly discard majority of background reads from the analysis which eventually brings speed to the overall process. Also, Map2Peak does not discard multi-mappable reads and allocates some of the multi-mappable reads to unique mapping location. This ability to utilize multi-mappable reads enhances the capability of Map2Peak to identify TFBS in repeat elements of genome.
The input files required by Map2Peak:
- ChIP-Seq read file (FASTQ)
- control read file (FASTQ)
- Bowtie index of reference genome
The output file are
- Bed file containing peak locations
- Alignment file
Map2Peak is particularly useful to researchers whose goal is quickly identify TFBS and do not require the read alignment file.
You may download the source files from downloads section.
To build from source
- extract the source files
- go to source files folder
- run make
- run make install
$ sudo make install
To use Map2Peak, understanding of bowtie parameters is necessary because Map2Peak uses bowtie for read alignment. For read alignment use the following bowtie parameters in conjunction with Map2Peak parameters. Please see example section for a complete example.
-v: Number of mismatches allowed.
-m: Suppress all alignments for a particular read or pair if more than 'm' reportable alignments exist for it.
--best: Make Bowtie guarantee that reported alignments are "best" in terms of stratum (i.e. number of mismatches).
--strata: If many valid alignments exist and are reportable and they fall into more than one alignment "stratum", report only those alignments that fall into the best stratum.
-S: Print alignments in SAM format.
-V: p.value cutoff for peak calling. Default value is 1e-5. Smaller the value of p.value more statistically significant are the called peaks.
-N: name string to create output files after peak calling. Default is ''.
-R: fraction of total Chip-Seq reads. After every "R" number of reads are aligned Map2Peak checks for read density saturation. Default is 0.01.
-T: Threshold to check read saturation. Once this threshold is achieved, Map2Peak moves to Phase 2. Default is 0.02. As the value of 'T' is increased Map2Peak becomes slower and more number of reads are mapped in Phase 1 of Map2Peak.
-E: Membership probability cutoff to classify genomic regions into background or Signal. Default is 0.9. The lower the value of membership probability, the larger the pseudogenome ,i.e., Phase 3 genome and slower Map2Peak computations.
-g: used to set the genome. Default is "hs" for human genome. You can set it equal to "mm" for Mouse , "ce" for C. elegans and "dm" for Drosophilia melanogaster.
-L: used to set the fragment length of ChIP library. Map2Peak will use this value instead of estimating the fragment length.
-D: Name of the control file.
- Download and extract ChIP-Seq file
$ wget -q 'https://www.encodeproject.org/files/ENCFF469MGV/@@download/ENCFF469MGV.fastq.gz';
$ gunzip ENCFF469MGV.fastq.gz;
- Download and extract control file
$ wget -q 'https://www.encodeproject.org/files/ENCFF295VZD/@@download/ENCFF295VZD.fastq.gz';
$ gunzip ENCFF295VZD.fastq.gz;
- Download hg19 index
$ wget -q 'ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/hg19.ebwt.zip';
$ unzip hg19.ebwt.zip;
- run Map2Peak
$ Map2Peak -v 2 -m 1 --best --strata -D ControlDirectory/ENCFF295VZD.fastq -N MGV directory/hg19 ChIPDirectory/ENCFF469MGV.fastq -S ENCFF469MGV.sam;
- Also, you must have the Boost C++ libraries installed. Most linux distributions have these preinstalled.
- “directory” is the location of hg19 bowtie index file.
- “ChIPDirectory” is the location of ChIP-Seq fastq file.
- “ChIPDirectory” is the location of Control fastq file.
- Important: Use only single fastq files for control and ChIP-Seq.
- Peak output file will be “MGVPeaks.bed”.
In this example, only 2 mismatches and uniquely aligned reads in best stratum are allowed during read alignment in Phase 1.
Map2PeakSource.zip: contains Map2Peak source files.