ChIP-seq ambiguous tag mapper - a Gibbs sampling algorithm for the mapping of ambiguous ChIP-seq sequence tags. A collaboration with the Lunyak laboratory at the Buck Institute for Age Research.
Download
Contents:
To run our ambiguous tag mapping program, you need to do tag-to-genome mapping with the program Bowtie first and then re-format the Bowtie output before running our script (see Documentation and/or parts A-E below).
Alternatively, to test our program you can use the test data we provide and run the program directly (see Documenation and/or parts D and E below).
Notice: the scripts were changed and updated on February 16, 2011. Please use the updated version.
A. Run Bowtie program to obtain the initial mapping of sequence tags.
B. Format the output from Bowtie program:
1. The output file from Bowtie program need to be formatted in order to run “gibbsAM.pl”.
2. Run the script “format_bowtie_result.pl” to change the format.
3. Command line:
perl format_bowtie_result.pl –p directory –i Bowtie output file -o output.bed
-p: directory where the Bowtie output file (initial mapping of tags) is located;
-i: the Bowtie output file (initial mapping of tags), including both unique and ambiguous tags;
-o: the name of the output file. The default name is “mapping_result.bed”
4. The resulting output file is in this format: “tag_id chr>position>strand,chr>position>strand,…”
e.g.
HWI-EAS229_75_30DY0AAXX:4:1:0:1282/1 chr18>6452262>+,
HWI-EAS229_75_30DY0AAXX:4:1:0:1282/2 chr18>6452351>+,chr4>66122359>-,
C. Organize unique tags:
1. In order to set the parameters for the algorithm, the unique tags need to be organized.
2. Run “get_unique_file.pl” to organize the unique tags.
3. Command line:
perl get_unique_file.pl -p directory -i the formatted mapping file -o output.bed -l length of adjacent region
-p: directory where the formatted mapping file is located
-i: the formatted mapping file generated by “format_bowtie_result.pl”
-o: name of output file, the default name is "unique_screen_result.bed"
-l: length of adjacent region for co-located tags, the default value is 147
4. The resulting output file has this format:
“chr position_start position_end unique_tag_count”
e.g.
chr1 795260 795406 226
chr1 830067 830213 166
D. Run "gibbsAM.pl":
1. After the preparation through the steps above, run “gibbsAM.pl” to apply the Gibbs sampling method to assign
each ambiguous tag to a specific genomic site.
2. Command line:
perl gibbsAM.pl -p path -f mapping_result.file -u unique_mapping.file -o output.bed -l region_length
-r maximal_tag_number -a ambiguous_confidence -m iteration_number
-p: the directory where the files (formatted mapping file & the file for unique tags) are located.
-f: the formatted mapping file generated by “format_bowtie_result.pl”.
-u: the file for unique tags generated by “get_unique_file.pl”.
-o: name of the output file.
-l: the length of the adjacent region for co-located tags. The default value is 147.
-r: the maximal tag count used to construct the likelihood table. The default value is 50.
-a: the relative confidence of ambiguous tags. The default value is 0.2.
-m: the number of iterations. The default value is 5.
E: Sample Data