Inferring nucleosome positions with their histone mark annotation from ChIP data

Introduction: The nucleosome is the basic repeating unit of chromatin. It contains two copies each of the four core histones H2A, H2B, H3 and H4 and about 147 base pairs of DNA. The residues of the histone proteins are subject to numerous post-translational modifications, such as methylation or acetylation. Chromatin immunoprecipitiation followed by sequencing (ChIP-seq) is a technique that provides genome-wide occupancy data of these modified histone proteins and it requires appropriate computational methods.
The software: NucHunter is an algorithm that uses the data from ChIP-seq experiments to infer positioned nucleosomes. It is a versatile tool that can be used to predict positioned nucleosomes from one or multiple ChIP-seq bam files and it can be also used in conjunction with a control experiment.
Contact: For information and support contact: mammana AT molgen.mpg.de or chung AT molgen.mpg.de
Reference: Our paper was published in Bioinformatics (August 1, 2013). For details and for citing us, please refer to
Alessandro Mammana, Martin Vingron, and Ho-Ryun Chung. Inferring nucleosome positions with their histone mark annotation from ChIP data. Bioinformatics 2013 29: 2547-2554.
The article can be accessed through this link.
Click here to download the compressed jar archive

Usage examples:

Check the available sub-commands
java -jar NucHunter.jar
No command-line argument provided. Available commands: callnucs Run NucHunter with one or several bam files. fraglen Analyze fragment length of a single bam file. fitpars Try to find out the best values for the fragment length and sigma for a single bam file. Because peak calling is run several times, it is recommended to run this command in a cluster and to use the multithreading options. --help Print this help menu.
Check all the available options for nucleosome calling
java -jar NucHunter.jar callnucs
The following options are required: -in Usage:
[options] Options: -chunkSize Maximum portion of the genome to load at each step.The statistical modelling for peak detection depends on this option. (default: 1000000) -ctrl Control BAM file. This file is required to have a postprocessing and a pValue for each peak -fLen Average fragment length. If used once, the same parameters will be used for all bam files,if used as many times as the bam files are, each file will be matched to a parameter following the input order.If not used the value 147 will be applied to all libraries. (default: []) * -in Input BAM file(s) to be used. This option can be repeated to analyze different histone modifications. -minq Minimum mapping quality (according to the minq field in BAM/SAM format). Low-quality reads will be discarded. If used once, the same parameters will be used for all bam files,if used as many times as the bam files are, each file will be matched to a parameter following the input order.If not used the value 0 will be applied to all libraries. (default: [0]) -minRatio Unbalanced read counts for postprocessing. Peaks where there is a strongly unbalanced contribution of reads from negative strand and positive strand will be filtered out. This parmeter specifies the minimum ratio between the two contributions. Set to -1 to disable. (default: 0.25) -out Output directory (default: ) -pName Project name. All output files will be named according to it. (default: ChIP_peaks) -pval P-value threshold based on the number of tags that fall in a certain window around the peak. The radius of this window is specified by the parameter -wrad. (default: 9.999999999999999E-6) -reg Regions to be analyzed (don't set this option for a whole genome analysis). Path to a .bed file containing the genomic intervals to be analyzed (the first three fields are enough). Regions are treated independently in the peak calling step, if they are too small the null model for the the z-score is not reliable. -sigma Scale parameter for the template (default: 50.0) -wins How to smooth the control signal. One or more window half sizes can be selected and the signal will be smoothed taking the maximum of the averages of the signal in each window size.A window size of -1 means that the chromosome-wide average will also be considered. If the -ctrl option is not set this feature is not used. (default: [-1]) (default: []) -wrad Interval length for postprocessing. The read counts and noise estimation will be done according to the signal in this interval. (default: 146) -zT z-score threshold for peak detection (default: 3.0)
Call nucleosomes on the file "H3K4me3.bam" with average fragment length 150
java -jar NucHunter.jar callnucs -in H3K4me3.bam -fLen 150
Infer the average fragment length for the file "H3K4me3.bam"
java -jar NucHunter.jar fraglen -in H3K4me3.bam
Call nucleosomes on the file "H3K4me3.bam" with average fragment length 150 using as a control experiment the file "Input.bam"
java -jar NucHunter.jar callnucs -in H3K4me3.bam -fLen 150 -ctrl Input.bam
Call nucleosomes on the files "H3K4me3.bam", "H3K27ac.bam", and "H3K36me3.bam" with average fragment lengths 150, 130 and 180 respectively, using as a control experiment the file "Input.bam"
java -jar NucHunter.jar callnucs -in H3K4me3.bam -fLen 150 -in H3K27ac.bam -fLen 130 -in H3K36me3.bam -fLen 180 -ctrl Input.bam
Read part of the options from the file "cliopts" (linux environment)
xargs -a cliopts java -jar NucHunter.jar callnucs