BiQ Analyzer HT
This website provides the BiQ Analyzer HT software tool (Lutsik et al. 2011; Becker et al. 2014) for analyzing medium-scale amplicon sequencing data (typically using MiSeq). The BiQ Analyzer (Bock et al. 2005) software tool for analyzing classical small-scale bisulfite sequencing data (typically based on Sanger sequencing) is available from a separate website.
​
Contents
​
About
BiQ Analyzer HT is a software tool designed to aid in the analysis of medium-scale bisulfite sequencing data (typically using MiSeq). You can use BiQ Analyzer HT to process a variety of bisulfite sequencing reads from one or several sequencing experiments whilst keeping analysis times low.
​
In case you run into problems while running BiQ Analyzer HT please contact us by e-mail: support@bocklab.org.
​
Download & Install BiQ Analyzer HT
You will need to first download and install a copy of Java for your platform. We recommend using the installer where available as this will make the set up process easier. BiQ Analyzer has been tested to run with Java 1.8.0_411 on both macOS (14.3) and Windows (11).
​
Once you have installed Java then you can download BiQ Analyzer for you platform.
​
macOS
You can use the macOS Disk Image (recommended) or the Cross Platform Java Files.
​
Windows
You can use the Cross Platform Java Files.
​
Linux
You can use the Cross Platform Java Files.
​
Using BiQ Analyzer HT
If you encounter any issues whilst using the software, please check the frequently asked questions, and e-mail support@bocklab.org.
​
Starting BiQ Analyzer HT
MacOS (Disk Images)
The easier way to run the BiQ Analyzer HT software on Mac OS is to download the corresponding disk image. Mount the disk image and drag BiQ Analyzer HT to your Applications folder. Then, because of macOS security features, you should right-click on the BiQ Analyzer HT in your Applications folder and choose Open.
Alternatively after attempting to open the application once, you can open System Settings, scroll down to Privacy & Security on the left and then to Security on the right and choose ‘Open Anyway’.
You can now skip ahead to the section 'Configuration'.
Cross Platform (Java) for Windows / Linux & macOS
Once you have installed Java, you will need to check that it can be launched from your Terminal emulator of choice. This is usually Terminal.app on macOS, and cmd.exe on Windows. You can open the terminal on macOS by typing ‘terminal’ into Spotlight and the command prompt on Windows by typing ‘command prompt’ into the start menu.
Once this is open you can check java is installed by running java --version. If the output is something like:
​
java 22.0.1 2024-04-16
Java(TM) SE Runtime Environment (build 22.0.1+8-16)
Java HotSpot(TM) 64-Bit Server VM (build 22.0.1+8-16, mixed mode, sharing
​
then you can run BiQ Analyzer by double-clicking on BiQ_Analyser.bat (Windows) or running BiQ_Analyser.sh (macOS/Linux). You can then skip ahead to 'Configuration'.
​
If instead you saw an error message, then you can find the path to Java in one of several ways:
​
macOS
On macOS run: mdfind -name 'java' | grep '/bin/java$'.
Linux
On Linux run: find / -name java 2>/dev/null | grep '/bin/java$'.
Windows
On Windows run: for %i in (java.exe) do @echo. %~$PATH:i
​
This will result in either a list, or a single entry, such as:
​
/Library/Java/JavaVirtualMachines/jdk1.8.0_401.jdk/Contents/Home/jre/bin/java
/Library/Java/JavaVirtualMachines/jdk-11.0.22.jdk/Contents/Home/bin/java
​
You can then edit BiQ_Analyzer_HT.sh (Linux/macOS) or BiQ_Analyzer_HT.bat (Windows) in your chosen text editor and replace java with the full path listed above.
​
After you’ve edited the corresponding file you can launch the jar file by double clicking BiQ_Analyzer.bat on Windows or by running BiQ_Analyzer.sh using Terminal.app on macOS or Linux. As a result of macOS security features you may need to run xattr -c BiQ_Analyzer_HT.sh and chmod +x BiQ_Analyzer_HT.sh first.​
Configuration
For a basic analysis run BiQ Analyzer HT does not require any configuration.
​
In a typical experimental scenario several target amplicons are amplified from bisulfite treated DNA of each considered sample. BiQ Analyzer HT assumes that the sequence reads obtained for each sample-amplicon combination of a Bisulfite project are separated and stored in a single FASTA or FASTQ file, while the sequence reads obtained for each sample-amplicon combination of a project combining two sequencing methods like Oxidative Bisulfite analysis are stored in two files. One for each sequencing approach.
Since the exact multiplexing strategy is experiment-specific and hard to generalize, BiQ HT requires the direct sequencing output to be demultiplexed using the available third-party tools. We recommend the Galaxy barcode splitter as an adequate solution.
​
Alternatively, BiQAnalyzer HT supports loading of the mapped reads from genome-wide sequencing experiments. The reads should be stored in SAM(BAM) files, one file per analyzed sample.
​
Analysis
BiQ Analyzer HT starts with a welcome panel, at the bottom of which is some short guidance information. The “New project” button leads to the dialogue that helps you to select an output directory for the new analysis project. Before selecting it please verify that the location is accessible for writing and there is enough space on the corresponding storage device. Alternatively an existing project can be opened by pressing “Open project” and selecting the output directory of an existing analysis project. The existing analysis project should contain a file “biqanalyzerht.xml” which is written to the output directory when you save your analysis.
​
If you have chose to create new project and selected a directory, you must then choose the number of readsets per sample per amplicon and select the sequencing approach used to obtain these readsets. The first sequencing approach has to be “Bisulfite Sequencing” while you can choose between “Oxidative Bisulifite Sequencing (oxBS)”, “TET assisted Bisulfite Sequencing (TAB)”, “Chemical Modification-Assisted Bisulfite Sequencing (CAB)” and “Formyl Chemical Modification-Assisted Bisulfite Sequencing (f-CAB)” for the second.
​
After the directory has been selected the BiQ Analyzer HT workspace is initialized. The summary of the newly created project is given in a corresponding tab of the main panel. The overview of the project is divided in three parts. On the upper left one can see a table with basic information for every sample and reference sequence combination. On the upper right the different mean methylation heatmaps are displayed. One for each sequencing approach and one for the difference between those two. The lower panel shows a graphical summary of the readset selected in the table.
​
Loading Data
First add the required number of samples to the project by selecting the “Add sample” option in the “Analysis” menu.
​​
Once the project has at least one sample load reference sequences via “File”->“Load reference sequences”. BiQAnalyzer HT requires genomic reference sequences of the sequenced loci, where the potential methylation sites can be easily detected. The reference should originate from the DNA strand which was actually amplified after the bisulfite conversion.
Each loaded reference will be added to each sample in the project.
BiQ Analyzer HT supports two ways of structuring the analysis project: either by samples or by reference sequences.
​
​Each loaded reference can be assigned to a existing genomic location by specifying the coordinates and the strand of a corresponding genomic region. The respective form is located in the reference summary panel. The genomic location can also be fetched from the FASTA/FASTQ header. For that the header should contain the location in the form “range=chrN:NNNNNN-NNNNNNN” or “chrN_NNNNNNN_NNNNNNN+”.
​
Before loading into BiQ Analyzer HT the sequence reads should be prepared, i.e. the initial set of sequence reads from the sequencing machine should be split into batches by sample and reference sequence – one multi-sequence Fasta or Fastq file for each sample/reference combination. This is done by matching the sample-specific sequence tags and primer sequences in the read sequences. (In case the sequencing was done on a FLX (Roche 454) System this can be done with the sff-tools included in the analysis software package. In other cases we recommend the Galaxy barcode splitter as an adequate solution). The files containing the reads can be loaded into BiQ Analyzer HT in two ways.
​
-
To load a single set of reads focus at the corresponding leaf in the project tree and select “Load sequence reads” in menu “File”. BiQ Analyzer HT will ask for the location of the files.
-
To simplify the loading of reads, the”Load reads by filename” option was added. In this case the read files should have the filenames identical to the files of corresponding reference sequences. “Load reads by filename” should be selected once for each sample.
​
As in most of the high-throughput sequencing technologies the submitted DNA fragments are sequenced in both directions and each loaded read set can contain reads with opposite orientation. The BiQAnalyzer HT alignment algorithm automatically corrects the orientation of each read by aligning both the original read and its reverse complement to the reference sequence and selecting the variant giving higher alignment score.
​
Finally, the project data can be loaded into BiQ Analyzer HT as a table prepared in user’s favorite spreadsheet editor. The table should be stored in a tab-separated plain text file and have - in case of a bisulfite analysis - three and - in the other cases - four columns:
-
a column with sample identifiers,
-
a column with full paths to the reference sequence FASTA/FastQ files,
-
a column with full paths to the corresponding FASTA/FastQ files with the first sequence reads and,
-
a column with full paths to the corresponding FASTA/FastQ files with the second sequence reads if needed for the used analysis type.
​
Thus the number of rows in the table should be at most the number of samples multiplied with the number of references in the project (or the total number of available files with reads). The table should also have a header (BiQ Analyzer HT will skip the first row in the opposite case).
​
Analysis Setup
After selecting a leaf in the project tree, a tab with a settings form appears in the BiQ Analyzer HT main panel. The settings form is divided into four categories – alignment, quality filtering, sorting and output.
​
The filtering parameters correspond to alignment and bisulfite quality measures (e.g. alignment score, sequence identity, bisulfite conversion rate, sequence length and - in case of Fastq files - sequence quality), as well as to the extracted methylation information (mean methylation level of the read, fraction of unrecognized methylation sites etc.). The set of reads that pass the filtering can be sorted in a number of ways (e.g. by alignment score, sequence identity or methylation level).
​
There are also global settings accessible via “File”->“Settings…”. In these global settings one can find options to choose the methylation context and the colors for the diagrams. These settings will be saved locally and loaded each time BiQ Analyzer HT is started.
​
Running the Analysis
The processing and analysis of the loaded data can be run for one selected amplicon sample or for all combinations. These options are located in the second section of the “Analysis” menu. As soon as the analysis is finished the main application panel will be updated and the results of the analysis will be loaded. A running analysis can be stopped at any time via “Analysis”->“Terminate”.
​
The BiQ Analyzer HT backend processes the loaded data and outputs DNA methylation information to the project output folder in several forms.
​
First of all the results of the analysis are reflected in the project summary. Information about processed read sets, e.g. the read counts, basic DNA methylation and bisulfite quality statistics, is written to the summary table, and the mean methylation values are used to update the corresponding cells of the zoomable project methylation heatmaps. In case of a project with two readsets per amplicon per sample the user can choose between a heatmap for each of the sequencing types as well as a difference heatmap.
​
By choosing a row in the project summary table the user can display a zoomable bar diagrams for the specified readset. Using the drop down menu it is possible to choose further between a bar diagram for each of the sequence types, a difference and a comparison bar diagram.
​
The summary statistics are also available for each analyzed sample (as a summary table) and reference sequence (as zoomable heatmaps of averaged methylation profiles). The user can choose here as well between the tree types of heatmap in case the analysis consists of two readsets per ampicon per sample.
​
For each analyzed sample-reference combination a number of result tabs are added to the main application panel. The “Summary” tab gives short information about the run including mean methylation level calculated for the amplicon and elapsed analysis time.​​ The “Results” tab gives a table with analysis information for each methylation site.
​
The alignment viewer allows you to inspect a multiple alignment of the sequence reads to the reference sequence of the bisulfite sequenced amplicon obtained through the merger of pairwise alignments. The alignment has methylation sites highlighted in accordance with their states. Faster scrolling in the viewer is enabled by holding Ctrl while scrolling with the mouse wheel.
​
The methylation heatmap represents the extracted methylation patterns of the bisulfite reads graphically. Columns of the heatmap are formed by the methylation sites found in the reference sequence by matching the analyzed methylation context, while rows correspond to the sequence reads.
​
The table in the “Results” tab contains analysis information for each analyzed read that passed the filtering. The columns of the table correspond to the columns of the tab-separated file named results.tsv located in the project output directory, and include alignment score, sequence identity, methylation pattern and mean methylation level.
​
All of the above tables and graphics are exported to the project folder. The state of the analysis project can be saved to the hard drive at any time point by selecting File -> Save in the system menu or pressing a respective button in the toolbar.
​​
Command Line Interface
BiQ Analyzer HT features a command line interface. To trigger BiQ Analyzer HT in command line mode the executable BiQ_Analyser.jar should be started with the “-nogui” argument in the following way (assuming the Java binaries are available on your $PATH or %PATH%):
​
java -jar "BiQ Analyzer.jar" -nogui [OPTIONS]
​
The list of all available options is accessible via -help. At minimum you should provide the arguments:
-rseq (genomic reference sequence in a single FASTA file) and,
-bseq (bisulfite sequence reads in one FASTA file or a as a directory of FASTA files)
The output directory name can be specified with -outdir, but by default BiQ Analyzer creates an output directory named analysis_run/.
The output directory contains the following result files:
-
summary.dat, a short summary of the analysis run.
-
results.tsv, a tab-separated table with the processing and analysis results (a row per each analyzed read)
-
heatmap.png, methylation heatmap
-
pearlNecklace.png, pearl necklace diagram, summarizing methylation information for each CpG
-
sourceSequences.mfa, source FASTA sequences of the reads that passed the quality filters
-
alignment.mfa, multi-sequence FASTA file containing multiple alignment of the bisulfite reads to the genomic reference sequenc​e
​
​Troubleshooting
As the number of sequences grows the data structures that store the sequence pileup may exceed the available Java heap space. In case it reaches the order of 20k and more the user may want to expand the default and maximal values of the Java heap space size. To do so you can launch the tool directly using java e.g. java -jar -Xms2048m -Xmx2048m "BiQ Analyzer.jar"
​
The available heap space can be extended by increasing the numbers after the -Xms and -Xmx command line modifiers which specify the default and maximal size of Java heap space (in megabytes) respectively.
​
If you encounter any other unexpected issues, please reach out to us by e-mail: support@bocklab.org. We’ll do our best to help.
Frequently Asked Questions
​
Q: How do I preprocess my data to use it in BiQ Analyzer HT?
A: Data files returned from a sequencing machine cannot be used with BiQ Analyzer HT directly and have to be preprocessed. To prepare the data we recommend Galaxy and provide information on how to use it.
​
Q: What do I do with bedGraph files?
A: Exported bedGraph files can be used either with the IGV Browser or with the UCSC genome browser to visualize and compare modification levels with each other or other information given for that genetic region. Each bedgraph file loaded into the IGV browser appears as a new track, containing information about the modification levels. Zooming in, one can review the data and compare different tracks at a position.
References & License
​If you use BiQ Analyzer HT in your own work please cite:
​
Lutsik, P., Feuerbach, L., Arand, J., Lengauer, T., Walter, J., & Bock, C. (2011). BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing. Nucleic acids research, 39, W551-W556.
​
BiQ Analyzer is made available for non-commercial use under the terms of the BiQ Analyzer End User License Agreement.
Preprocessing Data
Data files returned from a sequencing machine cannot be used with BiQ Analyzer HT directly and have to be preprocessed. To prepare the data we recommend a selection of tools included in Galaxy and provide a step by step guideline on how to use them.
​
Galaxy
Galaxy can be used in various ways. We recommend using the main public Galaxy server which can be accessed at https://usegalaxy.org/ Other options to use Galaxy are a local installation, a cloud instance, using it on Slipstream or using one of the many other server based versions. All these options can be chosen at http://galaxyproject.org/ In this tutorial we describe the usage of the main public server option.
​
After opening Galaxy, one can see a list of tools on the left side and a history of loaded and processed files on the right. The space in the middle shows the currently open file or result.
Loading Files
To load files the user can choose “Get Data” -> “Upload File”. The file format should be automatically detected but the user can specify it if necessary. Files can be uploaded from the local computer, by URL or via a FTP Server. The user can further specify the genome of their reads and let the tool automatically convert spaces to tabs while uploading the file. Uploaded files appear consecutively numbered in the history on the right side an can be accessed there.
FastQ quality encoding - FASTQ Groomer
The BiQ Analyzer HT quality filter supports only Sanger or Illumina 1.8+ encoding. The FASTQ Groomer can be used to convert encodings. This tool can be found at “NGS: QC and manipulation” -> “FASTQ Groomer”. Setting a loaded file and a given encoding style, the FASTQ Groomer will change the encoding to Sanger, if nothing else is specified in the advanced options. If the loaded file is already in Sanger format, it will not be changed. After using the FASTQ Groomer on a file, a new file with a name containing “[]FASTQ Groomer on data[]” will appear in the history. In case of working with paired end sequencing data, it is necessary to use this tool as a first step.
Clipping Adapter sequences - Clip
In case of a sequencing approach with paired ends, the user will end up with two files per lane. One contains the reads started at the 5’end and the other the reads started at the 3’ end. The Clip tool - accessible at “NGS: QC and manipulation” -> “Clip” - removes adapter sequences to prepare the joining step. The user has to enter a groomed file and choose an adapter sequence. The standard Illumina adapter sequence is “AGATCGGAAGAGC”. It is further necessary to specify which reads should be kept and which should be discarded based on the clipping results.
Joining paired reads - FASTQ joiner
The FASTQ joiner - accessible at “NGS: QC and manipulation” -> “FASTQ joiner” - can join two files of a paired end sequencing and produce one file containing single end reads.
Trimming bad quality bases at read ends - FASTQ Quality Trimmer
The user can choose to trim reads that have ends with bad quality scores. The FASTQ Quality Trimmer tool (“NGS: QC and manipulation” -> “FASTQ Quality Trimmer”) uses a sliding window approach to check for bases with bad quality and trims them until the remaining area has a good quality score. The user hast to specify the current FastQ file and a minimal quality score that should be reached. Window size, step size and other parameters can also be set, but the default parameters lead to usable results. One should only trim from the 3’ end, because the 5’ end contains the primer sequences needed to demultiplex later on and should therefore not be altered. The 5’ end can be trimmed after demultiplexing.
​Demultiplexing - Barcode Splitter Version 1.0.0
As a last step it is necessary to demultiplex the FastQ file, if more than one amplicon was sequenced per lane. To do so the user has to upload another file containing the primers - which act as barcodes in this case - belonging to the different amplicons. Such a file should be tab seperated and contain in each line an identifier for the amplicon and the primer. All primers have to have the same length. The Barcode Splitter 1.0.0 can be found at “NGS: QC and manipulation” -> “Barcode Splitter”. Specifying the barcode file, the current FastQ file and at which end of the reads the primers can be found, the user can start the demultiplexing. The number of allowed mismatches and the number of allowed primer nucleotide deletions can be set to improve the results.
After completion, a new file appears in the history containing a table of the different identifiers, how many reads matched this identifier and a link to a file containing these reads. Opening this file and clicking the download button on the demultiplexing result file in the history downloads a zip archive including all new FastQ files. The file extension of those files is '.txt' but they contain fastq data and can therefore be easily renamed. Sequencing platforms as Illumina MiSeq and HiSeq demultiplex by barcodes themselves resulting in fastq files, each containing only reads for one barcode per amplicon. The barcode is given in the first line of each read. In case the raw read files come from a platform which does not do this automatically, this step must be repeated for all files resulting from the first demultiplexing step.
Creating a spreadsheet
To easily load the created files into BiQ Analyzer HT it is helpful to create a spreadsheet containing the paths to all the files. Such a spreadsheet starts with one line for comments and a description. If this line is empty, it will be ignored. The following lines need to be tab-delimited and contain a sample name, the path to a reference file and the paths to one or two read files, depending on the kind of analysis that should be done. Each possible combination of sample and amplicon needs to be listed in one independent line. In case there are no reads for a probe, the corresponding line can end after the reference.