What is MULTo?
MULTo is a set of python scripts that identifies, stores and retrieves the minimum length required at each genomic
position to be unique across the genome or transcriptome. We have computed the minimum unique length (in the range
between 20-255 nts) and stored them in a set of binary files (one per chromosome).
Why use MULTo files?
MULTo has been developed to allow for efficient retrieval of uniquely mappable positions within genomes or transcriptomes
to use for correcting density measures in next-generation sequencing experiments. A common strategy in RNA-Sequencing is to
only consider uniquely mapping reads and to correct the gene models using for RPKM estimation to only consider the number of
uniqely mappable positions. With MULTo this information is available to normalize sequence data of arbitrary length.
How to use MULTo files?
It is very easy to query the precomputed MULTo files, either using the command line (see example below) or to integrate querying
into existing programs. Using our scripts one can query the files for single or many regions at the time and define that level
of summariztion of results. See the documentation below for more information.
Genomic Minimum Unique Length Files:
Species | Single reads |
Mouse (C57/Bl6): | mm9 |
Human: | hg19 |
Transcriptome (Gene-level) Minimum Unique Length Files:
Transcriptome (Isoform-level) Minimum Unique Length Files:
Scripts for the generation and querying of MUL files:
The multo package contains python scripts for the generation and/or querying of MULTo files and can be downloaded
here.
Readme file with explanations and examples is found
here.
Examples of commands for querying MULTo files
Making use of the uniqueness files within your own software could not be easier.
Below is simple python code snippet thet reads MULTo files and returns uniquess information:
def read_genomicregion(binfile, start_coord, end_coord):
binfile.seek(start_coord, 0)
return map(ord, binfile.read(end_coord-start_coord+1))
Reference
The MULTo framework has been developed by Helena Storvall, Daniel Ramsköld and Rickard Sandberg.
If you find this resource useful, please cite our paper:
Storvall H, Ramskold D, Sandberg R (2013) Efficient and Comprehensive Representation of Uniqueness for Next-Generation Sequencing by Minimum Unique Length Analyses. PLoS ONE 8(1): e53822. doi:10.1371/journal.pone.0053822/>