Sandberg Lab: MULTo

What is MULTo?

MULTo is a set of python scripts that identifies, stores and retrieves the minimum length required at each genomic position to be unique across the genome or transcriptome. We have computed the minimum unique length (in the range between 20-255 nts) and stored them in a set of binary files (one per chromosome).

Why use MULTo files?

MULTo has been developed to allow for efficient retrieval of uniquely mappable positions within genomes or transcriptomes to use for correcting density measures in next-generation sequencing experiments. A common strategy in RNA-Sequencing is to only consider uniquely mapping reads and to correct the gene models using for RPKM estimation to only consider the number of uniqely mappable positions. With MULTo this information is available to normalize sequence data of arbitrary length.

How to use MULTo files?

It is very easy to query the precomputed MULTo files, either using the command line (see example below) or to integrate querying into existing programs. Using our scripts one can query the files for single or many regions at the time and define that level of summariztion of results. See the documentation below for more information.

Genomic Minimum Unique Length Files:

Species	Single reads
Mouse (C57/Bl6):	mm9
Human:	hg19

Transcriptome (Gene-level) Minimum Unique Length Files:

Species	Single reads	Paired-end fragments
Mouse (C57/Bl6):	Refseq (mm9)	Refseq (mm9, 250nt)
Human:	Refseq (hg19)	Refseq (hg19, 250nt)

Transcriptome (Isoform-level) Minimum Unique Length Files:

Species	Single reads	Paired-end fragments
Mouse (C57/Bl6):	Refseq (mm9)	Refseq (mm9, 250nt)
Human:	Refseq (hg19)	Refseq (hg19, 250nt)

Scripts for the generation and querying of MUL files:

The multo package contains python scripts for the generation and/or querying of MULTo files and can be downloaded here.
Readme file with explanations and examples is found here.

Examples of commands for querying MULTo files

Making use of the uniqueness files within your own software could not be easier. Below is simple python code snippet thet reads MULTo files and returns uniquess information:


  def read_genomicregion(binfile, start_coord, end_coord):
      binfile.seek(start_coord, 0)
      return map(ord, binfile.read(end_coord-start_coord+1))

Reference

The MULTo framework has been developed by Helena Storvall, Daniel Ramsköld and Rickard Sandberg.
If you find this resource useful, please cite our paper:
Storvall H, Ramskold D, Sandberg R (2013) Efficient and Comprehensive Representation of Uniqueness for Next-Generation Sequencing by Minimum Unique Length Analyses. PLoS ONE 8(1): e53822. doi:10.1371/journal.pone.0053822/>