MoSBAT: Motif Similarity Based on
Affinity of Targets

Measuring motif
similarity is essential for identifying functionally related transcription
factors (TFs) and RNA-binding proteins (RBPs), and to annotate de novo motifs. Here, we describe
Motif Similarity Based on Affinity of Targets (MoSBAT),
an approach for measuring the similarity of motifs by computing their
affinity profiles across varying sequences. We show that MoSBAT
accurately identifies TFs with similar in
vitro binding activities even when their motifs are derived from noisy
measurements, successfully associates de
novo ChIP-seq motifs to their respective TFs, and outperforms existing
motif comparison tools in both tasks.
MoSBAT Webserver
Input
To compare your
motifs using MosBAT you will need a CIS-BP
formatted position frequency matrix (PFM), or file containing them. Each motif should include the
following lines:
1. Identifier: Motif<tab>Motif_ID
2. Motif Header:
DNA Motif Header:
Pos<tab>A<tab>C<tab>G<tab>T
or
RNA Motif Header:
Pos<tab>A<tab>C<tab>G<tab>U
3. Position Frequency Line(s): PositionNumber<tab>Freq_A<tab>Freq_C<tab>Freq_G<tab>Freq_T<tab>
á PositionNumber starts at 1
á The frequencies
at each position should sum to 1
Example
Motif PITX1
Pos A C G T
1 0.19 0.27 0.11 0.43
2 0.05 0.01 0.00 0.94
3 0.88 0.04 0.06 0.01
4 0.99 0.00 0.00 0.01
5 0.02 0.02 0.13 0.84
6 0.06 0.86 0.01 0.07
7 0.04 0.77 0.03 0.15
8 0.23 0.41 0.14 0.22
Motif HOXA2
Pos A C G T
1 0.23 0.28 0.24 0.25
2 0.23 0.27 0.25 0.25
3 0.16 0.23 0.15 0.46
4 0.51 0.05 0.26 0.18
5 0.52 0.21 0.11 0.17
6 0.20 0.25 0.08 0.47
7 0.16 0.17 0.36 0.31
8 0.40 0.16 0.20 0.23
9 0.22 0.28 0.28 0.22
If the PFM file
has more the 1 motif in it, 2 new lines should separate motifs. An example
can be downloaded here.
Motif Comparison Settings
Once your PFM(s)
have been input, you can choose to compare your motifs to:
A. CIS-BP public
collection of Human and Mouse TF motifs
B. A user-defined
collection of motifs (in the CIS-BP format stated above)
The user then
selects whether their motifs are DNA or RNA, and can adjust two settings
(random sequence length and number) to change the random sequence pool.
Output
MoSBAT outputs the
following files containing all pairwise comparisons of motifs in the first
set of PFMs to the second set:
á Full Matrix of MoSBAT-a
Results (results.affinity.correl.txt) – Text file containing a matrix
of motif similarities based on sequence affinities (MoSBAT-a).
Matrix has dimensions: Motif_Set_1 by
Motif_Set_2
á Full Matrix of MoSBAT-e
Results (results.energy.correl.txt) – Text file containing a matrix of
motif similarities based on sequence energies (MoSBAT-e).
Matrix has dimensions: Motif_Set_1 by
Motif_Set_2
For convenience MoSBAT also identifies the most similar pairs of motifs
and their offsets. The outputs are available as heatmaps
and browser viewable tables:
á Heatmap of Top MoSBAT-a Hits
(results.affinity.correl.heatmap.jpg) – Heatmap
image displaying at most top 10x10 MoSBAT-a
(results.affinity.correl.txt) motif similarity values.
á Table of Top MoSBAT-a Hits
(results.affinity.correl.htm) – HTML table of the top 1000 pairs of
motifs (results.affinity.correl.txt). Includes MoSBAT-a
scores, and a histogram of offsets for the top 100 pairs of motifs.
á Heatmap of Top MoSBAT-e Hits
(results.energy.correl.heatmap.jpg) – Heatmap
image displaying at most top 10x10 MoSBAT-e
(results.energy.correl.txt) motif similarity values.
á Table of Top MoSBAT-e Hits
(results.energy.correl.htm) – HTML table of the top 1000 pairs of
motifs (results.energy.correl.txt). Includes MoSBAT-e
scores, and a histogram of offsets for the top 100 pairs of motifs.
MoSBAT source code
The source code
for MoSBAT can be downloaded from the project github: https://github.com/csglab/MoSBAT.
A detailed README is included with instructions on how to run MoSBAT, and some common extensions.
MoSBAT Requirements
Unix-compatible
OS
R version 3.0.1
or later (http://www.r-project.org/)
R ÒgplotsÓ library (https://cran.r-project.org/web/packages/gplots/index.html)
|