MoSBAT: Motif Similarity Based on Affinity of Targets

 

Measuring motif similarity is essential for identifying functionally related transcription factors (TFs) and RNA-binding proteins (RBPs), and to annotate de novo motifs. Here, we describe Motif Similarity Based on Affinity of Targets (MoSBAT), an approach for measuring the similarity of motifs by computing their affinity profiles across varying sequences. We show that MoSBAT accurately identifies TFs with similar in vitro binding activities even when their motifs are derived from noisy measurements, successfully associates de novo ChIP-seq motifs to their respective TFs, and outperforms existing motif comparison tools in both tasks.

 

MoSBAT Webserver

 

Input

To compare your motifs using MosBAT you will need a CIS-BP formatted position frequency matrix (PFM), or file containing them.  Each motif should include the following lines:

1.     Identifier: Motif<tab>Motif_ID

2.     Motif Header:

DNA Motif Header: Pos<tab>A<tab>C<tab>G<tab>T

or

RNA Motif Header: Pos<tab>A<tab>C<tab>G<tab>U

3.     Position Frequency Line(s): PositionNumber<tab>Freq_A<tab>Freq_C<tab>Freq_G<tab>Freq_T<tab>

á       PositionNumber starts at 1

á       The frequencies at each position should sum to 1

                   Example

                   Motif   PITX1            

Pos     A     C     G     T

1 0.19  0.27  0.11  0.43

2 0.05  0.01  0.00  0.94

3 0.88  0.04  0.06  0.01

4 0.99  0.00  0.00  0.01

5 0.02  0.02  0.13  0.84

6 0.06  0.86  0.01  0.07

7 0.04  0.77  0.03  0.15

8 0.23  0.41  0.14  0.22

                   

                   

Motif   HOXA2            

Pos     A     C     G     T

1 0.23  0.28  0.24  0.25

2 0.23  0.27  0.25  0.25

3 0.16  0.23  0.15  0.46

4 0.51  0.05  0.26  0.18

5 0.52  0.21  0.11  0.17

6 0.20  0.25  0.08  0.47

7 0.16  0.17  0.36  0.31

8 0.40  0.16  0.20  0.23

9 0.22  0.28  0.28  0.22

 

 

If the PFM file has more the 1 motif in it, 2 new lines should separate motifs. An example can be downloaded here.

 

Motif Comparison Settings

Once your PFM(s) have been input, you can choose to compare your motifs to:

A.    CIS-BP public collection of Human and Mouse TF motifs

B.    A user-defined collection of motifs (in the CIS-BP format stated above)

The user then selects whether their motifs are DNA or RNA, and can adjust two settings (random sequence length and number) to change the random sequence pool.

 

Output

MoSBAT outputs the following files containing all pairwise comparisons of motifs in the first set of PFMs to the second set:

á       Full Matrix of MoSBAT-a Results (results.affinity.correl.txt) – Text file containing a matrix of motif similarities based on sequence affinities (MoSBAT-a). Matrix has dimensions: Motif_Set_1 by Motif_Set_2

á       Full Matrix of MoSBAT-e Results (results.energy.correl.txt) – Text file containing a matrix of motif similarities based on sequence energies (MoSBAT-e). Matrix has dimensions: Motif_Set_1 by Motif_Set_2

 

For convenience MoSBAT also identifies the most similar pairs of motifs and their offsets. The outputs are available as heatmaps and browser viewable tables:

á       Heatmap of Top MoSBAT-a Hits (results.affinity.correl.heatmap.jpg) – Heatmap image displaying at most top 10x10 MoSBAT-a (results.affinity.correl.txt) motif similarity values.

á       Table of Top MoSBAT-a Hits (results.affinity.correl.htm) – HTML table of the top 1000 pairs of motifs (results.affinity.correl.txt). Includes MoSBAT-a scores, and a histogram of offsets for the top 100 pairs of motifs.

á       Heatmap of Top MoSBAT-e Hits (results.energy.correl.heatmap.jpg) – Heatmap image displaying at most top 10x10 MoSBAT-e (results.energy.correl.txt) motif similarity values.

á       Table of Top MoSBAT-e Hits (results.energy.correl.htm) – HTML table of the top 1000 pairs of motifs (results.energy.correl.txt). Includes MoSBAT-e scores, and a histogram of offsets for the top 100 pairs of motifs.

 

 

 

MoSBAT source code

 

The source code for MoSBAT can be downloaded from the project github: https://github.com/csglab/MoSBAT. A detailed README is included with instructions on how to run MoSBAT, and some common extensions.

 

MoSBAT Requirements

Unix-compatible OS

R version 3.0.1 or later (http://www.r-project.org/)

R ÒgplotsÓ library (https://cran.r-project.org/web/packages/gplots/index.html)