******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.10.0 (Release date: Wed May 21 10:35:36 2014 +1000) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= motifs/475/475.seqs.fa ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ 15801 1.0000 500 48789 1.0000 500 48861 1.0000 500 46528 1.0000 500 38048 1.0000 500 49562 1.0000 500 40020 1.0000 500 49629 1.0000 500 44206 1.0000 500 37892 1.0000 500 48016 1.0000 500 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme motifs/475/475.seqs.fa -oc motifs/475 -dna -minw 12 -maxw 21 -nmotifs 3 -maxsize 500000 model: mod= zoops nmotifs= 3 evt= inf object function= E-value of product of p-values width: minw= 12 maxw= 21 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 11 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 global: substring= yes branching= no wbranch= no em: prior= dirichlet b= 0.01 maxiter= 50 distance= 1e-05 data: n= 5500 N= 11 strands: + sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.303 C 0.208 G 0.221 T 0.268 Background letter frequencies (from dataset with add-one prior applied): A 0.303 C 0.208 G 0.221 T 0.268 ******************************************************************************** ******************************************************************************** MOTIF 1 MEME width = 16 sites = 6 llr = 89 E-value = 3.1e+001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A :3:58:a2:2:::3:2 pos.-specific C :2a::2::a3a:733: probability G 82:228:::::a3::8 matrix T 23:3:::8:5:::37: bits 2.3 * * ** 2.0 * * ** 1.8 * * * ** 1.6 * * ** * ** Relative 1.4 * * ** * *** * Entropy 1.1 * * ***** *** ** (21.3 bits) 0.9 * * ***** *** ** 0.7 * * ***** *** ** 0.5 * ************** 0.2 * ************** 0.0 ---------------- Multilevel GACAAGATCTCGCATG consensus T T C GCC sequence T -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ---------------- 49629 65 7.94e-10 TCTTCTTTTC GTCTAGATCTCGCCTG CACCATCCTT 44206 22 6.98e-09 GGAAAACGAT GACAAGATCTCGGCTG ATGTGTCGAC 37892 299 3.79e-08 CCTGTATGAT GGCAAGATCACGCATG CCTTCCGCCA 49562 27 2.59e-07 GACAAGCTCT GCCGAGATCTCGCTCA CGGGGGAACT 38048 431 2.76e-07 CTCGTTTCTG TTCAGGATCCCGCTCG TGTCCACTAT 40020 285 5.78e-07 GAATTAGAAA GACTACAACCCGGATG TTTATAAAAA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 49629 7.9e-10 64_[+1]_420 44206 7e-09 21_[+1]_463 37892 3.8e-08 298_[+1]_186 49562 2.6e-07 26_[+1]_458 38048 2.8e-07 430_[+1]_54 40020 5.8e-07 284_[+1]_200 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=16 seqs=6 49629 ( 65) GTCTAGATCTCGCCTG 1 44206 ( 22) GACAAGATCTCGGCTG 1 37892 ( 299) GGCAAGATCACGCATG 1 49562 ( 27) GCCGAGATCTCGCTCA 1 38048 ( 431) TTCAGGATCCCGCTCG 1 40020 ( 285) GACTACAACCCGGATG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 16 n= 5335 bayes= 9.45322 E= 3.1e+001 -923 -923 192 -69 14 -32 -40 31 -923 227 -923 -923 72 -923 -40 31 146 -923 -40 -923 -923 -32 192 -923 172 -923 -923 -923 -86 -923 -923 163 -923 227 -923 -923 -86 68 -923 90 -923 227 -923 -923 -923 -923 218 -923 -923 168 59 -923 14 68 -923 31 -923 68 -923 131 -86 -923 192 -923 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 16 nsites= 6 E= 3.1e+001 0.000000 0.000000 0.833333 0.166667 0.333333 0.166667 0.166667 0.333333 0.000000 1.000000 0.000000 0.000000 0.500000 0.000000 0.166667 0.333333 0.833333 0.000000 0.166667 0.000000 0.000000 0.166667 0.833333 0.000000 1.000000 0.000000 0.000000 0.000000 0.166667 0.000000 0.000000 0.833333 0.000000 1.000000 0.000000 0.000000 0.166667 0.333333 0.000000 0.500000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.666667 0.333333 0.000000 0.333333 0.333333 0.000000 0.333333 0.000000 0.333333 0.000000 0.666667 0.166667 0.000000 0.833333 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 regular expression -------------------------------------------------------------------------------- G[AT]C[AT]AGATC[TC]CG[CG][ACT][TC]G -------------------------------------------------------------------------------- Time 1.23 secs. ******************************************************************************** ******************************************************************************** MOTIF 2 MEME width = 17 sites = 4 llr = 73 E-value = 2.3e+001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 2 Description -------------------------------------------------------------------------------- Simplified A :33:3:::::::::3:: pos.-specific C a::a8:53:aa::5:a8 probability G :35::::::::::53:: matrix T :53::a58a::aa:5:3 bits 2.3 * * ** * 2.0 * * ** * 1.8 * * * ***** * 1.6 * * * ***** * Relative 1.4 * *** ***** ** Entropy 1.1 * *********** ** (26.4 bits) 0.9 * *********** ** 0.7 * *********** ** 0.5 ***************** 0.2 ***************** 0.0 ----------------- Multilevel CTGCCTCTTCCTTCTCC consensus AA A TC GA T sequence GT G -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ----------------- 37892 440 8.77e-10 AATACGAAGT CGGCCTCTTCCTTGACC AAGCAGAAAG 44206 72 1.32e-09 AATACGGAAA CTGCCTTTTCCTTGTCT GCGAGGGCGC 49629 84 5.50e-09 TCGCCTGCAC CATCCTTCTCCTTCTCC AATTCTTGGC 40020 50 6.28e-09 ACTTGTCTAG CTACATCTTCCTTCGCC GTTTACATCA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 37892 8.8e-10 439_[+2]_44 44206 1.3e-09 71_[+2]_412 49629 5.5e-09 83_[+2]_400 40020 6.3e-09 49_[+2]_434 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=17 seqs=4 37892 ( 440) CGGCCTCTTCCTTGACC 1 44206 ( 72) CTGCCTTTTCCTTGTCT 1 49629 ( 84) CATCCTTCTCCTTCTCC 1 40020 ( 50) CTACATCTTCCTTCGCC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 17 n= 5324 bayes= 10.3772 E= 2.3e+001 -865 226 -865 -865 -28 -865 18 90 -28 -865 118 -10 -865 226 -865 -865 -28 185 -865 -865 -865 -865 -865 190 -865 127 -865 90 -865 27 -865 148 -865 -865 -865 190 -865 226 -865 -865 -865 226 -865 -865 -865 -865 -865 190 -865 -865 -865 190 -865 127 118 -865 -28 -865 18 90 -865 226 -865 -865 -865 185 -865 -10 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 17 nsites= 4 E= 2.3e+001 0.000000 1.000000 0.000000 0.000000 0.250000 0.000000 0.250000 0.500000 0.250000 0.000000 0.500000 0.250000 0.000000 1.000000 0.000000 0.000000 0.250000 0.750000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.500000 0.000000 0.500000 0.000000 0.250000 0.000000 0.750000 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.500000 0.500000 0.000000 0.250000 0.000000 0.250000 0.500000 0.000000 1.000000 0.000000 0.000000 0.000000 0.750000 0.000000 0.250000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 regular expression -------------------------------------------------------------------------------- C[TAG][GAT]C[CA]T[CT][TC]TCCTT[CG][TAG]C[CT] -------------------------------------------------------------------------------- Time 2.35 secs. ******************************************************************************** ******************************************************************************** MOTIF 3 MEME width = 12 sites = 11 llr = 108 E-value = 1.2e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 3 Description -------------------------------------------------------------------------------- Simplified A 51::535:::77 pos.-specific C ::1a57::18:2 probability G 2:::1::8123: matrix T 399:::528::1 bits 2.3 * 2.0 * 1.8 * 1.6 ** * Relative 1.4 *** * * * Entropy 1.1 *** * *** (14.2 bits) 0.9 *** ****** 0.7 *********** 0.5 ************ 0.2 ************ 0.0 ------------ Multilevel ATTCACAGTCAA consensus T CAT G sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------ 48861 252 7.40e-07 GTGTTTGGGG TTTCACAGTCAA GAATGCCTGT 46528 424 1.11e-06 TTTGAGCAAA GTTCACAGTCAA CCACGGTCAC 44206 395 1.46e-06 TTCACGTTTC ATTCACAGTCAC TGTCACACGC 15801 157 2.41e-06 GCAAAGACAA ATTCAAAGTCAA AAATTGACGA 37892 22 3.41e-06 AAAGATGCAA ATTCCCTGTCAT GGAAATAAAT 49629 373 1.17e-05 GCTGTGACCG TTTCGCTGTCGA AATCTTAATC 38048 22 1.60e-05 AATGATGACA ATTCCCTTTGAA CAATTTTATG 40020 419 2.46e-05 TTGTTTTGAT GTTCACAGCCGA GTCAATCTGT 48016 129 4.16e-05 GTACTGTTCG AACCCCTGTCAA GTTGTCATGG 48789 208 4.16e-05 CCGTGGTTCA ATTCCATGGCAC GTTAATTTTG 49562 421 1.04e-04 CGGCAGAAAC TTTCCAATTGGA GAGAAGATGC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 48861 7.4e-07 251_[+3]_237 46528 1.1e-06 423_[+3]_65 44206 1.5e-06 394_[+3]_94 15801 2.4e-06 156_[+3]_332 37892 3.4e-06 21_[+3]_467 49629 1.2e-05 372_[+3]_116 38048 1.6e-05 21_[+3]_467 40020 2.5e-05 418_[+3]_70 48016 4.2e-05 128_[+3]_360 48789 4.2e-05 207_[+3]_281 49562 0.0001 420_[+3]_68 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 3 width=12 seqs=11 48861 ( 252) TTTCACAGTCAA 1 46528 ( 424) GTTCACAGTCAA 1 44206 ( 395) ATTCACAGTCAC 1 15801 ( 157) ATTCAAAGTCAA 1 37892 ( 22) ATTCCCTGTCAT 1 49629 ( 373) TTTCGCTGTCGA 1 38048 ( 22) ATTCCCTTTGAA 1 40020 ( 419) GTTCACAGCCGA 1 48016 ( 129) AACCCCTGTCAA 1 48789 ( 208) ATTCCATGGCAC 1 49562 ( 421) TTTCCAATTGGA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 12 n= 5379 bayes= 9.28648 E= 1.2e+002 85 -1010 -28 2 -174 -1010 -1010 176 -1010 -119 -1010 176 -1010 227 -1010 -1010 58 113 -128 -1010 -15 181 -1010 -1010 85 -1010 -1010 76 -1010 -1010 189 -56 -1010 -119 -128 161 -1010 198 -28 -1010 126 -1010 31 -1010 126 -19 -1010 -156 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 12 nsites= 11 E= 1.2e+002 0.545455 0.000000 0.181818 0.272727 0.090909 0.000000 0.000000 0.909091 0.000000 0.090909 0.000000 0.909091 0.000000 1.000000 0.000000 0.000000 0.454545 0.454545 0.090909 0.000000 0.272727 0.727273 0.000000 0.000000 0.545455 0.000000 0.000000 0.454545 0.000000 0.000000 0.818182 0.181818 0.000000 0.090909 0.090909 0.818182 0.000000 0.818182 0.181818 0.000000 0.727273 0.000000 0.272727 0.000000 0.727273 0.181818 0.000000 0.090909 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 regular expression -------------------------------------------------------------------------------- [AT]TTC[AC][CA][AT]GTC[AG]A -------------------------------------------------------------------------------- Time 3.63 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 15801 4.51e-03 156_[+3(2.41e-06)]_332 48789 6.76e-02 207_[+3(4.16e-05)]_281 48861 3.46e-03 251_[+3(7.40e-07)]_237 46528 1.46e-03 423_[+3(1.11e-06)]_65 38048 3.52e-05 21_[+3(1.60e-05)]_397_\ [+1(2.76e-07)]_54 49562 4.31e-04 26_[+1(2.59e-07)]_458 40020 3.54e-09 49_[+2(6.28e-09)]_218_\ [+1(5.78e-07)]_118_[+3(2.46e-05)]_70 49629 3.34e-12 64_[+1(7.94e-10)]_3_[+2(5.50e-09)]_\ 272_[+3(1.17e-05)]_116 44206 9.50e-13 21_[+1(6.98e-09)]_34_[+2(1.32e-09)]_\ 306_[+3(1.46e-06)]_94 37892 7.09e-12 21_[+3(3.41e-06)]_265_\ [+1(3.79e-08)]_125_[+2(8.77e-10)]_44 48016 1.98e-01 128_[+3(4.16e-05)]_360 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 3 reached. ******************************************************************************** CPU: seaotter.hsd1.wa.comcast.net ********************************************************************************