******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.10.0 (Release date: Wed May 21 10:35:36 2014 +1000) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= motifs/301/301.seqs.fa ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ 42779 1.0000 500 43051 1.0000 500 47069 1.0000 500 47123 1.0000 500 48163 1.0000 500 50130 1.0000 500 50305 1.0000 500 10291 1.0000 500 44009 1.0000 500 44900 1.0000 500 45657 1.0000 500 12731 1.0000 500 31486 1.0000 500 42791 1.0000 500 49281 1.0000 500 50306 1.0000 500 48957 1.0000 500 49975 1.0000 500 46129 1.0000 500 49497 1.0000 500 46116 1.0000 500 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme motifs/301/301.seqs.fa -oc motifs/301 -dna -minw 12 -maxw 21 -nmotifs 3 -maxsize 500000 model: mod= zoops nmotifs= 3 evt= inf object function= E-value of product of p-values width: minw= 12 maxw= 21 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 21 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 global: substring= yes branching= no wbranch= no em: prior= dirichlet b= 0.01 maxiter= 50 distance= 1e-05 data: n= 10500 N= 21 strands: + sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.263 C 0.236 G 0.232 T 0.268 Background letter frequencies (from dataset with add-one prior applied): A 0.263 C 0.236 G 0.232 T 0.268 ******************************************************************************** ******************************************************************************** MOTIF 1 MEME width = 19 sites = 8 llr = 127 E-value = 1.2e+000 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A 14695a:6:9191aa5:46 pos.-specific C 14:11:3::1:::::3:6: probability G 834:4:819:619::1a:3 matrix T :::::::31:3::::1::1 bits 2.1 * 1.9 * ** * 1.7 * ** * 1.5 * * ** **** * Relative 1.3 * ** ** **** * Entropy 1.1 * ** ** ** **** ** (22.9 bits) 0.8 * ** ** ** **** ** 0.6 * ************* *** 0.4 *************** *** 0.2 ******************* 0.0 ------------------- Multilevel GAAAAAGAGAGAGAAAGCA consensus CG G CT T C AG sequence G -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------------- 10291 404 7.38e-10 TGTTTCCTTT GCAAAAGAGAGAGAACGAG TTCCGTGGAT 49281 480 2.32e-09 ATATATATTC GCGAGACAGATAGAAAGCA TC 12731 303 3.32e-09 AGTGACTTCA GAGAGAGAGAGAGAAAGAT AGAGAGAGCC 47123 194 9.62e-09 CACAGCCACG CAAAAAGAGAGGGAAAGCA ACGAGGCCTG 42791 340 3.61e-08 AGTGACTACT GGGAGACAGAAAGAACGAA GATGACATGA 50306 132 4.56e-08 CGTCCGACGA GCAAAAGTGATAAAAGGCA AGACTCAGAT 50130 42 1.91e-07 ATGGTGTAAC AGACAAGGGAGAGAATGCA ATCGATCTCG 43051 297 2.02e-07 CTGAAGGCCA GAAACAGTTCGAGAAAGCG TTCCAAACAT -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 10291 7.4e-10 403_[+1]_78 49281 2.3e-09 479_[+1]_2 12731 3.3e-09 302_[+1]_179 47123 9.6e-09 193_[+1]_288 42791 3.6e-08 339_[+1]_142 50306 4.6e-08 131_[+1]_350 50130 1.9e-07 41_[+1]_440 43051 2e-07 296_[+1]_185 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=19 seqs=8 10291 ( 404) GCAAAAGAGAGAGAACGAG 1 49281 ( 480) GCGAGACAGATAGAAAGCA 1 12731 ( 303) GAGAGAGAGAGAGAAAGAT 1 47123 ( 194) CAAAAAGAGAGGGAAAGCA 1 42791 ( 340) GGGAGACAGAAAGAACGAA 1 50306 ( 132) GCAAAAGTGATAAAAGGCA 1 50130 ( 42) AGACAAGGGAGAGAATGCA 1 43051 ( 297) GAAACAGTTCGAGAAAGCG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 19 n= 10122 bayes= 10.3041 E= 1.2e+000 -107 -92 169 -965 51 67 11 -965 125 -965 69 -965 173 -92 -965 -965 92 -92 69 -965 192 -965 -965 -965 -965 8 169 -965 125 -965 -89 -10 -965 -965 191 -110 173 -92 -965 -965 -107 -965 143 -10 173 -965 -89 -965 -107 -965 191 -965 192 -965 -965 -965 192 -965 -965 -965 92 8 -89 -110 -965 -965 210 -965 51 140 -965 -965 125 -965 11 -110 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 19 nsites= 8 E= 1.2e+000 0.125000 0.125000 0.750000 0.000000 0.375000 0.375000 0.250000 0.000000 0.625000 0.000000 0.375000 0.000000 0.875000 0.125000 0.000000 0.000000 0.500000 0.125000 0.375000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.250000 0.750000 0.000000 0.625000 0.000000 0.125000 0.250000 0.000000 0.000000 0.875000 0.125000 0.875000 0.125000 0.000000 0.000000 0.125000 0.000000 0.625000 0.250000 0.875000 0.000000 0.125000 0.000000 0.125000 0.000000 0.875000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.500000 0.250000 0.125000 0.125000 0.000000 0.000000 1.000000 0.000000 0.375000 0.625000 0.000000 0.000000 0.625000 0.000000 0.250000 0.125000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 regular expression -------------------------------------------------------------------------------- G[ACG][AG]A[AG]A[GC][AT]GA[GT]AGAA[AC]G[CA][AG] -------------------------------------------------------------------------------- Time 3.98 secs. ******************************************************************************** ******************************************************************************** MOTIF 2 MEME width = 20 sites = 19 llr = 207 E-value = 1.6e+001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 2 Description -------------------------------------------------------------------------------- Simplified A 231:::7:72:31:21:2:6 pos.-specific C 143:1::8166:47211:63 probability G 2:34:9::11:223542:4: matrix T 5446913222454:2578:1 bits 2.1 1.9 1.7 * 1.5 ** * Relative 1.3 ** * * * Entropy 1.1 ***** * * ** (15.7 bits) 0.8 ***** * * *** 0.6 ****** * * **** 0.4 * ********* * ***** 0.2 ******************** 0.0 -------------------- Multilevel TCTTTGACACCTCCGTTTCA consensus ATCG T TTATGAGGAGC sequence GAG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- -------------------- 42779 348 2.15e-11 TAAATGTCAG TTTTTGACACCTTCGGTTCA TCGCGTTCTG 50306 395 9.59e-08 ACAGCCACAT ACGTTGACACCTCCCCTTCA CACGATGATC 49281 394 2.20e-07 TTTCTGCGAG TACTTGACACCAGCGTCTGA TCGTGTGTCT 31486 24 9.40e-07 CGTCGGCGTG TTGGTGTCACTTCGCGTTGC TGCCGAGACT 45657 378 1.17e-06 ATCACAGGTT GACTTGACACCATGGTGACA GAGGTCGTGT 10291 322 1.30e-06 ATAGTTGTTT ACTGTGTCTCTATCGGTTGC GTTTTGGGGG 46129 129 1.45e-06 GACGAACTTT TCATTGACTCTTTCGTGTCC ATCTAAGAAA 44900 455 1.97e-06 TGAGATTCCT CTTGCGACACCTCCATTTGA TTCTTCGACT 43051 262 2.91e-06 GAAAGAGATT GCGTTGACAACAAGGTGTCA CAAAACTGAA 50130 308 3.20e-06 ACAACGAACG GACGTGTCAATACCAGTTCC GCAAACACAC 49497 338 4.61e-06 CGCGTGAAGG TTGTTGTCATCGTCGCTTCT GGTTGATTCG 12731 26 4.61e-06 GGGTGCTTCG TCGGTTACACTTGCTGTACA CTAGACAGAG 48163 406 5.49e-06 AGCGCGATTC GCTGTGTCACCTTCTACTCA CTACCGCGTT 47123 138 7.68e-06 ACAGAGAAGC AATTTGATTCTTCGCTTTCA AATAAACCTG 42791 122 1.33e-05 ATTTTCTAAC TCTTTGATAACTGCAGGAGA AAATCTTCAG 48957 42 1.65e-05 GGGGCATCAT ATCGTGACCGCGTCGTTTCC TCCCCGATCC 49975 71 2.18e-05 AATCATCAAG TATTTTACGTTTACGTTTGC TGAATCCCCT 50305 51 2.33e-05 CGATGCCTGA TTTTCGATATTACGAGTTGA AAGTAATAGG 44009 338 5.13e-05 TTTTTCTTCT TTCGTGTCGTCGCCTTTACT CTTGCTTCTA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 42779 2.2e-11 347_[+2]_133 50306 9.6e-08 394_[+2]_86 49281 2.2e-07 393_[+2]_87 31486 9.4e-07 23_[+2]_457 45657 1.2e-06 377_[+2]_103 10291 1.3e-06 321_[+2]_159 46129 1.4e-06 128_[+2]_352 44900 2e-06 454_[+2]_26 43051 2.9e-06 261_[+2]_219 50130 3.2e-06 307_[+2]_173 49497 4.6e-06 337_[+2]_143 12731 4.6e-06 25_[+2]_455 48163 5.5e-06 405_[+2]_75 47123 7.7e-06 137_[+2]_343 42791 1.3e-05 121_[+2]_359 48957 1.7e-05 41_[+2]_439 49975 2.2e-05 70_[+2]_410 50305 2.3e-05 50_[+2]_430 44009 5.1e-05 337_[+2]_143 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=20 seqs=19 42779 ( 348) TTTTTGACACCTTCGGTTCA 1 50306 ( 395) ACGTTGACACCTCCCCTTCA 1 49281 ( 394) TACTTGACACCAGCGTCTGA 1 31486 ( 24) TTGGTGTCACTTCGCGTTGC 1 45657 ( 378) GACTTGACACCATGGTGACA 1 10291 ( 322) ACTGTGTCTCTATCGGTTGC 1 46129 ( 129) TCATTGACTCTTTCGTGTCC 1 44900 ( 455) CTTGCGACACCTCCATTTGA 1 43051 ( 262) GCGTTGACAACAAGGTGTCA 1 50130 ( 308) GACGTGTCAATACCAGTTCC 1 49497 ( 338) TTGTTGTCATCGTCGCTTCT 1 12731 ( 26) TCGGTTACACTTGCTGTACA 1 48163 ( 406) GCTGTGTCACCTTCTACTCA 1 47123 ( 138) AATTTGATTCTTCGCTTTCA 1 42791 ( 122) TCTTTGATAACTGCAGGAGA 1 48957 ( 42) ATCGTGACCGCGTCGTTTCC 1 49975 ( 71) TATTTTACGTTTACGTTTGC 1 50305 ( 51) TTTTCGATATTACGAGTTGA 1 44009 ( 338) TTCGTGTCGTCGCCTTTACT 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 20 n= 10101 bayes= 9.24673 E= 1.6e+001 -32 -216 -14 97 0 64 -1089 46 -232 16 18 65 -1089 -1089 86 111 -1089 -116 -1089 174 -1089 -1089 194 -135 138 -1089 -1089 24 -1089 183 -1089 -76 138 -216 -114 -76 -74 129 -214 -35 -1089 129 -1089 65 26 -1089 -56 97 -132 64 -56 46 -1089 164 18 -1089 -32 -58 103 -76 -232 -116 66 82 -1089 -116 -14 135 -32 -1089 -1089 156 -1089 142 66 -1089 114 42 -1089 -135 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 20 nsites= 19 E= 1.6e+001 0.210526 0.052632 0.210526 0.526316 0.263158 0.368421 0.000000 0.368421 0.052632 0.263158 0.263158 0.421053 0.000000 0.000000 0.421053 0.578947 0.000000 0.105263 0.000000 0.894737 0.000000 0.000000 0.894737 0.105263 0.684211 0.000000 0.000000 0.315789 0.000000 0.842105 0.000000 0.157895 0.684211 0.052632 0.105263 0.157895 0.157895 0.578947 0.052632 0.210526 0.000000 0.578947 0.000000 0.421053 0.315789 0.000000 0.157895 0.526316 0.105263 0.368421 0.157895 0.368421 0.000000 0.736842 0.263158 0.000000 0.210526 0.157895 0.473684 0.157895 0.052632 0.105263 0.368421 0.473684 0.000000 0.105263 0.210526 0.684211 0.210526 0.000000 0.000000 0.789474 0.000000 0.631579 0.368421 0.000000 0.578947 0.315789 0.000000 0.105263 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 regular expression -------------------------------------------------------------------------------- [TAG][CTA][TCG][TG]TG[AT]CA[CT][CT][TA][CT][CG][GA][TG][TG][TA][CG][AC] -------------------------------------------------------------------------------- Time 7.80 secs. ******************************************************************************** ******************************************************************************** MOTIF 3 MEME width = 19 sites = 6 llr = 105 E-value = 5.7e+001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 3 Description -------------------------------------------------------------------------------- Simplified A :87:::a28::27:3:2:a pos.-specific C a:::37:8:::22a:2:8: probability G :23::3:::8252::832: matrix T :::a7:::2282::7:5:: bits 2.1 * * 1.9 * * * * * 1.7 * * * * * 1.5 * * ** * * * ** Relative 1.3 ** * ****** * * ** Entropy 1.1 *********** *** ** (25.2 bits) 0.8 *********** *** ** 0.6 *********** **** ** 0.4 *********** ******* 0.2 ******************* 0.0 ------------------- Multilevel CAATTCACAGTGACTGTCA consensus G CG A G sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------------- 47123 69 4.18e-12 CGACACTTGG CAATTCACAGTGACTGTCA ATGATTTGGA 45657 331 6.53e-11 ACACCAAAAC CAATTCACAGTTACTGTCA ACGTTTTTCC 47069 348 1.13e-08 CTGCAGTGTT CAATTCACTGTCGCAGGCA ATACGTTTCG 48957 399 1.31e-08 ATCAATCAAT CAATCGACAGGAACTGACA TTCTTGTTGG 49975 354 3.29e-08 AAACCAGATA CAGTTGACATTGCCTCGCA AACACGACTA 49497 371 4.21e-08 TGATTCGAGT CGGTCCAAAGTGACAGTGA GCGAAGTCGG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 47123 4.2e-12 68_[+3]_413 45657 6.5e-11 330_[+3]_151 47069 1.1e-08 347_[+3]_134 48957 1.3e-08 398_[+3]_83 49975 3.3e-08 353_[+3]_128 49497 4.2e-08 370_[+3]_111 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 3 width=19 seqs=6 47123 ( 69) CAATTCACAGTGACTGTCA 1 45657 ( 331) CAATTCACAGTTACTGTCA 1 47069 ( 348) CAATTCACTGTCGCAGGCA 1 48957 ( 399) CAATCGACAGGAACTGACA 1 49975 ( 354) CAGTTGACATTGCCTCGCA 1 49497 ( 371) CGGTCCAAAGTGACAGTGA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 19 n= 10122 bayes= 11.1671 E= 5.7e+001 -923 208 -923 -923 166 -923 -48 -923 134 -923 52 -923 -923 -923 -923 190 -923 50 -923 131 -923 150 52 -923 192 -923 -923 -923 -66 182 -923 -923 166 -923 -923 -68 -923 -923 184 -68 -923 -923 -48 163 -66 -50 110 -68 134 -50 -48 -923 -923 208 -923 -923 34 -923 -923 131 -923 -50 184 -923 -66 -923 52 90 -923 182 -48 -923 192 -923 -923 -923 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 19 nsites= 6 E= 5.7e+001 0.000000 1.000000 0.000000 0.000000 0.833333 0.000000 0.166667 0.000000 0.666667 0.000000 0.333333 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.333333 0.000000 0.666667 0.000000 0.666667 0.333333 0.000000 1.000000 0.000000 0.000000 0.000000 0.166667 0.833333 0.000000 0.000000 0.833333 0.000000 0.000000 0.166667 0.000000 0.000000 0.833333 0.166667 0.000000 0.000000 0.166667 0.833333 0.166667 0.166667 0.500000 0.166667 0.666667 0.166667 0.166667 0.000000 0.000000 1.000000 0.000000 0.000000 0.333333 0.000000 0.000000 0.666667 0.000000 0.166667 0.833333 0.000000 0.166667 0.000000 0.333333 0.500000 0.000000 0.833333 0.166667 0.000000 1.000000 0.000000 0.000000 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 regular expression -------------------------------------------------------------------------------- CA[AG]T[TC][CG]ACAGTGAC[TA]G[TG]CA -------------------------------------------------------------------------------- Time 11.36 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 42779 4.58e-07 347_[+2(2.15e-11)]_133 43051 1.18e-05 261_[+2(2.91e-06)]_15_\ [+1(2.02e-07)]_185 47069 1.09e-04 347_[+3(1.13e-08)]_134 47123 2.61e-14 68_[+3(4.18e-12)]_50_[+2(7.68e-06)]_\ 36_[+1(9.62e-09)]_288 48163 2.36e-02 405_[+2(5.49e-06)]_75 50130 1.32e-06 41_[+1(1.91e-07)]_247_\ [+2(3.20e-06)]_173 50305 2.39e-02 50_[+2(2.33e-05)]_430 10291 1.64e-08 96_[+2(4.58e-05)]_205_\ [+2(1.30e-06)]_62_[+1(7.38e-10)]_78 44009 8.42e-02 337_[+2(5.13e-05)]_143 44900 3.70e-04 266_[+3(5.50e-05)]_169_\ [+2(1.97e-06)]_26 45657 3.58e-09 330_[+3(6.53e-11)]_28_\ [+2(1.17e-06)]_103 12731 1.53e-07 25_[+2(4.61e-06)]_257_\ [+1(3.32e-09)]_179 31486 1.57e-04 23_[+2(9.40e-07)]_77_[+3(8.13e-05)]_\ 361 42791 2.40e-06 121_[+2(1.33e-05)]_198_\ [+1(3.61e-08)]_142 49281 8.09e-09 393_[+2(2.20e-07)]_66_\ [+1(2.32e-09)]_2 50306 5.96e-08 131_[+1(4.56e-08)]_244_\ [+2(9.59e-08)]_86 48957 1.62e-06 41_[+2(1.65e-05)]_337_\ [+3(1.31e-08)]_83 49975 1.22e-05 70_[+2(2.18e-05)]_263_\ [+3(3.29e-08)]_128 46129 5.09e-04 50_[+3(2.24e-05)]_59_[+2(1.45e-06)]_\ 352 49497 1.89e-06 337_[+2(4.61e-06)]_13_\ [+3(4.21e-08)]_111 46116 9.51e-01 500 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 3 reached. ******************************************************************************** CPU: seaotter.hsd1.wa.comcast.net ********************************************************************************