******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.10.0 (Release date: Wed May 21 10:35:36 2014 +1000) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= motifs/20/20.seqs.fa ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ 47447 1.0000 500 47558 1.0000 500 49802 1.0000 500 35370 1.0000 500 45797 1.0000 500 42689 1.0000 500 46905 1.0000 500 44308 1.0000 500 54466 1.0000 500 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme motifs/20/20.seqs.fa -oc motifs/20 -dna -minw 12 -maxw 21 -nmotifs 3 -maxsize 500000 model: mod= zoops nmotifs= 3 evt= inf object function= E-value of product of p-values width: minw= 12 maxw= 21 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 9 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 global: substring= yes branching= no wbranch= no em: prior= dirichlet b= 0.01 maxiter= 50 distance= 1e-05 data: n= 4500 N= 9 strands: + sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.264 C 0.250 G 0.231 T 0.256 Background letter frequencies (from dataset with add-one prior applied): A 0.264 C 0.250 G 0.231 T 0.256 ******************************************************************************** ******************************************************************************** MOTIF 1 MEME width = 12 sites = 7 llr = 82 E-value = 9.9e+001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A :31::4:::::3 pos.-specific C :::::4::::a1 probability G 9:3::1a:14:6 matrix T 176aa::a96:: bits 2.1 * 1.9 ** ** * 1.7 ** ** * 1.5 * ** *** * Relative 1.3 * ** *** * Entropy 1.1 ** ** ***** (16.9 bits) 0.8 ** ** ***** 0.6 ************ 0.4 ************ 0.2 ************ 0.0 ------------ Multilevel GTTTTAGTTTCG consensus AG C G A sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------ 42689 191 2.17e-07 ATATTGCGGG GTTTTCGTTGCG CTTGCGCTGG 46905 178 3.20e-07 GTCTGAGAGT GTGTTAGTTTCG GCACCTACGG 49802 16 1.34e-06 GTACCACCAG GTATTCGTTTCG TCGATGGTCG 45797 235 2.25e-06 CAAAATACAT GTGTTGGTTGCG ATGGTTCCAC 35370 94 3.83e-06 CCGAGAAAAC GATTTAGTTTCC TATACATTGC 47558 450 6.04e-06 TCAATCCTAC TTTTTCGTTGCA TTCTAGCTGT 47447 192 8.51e-06 GTGGTTCGAG GATTTAGTGTCA ATAACACATG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 42689 2.2e-07 190_[+1]_298 46905 3.2e-07 177_[+1]_311 49802 1.3e-06 15_[+1]_473 45797 2.3e-06 234_[+1]_254 35370 3.8e-06 93_[+1]_395 47558 6e-06 449_[+1]_39 47447 8.5e-06 191_[+1]_297 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=12 seqs=7 42689 ( 191) GTTTTCGTTGCG 1 46905 ( 178) GTGTTAGTTTCG 1 49802 ( 16) GTATTCGTTTCG 1 45797 ( 235) GTGTTGGTTGCG 1 35370 ( 94) GATTTAGTTTCC 1 47558 ( 450) TTTTTCGTTGCA 1 47447 ( 192) GATTTAGTGTCA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 12 n= 4401 bayes= 9.90047 E= 9.9e+001 -945 -945 189 -84 12 -945 -945 148 -88 -945 31 116 -945 -945 -945 197 -945 -945 -945 197 70 78 -69 -945 -945 -945 211 -945 -945 -945 -945 197 -945 -945 -69 174 -945 -945 89 116 -945 200 -945 -945 12 -80 130 -945 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 12 nsites= 7 E= 9.9e+001 0.000000 0.000000 0.857143 0.142857 0.285714 0.000000 0.000000 0.714286 0.142857 0.000000 0.285714 0.571429 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.428571 0.428571 0.142857 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.142857 0.857143 0.000000 0.000000 0.428571 0.571429 0.000000 1.000000 0.000000 0.000000 0.285714 0.142857 0.571429 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 regular expression -------------------------------------------------------------------------------- G[TA][TG]TT[AC]GTT[TG]C[GA] -------------------------------------------------------------------------------- Time 0.80 secs. ******************************************************************************** ******************************************************************************** MOTIF 2 MEME width = 12 sites = 9 llr = 92 E-value = 2.7e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 2 Description -------------------------------------------------------------------------------- Simplified A ::1::83a:2:3 pos.-specific C ::92::3:12a3 probability G a1:6922:86:3 matrix T :9:21:1:1::: bits 2.1 * 1.9 * * * 1.7 * * * * 1.5 *** * * * Relative 1.3 *** ** * * Entropy 1.1 *** ** ** * (14.7 bits) 0.8 *** ** ** * 0.6 ****** **** 0.4 ****** ***** 0.2 ****** ***** 0.0 ------------ Multilevel GTCGGAAAGGCA consensus C GC A C sequence T G C G -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------ 42689 430 3.62e-07 ACTACTCCTT GTCGGAGAGGCC ATCCAATTTT 47447 95 1.88e-06 ATCTCGTGAA GTCCGAAAGGCA GCGCGGAGGG 35370 271 3.64e-06 GGAAAGGGCA GTCTGACAGACG TCAGAATTAG 46905 364 7.96e-06 TTTTTTCCGA GTCCGAGAGCCC CATAGAAAGC 45797 188 1.09e-05 GGAAAAAGAT GTCGGGCACGCG ACCCGGGAGA 49802 75 1.28e-05 GATACTCCTC GGCGGGCAGGCA GGCGGAATAC 54466 259 1.81e-05 CACCCTGGGT GTAGGAAAGCCC TTTTCCTGGT 44308 225 1.81e-05 CAGACGAAGT GTCTGAAATGCA AACTGCATGT 47558 24 4.58e-05 GTTCTCGTGG GTCGTATAGACG GCGCAACGTT -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 42689 3.6e-07 429_[+2]_59 47447 1.9e-06 94_[+2]_394 35370 3.6e-06 270_[+2]_218 46905 8e-06 363_[+2]_125 45797 1.1e-05 187_[+2]_301 49802 1.3e-05 74_[+2]_414 54466 1.8e-05 258_[+2]_230 44308 1.8e-05 224_[+2]_264 47558 4.6e-05 23_[+2]_465 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=12 seqs=9 42689 ( 430) GTCGGAGAGGCC 1 47447 ( 95) GTCCGAAAGGCA 1 35370 ( 271) GTCTGACAGACG 1 46905 ( 364) GTCCGAGAGCCC 1 45797 ( 188) GTCGGGCACGCG 1 49802 ( 75) GGCGGGCAGGCA 1 54466 ( 259) GTAGGAAAGCCC 1 44308 ( 225) GTCTGAAATGCA 1 47558 ( 24) GTCGTATAGACG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 12 n= 4401 bayes= 8.93074 E= 2.7e+002 -982 -982 211 -982 -982 -982 -105 180 -124 183 -982 -982 -982 -17 126 -20 -982 -982 194 -120 156 -982 -6 -982 34 42 -6 -120 192 -982 -982 -982 -982 -117 175 -120 -25 -17 126 -982 -982 200 -982 -982 34 42 53 -982 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 12 nsites= 9 E= 2.7e+002 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.111111 0.888889 0.111111 0.888889 0.000000 0.000000 0.000000 0.222222 0.555556 0.222222 0.000000 0.000000 0.888889 0.111111 0.777778 0.000000 0.222222 0.000000 0.333333 0.333333 0.222222 0.111111 1.000000 0.000000 0.000000 0.000000 0.000000 0.111111 0.777778 0.111111 0.222222 0.222222 0.555556 0.000000 0.000000 1.000000 0.000000 0.000000 0.333333 0.333333 0.333333 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 regular expression -------------------------------------------------------------------------------- GTC[GCT]G[AG][ACG]AG[GAC]C[ACG] -------------------------------------------------------------------------------- Time 1.65 secs. ******************************************************************************** ******************************************************************************** MOTIF 3 MEME width = 16 sites = 7 llr = 94 E-value = 3.4e+001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 3 Description -------------------------------------------------------------------------------- Simplified A :913:1:3::::43:3 pos.-specific C a:31a6a:::a91191 probability G :::3:3:1:::1:1:6 matrix T :163:::6aa::441: bits 2.1 1.9 * * * *** 1.7 * * * *** 1.5 * * * **** * Relative 1.3 ** * * **** * Entropy 1.1 ** * * **** * (19.3 bits) 0.8 ** * * **** * 0.6 *** ******** ** 0.4 *** ********* ** 0.2 *** ************ 0.0 ---------------- Multilevel CATACCCTTTCCATCG consensus CG G A TA A sequence T -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ---------------- 35370 6 8.37e-08 TATTA CATTCGCGTTCCATCG AAGTCGACAG 42689 150 1.18e-07 ACAAATGAAC CTTTCCCTTTCCAACG GCAACGGAAG 46905 65 1.96e-07 TACTTTGGTA CAAACCCTTTCCTCCG GGGAAAGGGG 45797 71 2.48e-07 AGCAACGGAG CATCCGCTTTCCTTCC TCCAAAATCG 49802 333 4.83e-07 TACACACACA CACACCCATTCCCTCA CATAACAATC 47447 31 5.51e-07 TGTTCGCGCC CATGCCCATTCCTGTG CCTCGAGGAG 44308 376 1.91e-06 CATATCGTGG CACGCACTTTCGAACA TCCATAGTAC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 35370 8.4e-08 5_[+3]_479 42689 1.2e-07 149_[+3]_335 46905 2e-07 64_[+3]_420 45797 2.5e-07 70_[+3]_414 49802 4.8e-07 332_[+3]_152 47447 5.5e-07 30_[+3]_454 44308 1.9e-06 375_[+3]_109 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 3 width=16 seqs=7 35370 ( 6) CATTCGCGTTCCATCG 1 42689 ( 150) CTTTCCCTTTCCAACG 1 46905 ( 65) CAAACCCTTTCCTCCG 1 45797 ( 71) CATCCGCTTTCCTTCC 1 49802 ( 333) CACACCCATTCCCTCA 1 47447 ( 31) CATGCCCATTCCTGTG 1 44308 ( 376) CACGCACTTTCGAACA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 16 n= 4365 bayes= 9.12571 E= 3.4e+001 -945 200 -945 -945 170 -945 -945 -84 -88 19 -945 116 12 -80 31 16 -945 200 -945 -945 -88 119 31 -945 -945 200 -945 -945 12 -945 -69 116 -945 -945 -945 197 -945 -945 -945 197 -945 200 -945 -945 -945 178 -69 -945 70 -80 -945 75 12 -80 -69 75 -945 178 -945 -84 12 -80 130 -945 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 16 nsites= 7 E= 3.4e+001 0.000000 1.000000 0.000000 0.000000 0.857143 0.000000 0.000000 0.142857 0.142857 0.285714 0.000000 0.571429 0.285714 0.142857 0.285714 0.285714 0.000000 1.000000 0.000000 0.000000 0.142857 0.571429 0.285714 0.000000 0.000000 1.000000 0.000000 0.000000 0.285714 0.000000 0.142857 0.571429 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.857143 0.142857 0.000000 0.428571 0.142857 0.000000 0.428571 0.285714 0.142857 0.142857 0.428571 0.000000 0.857143 0.000000 0.142857 0.285714 0.142857 0.571429 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 regular expression -------------------------------------------------------------------------------- CA[TC][AGT]C[CG]C[TA]TTCC[AT][TA]C[GA] -------------------------------------------------------------------------------- Time 2.36 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 47447 2.41e-07 30_[+3(5.51e-07)]_48_[+2(1.88e-06)]_\ 85_[+1(8.51e-06)]_297 47558 3.25e-03 23_[+2(4.58e-05)]_414_\ [+1(6.04e-06)]_39 49802 2.28e-07 15_[+1(1.34e-06)]_47_[+2(1.28e-05)]_\ 246_[+3(4.83e-07)]_152 35370 3.81e-08 5_[+3(8.37e-08)]_72_[+1(3.83e-06)]_\ 165_[+2(3.64e-06)]_218 45797 1.72e-07 70_[+3(2.48e-07)]_101_\ [+2(1.09e-05)]_35_[+1(2.25e-06)]_254 42689 4.39e-10 149_[+3(1.18e-07)]_25_\ [+1(2.17e-07)]_227_[+2(3.62e-07)]_59 46905 1.75e-08 64_[+3(1.96e-07)]_97_[+1(3.20e-07)]_\ 174_[+2(7.96e-06)]_125 44308 2.66e-04 224_[+2(1.81e-05)]_139_\ [+3(1.91e-06)]_109 54466 4.06e-02 258_[+2(1.81e-05)]_230 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 3 reached. ******************************************************************************** CPU: seaotter.hsd1.wa.comcast.net ********************************************************************************