******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.10.0 (Release date: Wed May 21 10:35:36 2014 +1000) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= motifs/163/163.seqs.fa ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ 21248 1.0000 500 23311 1.0000 500 23645 1.0000 500 261433 1.0000 500 264407 1.0000 500 26545 1.0000 500 26702 1.0000 500 270027 1.0000 500 35499 1.0000 500 9770 1.0000 500 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme motifs/163/163.seqs.fa -oc motifs/163 -dna -minw 12 -maxw 21 -nmotifs 3 -maxsize 500000 model: mod= zoops nmotifs= 3 evt= inf object function= E-value of product of p-values width: minw= 12 maxw= 21 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 10 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 global: substring= yes branching= no wbranch= no em: prior= dirichlet b= 0.01 maxiter= 50 distance= 1e-05 data: n= 5000 N= 10 strands: + sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.268 C 0.226 G 0.239 T 0.267 Background letter frequencies (from dataset with add-one prior applied): A 0.268 C 0.226 G 0.239 T 0.267 ******************************************************************************** ******************************************************************************** MOTIF 1 MEME width = 12 sites = 8 llr = 94 E-value = 2.5e+000 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A ::::1::8:895 pos.-specific C 11:a::9:13:4 probability G ::a::a:18::1 matrix T 99::9:111:1: bits 2.1 ** * 1.9 ** * 1.7 ** * 1.5 ** ** Relative 1.3 ******* * Entropy 1.1 ******* *** (17.0 bits) 0.9 *********** 0.6 ************ 0.4 ************ 0.2 ************ 0.0 ------------ Multilevel TTGCTGCAGAAA consensus C C sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------ 23311 113 6.85e-08 GGAGGGATGT TTGCTGCAGAAA GTGCTTTGAT 270027 156 1.26e-07 GCGCTTCACA TTGCTGCAGAAC GAGGACATTT 35499 242 6.55e-07 CTTGGATCCG TCGCTGCAGAAA GTGCTGAGTG 21248 425 1.94e-06 TAATTACAAC TTGCTGCACCAC TTAACAACTC 264407 430 3.57e-06 GCTGCCAATG CTGCTGCTGAAA AGTCAGAAGC 26545 2 4.05e-06 G TTGCTGCGTAAA AGTTTGTTCT 26702 208 6.82e-06 AGAGAAGCGA TTGCAGCAGATC AGAATCACCA 9770 399 7.51e-06 AAATGACATT TTGCTGTAGCAG ACGTCTCAAA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 23311 6.8e-08 112_[+1]_376 270027 1.3e-07 155_[+1]_333 35499 6.6e-07 241_[+1]_247 21248 1.9e-06 424_[+1]_64 264407 3.6e-06 429_[+1]_59 26545 4e-06 1_[+1]_487 26702 6.8e-06 207_[+1]_281 9770 7.5e-06 398_[+1]_90 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=12 seqs=8 23311 ( 113) TTGCTGCAGAAA 1 270027 ( 156) TTGCTGCAGAAC 1 35499 ( 242) TCGCTGCAGAAA 1 21248 ( 425) TTGCTGCACCAC 1 264407 ( 430) CTGCTGCTGAAA 1 26545 ( 2) TTGCTGCGTAAA 1 26702 ( 208) TTGCAGCAGATC 1 9770 ( 399) TTGCTGTAGCAG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 12 n= 4890 bayes= 9.25326 E= 2.5e+000 -965 -85 -965 171 -965 -85 -965 171 -965 -965 206 -965 -965 214 -965 -965 -110 -965 -965 171 -965 -965 206 -965 -965 195 -965 -109 148 -965 -93 -109 -965 -85 165 -109 148 15 -965 -965 171 -965 -965 -109 90 73 -93 -965 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 12 nsites= 8 E= 2.5e+000 0.000000 0.125000 0.000000 0.875000 0.000000 0.125000 0.000000 0.875000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.125000 0.000000 0.000000 0.875000 0.000000 0.000000 1.000000 0.000000 0.000000 0.875000 0.000000 0.125000 0.750000 0.000000 0.125000 0.125000 0.000000 0.125000 0.750000 0.125000 0.750000 0.250000 0.000000 0.000000 0.875000 0.000000 0.000000 0.125000 0.500000 0.375000 0.125000 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 regular expression -------------------------------------------------------------------------------- TTGCTGCAG[AC]A[AC] -------------------------------------------------------------------------------- Time 1.16 secs. ******************************************************************************** ******************************************************************************** MOTIF 2 MEME width = 20 sites = 10 llr = 128 E-value = 6.3e+001 ******************************************************************************** -------------------------------------------------------------------------------- Motif 2 Description -------------------------------------------------------------------------------- Simplified A 21:::74512543173:2:3 pos.-specific C 6:9192536:353826a447 probability G 26::11:12:2::::::1:: matrix T :319::1118:14111:36: bits 2.1 * 1.9 * 1.7 * * * 1.5 *** * Relative 1.3 *** * * * Entropy 1.1 *** * * * ** (18.5 bits) 0.9 **** * **** ** 0.6 ******* * * **** ** 0.4 ******* ********* ** 0.2 ******************** 0.0 -------------------- Multilevel CGCTCACACTACTCACCCTC consensus AT CACGACAA CA TCA sequence G G C A -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- -------------------- 261433 478 1.21e-09 ACATTATATC CTCTCACAGTACCCACCTTC ACC 23645 90 2.74e-08 TTTTGCTTTA GGCTCACCTTACTCAACCTC TCGAGATACA 23311 444 2.74e-08 ACCTACCTGA CGCTCAAACTCAAAACCTCC GTCGATCGTT 270027 479 8.56e-08 AACAAAATCA CGCTCGCCCAAACCACCACC AC 26702 137 2.14e-07 AACAGGCAGG CGTTGAAACTCACCACCCTC GGAGGCGAAG 9770 413 2.35e-07 TGTAGCAGAC GTCTCAAACTACTCCACGCC TTCGAGTGCT 35499 477 1.75e-06 ATACACACAT ATCTCATACACATCAACCCA CACC 26545 467 3.06e-06 CTCTTCATCG CACTCACGGTGCACCTCCTC CTCTCAATGA 264407 384 7.27e-06 AGAAACAAAA CGCCCCCTATATTCACCTTA CCAAGCAAAA 21248 81 9.09e-06 GACTTAATTC AGCTCCACCTGCATTCCATA CGAGTTCAGG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 261433 1.2e-09 477_[+2]_3 23645 2.7e-08 89_[+2]_391 23311 2.7e-08 443_[+2]_37 270027 8.6e-08 478_[+2]_2 26702 2.1e-07 136_[+2]_344 9770 2.4e-07 412_[+2]_68 35499 1.8e-06 476_[+2]_4 26545 3.1e-06 466_[+2]_14 264407 7.3e-06 383_[+2]_97 21248 9.1e-06 80_[+2]_400 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=20 seqs=10 261433 ( 478) CTCTCACAGTACCCACCTTC 1 23645 ( 90) GGCTCACCTTACTCAACCTC 1 23311 ( 444) CGCTCAAACTCAAAACCTCC 1 270027 ( 479) CGCTCGCCCAAACCACCACC 1 26702 ( 137) CGTTGAAACTCACCACCCTC 1 9770 ( 413) GTCTCAAACTACTCCACGCC 1 35499 ( 477) ATCTCATACACATCAACCCA 1 26545 ( 467) CACTCACGGTGCACCTCCTC 1 264407 ( 384) CGCCCCCTATATTCACCTTA 1 21248 ( 81) AGCTCCACCTGCATTCCATA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 20 n= 4810 bayes= 8.90689 E= 6.3e+001 -42 141 -26 -997 -142 -997 133 17 -997 199 -997 -142 -997 -117 -997 175 -997 199 -125 -997 139 -18 -125 -997 58 114 -997 -142 90 41 -125 -142 -142 141 -26 -142 -42 -997 -997 158 90 41 -26 -997 58 114 -997 -142 16 41 -997 58 -142 182 -997 -142 139 -18 -997 -142 16 141 -997 -142 -997 214 -997 -997 -42 82 -125 17 -997 82 -997 117 16 163 -997 -997 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 20 nsites= 10 E= 6.3e+001 0.200000 0.600000 0.200000 0.000000 0.100000 0.000000 0.600000 0.300000 0.000000 0.900000 0.000000 0.100000 0.000000 0.100000 0.000000 0.900000 0.000000 0.900000 0.100000 0.000000 0.700000 0.200000 0.100000 0.000000 0.400000 0.500000 0.000000 0.100000 0.500000 0.300000 0.100000 0.100000 0.100000 0.600000 0.200000 0.100000 0.200000 0.000000 0.000000 0.800000 0.500000 0.300000 0.200000 0.000000 0.400000 0.500000 0.000000 0.100000 0.300000 0.300000 0.000000 0.400000 0.100000 0.800000 0.000000 0.100000 0.700000 0.200000 0.000000 0.100000 0.300000 0.600000 0.000000 0.100000 0.000000 1.000000 0.000000 0.000000 0.200000 0.400000 0.100000 0.300000 0.000000 0.400000 0.000000 0.600000 0.300000 0.700000 0.000000 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 regular expression -------------------------------------------------------------------------------- [CAG][GT]CTC[AC][CA][AC][CG][TA][ACG][CA][TAC]C[AC][CA]C[CTA][TC][CA] -------------------------------------------------------------------------------- Time 2.31 secs. ******************************************************************************** ******************************************************************************** MOTIF 3 MEME width = 18 sites = 9 llr = 115 E-value = 2.1e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 3 Description -------------------------------------------------------------------------------- Simplified A ::::29::::4229111: pos.-specific C ::2:::1122211::2:1 probability G 37:a8:8982327:3379 matrix T 738::11::6:4:1632: bits 2.1 * 1.9 * 1.7 * 1.5 * * * * * Relative 1.3 **** ** * * Entropy 1.1 ********* * * (18.5 bits) 0.9 ********* ** ** 0.6 ********** *** ** 0.4 *********** *** ** 0.2 ****************** 0.0 ------------------ Multilevel TGTGGAGGGTATGATGGG consensus GTC A CCGAA GTT sequence GCG C -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------------ 9770 100 5.03e-10 ATTGTGGAAG TGTGGAGGGTAAGATTGG AAAGCAACAG 23311 18 9.53e-08 GTGTGAAGAA TTTGGAGGGGGTGTTGGG AGTTTTTGGA 261433 319 1.78e-07 ATACAAATGT TGCGGAGGCCGGGATCGG GAGGGGACGA 35499 135 4.55e-07 CGTTTATATT GGTGGAGGGGGTGAAGGC AGCTGCGCTG 270027 80 5.87e-07 TGAAGGTGTT GGCGAAGGGTATGAGGAG GACAAACAAA 264407 185 1.20e-06 TTTTAGGGCA GTTGGATGGTACAATTGG AAGCACTCCC 23645 171 1.49e-06 ATCCAAAGTT TGTGATGGGTATAATCTG ACTTTTGTGC 26545 266 1.71e-06 GCGACGATCT TGTGGAGCGTCGCAGAGG TGATAGGGGT 26702 475 4.40e-06 AATCATTGAC TTTGGACGCCCAGAGTTG CACCCTCC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 9770 5e-10 99_[+3]_383 23311 9.5e-08 17_[+3]_465 261433 1.8e-07 318_[+3]_164 35499 4.6e-07 134_[+3]_348 270027 5.9e-07 79_[+3]_403 264407 1.2e-06 184_[+3]_298 23645 1.5e-06 170_[+3]_312 26545 1.7e-06 265_[+3]_217 26702 4.4e-06 474_[+3]_8 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 3 width=18 seqs=9 9770 ( 100) TGTGGAGGGTAAGATTGG 1 23311 ( 18) TTTGGAGGGGGTGTTGGG 1 261433 ( 319) TGCGGAGGCCGGGATCGG 1 35499 ( 135) GGTGGAGGGGGTGAAGGC 1 270027 ( 80) GGCGAAGGGTATGAGGAG 1 264407 ( 185) GTTGGATGGTACAATTGG 1 23645 ( 171) TGTGATGGGTATAATCTG 1 26545 ( 266) TGTGGAGCGTCGCAGAGG 1 26702 ( 475) TTTGGACGCCCAGAGTTG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 18 n= 4830 bayes= 9.19973 E= 2.1e+002 -982 -982 48 132 -982 -982 148 32 -982 -2 -982 154 -982 -982 206 -982 -27 -982 170 -982 173 -982 -982 -126 -982 -102 170 -126 -982 -102 189 -982 -982 -2 170 -982 -982 -2 -10 105 73 -2 48 -982 -27 -102 -10 73 -27 -102 148 -982 173 -982 -982 -126 -127 -982 48 105 -127 -2 48 32 -127 -982 148 -27 -982 -102 189 -982 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 18 nsites= 9 E= 2.1e+002 0.000000 0.000000 0.333333 0.666667 0.000000 0.000000 0.666667 0.333333 0.000000 0.222222 0.000000 0.777778 0.000000 0.000000 1.000000 0.000000 0.222222 0.000000 0.777778 0.000000 0.888889 0.000000 0.000000 0.111111 0.000000 0.111111 0.777778 0.111111 0.000000 0.111111 0.888889 0.000000 0.000000 0.222222 0.777778 0.000000 0.000000 0.222222 0.222222 0.555556 0.444444 0.222222 0.333333 0.000000 0.222222 0.111111 0.222222 0.444444 0.222222 0.111111 0.666667 0.000000 0.888889 0.000000 0.000000 0.111111 0.111111 0.000000 0.333333 0.555556 0.111111 0.222222 0.333333 0.333333 0.111111 0.000000 0.666667 0.222222 0.000000 0.111111 0.888889 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 regular expression -------------------------------------------------------------------------------- [TG][GT][TC]G[GA]AGG[GC][TCG][AGC][TAG][GA]A[TG][GTC][GT]G -------------------------------------------------------------------------------- Time 3.36 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 21248 2.93e-04 80_[+2(9.09e-06)]_324_\ [+1(1.94e-06)]_64 23311 1.08e-11 17_[+3(9.53e-08)]_77_[+1(6.85e-08)]_\ 319_[+2(2.74e-08)]_37 23645 5.02e-07 89_[+2(2.74e-08)]_61_[+3(1.49e-06)]_\ 312 261433 7.96e-09 318_[+3(1.78e-07)]_141_\ [+2(1.21e-09)]_3 264407 7.39e-07 184_[+3(1.20e-06)]_181_\ [+2(7.27e-06)]_26_[+1(3.57e-06)]_59 26545 5.24e-07 1_[+1(4.05e-06)]_252_[+3(1.71e-06)]_\ 183_[+2(3.06e-06)]_14 26702 1.77e-07 136_[+2(2.14e-07)]_51_\ [+1(6.82e-06)]_255_[+3(4.40e-06)]_8 270027 3.02e-10 79_[+3(5.87e-07)]_58_[+1(1.26e-07)]_\ 311_[+2(8.56e-08)]_2 35499 1.79e-08 56_[+3(7.25e-05)]_60_[+3(4.55e-07)]_\ 89_[+1(6.55e-07)]_223_[+2(1.75e-06)]_4 9770 4.82e-11 99_[+3(5.03e-10)]_153_\ [+2(3.62e-05)]_108_[+1(7.51e-06)]_2_[+2(2.35e-07)]_68 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 3 reached. ******************************************************************************** CPU: seaotter.hsd1.wa.comcast.net ********************************************************************************