******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.10.0 (Release date: Wed May 21 10:35:36 2014 +1000) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= motifs/385/385.seqs.fa ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ 10800 1.0000 500 10927 1.0000 500 10986 1.0000 500 14971 1.0000 500 21237 1.0000 500 21414 1.0000 500 24698 1.0000 500 25426 1.0000 500 25618 1.0000 500 264048 1.0000 500 3001 1.0000 500 38907 1.0000 500 4732 1.0000 500 5121 1.0000 500 7122 1.0000 500 7123 1.0000 500 8217 1.0000 500 9552 1.0000 500 bd917 1.0000 500 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme motifs/385/385.seqs.fa -oc motifs/385 -dna -minw 12 -maxw 21 -nmotifs 3 -maxsize 500000 model: mod= zoops nmotifs= 3 evt= inf object function= E-value of product of p-values width: minw= 12 maxw= 21 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 19 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 global: substring= yes branching= no wbranch= no em: prior= dirichlet b= 0.01 maxiter= 50 distance= 1e-05 data: n= 9500 N= 19 strands: + sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.278 C 0.245 G 0.214 T 0.263 Background letter frequencies (from dataset with add-one prior applied): A 0.278 C 0.245 G 0.214 T 0.263 ******************************************************************************** ******************************************************************************** MOTIF 1 MEME width = 21 sites = 13 llr = 185 E-value = 2.9e-005 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A 22:a42282a:56139426:2 pos.-specific C 72a:676:8:54265:28288 probability G 15:::122::22:22:3::2: matrix T :1::::::::2:21:1112:: bits 2.2 2.0 * 1.8 ** * 1.6 ** * * Relative 1.3 ** * * * * Entropy 1.1 ** *** * * ** (20.5 bits) 0.9 * **** *** * * ** 0.7 * ********* ** * **** 0.4 * ************** **** 0.2 ********************* 0.0 --------------------- Multilevel CGCACCCACACAACCAACACC consensus AA AAA A GCCGA G C A sequence C T G C -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- --------------------- 10800 446 5.86e-11 TGGCACACCC CGCAACCACACCACAACCACC CCACCTTTCT 9552 350 9.60e-10 CAACAAAGAG CACAACCACACAAGCACCACC GTCGTCTACA 38907 267 8.64e-09 GCCTCTTCCT CCCACCAACACCCCCAACTCC GCTCTTGGAA 7122 25 2.20e-08 TATCAAGGAT ACCACGCACAGAACCAGCACC ATGGCAACTA 10986 29 2.20e-08 ACCAACTCCA CGCACACAAACAACCAACCCA CCTTCGTCTC 7123 363 3.75e-08 CGAGCTTCTT CACAACAACATGACAACCACC CGTTGTGTAT 24698 386 8.25e-08 AAAGCACAAC CCCACCCACACAACAATAACA TAAAGCGACA 3001 107 9.94e-08 TCTGTCCATC CGCAACGGCAGGAGGAGCACC GAAGGAGGCA 21414 429 8.63e-07 CAAGGACTGC AGCAACCACAGCAACAACCGA GCCAAGGGGA 25618 185 9.20e-07 TCATGATCTT GGCACACACATCTCGTGCACC AACTATCGCA 8217 451 9.80e-07 CACGTCCCAC CGCACAGAAACATCAAGATCC TCCCTCACAG 21237 115 1.58e-06 CGCCACTGTG AACACCAGCACCCTCAACAGC CTCTGCCCGA 10927 134 2.18e-06 CGCAATGTGA CTCACCCAAATACGGAATCCC GAAACAACTG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 10800 5.9e-11 445_[+1]_34 9552 9.6e-10 349_[+1]_130 38907 8.6e-09 266_[+1]_213 7122 2.2e-08 24_[+1]_455 10986 2.2e-08 28_[+1]_451 7123 3.7e-08 362_[+1]_117 24698 8.3e-08 385_[+1]_94 3001 9.9e-08 106_[+1]_373 21414 8.6e-07 428_[+1]_51 25618 9.2e-07 184_[+1]_295 8217 9.8e-07 450_[+1]_29 21237 1.6e-06 114_[+1]_365 10927 2.2e-06 133_[+1]_346 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=21 seqs=13 10800 ( 446) CGCAACCACACCACAACCACC 1 9552 ( 350) CACAACCACACAAGCACCACC 1 38907 ( 267) CCCACCAACACCCCCAACTCC 1 7122 ( 25) ACCACGCACAGAACCAGCACC 1 10986 ( 29) CGCACACAAACAACCAACCCA 1 7123 ( 363) CACAACAACATGACAACCACC 1 24698 ( 386) CCCACCCACACAACAATAACA 1 3001 ( 107) CGCAACGGCAGGAGGAGCACC 1 21414 ( 429) AGCAACCACAGCAACAACCGA 1 25618 ( 185) GGCACACACATCTCGTGCACC 1 8217 ( 451) CGCACAGAAACATCAAGATCC 1 21237 ( 115) AACACCAGCACCCTCAACAGC 1 10927 ( 134) CTCACCCAAATACGGAATCCC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 21 n= 9120 bayes= 9.98347 E= 2.9e-005 -27 150 -147 -1035 -27 -9 111 -177 -1035 203 -1035 -1035 185 -1035 -1035 -1035 47 133 -1035 -1035 -27 150 -147 -1035 -27 133 -48 -1035 161 -1035 -48 -1035 -27 165 -1035 -1035 185 -1035 -1035 -1035 -1035 113 11 -19 73 65 -48 -1035 115 -9 -1035 -77 -185 133 11 -177 15 91 11 -1035 173 -1035 -1035 -177 47 -9 52 -177 -85 165 -1035 -177 115 -9 -1035 -77 -1035 179 -48 -1035 -27 165 -1035 -1035 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 21 nsites= 13 E= 2.9e-005 0.230769 0.692308 0.076923 0.000000 0.230769 0.230769 0.461538 0.076923 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.384615 0.615385 0.000000 0.000000 0.230769 0.692308 0.076923 0.000000 0.230769 0.615385 0.153846 0.000000 0.846154 0.000000 0.153846 0.000000 0.230769 0.769231 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.538462 0.230769 0.230769 0.461538 0.384615 0.153846 0.000000 0.615385 0.230769 0.000000 0.153846 0.076923 0.615385 0.230769 0.076923 0.307692 0.461538 0.230769 0.000000 0.923077 0.000000 0.000000 0.076923 0.384615 0.230769 0.307692 0.076923 0.153846 0.769231 0.000000 0.076923 0.615385 0.230769 0.000000 0.153846 0.000000 0.846154 0.153846 0.000000 0.230769 0.769231 0.000000 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 regular expression -------------------------------------------------------------------------------- [CA][GAC]CA[CA][CA][CA]A[CA]A[CGT][AC][AC][CG][CAG]A[AGC]C[AC]C[CA] -------------------------------------------------------------------------------- Time 3.47 secs. ******************************************************************************** ******************************************************************************** MOTIF 2 MEME width = 16 sites = 7 llr = 105 E-value = 2.1e+000 ******************************************************************************** -------------------------------------------------------------------------------- Motif 2 Description -------------------------------------------------------------------------------- Simplified A :a3:a1::::16731: pos.-specific C ::::::1::1:3:1:: probability G a:4a:9:793913199 matrix T ::3:::9316:::4:1 bits 2.2 * * 2.0 * * 1.8 ** ** 1.6 ** *** * * ** Relative 1.3 ** ****** * ** Entropy 1.1 ** ****** * * ** (21.6 bits) 0.9 ** ****** * * ** 0.7 ** ********** ** 0.4 ************* ** 0.2 **************** 0.0 ---------------- Multilevel GAGGAGTGGTGAATGG consensus A T G CGA sequence T -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ---------------- 24698 126 2.17e-09 CAATGCAAGG GAAGAGTGGGGAATGG CATCTATGGA 8217 56 1.20e-08 ATATCGTTCC GATGAGTGGTGAGCGG TCGCTCGCCC 14971 193 1.20e-08 CAGTGCACGT GATGAGTTGTGCATGG TGCATGATAT 25618 22 1.28e-07 CTGTAGATTT GAGGAGCGGTGAGTGT GAGAGTCTAA 21414 185 2.07e-07 CATTTTGAAC GAGGAGTTGCAAAAGG CCATGCTCGC 264048 205 2.17e-07 CGAATCTCTT GAGGAATGTTGCAAGG CTGTCTTTGG 21237 281 2.52e-07 CAGAGCCGTT GAAGAGTGGGGGAGAG TGAAACAGCA -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 24698 2.2e-09 125_[+2]_359 8217 1.2e-08 55_[+2]_429 14971 1.2e-08 192_[+2]_292 25618 1.3e-07 21_[+2]_463 21414 2.1e-07 184_[+2]_300 264048 2.2e-07 204_[+2]_280 21237 2.5e-07 280_[+2]_204 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=16 seqs=7 24698 ( 126) GAAGAGTGGGGAATGG 1 8217 ( 56) GATGAGTGGTGAGCGG 1 14971 ( 193) GATGAGTTGTGCATGG 1 25618 ( 22) GAGGAGCGGTGAGTGT 1 21414 ( 185) GAGGAGTTGCAAAAGG 1 264048 ( 205) GAGGAATGTTGCAAGG 1 21237 ( 281) GAAGAGTGGGGGAGAG 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 16 n= 9215 bayes= 9.33972 E= 2.1e+000 -945 -945 222 -945 185 -945 -945 -945 4 -945 100 12 -945 -945 222 -945 185 -945 -945 -945 -96 -945 200 -945 -945 -78 -945 170 -945 -945 174 12 -945 -945 200 -88 -945 -78 42 112 -96 -945 200 -945 104 22 -58 -945 136 -945 42 -945 4 -78 -58 70 -96 -945 200 -945 -945 -945 200 -88 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 16 nsites= 7 E= 2.1e+000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.285714 0.000000 0.428571 0.285714 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.142857 0.000000 0.857143 0.000000 0.000000 0.142857 0.000000 0.857143 0.000000 0.000000 0.714286 0.285714 0.000000 0.000000 0.857143 0.142857 0.000000 0.142857 0.285714 0.571429 0.142857 0.000000 0.857143 0.000000 0.571429 0.285714 0.142857 0.000000 0.714286 0.000000 0.285714 0.000000 0.285714 0.142857 0.142857 0.428571 0.142857 0.000000 0.857143 0.000000 0.000000 0.000000 0.857143 0.142857 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 regular expression -------------------------------------------------------------------------------- GA[GAT]GAGT[GT]G[TG]G[AC][AG][TA]GG -------------------------------------------------------------------------------- Time 6.75 secs. ******************************************************************************** ******************************************************************************** MOTIF 3 MEME width = 12 sites = 5 llr = 75 E-value = 2.2e+000 ******************************************************************************** -------------------------------------------------------------------------------- Motif 3 Description -------------------------------------------------------------------------------- Simplified A :::::::aa2:4 pos.-specific C :::a2::::88: probability G :aa:8aa:::2: matrix T a::::::::::6 bits 2.2 ** ** 2.0 **** ** 1.8 **** **** 1.6 ********* Relative 1.3 *********** Entropy 1.1 *********** (21.5 bits) 0.9 ************ 0.7 ************ 0.4 ************ 0.2 ************ 0.0 ------------ Multilevel TGGCGGGAACCT consensus C AGA sequence -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------ 4732 183 3.53e-08 TCGAAGAAAA TGGCGGGAACCT AGTATCCCTT 21237 313 7.26e-08 AGCAGGTTCT TGGCGGGAACCA AATTTTGCAT 8217 8 1.03e-07 TAGCAAG TGGCGGGAACGT TTCAAGTCTC 10927 463 1.84e-07 GTGGACGAAA TGGCGGGAAACT GCGTATGTAG 5121 167 3.01e-07 ACAGACTCGT TGGCCGGAACCA ACGCTGACGC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 4732 3.5e-08 182_[+3]_306 21237 7.3e-08 312_[+3]_176 8217 1e-07 7_[+3]_481 10927 1.8e-07 462_[+3]_26 5121 3e-07 166_[+3]_322 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 3 width=12 seqs=5 4732 ( 183) TGGCGGGAACCT 1 21237 ( 313) TGGCGGGAACCA 1 8217 ( 8) TGGCGGGAACGT 1 10927 ( 463) TGGCGGGAAACT 1 5121 ( 167) TGGCCGGAACCA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 12 n= 9291 bayes= 11.1106 E= 2.2e+000 -897 -897 -897 193 -897 -897 222 -897 -897 -897 222 -897 -897 203 -897 -897 -897 -29 190 -897 -897 -897 222 -897 -897 -897 222 -897 185 -897 -897 -897 185 -897 -897 -897 -47 170 -897 -897 -897 170 -10 -897 53 -897 -897 119 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 12 nsites= 5 E= 2.2e+000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.200000 0.800000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.200000 0.800000 0.000000 0.000000 0.000000 0.800000 0.200000 0.000000 0.400000 0.000000 0.000000 0.600000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 regular expression -------------------------------------------------------------------------------- TGGC[GC]GGAA[CA][CG][TA] -------------------------------------------------------------------------------- Time 9.82 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 10800 1.31e-06 445_[+1(5.86e-11)]_34 10927 1.20e-06 133_[+1(2.18e-06)]_308_\ [+3(1.84e-07)]_26 10986 1.04e-04 28_[+1(2.20e-08)]_451 14971 3.02e-04 192_[+2(1.20e-08)]_292 21237 1.24e-09 114_[+1(1.58e-06)]_145_\ [+2(2.52e-07)]_16_[+3(7.26e-08)]_176 21414 1.98e-06 184_[+2(2.07e-07)]_228_\ [+1(8.63e-07)]_51 24698 1.19e-08 125_[+2(2.17e-09)]_244_\ [+1(8.25e-08)]_94 25426 8.38e-01 500 25618 4.20e-06 21_[+2(1.28e-07)]_147_\ [+1(9.20e-07)]_295 264048 1.58e-03 204_[+2(2.17e-07)]_280 3001 5.39e-04 106_[+1(9.94e-08)]_373 38907 9.54e-05 266_[+1(8.64e-09)]_213 4732 3.10e-04 182_[+3(3.53e-08)]_306 5121 5.79e-03 166_[+3(3.01e-07)]_322 7122 2.10e-04 24_[+1(2.20e-08)]_455 7123 5.78e-05 76_[+1(2.19e-05)]_265_\ [+1(3.75e-08)]_117 8217 6.48e-11 7_[+3(1.03e-07)]_36_[+2(1.20e-08)]_\ 379_[+1(9.80e-07)]_29 9552 3.03e-05 349_[+1(9.60e-10)]_47_\ [+1(2.40e-05)]_62 bd917 8.29e-01 500 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 3 reached. ******************************************************************************** CPU: seaotter.hsd1.wa.comcast.net ********************************************************************************