******************************************************************************** MEME - Motif discovery tool ******************************************************************************** MEME version 4.10.0 (Release date: Wed May 21 10:35:36 2014 +1000) For further information on how to interpret these results or to get a copy of the MEME software please access http://meme.nbcr.net. This file may be used as input to the MAST algorithm for searching sequence databases for matches to groups of motifs. MAST is available for interactive use and downloading at http://meme.nbcr.net. ******************************************************************************** ******************************************************************************** REFERENCE ******************************************************************************** If you use this program in your research, please cite: Timothy L. Bailey and Charles Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994. ******************************************************************************** ******************************************************************************** TRAINING SET ******************************************************************************** DATAFILE= motifs/414/414.seqs.fa ALPHABET= ACGT Sequence name Weight Length Sequence name Weight Length ------------- ------ ------ ------------- ------ ------ 47869 1.0000 500 39785 1.0000 500 49937 1.0000 500 48498 1.0000 500 39232 1.0000 500 39741 1.0000 500 49858 1.0000 500 40651 1.0000 500 49862 1.0000 500 33153 1.0000 500 ******************************************************************************** ******************************************************************************** COMMAND LINE SUMMARY ******************************************************************************** This information can also be useful in the event you wish to report a problem with the MEME software. command: meme motifs/414/414.seqs.fa -oc motifs/414 -dna -minw 12 -maxw 21 -nmotifs 3 -maxsize 500000 model: mod= zoops nmotifs= 3 evt= inf object function= E-value of product of p-values width: minw= 12 maxw= 21 minic= 0.00 width: wg= 11 ws= 1 endgaps= yes nsites: minsites= 2 maxsites= 10 wnsites= 0.8 theta: prob= 1 spmap= uni spfuzz= 0.5 global: substring= yes branching= no wbranch= no em: prior= dirichlet b= 0.01 maxiter= 50 distance= 1e-05 data: n= 5000 N= 10 strands: + sample: seed= 0 seqfrac= 1 Letter frequencies in dataset: A 0.278 C 0.247 G 0.197 T 0.278 Background letter frequencies (from dataset with add-one prior applied): A 0.278 C 0.247 G 0.197 T 0.278 ******************************************************************************** ******************************************************************************** MOTIF 1 MEME width = 16 sites = 5 llr = 78 E-value = 1.4e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 1 Description -------------------------------------------------------------------------------- Simplified A ::28:::282:4:::: pos.-specific C :8222:84:6::4::a probability G a:2:8a2422a64::: matrix T :24:::::::::2aa: bits 2.3 * * * 2.1 * * * * 1.9 * * * *** 1.6 * ** * *** Relative 1.4 * *** * *** Entropy 1.2 ** **** * ** *** (22.4 bits) 0.9 ** **** * ** *** 0.7 ** ************* 0.5 ** ************* 0.2 ** ************* 0.0 ---------------- Multilevel GCTAGGCCACGGCTTC consensus TACC GGGA AG sequence C A G T G -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ---------------- 39741 379 7.73e-10 TCAGTTGAAA GCTAGGCGACGAGTTC CAAAGAGTTC 48498 460 2.56e-08 GGCCGGCATC GCTAGGCCGAGGGTTC AACCGAGAAG 49937 266 4.99e-08 TGCTTGTTCT GCAAGGGAACGGCTTC CCAAATACTC 33153 184 1.31e-07 TTAATGTACT GCGACGCCAGGGTTTC TTGGTTGGGT 39785 88 1.40e-07 CCTTCTCCGC GTCCGGCGACGACTTC GCTTTCGGAG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 39741 7.7e-10 378_[+1]_106 48498 2.6e-08 459_[+1]_25 49937 5e-08 265_[+1]_219 33153 1.3e-07 183_[+1]_301 39785 1.4e-07 87_[+1]_397 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 1 width=16 seqs=5 39741 ( 379) GCTAGGCGACGAGTTC 1 48498 ( 460) GCTAGGCCGAGGGTTC 1 49937 ( 266) GCAAGGGAACGGCTTC 1 33153 ( 184) GCGACGCCAGGGTTTC 1 39785 ( 88) GTCCGGCGACGACTTC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 16 n= 4850 bayes= 10.1721 E= 1.4e+002 -897 -897 234 -897 -897 169 -897 -47 -47 -30 2 52 152 -30 -897 -897 -897 -30 202 -897 -897 -897 234 -897 -897 169 2 -897 -47 70 102 -897 152 -897 2 -897 -47 128 2 -897 -897 -897 234 -897 53 -897 160 -897 -897 70 102 -47 -897 -897 -897 184 -897 -897 -897 184 -897 202 -897 -897 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 16 nsites= 5 E= 1.4e+002 0.000000 0.000000 1.000000 0.000000 0.000000 0.800000 0.000000 0.200000 0.200000 0.200000 0.200000 0.400000 0.800000 0.200000 0.000000 0.000000 0.000000 0.200000 0.800000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.800000 0.200000 0.000000 0.200000 0.400000 0.400000 0.000000 0.800000 0.000000 0.200000 0.000000 0.200000 0.600000 0.200000 0.000000 0.000000 0.000000 1.000000 0.000000 0.400000 0.000000 0.600000 0.000000 0.000000 0.400000 0.400000 0.200000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 1 regular expression -------------------------------------------------------------------------------- G[CT][TACG][AC][GC]G[CG][CGA][AG][CAG]G[GA][CGT]TTC -------------------------------------------------------------------------------- Time 0.89 secs. ******************************************************************************** ******************************************************************************** MOTIF 2 MEME width = 21 sites = 7 llr = 112 E-value = 2.4e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 2 Description -------------------------------------------------------------------------------- Simplified A 7a:76:1:191:3:3:1:1a6 pos.-specific C :::1::1:::::173:374:4 probability G 1:7::1:a91611:::33::: matrix T 1:31497:::39434a3:4:: bits 2.3 * 2.1 * 1.9 * * * * 1.6 * ** * * Relative 1.4 ** * *** * * * Entropy 1.2 ** * *** * * * * * (23.0 bits) 0.9 ** ** *** * * * * ** 0.7 ************ * * * ** 0.5 ************ * * **** 0.2 ************ *** **** 0.0 --------------------- Multilevel AAGAATTGGAGTTCTTCCCAA consensus T T T ATA GGT C sequence C T -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- --------------------- 33153 45 1.18e-10 GATCCATTGT TAGAATTGGAGTTCTTGCCAA TCTCACGTAA 39741 56 5.15e-09 AAACAATCCT AATAATTGGAGTACCTGGAAA CCATCCCTTA 47869 37 1.42e-08 AGCATACACC AAGCATTGGATTATTTTCTAA TTGCTAATTT 39232 93 3.14e-08 CAAGCCTGTC AAGATTTGGAATTTATAGCAC GACGGCTGTG 39785 396 6.73e-08 CGCACCGCCG GAGTATTGGGGTCCCTTCCAC TCCCACACTC 40651 447 7.73e-08 ATCAGCGCCC AAGATTCGAATTGCTTCCTAC ATACGCAAGA 49862 113 1.29e-07 AATCTCTCTA AATATGAGGAGGTCATCCTAA ATTTGGTGAG -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 33153 1.2e-10 44_[+2]_435 39741 5.1e-09 55_[+2]_424 47869 1.4e-08 36_[+2]_443 39232 3.1e-08 92_[+2]_387 39785 6.7e-08 395_[+2]_84 40651 7.7e-08 446_[+2]_33 49862 1.3e-07 112_[+2]_367 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 2 width=21 seqs=7 33153 ( 45) TAGAATTGGAGTTCTTGCCAA 1 39741 ( 56) AATAATTGGAGTACCTGGAAA 1 47869 ( 37) AAGCATTGGATTATTTTCTAA 1 39232 ( 93) AAGATTTGGAATTTATAGCAC 1 39785 ( 396) GAGTATTGGGGTCCCTTCCAC 1 40651 ( 447) AAGATTCGAATTGCTTCCTAC 1 49862 ( 113) AATATGAGGAGGTCATCCTAA 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 21 n= 4800 bayes= 9.263 E= 2.4e+002 136 -945 -46 -96 185 -945 -945 -945 -945 -945 186 4 136 -79 -945 -96 104 -945 -945 62 -945 -945 -46 162 -96 -79 -945 136 -945 -945 234 -945 -96 -945 212 -945 162 -945 -46 -945 -96 -945 153 4 -945 -945 -46 162 4 -79 -46 62 -945 153 -945 4 4 21 -945 62 -945 -945 -945 184 -96 21 53 4 -945 153 53 -945 -96 80 -945 62 185 -945 -945 -945 104 80 -945 -945 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 21 nsites= 7 E= 2.4e+002 0.714286 0.000000 0.142857 0.142857 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.714286 0.285714 0.714286 0.142857 0.000000 0.142857 0.571429 0.000000 0.000000 0.428571 0.000000 0.000000 0.142857 0.857143 0.142857 0.142857 0.000000 0.714286 0.000000 0.000000 1.000000 0.000000 0.142857 0.000000 0.857143 0.000000 0.857143 0.000000 0.142857 0.000000 0.142857 0.000000 0.571429 0.285714 0.000000 0.000000 0.142857 0.857143 0.285714 0.142857 0.142857 0.428571 0.000000 0.714286 0.000000 0.285714 0.285714 0.285714 0.000000 0.428571 0.000000 0.000000 0.000000 1.000000 0.142857 0.285714 0.285714 0.285714 0.000000 0.714286 0.285714 0.000000 0.142857 0.428571 0.000000 0.428571 1.000000 0.000000 0.000000 0.000000 0.571429 0.428571 0.000000 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 2 regular expression -------------------------------------------------------------------------------- AA[GT]A[AT]TTGGA[GT]T[TA][CT][TAC]T[CGT][CG][CT]A[AC] -------------------------------------------------------------------------------- Time 1.78 secs. ******************************************************************************** ******************************************************************************** MOTIF 3 MEME width = 12 sites = 5 llr = 65 E-value = 3.1e+002 ******************************************************************************** -------------------------------------------------------------------------------- Motif 3 Description -------------------------------------------------------------------------------- Simplified A 2:::a22::::2 pos.-specific C 8:2a:462aa:4 probability G :a8:::28::a4 matrix T :::::4:::::: bits 2.3 * * 2.1 * * *** 1.9 * ** *** 1.6 **** **** Relative 1.4 **** **** Entropy 1.2 ***** **** (18.6 bits) 0.9 ***** **** 0.7 ***** ****** 0.5 ************ 0.2 ************ 0.0 ------------ Multilevel CGGCACCGCCGC consensus A C TAC G sequence AG A -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 sites sorted by position p-value -------------------------------------------------------------------------------- Sequence name Start P-value Site ------------- ----- --------- ------------ 48498 384 1.26e-07 ACCGAAGTTT CGGCACGGCCGG GAGTAAAATT 49858 474 5.11e-07 CAACCAATAA CGCCATCGCCGG CTCCAGAGTT 39785 261 6.72e-07 TCCGACCTTG CGGCAACGCCGA TTTCCACTGT 49937 153 7.39e-07 CACACACAAA CGGCATCCCCGC AAAAGCCCGT 39741 312 1.77e-06 TGCCTCCTCC AGGCACAGCCGC AGAAGTTTTC -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 block diagrams -------------------------------------------------------------------------------- SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 48498 1.3e-07 383_[+3]_105 49858 5.1e-07 473_[+3]_15 39785 6.7e-07 260_[+3]_228 49937 7.4e-07 152_[+3]_336 39741 1.8e-06 311_[+3]_177 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 in BLOCKS format -------------------------------------------------------------------------------- BL MOTIF 3 width=12 seqs=5 48498 ( 384) CGGCACGGCCGG 1 49858 ( 474) CGCCATCGCCGG 1 39785 ( 261) CGGCAACGCCGA 1 49937 ( 153) CGGCATCCCCGC 1 39741 ( 312) AGGCACAGCCGC 1 // -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific scoring matrix -------------------------------------------------------------------------------- log-odds matrix: alength= 4 w= 12 n= 4890 bayes= 9.36591 E= 3.1e+002 -47 169 -897 -897 -897 -897 234 -897 -897 -30 202 -897 -897 202 -897 -897 185 -897 -897 -897 -47 70 -897 52 -47 128 2 -897 -897 -30 202 -897 -897 202 -897 -897 -897 202 -897 -897 -897 -897 234 -897 -47 70 102 -897 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 position-specific probability matrix -------------------------------------------------------------------------------- letter-probability matrix: alength= 4 w= 12 nsites= 5 E= 3.1e+002 0.200000 0.800000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.200000 0.800000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.200000 0.400000 0.000000 0.400000 0.200000 0.600000 0.200000 0.000000 0.000000 0.200000 0.800000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.200000 0.400000 0.400000 0.000000 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Motif 3 regular expression -------------------------------------------------------------------------------- [CA]G[GC]CA[CTA][CAG][GC]CCG[CGA] -------------------------------------------------------------------------------- Time 2.66 secs. ******************************************************************************** ******************************************************************************** SUMMARY OF MOTIFS ******************************************************************************** -------------------------------------------------------------------------------- Combined block diagrams: non-overlapping sites with p-value < 0.0001 -------------------------------------------------------------------------------- SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM ------------- ---------------- ------------- 47869 2.89e-04 36_[+2(1.42e-08)]_443 39785 3.03e-10 87_[+1(1.40e-07)]_157_\ [+3(6.72e-07)]_123_[+2(6.73e-08)]_84 49937 1.56e-06 152_[+3(7.39e-07)]_101_\ [+1(4.99e-08)]_219 48498 1.64e-07 383_[+3(1.26e-07)]_64_\ [+1(2.56e-08)]_25 39232 1.91e-04 92_[+2(3.14e-08)]_387 39741 5.14e-13 55_[+2(5.15e-09)]_235_\ [+3(1.77e-06)]_55_[+1(7.73e-10)]_106 49858 6.27e-03 473_[+3(5.11e-07)]_15 40651 1.23e-03 446_[+2(7.73e-08)]_33 49862 2.77e-03 112_[+2(1.29e-07)]_367 33153 1.03e-09 44_[+2(1.18e-10)]_118_\ [+1(1.31e-07)]_301 -------------------------------------------------------------------------------- ******************************************************************************** ******************************************************************************** Stopped because nmotifs = 3 reached. ******************************************************************************** CPU: seaotter.hsd1.wa.comcast.net ********************************************************************************