AS DOWNLOADED FROM GRAMENE SITE: http://www.gramene.org/documentation/Alignment_docs/cornsensus.html The documentation here refers to file: UnigeneToRice.txt, stored in this directory. /* This documents the processing of zea mays Clusters and ESTs to the rice genome. Lenny Teytelman Sun Dec 8 11:30:43 2002 */ The BACs/PACs are from the GenBank Entrez Nucleotide query: "Oryza [ORGN] AND (30000 [SLEN]:250000 [SLEN]) AND ((htg [KYWD] OR BAC [ALL] OR chromosome [TITL] OR PAC [ALL]) NOT (marker [TITL] OR cDNA [TITL] OR mRNA [TITL] OR RAPD [TITL] OR GSS [KYWD] OR telomere [TITL] OR protein[TITL]))" for BACs, a nd The Dupont Unigene set is from http://www.agron.missouri.edu/files_dl/MMP/Cornse nsus.fasta The average Clusters and ESTs length is 1,000. 3,420 sequences were compared to 10,678 Clusters and ESTs using BLAT with minId entity=50. The 22,785 BLAT hits were filtered using pslReps utility with -sing leHit parameter. This resulted in 9,162 alignments. The lengths of the matches are distributed as follows: Length of hits Count -------- ------ 0-100 596 100-150 776 150-200 899 200-250 930 250-300 886 300-350 815 350-400 724 400-450 541 450-500 462 500-550 362 550-600 296 600-650 270 650-700 224 700-750 188 750-800 163 >800 1030 Removing matches with less than 150bp match-length leaves 7,770 hits. Many of the Clusters and ESTs hit more than once. The distribution of the hit f requencies is: # Of Hits per Feature Count ------- ----- 01 5725 02 894 03 73 04 8 06 1 Clusters and ESTs that hit more than three times are removed, with 7,732 hits re maining. These matches have the following distribution of the percent identity per hit: % Identity Count ---------- ---------- 43 1 53 2 55 1 58 1 60 1 61 1 62 2 63 1 64 2 65 2 66 7 67 5 68 3 69 7 70 18 71 18 72 13 73 21 74 35 75 33 76 63 77 64 78 104 79 119 80 176 81 195 82 296 83 354 84 457 85 586 86 731 87 820 88 896 89 870 90 708 91 485 92 303 93 168 94 69 95 54 96 19 97 13 98 8 The distribution of the sequenced clone gaps is: Bac Gap Length Count ------ ---------- 01000 5000 02000 1458 03000 715 04000 262 05000 117 06000 39 07000 30 08000 23 09000 15 10000 12 20000 31 30000 9 40000 8 50000 4 60000 7 70000 1 >90000 1 The hits represent 6,692 unique Clusters and ESTs (62% of the total 10,678) and 2,503 sequenced clones (73% of the 3,420). Those having at least one gap of le ngth 50 or above, are considered multi-exon hits. 5,888 are multi-exon and 817 are single-exon hits. ------------------------------------------------------------------------ gramene@gramene.org top of page Copyright statement Feedback Last modified: Thu Jan 23 17:03:12 2003