MaizeDB has been busy integrating data from the published literature and from the new plant genome projects. WWW sites of interest to cooperators are summarized on p. iv of this Newsletter. While our goal remains to provide access to a comprehensive genome resource, we are engaging in interoperability with other data repositories such as the NCBI databases, and other maize and plant genome databases. The focus is genetically defined loci (17558 records) and maps (874) complete with documentation and functional annotation. Documentation includes the tools used, notably genetic stocks and source germplasm (21,534) and probes (151,579,aka markers, clones, primers), sources and availability, map scores (8610) and recombination data (1950). Functional annotation includes agronomic traits (565), phenotypic variations (1001), locus expression and properties, and gene products (1359). Literature citations (62,465), including authors, with addresses (5523), are considered key documentation. In this report we present the highlights, a brief summary of Maize Conference 2001 feedback, a report on recent data types, and a table summarizing interoperability status with some major external database repositories.

Highlights this year:

major feedback from the maize genetics community, posted on the homepage
redesign of home page, featuring easier access routes in a central location
full text search of the entire www site (Google)
comparative map graphic utility in collaboration with the Rice Genome Program, Tsukuba Japan
user accesses up 60% over last year, at 8000 accesses/day.

Note: these accesses were via MaizeDB services (browse utilities, forms, full text searches). They do not include the new Google site search (an additional 25%) nor indexing activities by Internet robots and spiders.

Feedback synopsis.

We are delighted to report the success of the Maize Genetics Executive Committee Chair, Jeff Bennetzen, in eliciting a wide response from the community at the 2001 Maize Genetics Conference at Lake Geneva, WI. Thank you Jeff! We have in the past relied on our own sense, based on interactions with other genome and related databases on the Internet and a steady trickle of advice from the community. The Maize Conference 2001 feedback indicated an interest in retrieving data by graphical map displays (15% of responses), in comparative genomics (30%), in retrieval of map information for a particular nucleotide sequence (30%) and generally in easier, friendlier access (40%). We have made a start on addressing community wishes, (see highlights above). We will soon be implementing a graphical view of the genetically anchored BACs, and, additionally, a BLAST utility that returns map information in a custom report.

We are quite concerned that many of you find the MaizeDB interface difficult to navigate. We hope changes instituted this year will help, as well as those in progress. Consider our new utilities and rearrangings as first drafts, ready to be polished, embellished or surgically transformed based on your ongoing inputs and our resources. Especially try Google first for efficient searches.

Summary of recent datatypes.

SSR markers. Simple Sequence Repeats. Currently we represent data for 1735 SSR, for which 590 were discovered in public cDNA sequences. Data have been integrated from the Maize Mapping Project (US), Maize Mapping Consortium (EU), Pioneer, NC State, and the Brookhaven National Laboratory Acemaz. We are in contact with the Doebley maize evolution project and with CIMMYT regarding their SSR diversity data. Dynamic data summaries, organized by bin locations, and detailed map snapshots are provided by the SSR link in the database sidebar. A comparative map tool, featured in a central box on our home page, permits dynamic comparison of map coordinates for SSR mapped in distinct populations, as well as with other major maize maps.

ESTs, Expressed Sequence Tags: cDNA sequences most often deposited in the dbEST division of GenBank (NCBI, National Center for Biotechnology Information). Currently 112,582 cDNA GenBank accessions for some 66,996 clones are represented in MaizeDB and are linked to maps and other genetic and genomic information. Note that Google searches find all accessions. Of these 1,172 have genetic map locations.

EST Data flow: NCBI (GenBank) regularly sends files to MaizeDB with new ESTs or updates. MaizeDB processes these and imports information about each clone, the library, the source and availability. The universal sequence accessions are used to form links to GenBank and to ZmDB (for clone submissions from Stanford.) Information needed to create links to ZmDB contigs and to the TIGR gene index is extracted using the ZmDB table-maker or downloaded from an ftp site (TIGR), processed and updated several times/year.

EST Data Access: Zea mays-specific BLAST searches of all entries in dbEST are currently supported at NCBI and ZmDB. Mapped clones are accessible by the MaizeDB Probe Browser, an alphabetical tabulation, which can be delimited by bin location(s); it lists map coordinates and has dynamic links to selected MaizeDB pages (images, probe details), as well as to GenBank, ZmDB and TIGR. The first few letters of the Probe name reflect source; acronyms relevant to ESTs include: csu, California State University, isu, Iowa State University, std, Stanford University. One exception, EST clones developed by Tim Helentjaris, but mapped to a uaz probed site, are named by plate location, and begin with 1C, 2C, 5C, 6C or 7C. A complete list of institutional acronyms is provided on-line with the suggested guidelines for nomenclature, www.agron.missouri.edu/maize_nomenclature.html, and under locus names. With permission of Ginny Walbot, we mirror a set of trace files for the Stanford sequences, converted to various formats by Deverie Bongard-Pierce (Mass. General Hospital).

Unigene Overgo BAC (Bacterial Artificial Chromosome) Anchors.

Currently 4300 Unigene-Overgo primer pairs are represented in MaizeDB, complete with links to public sequences included in the assembly. These data were provided by the partnership of Dupont and Incyte Genomics with the Maize Mapping Project. We anticipate an additional 6000 Unigenes to be added summer 2001. Of the initial 4300 Unigenes, over 2200 have been anchored to either an EcoRI and/or HindIII BAC clone (B73 libraries) available from CUGI. These probes, of type 'Overgo', are listed with the public sequence accessions and clones that contribute to the Unigene. This process builds on the EST dataflow described. While DuPont EST and genomic data per se are not supplied to MaizeDB, DuPont EST and genomic data contribute (a) to refine the public consensus sequence, (b) to the assembly, often a collapse of multiple public assemblies, and (c) to mask repeat sequences. In MaizeDB, the assembly is linked to appropriate contributing (public) sequences, to map coordinates inherited from a public sequence, and to the corresponding ZmDB-computed Unigene clone. Of note, some 450,000 BAC clones, from 3 public libraries are in queue at CUGI for contig assignment, based on fingerprint and marker data. CUGI updates the BAC contig computation approximately monthly and incorporates marker data submissions processed at MaizeDB.

Access to marker data currently is via Google, by focused searches on the Probe form, or by exploring the CUGI site. In process: (a) tabular summaries with query access and overview; (b) a physical/genetic map graphical display; (c) a BLAST utility that will return genetic map information and links to relevant external databases, including CUGI, TIGR, and ZmDB.

MTM Mutator Stocks.

Of new genetic Stocks represented in MaizeDB, there are 8,436 from the Cold Spring Harbor resource , which include 24 kernel phenotypes. Files were provided by MTM on request by MaizeDB, and phenotypic descriptors have been harmonized with MaizeDB listings, largely provided by Gerry Neuffer and the Stock Center. To view phenotypes, and their MTM Stocks, see the MaizeDB side bar, What' s New, and scroll to July 2000. If Mutator is known to be 'On' or 'Off' this is part of the Stock Name.

External Database Interoperability
Database1 Data retrieved2 MaizeDB data linked3 Links4
GenBank (EMBL,DDBJ)5 Sequences Variation, Probes, Loci 116,980
PubMed Abstracts, full text, links out Reference 2,805
ZmDB5 Sequences, contigs, clones Probe 135,369
TIGR Gene index, paralogs Probe 50,466
CUGI5 BAC contigs, clones Probe 68,545
RiceGenes Rice maps Probe 367
GrainGenes Triticeae maps Probe 287
SwissProt5 Protein sequence, function Gene Product 448
Enzyme Reactions, sequences, pathways Gene Product 329
ProSite Motif sequence and function Term (Protein Feature) 1035
GRIN5 Germplasm evaluations, other Stocks 3348
MTM Descriptions, images Stocks 8436

1Database www sites are provided on p.iv of this Newsletter.
2Data retrieved from the external database from the MaizeDB-created link
3Entities or data classes in MaizeDB with links to the external database listed
4Number of distinct accessions in MaizeDB for a given site. Thus multiple links of a GenBank accession to EMBL or DDBJ are counted once. Similarly, two distinct records, for example a Variation and a Probe with a link to the same GenBank accession, are also counted once.
5These databases have reciprocal links with MaizeDB.

Mary Polacco
Ed Coe
July 2001

