One of the key developments in the Plant Genome Initiative is the design and implementation of a database and network system for genetic data, analysis of data, and linked access to sequences, clones, biosynthetic pathways, and the like, across species boundaries. In addition to its grants through the Competitive Grants Program of Cooperative States Research Service, the Initiative supports database development through the Agricultural Research Service; both are branches of the U.S. Dept. of Agriculture. A Plant Genome Database is being derived by "Prototype Developers" working in concert, for maize, soybean, wheat, forest trees, and Arabidopsis. The structure will be inclusive of higher plant data and is to be focussed at the National Agricultural Library (NAL). The Newsletter "Probe", available from Plant Genome Data and Information Center, USDA - National Agricultural Library, 10301 Baltimore Blvd. Room 1402, Beltsville, MD 20705-2351, offers coverage of current issues, descriptions of developing programs, and updates on plant genome activities. Expectation is that on-line access to plant genome databases will be available by 1993 in their first implementations. The Maize Genome Database is a developing prototype in this network. This is a report on the development and progress for Maizedb.

In January of 1991 Cooperators (maize geneticists, breeders and molecular biologists) were surveyed for help in defining essential components and structures for the database, eliciting many suggestions, creative ideas, offers of data, and interest in contributing to and participating in the effort. Substantial further input on how to proceed was obtained as a result of presentations of descriptions and potential at the annual Maize Genetics Conference in March, 1991, and in this Newsletter (MNL 65:54). These responses made clear that interest was strong among respondents, that hopes were for the database to be as comprehensive as possible, and that integration with other genome databases was a much needed resource.

In April,1991, an Advisory Group of 18 scientists derived a conceptual framework and a beginning plan for the prototype effort. Consensus has been particularly strong to develop a useful and effective prototype as promptly as might be feasible. The Advisory also defined subcommittees and responsibilities for Nomenclature and Standards, User Needs and Quality Control, Clone Banks, Quantitative Characters and Descriptors, Germplasm Characterization, Prototype Development, and other high-priority data compilation and collection. Evaluations of database projects, through visits by one to several members of the group to Yale University (E. coli), Livermore National Laboratory (human), Welch Library at Johns Hopkins University (human GDB), DuPont (soybean, Arabidopsis, maize), University of Missouri and Washington University (nematodes), Agrigenetics (maize and other crops), and Lawrence Berkeley Laboratory (human), especially guided conceptualization of canonical structures and of planning for User Needs and Quality Control.

The Advisory, which provides continuing guidance and evaluations, includes geneticists from universities, government, and industry, each of whom enthusiastically agreed to help. The membership includes some scientists with extensive experience in developing large relational databases. The members of the Advisory Group are Bill Beavis, Mary Berlyn, Peter Bretting, Ben Burr, Vicki Chandler, Jim Coors, Neil Cowen, Larry Darrah, Tim Helentjaris, Dave Hoisington, Kendall Lamkey, Oliver Nelson, Jean Romero-Severson, Margaret Smith, Charles Stuber, and Scott Tingey.

A Working Group (Ed Coe, Mary Berlyn, Stan Letovsky, Mary Polacco, Marty Sachs, Denis Hancock) began in June, 1991, to build a prototype based on the E. coli design developed by Berlyn with Letovsky. Because the power and the specificity requirements of a relational database require highly structured planning for an interconnecting design, intensive communication between the Yale developers and the Columbia group ensued, which defined and developed conceptualization and refinements for the structure. The initial implementation of the Maizedb prototype was installed in December. Stock data were then imported after a combination of parsing and of manual revision of the stocklist to systematize the information. Site (gene list) data were imported, and in-depth entries of alleles, phenotypes, and products were made in selected components toward evaluations of the implications of the design. The list of reciprocal translocations with breakpoints, and a one-year block of references, were incorporated in February. Refinements of the details and the extent of the data are continuing. The implementation was immediately effective, but most importantly was instructive toward debugging and toward further conceptualization on the design.

Designs and structures among the Prototype Developers are shared and planned in quarterly meetings organized by D. Bigwood of NAL. Rationales of the implementation designs of each (Yale & Missouri for maize; LBL & Ames & Albany for soybean, wheat, and pines; Harvard & Ohio State for Arabidopsis; NAL for the combined prototype) are developing interactively.

The Nomenclature and Standards Committee, chaired by Oliver Nelson, developed recommended revisions for nomenclature and standards that have been presented and discussed at the 1992 Maize Genetics Conference. The new standards will be distributed to Cooperators in the near future.

A compilation of the data and a combined map of RFLP markers in recombinant inbred lines is being done by Ben Burr and Tim Helentjaris; similar data are in preparation for the Immortal F2 at Missouri, and other data from public and company sources are being assembled. An expanded set of recombinant inbreds is being typed by Charles Stuber. When completed, these materials will substantially increase the mapping-data resource and the ability of research workers to evaluate and to use markers and maps.

To provide a data baseline for Germplasm Characterization of selected production materials, priority characterization of 100 elite inbred lines for isozymes and DNA molecular (RFLP) probes is being carried out by Biogenetic Services; reproduction of seed of most of the 'typed' pedigrees has been completed at Missouri, toward placing these characterized strains in repository collections. Data from previous research in ARS and at NC State, with 22 isozymes on over 400 inbred lines, has been obtained and will soon be in the database. Pedigree data for inbred lines and other elite germplasm are expected to be available for incorporation in the near future.

Projects have been initiated to assemble, verify, and compile cytogenetic data and mutant data, to incorporate these data into systematic formats, and to carry out priority research on mapping of cytogenetic sites and mutants. This work is being carried out by Dave Weber at Illinois State University, and by Gerry Neuffer and colleagues at the University of Missouri.

Plans are under development, with the help of Al Kriz at Urbana, to develop a record of the Maize Genetics Cooperation Stock Center materials and to initiate computerized records for this operation.

A demonstration of the Maizedb Prototype was given at the Maize Genetics Conference in Asilomar, in March, 1992. Following up on comments and responses from the demonstration, and on a period of experience with the implementation before and after the demo, requirements have been defined for restructuring and will be carried through as part of the next phase of the implementation. Systematization of information (which will increase its accessibility to a much wider and more diverse research and development community), for genome analysis and manipulation and for resource and utilization requirements, is a conspicuous and desirable result (the fruit of less than a year of effort). The contrast with static files of decayable information, and with the comprehensive but casual organization of data of the past, stands out increasingly as the prototype is refined.

