Stanford, California
Stanford University
2001 Report from the Maize Gene Discovery Project --Walbot, V, Hake, S1, Freeling, M2, Schmidt, B3, Smith, L3, Chandler, VL4, Galbraith, D4, Larkins, B4, Sachs, M5, Brendel, V6

1USDA-ARS-PGEC, 2University of California-Berkeley, 3University of California-San Diego, 4University of Arizona, 5Maize Genetics Cooperation Stock Center, 6Iowa State University

The Maize Gene Discovery, DNA Sequencing and Phenotypic Analysis project (NSF DBI-9872657) was initiated in October 1998 during the first year of the Plant Genome Research Program. This large project involves 10 investigators and about 20 staff members. An important lesson from the Maize Gene Discovery Project has been that careful planning and expert knowledge must be combined with flexibility in exploiting research findings and in modifying methods and goals when new data are in hand. We are interested in furthering maize research through cooperation and collaboration with other projects.

New ZmDB Blast Alert Service. ZmDB introduces a new Blast Service - Blast Alert! ZmDB users can subscribe their blast search query sequences by filling out a simple registration form. Users will store their query sequences and blast specifications at ZmDB. An initial blast search will be carried out against the current database and the blast output will be sent to the user by email. For each registered query, a Query-ID is also sent to the user and is required for unsubscribing. If any new blast hits are found whenever ZmDB updates its sequence database, an email alert will be sent to the corresponding user. ZmDB Blast Alert Service can be accessed by clicking the "Search ZmDB" button at the ZmDB front page.

Table 1. Key Components of the Project
Subproject Status Comments
Sequence 50,000 ESTs Currently ~95,000; ~28,000 Tentative Unique Genes Exciting discovery that organs express different members of gene families; >2200 complete cDNAs
ZmDB web site key resource for corn Databases maintained 1. EST assembly 2. Phenotypes of mutants 3. Microarray protocols & data 4. Genomic sequences.
RescueMu site sequencing ~4,000 likely germinal mutations sequenced with goal of 15,000 1 MB per month of gene-enriched maize genomic DNA per month. More than half of the sequences flanking likely germinal insertions of RescueMu are readily identified as genes by matches to ESTs. Only 4% of insertions match retrotransposons.
Phenotypes of Mu-induced mutants On-going in the Sachs, Freeling, Schmidt, Smith and Hake labs Ear, kernel and seedling screening for >20,000 families. Summer 2001 in-depth 3000 adult families screened at the Coop. We plan to repeat the adult screen next year.
Microarraying ESTs 4 array types now, with Unigene set in 2001-2002  Major effort managed by Univ. AZ (David Galbraith, Vicki Chandler). Current arrays cover 11,000 "genes" and the Unigene1.1, 1.2 and 1.3 arrays will cover ~24,000
Postdoctoral training 4 fellows 1 fellow at Stanford (oligo arrays); 1 at Arizona (microarray); 2 fellows at ISU (bioinformatics & intron retention experiments)
Library plates: Immortalized RescueMu Resource for PCR screening to find germinal insertions  Initial plates available from summer 2000 tagging populations with transposed RescueMu elements

EST Sequencing. We have deposited nearly twice the original goal of 50,000 high quality ESTs (majority are phred >40, far above the GenBank minimum of 15 = 1 error per 33 bases for ESTs) mainly from inbreds B73, W23, and Oh43. Average length varies per library and has ranged from 380 bp to 520 bp. EST sequencing will continue during 2001-2002 (fourth project year) toward a new goal of 120,000 ESTs. We have been remarkably fruitful in gene discovery with >28,000 likely genes so far from the complete maize database of >114,000 ESTs. If short (100-200 b) and low quality, mainly pre-1998 ESTs are eliminated, ~25,000 genes are defined by ESTs. We attribute our success to libraries prepared from specific developmental stages of different organs. Our current project is BMS tissue culture cells; this library will be deeply sampled as the initial 1000 clones indicate very high diversity.

Our major findings from EST analysis are [1] that maize alleles are quite polymorphic and [2] that specific gene family members are expressed in different organs. In contrast, when such ESTs share >80-85% similarity over ~100 bases, they show a similar pattern of hybridization on microarrays fabricated with the Maize Gene Discovery ESTs (Fernandes et al., 2002; Y. Cho et al., in preparation). Comparisons of ESTs always have the criticism that insufficient sampling prevents detection of rare transcripts, but for gene families with robust EST support, we often find quite different patterns of expression. For example, TUC (Tentative Unique Contig) 09-07-6344.1 and TUC09-07-7818.1 share >99% sequence similarity to maize cytosolic glyceraldehyde-3-phosphate dehydrogenase genes Gpc3 and Gpc4, respectively. Eleven ESTs derived from the mixed late tassel stages library were from Gpc4 (TUCO9-07-7818.1), but none were found for Gpc3 (TUCO9-07-6344.1). TUC04-05-8646.1 and TUC07-14-6306 both share ~95% nucleotide sequence and high overall similarity to maize 22kDa alpha zein protein. TUC04-05-8646.1 is expressed in seedling root tissue (library 614) as judged by the presence of 20 ESTs, although there are no matches to TUC07-14-6306. In contrast TUC07-14-6306 has ESTs expressed in the early embryo library (687), but there are no TUC04-05-8646.1 ESTs in the embryo group. It seems very unlikely that there has been contamination with endosperm in either the root or embryo libraries because none of the common zeins dominating endosperm expression were found. A third example of possible tissue-specific expression is provided by TUC01-26-861.2 and TUC07-14-6194.1, which share ~89% sequence similarity to a subunit of the vacuolar proton ATPase. TUC07-14-6194.1 is expressed in root with 23 ESTs from library 614, but no ESTs were recovered from any other source. There are no root ESTs in the likely homeologous TUC01-26-861.2. These data convince us that examination of gene-specific expression patterns using tools with greater resolution than cDNA microarrays will be required to properly annotate the transcript distribution of individual genes.

Figure 1. ESTs from 2 cm (618) and 1-2 mm (946) immature tassels overlap by only 14%. More than 5000 ESTs from two stages of tassel development were compared for overlaps of 50 bases, >95% similarity. C = contig; S = singlet.

Maize Gene Discovery Microarrays. Approximately 15 months after starting EST sequencing, coPI Galbraith began distributing microarrays: two identical slides for $150. >600 slides have been distributed as of Nov. 2001. PCR yield of cDNA inserts has been ~90% up to 96%, ensuring excellent coverage of the EST diversity. By using leftover plasmid sequencing templates or the consolidated clones from Stanford, we effectively eliminate "tracking problems" in going from sequencing to microarrays. Thus far four EST projects, ~11,000 genes, are available The first of three Unigene1 slide types with 8,000 genes each will be available soon. The 24,000 Unigene1 array set should be available by December 2002. We estimate that Unigene arrays will contain about half of the expected number of maize genes.

Sequencing Genomic DNA. The second route to gene discovery is sequencing genomic DNA in a strategy that combines gene discovery with a key step in functional genomics, generating an insertional mutant. RescueMu, a 4.7 kb Mu1 derivative with pBluescript and other markers, inserts preferentially into genes and allows direct cloning of flanking maize DNA into E. coli (Raizada et al. 2001). Mutagenized plants are grown in grids of up to 48 rows and 48 columns, DNA is prepared from tissue samples from an entire row or column, and 96 (48 rows + 48 columns) E. coli libraries immortalize the RescueMu insertion sites. Maize inserts of up to 15 kb are amplified by PCR protocol; given the relatively compact size of maize introns (85% are <200 bp) most RescueMu plasmids contain an entire gene and its regulatory regions within the average ~ 6 - 7 kb maize insert. 96-well library plates containing the recovered plasmids organized by rows and columns of the grids are available to the community for PCR screening to find insertions in genes of interest. Heritable (or very rare, early somatic insertions) will be represented in both a row and a column; seed for germinal mutants is obtained from the Maize Coop. To complement ESTs for gene discovery, the Project is sequencing row plasmids (bidirectionally, about 400 bases from each Mu end), with a goal of completing 8 grids, about 15,000 likely germinal insertions and 30 MB of genomic sequences enriched for genes. This genomic sequence will identify genes not yet found by EST sequencing. The genomic DNA also contains presumptive introns and promoter regions, recognized after annotation against ESTs by the ISU staff; all of these results are displayed through ZmDB

In the Protocols section you will find information for efficiency screening of library plates to determine which plant has a likely germinal insertion mutation of interest to you. Also, the sequencing plan does not identify all germinal insertions (by definition these are sequenced multiple times), hence you may find it useful to do a PCR screen to determine if there is both a row and a column insertion in your favorite gene. Also, the library plates each contain a collection of 10 50% of the "gene-rich" part of the corn genome. A somatic plasmid of your favorite gene could contain the regulatory region and other information that interests you. Such plasmids can be found by transforming the DNA samples into E. coli to reconstruct a "gene-enriched" library.

Overcoming Transposon Biology to Make Mutants More Efficiently. We encountered a serious problem in scaling from the few dozen plants analyzed initially to the thousands in early grids. RescueMu transposition out of the transgene arrays was only 10-30%, compared to the ~100% of Mu1. In a massive project coordinated by coPI Vicki Chandler, the team surveyed 1200 RescueMu outcross progeny in January 2000 by DNA blot hybridization to identify individuals with transposed RescueMu but no transgene array. Fourteen individuals had 2 or 3 transposed RescueMu, and these were outcrossed multiple times to generate the tagging populations for summer 2000. In the Stanford 2000 grid, we found 100% RescueMu insertion frequency from the once transposed elements; other grids had 150% tagging efficiency. Now we are testing two smaller RescueMu elements, miniMu (2.2 kb) and midiMu (2.7 kb) in a new "cleaner" Mutator background (multiple transposase-encoding MuDR but few other Mu elements) and in a standard, medium copy Mu line. Native 1.4 - 2.2 kb Mu elements transpose more frequently than larger elements, and the smaller plasmid size will be more amenable to shotgun sequencing approaches.

Phenotype Database and Direct Access to Transposon-Tagged Mutants. Each plant in a RescueMu grid is self-pollinated (or outcrossed if male or female sterile). Because we have systematically selected for the absence of Mutator silencing and new germinal insertions of RescueMu, the forward mutation frequency of our materials is about twice the "best lines" reported by Don Robertson. Eight percent of ears have segregating seed mutations, at the seedling stage15 - 28% of families have a visible mutation, and adult families have nearly a 40% mutation frequency (1154 mutants in 3000 families). CoPI Marty Sachs and his staff at the Maize Coop manage the seed and ear screen, and propagate lines as needed. coPI Mike Freeling performs a detailed seedling phenotype screen at UC-Berkeley. Descriptions in a controlled vocabulary and photos are logged into a ZmDB database designed by UC-SD and ISU. During the summer of 2001 we hosted an adult plant phenotype screen at the Maize Coop. 14 Maize Gene Discovery personnel (including four summer REU students) and community researchers annotated 3000 families of 30 individuals each. All phenotypes found in two or more individuals were logged and checked by two scorers. Most new mutants are caused by standard Mu elements. Nonetheless, RescueMu sequencing identifies likely germinal insertions into many genes, and seed requests based on insertion location have started. Furthermore, our public phenotype database can greatly accelerate maize research. It is difficult and time-consuming to establish effective Mutator populations and to analyze the phenotypes; now, maize researchers can go on line to screen phenotypes: and get seed from the Coop. There are "no strings" attached to seed distribution.

Database of Project Information and Analytical Tools. ZmDB is the gateway to Maize Gene Discovery project data, materials, and analytical tools. CoPI Brendel manages ZmDB, and he and his laboratory have written software for gene identification and splice site prediction. As we relate EST data with genomic sequences obtained from the RescueMu-tagged sites, we face the same annotation problems as encountered in whole genome sequencing efforts. We have devised a number of computational tools that greatly facilitate our work. The SplicePredictor program assigns probabilities to potential splice sites based on species-specific training. (Brendel & Kleffe, 1998; Xing & Brendel, 2000). The GeneSeqer program implements a dynamic programming algorithm for spliced alignment of ESTs or proteins to genomic DNA, with scores optimizing both sequence similarity and splice site probabilites (Usuka et al., 2000; Usuka & Brendel, 2000). An upcoming service will be a registered sequence search conducted by ZmDB staff; individuals can post sequences for 3 months and receive notification when a new EST or RescueMu genomic sequence is a match.

Please Note: Notes submitted to the Maize Genetics Cooperation Newsletter may be cited only with consent of the authors.

Return to the MNL 76 On-Line Index
Return to the Maize Newsletter Index
Return to the Maize Genome Database Page