Purdue University
Assessment of gene content, colinearity and evolution in barley, maize, rice, sorghum and wheat --Bennetzen, JL In 1999, the Plant Genome Program (PGP) at the US National Science Foundation funded a two-year project to investigate the structural relationships of orthologous regions of several grass genomes. I was the PI on this project, and the co-PIs were Dr. Jorge Dubcovsky (UC Davis), Andris Kleinhofs (Washington State Univ.), Phillip SanMiguel (Purdue University), and Bruno Sobral (National Center for Genome Resources). The goals of this project were to sequence a total of twenty bacterial artificial chromosomes (BACs), containing inserts from five orthologous regions of five different grass genomes. The original genomes chosen were barley, maize, rice, sorghum and wheat because they are important grasses with known phylogenetic relationships. At the time of the proposal, some of the BAC libraries were of unproven quality, so we could not be sure that all 25 possible BACs would be recovered. Hence, our proposal to sequence only 20. The proposal was funded at 100% of the requested support.

The chief justification for the proposal related to our nearly complete deficiency in understanding the composition and organization of most grass genomes. Thus, more comprehensive studies of grass genome structure and/or evolution could not be proposed in a reasonable manner until the basal characteristics of these genomes were identified. Moreover, these studies would give a first general impression of the natures, lineages and rates of genome rearrangement in the 50 or so million years of grass diversification from a common ancestor.

At the same time that this proposal was funded, we received word that a similar proposal focused exclusively on maize/sorghum comparisons would be funded by the NSF PGP. This project, "Colinearity of Maize and Sorghum at the DNA Sequence Level" (PI, Dr. Jo Messing), is described elsewhere in this Newsletter issue. Because the depth of the Messing et al. proposal was so much greater than our two-year proposal, we decided to de-emphasize maize and sorghum. Hence, in our final project, we have sequenced six barley, one maize (the Wx1 region), five rice, two sorghum and eight wheat BACs. We also sequenced one pearl millet BAC, from the Wx1-orthologous region, the first large stretch of DNA sequenced from that genome, as part of a collaboration with Dr. Katrien Devos (John Innes Centre). In total, we sequenced 23 BACs from six orthologous regions, for a total of about 3 Mb of completed sequence. These results amount to about 120% of the proposed goals of the project.

Most of the results of this project have not yet been published, although the sequences have been downloaded to GenBank. One paper (Dubcovsky, J et al., Plant Physiol 125:1342, 2001) has been published, two more are in press, an additional four have been submitted, and several others are in preparation. The results of these studies have been quite interesting, and would require several pages of text to summarize. However, I will try to point out a few major observations in the next two paragraphs.

All of the large grass genomes that we have investigated by BAC sequence analysis (barley, maize, pearl millet and wheat) have relatively high gene density compared to their genome size, but still much lower than the gene densities observed for rice and sorghum (about one gene per 8 kb). The greater size of orthologous regions in the large genomes is largely caused by the presence of retrotransposon insertions. Each of the large genomes often has these elements inserted as nested series, although the degree varies between regions and between species. Most retrotransposons appear to be fairly recent insertions (within the last few million years) in each genome, and we find very few cases where the same element is identified at the same location in even our most closely related species (barley and wheat).

Comparisons of gene composition and arrangement between the selected areas has been complicated by the fact that gene-finding procedures are far from perfect. Many small genes may be missed, while pieces of mobile DNAs are often misidentified as genes. Despite these constraints, we have been able to use highly conservative analyses to make an estimate of the degree of conserved gene content and order between the six grasses that we have investigated. The somewhat surprising result is that we see a high frequency of small genic rearrangements. Small inversions encompassing one or a few genes have been seen, while duplication or deletion of tandem gene family members are also frequent events. In maize, the deletion of one or a few genes is fairly common. This loss of genes may be tolerated in maize because of its polyploid origin. Most surprising, translocations of single genes or a few adjacent genes to different chromosomes also appear to be fairly routine events, at least from an evolutionary perspective. In four regions that we have studied in this and other projects, we see that only 80-90% of the same genes are found in the same region in comparisons of maize with rice or maize with sorghum. Even more dramatic, because of the numerous small rearrangements, we see different adjacent genes in 20-60% of the pairwise gene comparisons between maize and rice or between maize and sorghum. We often see better colinearity when comparing rice with sorghum, despite their more ancient divergence, probably because they are closer to true diploids. These genic rearrangement results demonstrate that rice and sorghum will be poor surrogate species for such technologies as chromosome walking. Hence, to understand the genic composition and order of the entire maize genome, we will need to sequence the maize genome.

Perhaps the most important outcome of this project has been the resultant synergy in data generation and (especially) analysis that was gained by assembly of a diverse set of highly trained young scientists. Dr. Wusirika Ramakrishna (Purdue Univ.) was in charge of the overall sequencing project, with exceptional collaborative contributions from John Emberton, Dr. Jianxin Ma, Matt Ogden, Dr. Yong-Jin Park, and Dr. Yinan Yuan at Purdue University. Dr. Nils Rostoks (Washington State Univ.) led all of the barley studies, while Drs. Carlos Busso and Liuling Yan took the lead on the wheat genome analyses. Dr. Bryan Shiloff, working with Drs. Callum Bell and Bill Beavis at NCGR, developed some important annotation tools for the project, and was a vital consultant on all of our genome analyses. In future years, I am sure that this crew of outstanding scientists will continue as highly productive plant geneticists.

Please Note: Notes submitted to the Maize Genetics Cooperation Newsletter may be cited only with consent of the authors.

Return to the MNL 76 On-Line Index
Return to the Maize Newsletter Index
Return to the Maize Genome Database Page