Diversification of the R2R3 Myb gene family and the segmental allotetraploid origin of the maize genome
--Braun, EL, Grotewold, E

The maize genome is thought to have arisen by the reversion of an ancient polyploid to disomic inheritance (reviewed by White and Doebley, Trends Genet. 14:327-332, 1998). Comparisons of divergence times for specific duplicated loci in maize indicate that they exhibit two different coalescent times, corresponding to divergence times of approximately 11.4 mya (million years ago) and 20.5 mya, suggesting that the ancestor of maize was a segmental allotetraploid (Gaut and Doebley, PNAS 94:6809-6814, 1997). The maize genome duplication is expected to result in the doubling of any gene families present prior to duplication event, suggesting that surveys of large gene families are likely to reveal a signature of the segmental allotetraploid origin of the maize genome.

As a part of a larger survey of the R2R3 Myb gene family in maize (Rabinowicz et al., submitted for publication), we examined recent Myb gene duplications in maize. This survey of R2R3 Myb genes was accomplished by using RT-PCR (reverse transcriptase-polymerase chain reaction) to amplify a short segment of the Myb genes using a pair of degenerate primers corresponding to the conserved DNA recognition helices. The RNA used for RT-PCR was prepared from seedlings and various tissues of maize plants under normal growth conditions. Analysis of these segments is complicated by their limited length (averaging 129 bp) and their biased codon usage (mean GC content of third codon positions is 90%). These factors result in high variance of individual distance estimates and cause most commonly used methods of estimating synonymous distances to significantly underestimate the number of substitutions when highly divergent sequences are compared. However, analysis of the data can reveal general patterns, such as the presence of Myb genes that originated during the maize genome duplication. In fact, one of the gene pairs analyzed by Gaut and Doebley (PNAS 94:6809-6814, 1997) corresponds to the Myb genes encoded by C1 and Pl, indicating that Myb genes were duplicated during the allotetraploid origin of the maize genome. Among the 44 recently duplicated Myb genes identified (Table 1), we found 10 groups of Myb sequences that correspond to Myb genes that are likely to have undergone duplication during the allotetraploid origin of the maize genome. However, we also found indications of additional recent gene duplications and complex patterns of evolution for Myb genes in maize.

Five groups of recently duplicated R2R3 Myb genes have three or more member sequences, clearly indicating the existence of recent Myb gene duplications that do not reflect the maize genome duplication. The largest group of recently duplicated Myb genes identified by this study, group 2 (Table 1), has four additional members based upon the unweighted maximum parsimony (MP) estimate of phylogeny for the Myb genes of maize obtained using amino acid sequences (Rabinowicz et al., submitted for publication). These sequences may correspond to genes that have diverged at a higher rate than other Myb genes, although it is important to note that any accelerated divergence must have occurred at synonymous sites. Indeed, these results suggests that there may be currently unappreciated sources of rate variation at synonymous sites in the maize genome. This rate variation probably does not reflect differences in codon usage, since all four of the divergent sequences exhibit biased codon usage (third codon position GC content ranges from 81.4% to 86%). However, additional sources of rate variation may include factors such as gene conversion resulting in slower than expected divergence between specific Myb genes or differences in the rate at which synonymous mutations accumulate in genes with different chromosomal locations (such as that noted for mammals by Wolfe et al., Nature 337:283-285, 1989).

A total of 26 Myb sequences corresponding to 13 recently duplicated pairs of genes were identified among the 82 R2R3 Myb genes sequenced as a part of the survey performed by Rabinowicz et al. (submitted for publication), including the Myb genes encoded by C1 and Pl. At least 10 of these pairs are likely to reflect duplications that occurred during the maize genome duplication. Two of these groups of sequences, groups 10 and 17 (Table 1) appear to represent very recent divergences that may have occurred after the allotetraploid origin of the maize genome. One pair of sequences, group 11, corresponds to a gene clade that has two additional members based upon the MP estimate of maize Myb gene phylogeny (Rabinowicz et al., submitted for publication). Like the additional sequences that appear to belong to group 2, these sequences may correspond to genes that have diverged at a higher rate than other Myb genes. The degree of codon bias for the divergent sequences does exhibit some variation from that observed for other Myb genes, since IP49 is less biased (third codon position GC content is 69.8%) while IP108 is highly biased (third codon position GC content is 97.7%). However, the absence of a consistent pattern suggests that the divergence of these genes does not reflect their differences in codon bias.

The basis for the maintenance of duplicated genes in organisms has been the subject of substantial debate, since duplicated genes are predicted to exhibit functional redundancy. One possibility is that duplicated genes rapidly establish different patterns of expression, making both genes subject to selection because their activity is necessary in different tissues. In fact, different patterns of gene expression have been noted for duplicated Myb genes, such as the duplicated C1 and Pl genes of maize (Cone et al., Plant Cell 5:1795-1805, 1993). A similar situation was evident for two additional pairs of Myb genes corresponding to groups 7 and 12 from Table 1. However, the remaining pairs exhibit at least some overlap in their expression patterns and two groups (group 9 and group 17 from Table 1) exhibit complete overlap in their expression patterns. Although these data cannot exclude the possibility that subtle differences in expression patterns exist for some of these gene pairs, they are not consistent with the hypothesis that patterns of expression are often altered following gene duplications. Instead, they suggest that patterns of gene expression may exhibit some degree of conservation, at least over relatively short evolutionary time scales.

The R2R3 Myb gene sequences obtained by Rabinowicz et al. (submitted for publication) provide evidence for the existence of duplicated Myb genes in maize that reflect the segmental allotetraploid origin of the maize genome. However, they also provide evidence for additional gene duplications that cannot be explained by the maize genome duplication as well as evidence for unappreciated sources of rate variation at synonymous positions in a subset of maize Myb gene sequences. The availability of these short segments from Myb genes will facilitate future work, such as obtaining full length cDNAs to determine the similarity between their carboxyl-terminal and mapping of these genes to firmly establish that their origin reflects the segmental allotetraploid origin of the maize genome. Furthermore, the recently duplicated Myb genes identified in this study suggest the existence of 18 or fewer gene duplications associated with the duplication of the maize genome, which is substantially lower than that expected if the sampling of maize Myb genes is complete. The excess of Myb genes without closely related paralogues in this dataset suggests that the sampling of Myb genes in maize remains incomplete. Alternatively, duplicated loci that were not detected by this study may have been lost, may not be expressed at detectable levels under the growth conditions examined, or may be obscured by rate differences. Regardless of the specific explanations, it is clear that the Myb genes present in maize have undergone many recent duplications and that the biological basis for these duplications is relatively complex.

Table 1. Recently duplicated R2R3 Myb genes identified in maize a.
 
Group Sequences Ks b Divergence (mya) c
1 Pl, C1 0.0597 4.6
2 d P, IP20, 1C1, IQ68, 1H48, 2H67 0.4233 32.6
3 IF17, IQ32 0.1850 14.2
4 1C4 e, IP59 e 0.1752 13.5
5 3H101, IP126, IP39 0.0996 7.7
6 4H48, IF41, IM66 0.2172 16.7
7 IM16, IP29 0.1408 10.9
8 IP122, IP156 0.2171 16.7
9 HX30, IP148 0.0653 5.0
10 IF50, IP26 0.0309 2.4
11 f IM61 e, IQ26 0.2029 15.6
12 IF45, IP119 0.0630 4.8
13 1C18 e, IF55 e 0.0951 7.3
14 1H9 e, IM65 e, IP47 e 0.2619 20.1
15 IP45, IP71, IP74 0.3710 28.5
16 IF13, IF14 0.1010 7.8
17 g IP19, IP34 0.0000 0.0
18 IP102, IP124 0.1375 10.6

a Recent duplications were identified by screening the Myb sequences for those with uncorrected synonymous distances lesser than 0.3 and uncorrected nonsynonymous distances lesser than 0.1.

b Ks (synonymous distance) of the most divergent comparison. Synonymous distances were calculated by MEGA 1.01 (computer program available from the Institute of Molecular Evolutionary Genetics at the Pennsylvania State University, University Park, PA) using the method of Nei and Gojobori (Mol. Biol. Evol. 3: 418-426, 1986) with the Jukes-Cantor correction for multiple hits. This method will produce underestimates of the synonymous distance for more ancient duplications, due to the codon bias in this dataset. However, the underestimation will be fairly modest for the divergence times considered in this table.

c Divergence time in millions of years before present calculated by assuming that synonymous mutations accumulate at an average rate of 6.5 x 10-9 substitutions per synonymous site per year (see Gaut et al. PNAS 93: 10274-10279, 1996). The sampling variance of individual distance estimates indicates that the coefficient of variation for specific divergence times ranges from approximately 30% to 50%.

d Four additional sequences (1C42, IF25, IF35, IM44) belong to this group based upon the unweighted maximum parsimony estimate of phylogeny. They may represent rapidly evolving sequences.

e These sequences exhibit less extreme codon bias, with less than 80% GC in third codon positions.

f Two additional sequences (IP49, IP108) belong to this group based upon the unweighted maximum parsimony estimate of phylogeny. They may represent rapidly evolving sequences.

g These sequences exhibit 3 nonsynonymous differences in the sequenced region.
 


Please Note: Notes submitted to the Maize Genetics Cooperation Newsletter may be cited only with consent of the authors.

Return to the MNL 73 On-Line Index
Return to the Maize Newsletter Index
Return to the Maize Genome Database Page