Michigan Technological University

Codon bias in maize nuclear genes --Wilbur H. Campbell Considerable interest has recently centered on the codon bias of higher plant nuclear genes. This has partly come about because a much larger number of plant nuclear genes have been sequenced recently. But it has also been realized that there are significant differences between the codon bias of monocots and dicots. This is particularly striking when comparing some large genes which have recently been sequenced from maize, such as the cDNAs for nitrate and nitrite reductases, to their corresponding dicot genes. The maize genes were found to be encoded with a much smaller set of codons than the dicot genes for these enzymes. Furthermore, the codon set used for the maize genes was narrowly biased toward the synonymous codons ending in G and C, while the dicot genes had little bias toward these G+C ending codons and in fact, used all codons for encoding the polypeptides. Murray et al. (Nucl. Acids Res. 17:477, 1989) described the codon usage in 207 plant genes and concluded that it differed between monocots and dicots. We (Campbell & Gowri, Plant Physiol. in press, 1989) have also analyzed codon usage in 100 monocot and 63 dicot genes, which included all data available in GenBank (Release 57) as well as many recently published sequences. Although our total number of genes analyzed appears smaller than the prior study, we included only one example for genes which are represented by gene families (i.e. Cab, RbcS, Zeins etc.) when the members of the family did not differ in codon bias. In addition, we defined a set of preferred codons for each gene by selecting those codons which accounted for 85% of the amino acids encoded in a gene's sequence. This allowed a more compact and perhaps, we hope, a more understandable presentation of codon bias data for a broader audience. In any case, while we also found that dicots and monocots differed in codon usage, our results showed that this difference was not a simple case of the dicots nuclear genes showing less preference for synonymous codons ending in G+C than the monocots. In fact, monocots appear to have two classes of genes with respect to codon usage: those with a strong preference for G+C ending codons and those with less preference for these synonymous codons. When all the data we collected were plotted with gene number versus G+C in the 3rd position of the codon, dicot genes were found to have a modal distribution positioned at about 45%, while monocot genes had a bimodal distribution with nodes at 50% and 95%. A similar plot for maize nuclear genes is presented in Figure 1.

Figure 1. A plot of the number of maize genes versus the G+C percent in the third position of the codons of their respective coding sequences. A 5% G+C window was used. The data were taken from Gowri & Campbell, Plant Physiol. in press, 1989.

This difference in codon usage among maize nuclear genes is clearly illustrated when homologous genes are compared. For example, chloroplastic and cytosolic glyceraldehyde-3-phosphate dehydrogenase (MZEG3PD1 & MZEG3PD2) differ significantly in codon usage with MZEG3PD1 using a total of 39 codons for all amino acids encoded and a preferred set of 29 codons and MZEG3PD2 using a total of 51 codons and a preferred set of 40 codons. A similar difference in codon usage among the catalase genes is also found with MZECAT1 and MZECAT2 having a codon preference like MZEG3PD2, while MZECAT3 resembles MZEG3PD1 in codon usage pattern. The data available on nuclear genes of other monocots show a similar difference, but fewer examples are available to illustrate it using homologous genes. However, these differences in homologous genes are not found when the dicots are analyzed. Thus, it would appear that different mechanisms governing silent mutations in coding sequences of monocots and dicots have been operating during the evolution of these species.

We also noted in our review on codon usage in plant genes that the bimodal pattern of percentage of G+C in the 3rd position of codons for monocot nuclear genes was similar to the pattern observed for human genes (and perhaps other warm-blooded vertebrates). The bimodal distribution of codon usage in human genes has been explained by the finding that the human genome is a mosaic of A+T and G+C rich regions, which have been called isochores (Aota & Ikemura, Nucl. Acid Res. 14:6345, 1986). Wolf et al.(Nature 337:283, 1989) showed that A+T and G+C rich regions of mammalian genomes have different rates of mutation at silent sites, which may account in part for the existence of isochores in the genomes of these organisms. This leads to the question: do isochores exist in the nuclear genome of maize and does this account for the differences in codon usage among maize nuclear genes?

Bernardi and coworkers (Salinas et al., Nucl. Acids Res. 16:4269, 1988; Matassi et al., Nucl. Acids Res. 17:5273, 1989) have analyzed plant nuclear genomes to determine if isochores exist. Originally, they used buoyant density analysis of genomic DNA fragments to compare 3 dicots and 3 monocots, but more recently, they compared 5 dicots and 9 monocots. They concluded that isochores exist in all plant genomes, but that the usual distribution is toward a lower G+C content in the genomes of dicots as compared to monocots. However, the recent study showed a dicot (Oenothera hookeri) with a much higher and a monocot (Allium cepa) with a much lower G+C content. But their analysis of Poaceae indicates that all grasses have a high G+C content, including maize, and these species display evidence for isochores in their nuclear genomes. They suggest that the bimodal distribution of codon usage among monocot genes which have been sequenced is found because these genes come from different regions of their respective genomes with differences in G+C content. Thus, in the future as more maize genes are sequenced and mapped to their chromosomes, it should be possible to relate their G+C content and predict the isochore structure of the maize genome. However, the mechanism underlying the evolution of the maize genome into a mosaic structure of A+T and G+C rich regions is yet to be explained. Furthermore, it is not clear why some plant genomes have evolved toward a higher G+C content relative to others. Finally, it would appear that the differences in codon bias among maize nuclear genes, which probably reflects their genomic environment more than other features such as mRNA stability or translation efficiency, may have little physiological significance.

Please Note: Notes submitted to the Maize Genetics Cooperation Newsletter may be cited only with consent of the authors

Return to the MNL 64 On-Line Index
Return to the Maize Newsletter Index
Return to the Maize Genome Database Page