ALBANY, NEW YORK

State University of New York

Codon usage tables for zein and non-zein genes: an update

--Douglas A. Hamilton and Joseph P. Mascarenhas

In an earlier codon usage table for maize based on 25 nuclear genes (D. M. Bashe and J. P. Mascarenhas, MNL 63: 4-5, 1989) it was found that at least one of the zeins (22 kD family) was not typical of the codon usage profile of other maize genes. With the additional sequences now available we have updated the codon usage table. A total of 56 maize nuclear genes have now been analyzed. This analysis shows that zeins of the 19 kD and 22 kD families exhibit a codon usage pattern that is different from that of the bulk of other nuclear encoded genes. Accordingly we have created two tables, one for the 19 and 22 kD zeins and the second for nuclear genes other than those of the 19 and 22 kD zeins. Codon usage in maize has also been discussed by W. H. Campbell and G. Gowri (Plant Physiol. 92: 1-11, 1990).

Genes were selected from the GenBank database (Release 64, 6/90) on the basis of the presence of a complete coding sequence, either as mRNA or as combined exons from a genomic sequence. Codon usage tables were created by the repetitive addition of tables generated for individual genes by utilizing the Genetics Computer Group (GCG) Sequence Analysis Software (Version 6.2) program "CodonFrequency" (J. Devereux, P. Haeberli and O. Smithies, Nucleic Acids Res. 12: 387-395, 1984). Fourteen sequences for the 19 and 22 kD zein genes and forty two sequences of other nuclear genes were used (Table 1). The results in Table 2 show the number of occurrences of each codon for an amino acid, and the number per 100 codons for that amino acid. It is interesting that the most frequently used codon for the majority of nuclear genes, GAG, is entirely absent in the table for the 19 and 22 kD zein genes.

To test the ability of the tables in distinguishing the correct reading frame of a sequence, and whether the sequence utilized a "19 & 22 kD zein" or "other nuclear gene" codon preference, the tables were used to analyze several maize sequences whose reading frames were known. The results are shown in Table 3. As in our earlier report, the scoring was calculated as follows: for each codon in the sequence a score was assigned as the percent occurrence of that codon divided by the highest possible percent occurrence of any codon for that amino acid, as determined by the table. The scores for all codons in the sequence to be scored were summed and divided by the total number of codons to give a percent similarity to the table. The genes in Table 3 were scored in each of the three reading frames, over the same region, using both of the tables. The first frame is that which has been identified as the coding frame. All nuclear genes tested, as well as zeins of the 15-16 kD classes, are correctly distinguished by the "other nuclear gene" table. In contrast, however, the coding frames of zeins of the 19 and 22 kD classes are not correctly distinguished by a codon usage table made from other nuclear genes, but are distinguished by the "19 & 22 kD zein" table. Note that correct reading frames of two other seed storage proteins tested (MZEGLUT2E and MZEGLB1SA) are also correctly distinguished by the "other nuclear gene" table. The alternate coding preference raises important questions about the evolutionary origin of the 19 and 22 kD families of zein genes.

We thank David Bashe for his program to calculate reading frame scores in Table 3.

Permission of the authors is not required for citing the codon usage tables.

Table 1.  GenBank sequences used in the preparation of the codon usage tables.
GenBank locus Description
"Other Nuclear Genes"
MZEA1G A1 gene for NADPH-dependent reductase
MZEACT1G Actin 1 gene
MZEADH1F Alcohol dehydrogenase (ADH1-F) mRNA
MZEADH1FA Alcohol dehydrogenase (ADH1-1F) gene
MZEADH2NR Alcohol dehydrogenase (ADH2-N) mRNA
MZEALBB32 Albumin b-32 mRNA
MZEALD Aldolase mRNA
MZEANT ATP/ADP translocator mRNA
MZEBRNZW UDP glucose flavonoid glycosyl transferase (Bz-W22)
MZEBRNZZA UDP glucose flavonoid glycosyl transferase (Bz-McC)
MZECAT1I Catalase-1 isoenzyme (cat-1) mRNA
MZECAT3I Catalase-3 isoenzyme (cat-1) mRNA
MZEEG2R Endosperm glutelin-2 protein mRNA
MZEGAPDH Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) mRNA
MZEGGST3B Glutathione-S-transferase (GSTIII) mRNA
MZEGLB1SA Embryo globulin S allele mRNA
MZEGLUT2E Endosperm glutelin-2 gene
MZEGSTI Glutathione-S-transferase I mRNA
MZEGSTIB Glutathione-S-transferase (GST-I) mRNA
MZEH3C2 Histone 3 gene
MZEH3C4 Histone 3 gene
MZEH4C14 Histone 4 gene
MZEH4C7 Histone 4 gene
MZEHSP701,2 Heat shock protein 70, exons 1+2
MZELHCP Light-harvesting chlorophyll a/b binding protein mRNA
MZEMPL3 Major lipid body protein L3 mRNA
MZENAR,1 NADH:nitrate reductase (NR) mRNA (5' + 3' ends)
MZENDMEX NADP-dependent malic enzyme (Me1) mRNA
MZEPCSSU RuBisCo small subunit mRNA
MZEPEPCR Phospohenolpyruvate carboxylase (PEPCase) mRNA
MZEPLTP Phospholipid transfer protein mRNA
MZEPOD Pyruvate, orthophosphate dikinase mRNA
MZERBCS rbcS gene for RuBisCo small subunit
MZEREGG Lc regulatory protein mRNA
MZESOD2A Superoxide dismutase 2 (SOD2) mRNA
MZESOD3I Superoxide dismutase-3 isoenzyme mRNA
MZESUSYSG Sucrose synthase gene (shrunken)
MZETPI1 Triosephosphate isomerase 1, exon 1
MZEWAXY Amyloplast-specific transit protein (waxy locus)
MZEZE15A3 15kD zein
MZEZE15G 15kD zein
MZEZE16 16kD zein

"19 & 22 kD Zein Genes"
MZEI19 19kD zein
MZEZE19A 19kD zein
MZEZE19B1 19kD zein
MZEZE19C1 19kD zein
MZEZE19C2 19kD zein
MZEZE19D1 19kD zein
MZEZE22A 22kD zein
MZEZE22B 22kD zein
MZEZEA20M 19kD zein
MZEZEA30M 19kD zein
MZEZEAZ124 19kD zein
MZEZEPCM1 22kD zein
MZEZEZ4G 22kD zein
MZEZEZG3A 22kD zein

Table 2.  Table of codon usage with the number of occurrences of each codon, and the occurrences per 100 codons fo the same amino acid.
 
Other Nuclear Genes
19 and 22 kD Zein Genes
Amino Acid Codon Used Occurrences % Usage Occurrences % Usage
Arg CGA
CGC
CGG
CGT
AGA
AGG
34
332
134
80
48
231
4
39
16
9
6
27
3
0
4
3
2
25
8
0
11
8
5
68
Leu CTA
CTC
CTG
CTT
TTA
TTG
39
466
515
165
9
104
3
36
40
13
1
8
130
70
79
141
70
157
20
11
12
22
11
24
Ser TCA
TCC
TCG
TCT
AGC
AGT
59
272
174
88
269
45
7
30
19
10
30
5
60
30
18
62
49
13
26
13
8
27
21
6
Thr ACA
ACC
ACG
ACT
77
361
217
117
10
47
28
15
34
39
14
17
33
38
13
16
Pro CCA
CCC
CCG
CCT
159
268
347
144
17
29
38
16
148
56
23
80
48
18
7
26
Ala GCA
GCC
GCG
GCT
147
598
459
311
10
39
30
21
137
99
61
179
29
21
13
38
Gly GGA
GGC
GGG
GGT
151
610
254
211
12
50
21
17
4
14
3
37
7
24
5
64
Val GTA
GTC
GTG
GTT
41
411
500
172
4
37
44
15
24
19
64
20
19
15
50
16
Lys AAA
AAG
97
673
13
87
5
8
38
62
Asn AAC
AAT
392
90
81
19
132
17
89
11
Gln CAA
CAG
67
496
12
88
407
185
69
31
His CAC
CAT
265
96
73
27
15
18
45
55
Glu GAA
GAG
165
780
17
83
15
0
100
0
Asp GAC
GAT
552
176
76
24
6
3
67
33
Tyr TAC
TAT
381
48
89
11
82
24
77
23
Cys TGC
TGT
242
39
86
14
21
15
58
42
Phe TTC
TTT
493
87
85
15
106
63
63
37
Ile ATA
ATC
ATT
42
476
134
6
73
21
35
56
53
24
39
37
Met ATG 391 100 69 100
Trp TGG 187 100 1 100
Term. TAA
TAG
TGA
7
10
24
17
24
59
0
14
0
0
100
0

Table 3.  Scoring of some maize coding regions on the basis of the codon usage tables (highest score in bold print).
 
Score for reading frame based on codon usage for:
(other nuclear genes)
(19 + 22 kD zeins)
Gene tested 1 2 3 1 2 3
MZEA1G 77 55 50 46 50 49
MZEGGST3 77 60 56 48 49 46
MZEGLB1SA 74 62 45 44 49 41
MZEGLUT2E 74 72 57 46 46 66
MZEH3C4 79 58 55 42 54 46
MZELHCP 84 57 48 48 49 46
MZEPEPCR 73 58 46 49 54 49
MZERBCS 89 53 51 54 56 48
MZESUSYSG 81 57 47 52 48 43
MZEZE15G 83 68 50 48 52 50
MZEZE16 79 68 54 43 46 60
MZEI19 47 60 49 76 64 70
MZEZE19A 46 62 48 76 63 67
MZEZE22A 51 65 46 73 68 65
MZEZE22B 49 63 45 71 67 65


Please Note: Notes submitted to the Maize Genetics Cooperation Newsletter may be cited only with consent of the authors

Return to the MNL 65 On-Line Index
Return to the Maize Newsletter Index
Return to the Maize Genome Database Page