The teosinte branched1 (tb1) gene has been cloned (Doebley, Stec and Hubbard, 1998, Nature 386: 485-488), but its complete structure is not known. Because some evolutionary analyses of nucleotide variation in tb1 require a knowledge of the location of transcribed vs. non-transcribed regions, we are analyzing the structure of this gene. Here, we report our current evidence on the structure of tb1. This preliminary evidence was obtained by RT-PCR and by comparison of genomic and cDNA sequences.
The longest known tb1 cDNA is 1306 bp excluding the polyA tail. On northern blots, tb1 hybridizes to a message that is between 1.4 and 1.6 kb in length. Thus, the known 1306 bp cDNA plus a 100 bp polyA tail would be sufficient to account for the 1.4 kb message seen on northern blots. The genomic and cDNA sequences are fully colinear without any intervening sequences (introns). Thus, tb1 may be an intronless gene and the known cDNA could be full length. One concern with this interpretation is that the genomic sequence immediately upstream of the 5’ end of the cDNA lacks an obvious TATA-box. Thus, if this interpretation is correct, tb1 would belong to a class of genes that lack a TATA-box.
Approximately 840 bp upstream of the 5’ end of the cDNA, there is a short open reading frame (ORF) of about 100 bp with an adjacent TATA-box-like motif in the genomic sequence (Fig. 1). To test whether this TATA-box-like motif might be part of the tb1 promoter, we employed RT-PCR using one primer (JD105: gaagaccaactcatctgacc) located in the 100 bp ORF and another (JD82: ccgatctggtagctgagg) within the region covered by the cDNA (Fig. 1). RT-PCR yielded a 196 bp product corresponding to the genomic sequence that flanks a 795 bp intron. The intron is bound by the conserved donor-acceptor (gt...ag) splicing sites. Thus, the tb1 transcript can include the small 100 bp ORF and the TATA-box-like motif just upstream of it may form part of the promoter. If transcription begins about 35 bp downstream of the putative TATA-box, then the predicted length of the transcript (after removal of the intron) would be about 1304 bp plus the polyA tail. This length is long enough to account for the 1.4 to 1.6 kb message seen on northern blots. There are two difficulties with this model. (1) The first ATG, which is only 10 bp from the putative transcription start site, is not in the expected reading frame and would produce a polypeptide of only two amino acids in length. The second ATG, however, is in the expected reading frame. (2) The known cDNA clone extends into the intron, and one would have to infer that this cDNA was derived from an unprocessed message.
While the exact structure of tb1 is uncertain, the RT-PCR result establishes that the transcribed region of the gene can include the 100 bp ORF which is 35 bp downstream of the TATA-box-like motif. Thus, for the purpose of our evolutionary analysis of nucleotide variation in tb1, we have used the position (*) 35 bp downstream of the TATA-box-like motif as the point of division between the 5’ non-transcribed region and the transcript. Additional experiments are under way to resolve the full structure of this gene.
1. Composite nucleotide sequence of tb1 from cDNA and genomic
sequences. Putative exons (upper case), an intron (lower case), intron
splicing sites (underlined), TATA box-like motif (double underlined), primers
used in RT-PCR (arrows, black boxes), 5’ (+) and 3’ (!) ends of the cDNA,
start and stop codons (gray boxes) are shown. For evolutionary analyses
of nucleotide variation, the gene was partitioned into the 5’ non-transcribed
and transcribed regions at 35 bp downstream of the TATA-box-like motif
to the MNL 73 On-Line Index
Return to the Maize Newsletter Index
Return to the Maize Genome Database Page