As
provided to MNL by Sandy Clifton, Wash U Genome Sequencing. The below report
was supplied to the projectÕs advisory group Oct 2006. An updated report will be prepared as
the MNL goes to press after the Maize Meeting, March 2007.
Overview
of original project goals. These
will also be the project deliverables.
Project
Objectives/Deliverables:
1. Provide
the complete sequence and structures of all maize genes and their locations (in
linear order) on both the genetic and physical maps of maize.
2. The gene space of B73 maize (gene sequences and
adjacent regulatory regions) should be finished to high quality according to
currently acceptable standards
3. If applicable, the sizes of gaps between the genes should be
estimated and draft sequences of repetitive DNA between genes presented where
possible.
4. The sequence will be fully integrated with the genetic and
physical maps.
5. Annotation will include gene models, predicted exon/intron
structure, incorporation of EST and full-length cDNA data, gene ontology, and
relationship with homologs in other organisms, including but not limited to,
the other sequenced plant genomes.
6. Annotation will be coordinated with existing maize community
and comparative databases with the eventual goal of generating complete
curation of the genomic sequences to a standard set by established model
organism databases.
Research Activities and Results.
The first 6 months of the year were
spent coordinating the various tasks assigned to the participating
institutions, Washington University School of Medicine Genome Sequencing Center
(GSC), Arizona Genomics Institute (AGI), Cold Spring Harbor Laboratories (CSHL)
and Iowa State University (ISU).
The AGI has the responsibility for choosing a minimal tiling path (MTP)
of mapped BAC clones and preparing DNA for sequencing in consultation with the
GSC, the primary sequencing center.
By yearÕs end, an optimized pipeline for MTP clone selection to BAC
library construction was developed.
AGI is on target to process an additional 960 clones/month until the
entire maize genome is covered with BACs.
In addition, the first months of the past grant year were spent
optimizing procedures and improving computer access and communication among the
three Centers (GSC, AGI and CSHL) sharing production and finishing tasks.
Bimonthly or monthly conference calls among the three centers involved in the
production/finishing procedures have been and are continuing to be used as an effective
means of anticipating and dealing with any problems in a timely manner. A smoothly operating protocol is now in
place and functioning well. Through-put numbers are now on track to meet the
project goals. Clone selection has
increased to 1,000 clones per month, with library construction, and production
sequencing throughput scaling proportionally.
The bioinformatics teams at CSHL and
Iowa State have been crucial to the development and/or adaptation of software
that has allowed the project to move forward in a timely manner. The largely manual clone picking
process at AGI has been improved in cooperation with CHSL by automating the
data pipeline and setting up visualization methods using existing GMOD tools,
CMap and GBrowse (http://www.gmod.org/node/).
Annotation
is also a responsibility of the bioinformatics teams and the first year of this
endeavor has focused on the development of protocols that will form the basis
of the annotation pipeline. While some analysis protocols can be directly
adopted from the Gramene Project (www.gramene.org) with little modification, new
approaches are required to address characteristics that are specific to the
maize genome, such as repetitive sequences. Repeat classification is also
essential for understanding how proliferation of different transposon classes
contributed to the expansion of the maize genome. To classify repeat sequences, we are utilizing the MIPS
Repeat Element Catalog (MIPS-REcat), http://mips.gsf.de/proj/plant/webapp/recat/RecatTreeFrameset.jsp). Of course, most users will be
interested in the gene space between the repetitive regions. We are using both ab initio gene prediction and evidence-based
gene-build approaches to define protein-coding genes. . We are also working with Brad Barbazuk
(Donald Danforth Plant Science Center) to adopt TWINSCAN software, which is being
trained to annotate the maize genome. The staffs of CSHL and GSC have worked
closely to develop a standardized GenBank submission record that will be
accurately parsed as primary annotation for the sequenced clone. The Maize Project differs from other
clone-by-clone sequencing projects in that most clones will not be sequenced
beyond the Phase I level. Thus
most clones will be represented as multiple contigs. Information on how these contigs are oriented and associated
into scaffolds will be essential to users of the genome browser. We have therefore established methods
to encode this information within the GenBank record submitted by GSC, so that
it can be conveyed by the CSHL annotation team to the user. The ISU team has been developing a scaffolding approach
that makes use of retrotransposons.
Preliminary experiments conducted on maize BACs have been
promising. Using 257
"random" BACs finished by the GSC downloaded from GenBank it was
possible using this approach to conservatively obtain 1.5 additional scaffolds
per BAC. The first release
of the MaizeSequence Browser at CSHL (www.maizesequence.org) went live on 28 September
2006. The browser and database
infrastructure are powered by Ensembl (http://www.ensembl.org/index.html), which has proven itself highly
robust and flexible in the service of genome projects. The interface provides convenient entry
points to the genome, both by searching and browsing, and displays salient
features, including predicted genes, markers, repeats, and expressed and
conserved regions. A high priority
was to make searching and browsing easy for all members of the maize community,
whether geneticist, breeder, or molecular biologist. Thus entry is currently possible by sequence accession,
physical position, genetic position, and by conserved synteny with rice. (http://www.maizesequence.org). The BLAST search engine will be
added in December 2006, and the ability to view
annotated BAC clones for the maize project in ContigView will be available in
January 2007. Future plans
include a software feature that is an automated notification system
allowing end users to "subscribe" to specific regions of the maize
genome. The system, leveraged by the annotation pipeline, will notify users
when a region of interest has an updated sequence or marker alignment. Another
goal is to use Ensembl's Distributed Annotation System (DAS) infrastructure to
provide alignments of procured data sets (such as mRNAs and full-length cDNAs).
A third feature is the visual integration of the larger-scale FPC view with the
more targeted, sequence-based BAC view, that will provide a uniform browsing
context.
Outreach is
an important part of the project. In the past year outreach activities to the
maize community, included soliciting maize researchers for preliminary
requirements for the maize genome sequence site, coordination with the maize
community database MaizeGDB, establishing appropriate contacts with existing
and future maize research initiatives and attending annual meetings. In order to establish browser
requirements we solicited maize researchers at CSHL, Missouri, Iowa and the
Plant Gene Expression center for feedback on the existing Gramene browser and
specific maize requirements. This
was done via phone calls, in person meetings, and email exchanges. To enhance the existing working
relationship with MaizeGDB, Dr. Lawrence and her group spent a day at CSHL
reviewing the browser, discussing mechanisms for linking between the project
sites, data exchange, and establishing a working model for feedback between the
groups. To make the broader maize
community aware of ongoing efforts, Drs. Lawrence and Ware coauthored ÒMGSC:
Gramene and MaizeGDB cooperate to provide access to sequences and related dataÓ
published as part of the Maize Genetic Cooperation Newsletter volume 80,
describing efforts to coordinate on the delivery of the maize sequence to the
community. In order to build upon
existing resources we have worked closely with several community members for
data integration. These include annotation of maize retrotransposon elements
with Drs. Phillip San Miguel and Jeff Bennetzen, the maize optical map with Dr.
Schwartz group and gene predictions with Dr. Brad Barbazuk.
The outreach program at CSHL is largely focused on
developing a website with 3D graphics for high school students and the general
public. They have been waiting for enough data to support this activity. Now that the first large, Òsuper
contigÓ is nearing completion, they will begin building the website and
incorporating this data.
A
Òstandard presentationÓ has been developed to describe the project to other
scientists at meetings. Dr. Wilson and co-PIs have made several presentations to
the plant biology community concerning the maize genome sequencing project:
1. The National Corn Growers Association
Meeting Action Team, Chesterfield MO, December 2006.
1. Plant Genomics European Meeting,
Venice, Italy, October 2006.
2. National Academy of Science Workshop on
Agricultural Biotechnology for the
Global Public Good, Chennai, India, October 2006.
3. Plant Genomics in China VII, Harbin,
China, August 2006.
4.
China Agricultural University, China, August 2006.
5.
Monsanto, St. Louis. May 2006
6.
Biology of Genomes, Cold Spring Harbor, May 2006
7.
Maize Genetics, Asilomar, March 2006
8.
Advances in Genome Biology, February 2006
9.
Plant and Animal Genomes, San Diego, January 2006
The GSC maintains a web site where current
progress can be viewed. (http://genome.wustl.edu/genome.cgi?GENOME=Zea%20mays%20mays%20cv.%20B73&SECTION=research
Other
related links can be found at http://genome.wustl.edu/genome.cgi?GENOME=Zea%20mays%20mays%20cv.%20B73&SECTION=links