As provided to MNL by Sandy Clifton, Wash U Genome Sequencing. The below report was supplied to the projectÕs advisory group Oct 2006.  An updated report will be prepared as the MNL goes to press after the Maize Meeting, March 2007.

 

 

Overview of original project goals.  These will also be the project deliverables.

 

Project Objectives/Deliverables:

 

1.  Provide the complete sequence and structures of all maize genes and their locations (in linear order) on both the genetic and physical maps of maize. 

 

2.  The gene space of B73 maize (gene sequences and adjacent regulatory regions) should be finished to high quality according to currently acceptable standards

                                                                          

3.  If applicable, the sizes of gaps between the genes should be estimated and draft sequences of repetitive DNA between genes presented where possible. 

 

4.  The sequence will be fully integrated with the genetic and physical maps. 

 

5.  Annotation will include gene models, predicted exon/intron structure, incorporation of EST and full-length cDNA data, gene ontology, and relationship with homologs in other organisms, including but not limited to, the other sequenced plant genomes.

 

6.  Annotation will be coordinated with existing maize community and comparative databases with the eventual goal of generating complete curation of the genomic sequences to a standard set by established model organism databases. 

 

Research Activities and Results.

 

The first 6 months of the year were spent coordinating the various tasks assigned to the participating institutions, Washington University School of Medicine Genome Sequencing Center (GSC), Arizona Genomics Institute (AGI), Cold Spring Harbor Laboratories (CSHL) and Iowa State University (ISU).  The AGI has the responsibility for choosing a minimal tiling path (MTP) of mapped BAC clones and preparing DNA for sequencing in consultation with the GSC, the primary sequencing center.  By yearÕs end, an optimized pipeline for MTP clone selection to BAC library construction was developed.  AGI is on target to process an additional 960 clones/month until the entire maize genome is covered with BACs.  In addition, the first months of the past grant year were spent optimizing procedures and improving computer access and communication among the three Centers (GSC, AGI and CSHL) sharing production and finishing tasks. Bimonthly or monthly conference calls among the three centers involved in the production/finishing procedures have been and are continuing to be used as an effective means of anticipating and dealing with any problems in a timely manner.  A smoothly operating protocol is now in place and functioning well. Through-put numbers are now on track to meet the project goals.  Clone selection has increased to 1,000 clones per month, with library construction, and production sequencing throughput scaling proportionally.

 

The bioinformatics teams at CSHL and Iowa State have been crucial to the development and/or adaptation of software that has allowed the project to move forward in a timely manner.  The largely manual clone picking process at AGI has been improved in cooperation with CHSL by automating the data pipeline and setting up visualization methods using existing GMOD tools, CMap and GBrowse (http://www.gmod.org/node/).

Annotation is also a responsibility of the bioinformatics teams and the first year of this endeavor has focused on the development of protocols that will form the basis of the annotation pipeline. While some analysis protocols can be directly adopted from the Gramene Project (www.gramene.org) with little modification, new approaches are required to address characteristics that are specific to the maize genome, such as repetitive sequences. Repeat classification is also essential for understanding how proliferation of different transposon classes contributed to the expansion of the maize genome.  To classify repeat sequences, we are utilizing the MIPS Repeat Element Catalog (MIPS-REcat), http://mips.gsf.de/proj/plant/webapp/recat/RecatTreeFrameset.jsp). Of course, most users will be interested in the gene space between the repetitive regions.  We are using both ab initio gene prediction and evidence-based gene-build approaches to define protein-coding genes. .  We are also working with Brad Barbazuk (Donald Danforth Plant Science Center) to adopt TWINSCAN software, which is being trained to annotate the maize genome. The staffs of CSHL and GSC have worked closely to develop a standardized GenBank submission record that will be accurately parsed as primary annotation for the sequenced clone.  The Maize Project differs from other clone-by-clone sequencing projects in that most clones will not be sequenced beyond the Phase I level.  Thus most clones will be represented as multiple contigs.  Information on how these contigs are oriented and associated into scaffolds will be essential to users of the genome browser.  We have therefore established methods to encode this information within the GenBank record submitted by GSC, so that it can be conveyed by the CSHL annotation team to the user. The ISU team has been developing a scaffolding approach that makes use of retrotransposons.  Preliminary experiments conducted on maize BACs have been promising.  Using 257 "random" BACs finished by the GSC downloaded from GenBank it was possible using this approach to conservatively obtain 1.5 additional scaffolds per BAC.  The first release of the MaizeSequence Browser at CSHL (www.maizesequence.org) went live on 28 September 2006.  The browser and database infrastructure are powered by Ensembl (http://www.ensembl.org/index.html), which has proven itself highly robust and flexible in the service of genome projects.  The interface provides convenient entry points to the genome, both by searching and browsing, and displays salient features, including predicted genes, markers, repeats, and expressed and conserved regions.  A high priority was to make searching and browsing easy for all members of the maize community, whether geneticist, breeder, or molecular biologist.  Thus entry is currently possible by sequence accession, physical position, genetic position, and by conserved synteny with rice. (http://www.maizesequence.org).  The BLAST search engine will be added in December 2006, and the ability to view annotated BAC clones for the maize project in ContigView will be available in January 2007.   Future plans include a software feature that is an automated notification system allowing end users to "subscribe" to specific regions of the maize genome. The system, leveraged by the annotation pipeline, will notify users when a region of interest has an updated sequence or marker alignment. Another goal is to use Ensembl's Distributed Annotation System (DAS) infrastructure to provide alignments of procured data sets (such as mRNAs and full-length cDNAs). A third feature is the visual integration of the larger-scale FPC view with the more targeted, sequence-based BAC view, that will provide a uniform browsing context.

 

Outreach is an important part of the project. In the past year outreach activities to the maize community, included soliciting maize researchers for preliminary requirements for the maize genome sequence site, coordination with the maize community database MaizeGDB, establishing appropriate contacts with existing and future maize research initiatives and attending annual meetings.  In order to establish browser requirements we solicited maize researchers at CSHL, Missouri, Iowa and the Plant Gene Expression center for feedback on the existing Gramene browser and specific maize requirements.  This was done via phone calls, in person meetings, and email exchanges.   To enhance the existing working relationship with MaizeGDB, Dr. Lawrence and her group spent a day at CSHL reviewing the browser, discussing mechanisms for linking between the project sites, data exchange, and establishing a working model for feedback between the groups.  To make the broader maize community aware of ongoing efforts, Drs. Lawrence and Ware coauthored ÒMGSC: Gramene and MaizeGDB cooperate to provide access to sequences and related dataÓ published as part of the Maize Genetic Cooperation Newsletter volume 80, describing efforts to coordinate on the delivery of the maize sequence to the community.  In order to build upon existing resources we have worked closely with several community members for data integration. These include annotation of maize retrotransposon elements with Drs. Phillip San Miguel and Jeff Bennetzen, the maize optical map with Dr. Schwartz group and gene predictions with Dr. Brad Barbazuk.

 

The outreach program at CSHL is largely focused on developing a website with 3D graphics for high school students and the general public. They have been waiting for enough data to support this activity.  Now that the first large, Òsuper contigÓ is nearing completion, they will begin building the website and incorporating this data.

 

            A Òstandard presentationÓ has been developed to describe the project to other scientists at meetings.  Dr. Wilson and co-PIs have made several presentations to the plant biology community concerning the maize genome sequencing project:

 

1.  The National Corn Growers Association Meeting Action Team, Chesterfield MO, December 2006.

1.  Plant Genomics European Meeting, Venice, Italy, October 2006.

2.  National Academy of Science Workshop on Agricultural Biotechnology for the   Global Public Good, Chennai, India, October 2006.

3.  Plant Genomics in China VII, Harbin, China, August 2006.

4. China Agricultural University, China, August 2006.

5. Monsanto, St. Louis. May 2006

6. Biology of Genomes, Cold Spring Harbor, May 2006

7. Maize Genetics, Asilomar, March 2006

8. Advances in Genome Biology, February 2006

9. Plant and Animal Genomes, San Diego, January 2006

 

The GSC maintains a web site where current progress can be viewed. (http://genome.wustl.edu/genome.cgi?GENOME=Zea%20mays%20mays%20cv.%20B73&SECTION=research

Other related links can be found at http://genome.wustl.edu/genome.cgi?GENOME=Zea%20mays%20mays%20cv.%20B73&SECTION=links