Background: Plant databases are rapidly expanding in number, size and complexity. These information-rich databases face the challenge of accurately and consistently documenting features such as gene structures, products and functions, phenotypes, traits, developmental stages and anatomical parts besides other information. It will be increasingly desirable for inter-database queries to be performed between these plant-based databases to exploit comparative genomic strategies to elucidate functional aspects of plant biology. However, terms used to describe comparable objects in each database are sometimes quite variable and limit the ability to accurately and successfully query information in and across different databases. One solution to this problem involves the development and application of ontologies of structured controlled vocabularies.
What is an ontology? An ontology is a classification methodology for formalizing a subject’s knowledge in a structured way (typically for consumption by an electronic database). Dictionaries and encyclopedias are examples of ontologies, as are many web-based entities, such as Yahoo and Excite, and so is the schema for a database. A more formal definition of ontology is available from: http://www-ksl.stanford.edu/onto-std /mailarchive/0136.html. In the world of structured information, ontologies, comprising controlled vocabularies, play a very important role in facilitating information retrieval. Furthermore, the definitions that accompany the controlled vocabulary terms facilitate the consistent use of the controlled vocabulary terms in database curation.
In biology-based ontologies the controlled vocabulary terms are arranged in an ontology in such a way that their placement reflects the known or putative biological associations between the objects represented by the controlled vocabulary terms. Consequently, considerable effort must be invested into the compilation of the controlled vocabulary terms and the definitions of these terms and the correct design of the ontologies which comprise these controlled vocabularies. If the ontology of the relationships between the terms is incorrect, the information retrieved via a database search is likely to be incorrect or useless. The converse is also likely to be true. The same applies to the controlled vocabulary terms used, although the use of synonyms can overcome difficulties which occur through the use of local terms which lack wider/international recognition.
While it is relatively easy to design an ontology based on concrete facts such as names, birthdates etc., it is considerably more difficult to design an ontology based on knowledge that as yet does not have unanimous support or which is not yet well understood. However, the Plant OntologyTM Consortium is attempting to develop various plant ontologies that will represent our current and future understanding of relationships amongst various plant-based knowledge domains. These ontologies would provide shared, common vocabularies of defined terms to describe various knowledge domains in plant-based databases. The relationships between elements (represented by controlled vocabulary terms) within and between ontologies would be represented by the use of Directed Acyclic Graphs (DAGs). A DAG is similar to a hierarchical structure but with the ability to have more than one ‘parent’ for an element in the hierarchy. DAGs are able to represent biological relationships more readily than typical hierarchical structures.
Some of the structured controlled vocabularies being developed would be generic enough to facilitate inter-database queries for related organisms (e.g. monocots and dicots).
Other ontologies would be taxon-specific but would still be able to be interrogated for inter-taxon comparisons. For example, an inter-database query for phenotypes involving the inflorescence should produce 'tassel' and 'ear' phenotypes in maize (Zea mays) and comparable inflorescence phenotypes in Arabidopsis. The relevant associated genomic information can then be obtained from each database for further analysis.
There is little doubt that the plant sciences community, world-wide, could benefit from having controlled vocabularies of terms arranged in ontologies. Furthermore, that the use of these controlled vocabularies would contribute towards consistent data curation and so contribute to the information management needs of the plant sciences.
The Plant OntologyTM (PO) Consortium is extending a paradigm developed by the Gene OntologyTM Consortium. The Gene OntologyTM (GO) Consortium (http://www.geneontology.org) has been developing ontologies and associated controlled vocabularies for several years. The objective of the GO consortium has been the development of ontologies and controlled vocabularies for three knowledge domains: the molecular function, biological process and cellular component of gene products. These ontologies are being developed for a generic eukaryotic cell. Several research groups have been annotating their databases according to the controlled vocabularies contained in the ontologies produced by the GO consortium.
The Plant OntologyTM consortium is a collaboration between representatives of model organism databases and currently comprises the following participants: Gramene (A Comparative Mapping Resource for Grains - http://www.gramene.org/; the International Rice Research Institute (IRRI - http://www.irri.org /Index.htm) associated with The International Crop Information System (ICIS) database (http://www.cgiar.org/icis/); MaizeDB (http://www.agron.missouri.edu/) and the Maize Mapping Project (http://www.cafnr.missouri.edu/mmp/); The Arabidopsis Infor-mation Resource (TAIR - http://www.arabidopsis.org/). These collaborations are focusing on using and extending the GO paradigm to the very pressing need for ontology and controlled vocabulary development for plant-based databases. The GO paradigm is effectively described in the General Documentation at http://www.geneontology.org/doc/GO.doc.html and in Gene Ontology: tool for the unification of biology, The Gene Ontology Consortium (2000) Nature Genet. 25:25-29 (http://www. geneontology.org/GO_nature_genetics_2000.pdf). Database representatives of other plant-based databases (e.g. Medicago truncatula, soybean (Glycine max), Phaseolus vulgaris, cassava (Manihot esculenta) will be invited to become involved in the collaborative efforts of the Plant OntologyTM Consortium.
A website for the Plant OntologyTM Consortium is to be developed. The URL will be: www.plantontology.org.
Relevant entities in MaizeDB (http://www.agron.missouri. edu/) will be annotated with Plant OntologyTM I.D. numbers (PO:id) as these become available and annotation is implemented.
Further information on the Plant OntologyTM Consortium can be obtained
from Leszek Vincent (Leszek@missouri.edu).
Return to the MNL 76 On-Line Index
Return to the Maize Newsletter Index
Return to the Maize Genome Database Page