A novel paradigm for cell and molecule interaction ontology: from the CMM model to IMGT-ONTOLOGY

Background Biology is moving fast toward the virtuous circle of other disciplines: from data to quantitative modeling and back to data. Models are usually developed by mathematicians, physicists, and computer scientists to translate qualitative or semi-quantitative biological knowledge into a quantitative approach. To eliminate semantic confusion between biology and other disciplines, it is necessary to have a list of the most important and frequently used concepts coherently defined. Results We propose a novel paradigm for generating new concepts for an ontology, starting from model rather than developing a database. We apply that approach to generate concepts for cell and molecule interaction starting from an agent based model. This effort provides a solid infrastructure that is useful to overcome the semantic ambiguities that arise between biologists and mathematicians, physicists, and computer scientists, when they interact in a multidisciplinary field. Conclusions This effort represents the first attempt at linking molecule ontology with cell ontology, in IMGT-ONTOLOGY, the well established ontology in immunogenetics and immunoinformatics, and a paradigm for life science biology. With the increasing use of models in biology and medicine, the need to link different levels, from molecules to cells to tissues and organs, is increasingly important.


Introduction
Biology is a knowledge-based discipline. Many predictions and interpretations of biological data are made by comparing the data against existing knowledge. Traditionally, the knowledge base in biology has resided within the heads of experienced scientists who have devoted much study and became experts in their particular domain. This approach worked well in the past, when considerable effort was needed to tease a few new data out of biological experiments. However, this situation is changing rapidly, and biology is moving fast toward the virtuous circle of other disciplines: from data to quantitative modeling and back to data. Models are usually developed by mathematicians, physicists, and computer scientists to translate qualitative or semiquantitative biological knowledge into a quantitative approach [1].
To eliminate semantic confusion between biology and other disciplines, it is necessary to have a list of the most important and frequently used concepts coherently defined so that involved people could use such a set of definitions to create new models and software, to provide an exact, semantic specification of the concepts used in an existing schema and to curate and annotate existing database entries consistently. We notice here that it is important to understand that semantic ambiguities also can arise between human experts. However, in the course of a conversation usually enough background knowledge and context is available so that semantic ambiguities are most often faster resolved than even consciously recognized. This is possible because of our intelligent capabilities which computers, programs and databases, at least for the near future, fall yet short of.
An ontology describes basic concepts in a domain and defines relations among them. Basic building blocks of ontology design include concepts and their instances; properties of each concept describing various features and attributes of the concept (slots, sometimes called roles or properties); restrictions on slots (facets, sometimes called role restrictions). An ontology provides a common vocabulary for researchers who need to share information in the domain and allows to build knowledge databases. Ontologies are widely used in biology and medicine and several important ontology systems have been established. They contribute to a precise and exhaustive way to access bio-information and define concepts in a precise and rigorous way [2][3][4][5][6][7][8][9]. Interestingly, despite or because of the complexity of the immune response, IMGT-ONTOLOGY, the first ontology for immunogenetics and immunoinformatics, is also conceptually one of the more advanced biological ontologies [2][3][4][5][6], on which has been built IMGT®, the international ImMunoGeneTics information system® http:// www.imgt.org [10].
Other important efforts are underway to link models at different scales by means of markup languages (i.e. XML). CellML project is one of this http://www.cellml. org. The CellML language is an open standard based on the XML markup language. CellML is being developed by the Auckland Bioengineering Institute at the University of Auckland and affiliated research groups. The purpose of CellML is to store and exchange computerbased mathematical models. CellML allows scientists to share models even if they are using different modeling tools. It also enables them to reuse components from one model in another, thus accelerating model development.
Whereas usually ontologies led to knowledge databases, in what follows, we adopted another approach in which concepts for an ontology of cell and molecule interaction were generated starting from an agent based model (ABM), the Catania Mouse Model (CMM for short) and its computer implementation, the SimTriplex simulator [11,12]. SimTriplex simulates the immune system response elicited by the Triplex vaccine [13,14] against mammary carcinoma. This effort provides a solid infrastructure that is useful to overcome the semantic ambiguities that arise between biologists and mathematicians, physicists, and computer scientists, when they interact in such a multidisciplinary field.
The development of ontologies for molecular and cellular biology information, and the sharing of those ontologies within the bioinformatics community, are central problems in bioinformatics. If the bioinformatics community is to share ontologies effectively, ontologies must be exchanged in a form that uses standardized syntax and semantics. For this reason, while the initial motivation of our study was to present an ontology for the CMM, the paradigm we show here has wider applications as it bridges the molecule ontology with cell ontology (Figure 1). This is achieved by defining, at the same time, interactions in terms of cellular and molecular components of a biological system.

Implementation
The Catania Mouse Model (CMM) has been developed using Unified Modeling Language (UML) http://www. uml.org/. UML is a diagramming language or notation to specify, visualize and document different types of models and object oriented software systems. UML helps in visualizing design and in communication. We used Umbrello UML Modeller http://uml.sourceforge. net, an open source tool that allows to manage and create UML based models. UML was selected because it is a widely-used system for the representation of objects and their relationships. Moreover the Umbrello tool was used to export the CMM classes into Extensible Markup Language/Resource Description Format (XML/RDF), in order to create the concepts of CMM-Ontology. XML was developed by the W3C http://www.w3.org. The current standard for the XML Schema Language is controlled by the XML Schema Working Group of the W3C. XML is a good candidate to share ontologies because of the significance of the Web and Web-based applications [15][16][17]. It is clear that the Web is rapidly becoming the primary method for the exchange of information and data, and that XML is currently the leading candidate for a generic language for the exchange of semi-structured objects.

Distribution
CMM-Ontology main concepts, with controlled vocabularies and rules, are publicly available from the Computational Immunology and Immunomics Group homepage at http://www.immunomics.eu. Available formats are XML and XMI. The concepts of Identification for cellular components and the concepts of interaction have been added in IMGT-ONTOLOGY http://www. imgt.org.

From the model to the CMM-Ontology main concepts
CMM-Ontology concepts were generated from the model CMM. They provide a semantic standardization of the knowledge in the biological modeling field. They are used to identify the main biological entities used in the model as well as their interactions. We focus on two main types of concept: the concepts of identification and the concepts of interaction. These concepts bridge the gap between molecular component ontology and cellular component ontology. It is expected that they will allow scientists to easily identify the main biological entities they use, to model any given biological scenario.

Concepts of Identification
Concepts of identification for molecular components have been analyzed extensively in IMGT-ONTOLOGY [3][4][5][6]. We therefore focus, in CMM and CMM-Ontology, on the identification of the cellular entities involved in modeling the competition between cancer and the immune system with or without exogenous stimulation with a cancer vaccine. In CMM, and in accordance with UML, the classes have been developed as a "class diagram" (these classes correspond to concepts in CMM-Ontology). A class defines the attributes and the methods of a set of objects. All objects of a given class (instances of this class) share the same behavior, and have the same set of attributes (each object has its own set). In UML, classes are represented by rectangles, with the name of the class, and can also show the attributes and operations of the class in two other "compartments" inside the rectangle. Interfaces are abstract classes, which means that instances cannot be directly created inside them. They can contain operations but not attributes. An association represents a relationship between classes, and gives the common semantics and structure for many types of "connections" between objects. Associations are the mechanism that allows objects to communicate with each other. In UML, associations are represented as lines connecting the classes participating in the relationship. Aggregations are a special type of Figure 1 Relations between the Molecule ontology of IMGT-ONTOLOGY that comprises concepts of identification, description and numerotation [3][4][5][6]and the CMM-Ontology that allowed to define concepts of interactions, a novel component of IMGT-ONTOLOGY (this paper). Whereas the concepts of identification, description and numerotation were defined to manage data in IMGT databases and tools and therefore immunogenetic knowledge in normal and pathological conditions (cancer, allergy, autoimmune disease...) [6,10], the concepts of identification were defined from mathematical and statistical models related to vaccination protocol [11,12]. association in which the two participating classes do not have an equal status, but make a "whole-part" relationship. An aggregation describes how the class that takes the role of the whole, is composed of (has) other classes, which takes the role of the parts. In UML, aggregations are represented by an association line that ends in a diamond on the side of the whole. A generalization association between two classes puts them in a hierarchy representing the concept of inheritance of a derived class from a base class. In UML, generalizations are represented by a line connecting the two classes, with an ending arrow on the side of the base class. Containment associations represent an operation implementation. In UML, containments are represented by a line with a circle. In CMM-Ontology, the concepts of identification and their instances, generated from the model, are the following (  • CellFunction. This concept identifies the functions that a cell can perform. Instances of this concept are: helper (helper 1, helper 2), regulatory, cytotoxic, antigen presenting cell (APC).
• CellDifferentiationStatus. This concept identifies the differentiation status of a cell. Instances of this concept are, for example, memory cell, plasma cell.
• CellLifeStatus. This concept identifies the life status of a cell and was specifically included for modeling purposes. Instances of this concept are: alive (i.e. a cell is performing its own job), dead (a cell to be removed from the system) or dying (a cell starting an apoptosis process and supposed to do some other actions before it dies, i.e. releasing of antigens or some other cell product).

Concepts of interaction
An interaction between two entities is a complex action which eventually ends in a status change of one or both entities. In the immune system, interactions can be specific (adaptive immunity) or non specific (innate immunity). Specific interactions characterize the immune adaptive response and comprise a specific recognition phase between two entities, the antigen receptor and an antigen. These interactions involve the recognition of an antigen by: • either an immunoglobulin (IG) specific for that antigen (in CMM, native p185 antigen). The antigen can be either soluble or adsorbed at the surface of a follicular dendritic cell in the lymph node; • or a T cell receptor (TR) specific for a peptide/ MHC. The peptide (p) resulting from antigen processing by a cell is presented at the surface of that cell in the groove of a MHC protein of class I or II (MHC-I or MHC-II) [18]. A TR is specific of a pMHC-I or pMHC-II (in CMM, peptides processed from p185 and presented by MHC-I or MHC-II). Immune recognition can be eventually enhanced by adjuvants.
In CMM, the interactions have been modeled using component diagrams. They show the components (either component technologies or sections of the system which are clearly distinguishable) and the artifacts they are made of, such as source code files, or relational database tables. Components can have interfaces (i.e. abstract classes with operations) that allow associations between components.
In CMM-Ontology, the concepts of interaction and their instances, generated from the model, are the following ( Figure 3): 1. The "Molecule_Molecule_Interaction" concept. If the Molecule is a soluble immunoglobulin (IG) specific for an antigen, and if the other Molecule encountered is that antigen (Ag), IG binds to Ag and forms an immunocomplex (that can be captured by a macrophage). That instance of the "Molecule_Molecule_Interaction" concept is • Immunoglobulin_Antigen. In CMM, Ag is the native p185 antigen (Figure 4).
2. The "Cell_Molecule_Interaction" concept. If the Cell is a B lymphocyte, a macrophage or a dendritic cell, and if the Molecule is an antigen, the cell can internalize the native antigen, process it and present it as peptide bound to MHC-II (pMHC-II) protein at the cell surface. The cell becomes a professional antigen presenting cell (or APC). Three instances can therefore be defined: • B lymphocyte_Antigen ( Figure 5). If, in a lymph node, a naive B lymphocyte expresses at the cell surface a membrane IG which is specific for the native antigen (in CMM, p185 antigen), B lymphocyte internalizes the membrane IG and the bound Ag and processes the IG-Ag complex into peptides which are then presented by MHC-II (pMHC-II) at the B lymphocyte surface. B lymphocyte becomes an APC.
• Macrophage_Antigen. If a macrophage encounters a native antigen (in CMM-Ontology, p185 antigen) or an immunocomplex, the macrophage internalizes the antigen or the immunocomplex and processes it into peptides which are then presented by MHC-II (pMHC-II) at the macrophage cell surface. Macrophage becomes an APC. • Dendritic cell_Antigen. If a naive dendritic cell encounters a native antigen (in CMM, p185 antigen) or an immunocomplex, the dendritic cell internalizes the antigen or the immunocomplex and processes it into peptides which are then presented by MHC-II (pMHC-II) at the dendritic cell surface. Dendritic cell becomes an APC.
3. The "Cell_Cell_Interaction" concept.   Immunoglobulin_Antigen as an instance of the "Molecule_Molecule_Interaction" concept. This instance is shown as IG_Ag in Fig. 2. In CMM, Ag is represented by p185.
• T lymphocyte [cytotoxic]_Cancer cell ( Figure 7A). • T lymphocyte [cytotoxic]_Vaccine cell ( Figure 7B). 4. The "Molecule_Cell_Interaction" concept. If the Molecule is a specific soluble immunoglobulin (IG) and the Cell is a cancer cell (or a vaccine cell) that expresses the antigen at its cell surface (in CMM, p185), the soluble IG recognizes specifically the antigen (Figure 8). The opsonized cell (cell with bound IG on its surface) may be killed by complement dependent cytotoxicity (CDC) or by antibody dependent cell cytotoxicity (ADCC). At the molecular level, the first interaction is the recognition by the IG of the antigen expressed at the surface of the cancer cell or vaccine cell. The two corresponding instances of the "Molecule_Cell_Interaction" concept are: • Immunoglobulin_Cancer cell.
The consequence of the "Molecule_Cell_Interaction" in these two instances, that results in the killing (by CDC or ADCC) of the opsonized cancer cell or vaccine cell involves new instances of concepts defined above: • "Molecule_Cell_Interaction", for CDC (with the instance Complement_Opsonized cell), or • "Cell_Cell_Interaction", for ADCC (with the instance Natural killer_Opsonized cell). Figure 5 B lymphocyte_Antigen as an instance of the "Cell_Molecule_Interaction" concept. This instance is shown as B_Ag in Fig. 2. In CMM, Ag is represented by p185. Note that the interaction between the membrane IG, on the surface of the B lymphocyte, and the antigen is also an instance of the "Molecule_Molecule_Interaction". This example illustrates (as the following figures) how the IMGT-ONTOLOGY concepts of interactions allow to bridge the gap between the cell and molecule levels.  In CDC, complement (C1q) binds to the Fc of IG at the surface of the opsonized cell (cancer cell or vaccine cell), the complement cascade is activated, the membrane attack complex (MAC) is formed in the cell membrane and the cell is killed. In ADCC, Fc receptor gamma of natural killer cell (NK) binds to the Fc of IG at the surface of the opsonized cell (cancer cell or vaccine cell), NK kills the cell (cancer cell or vaccine cell).

Discussion
An interaction described at the "Cell_Cell_Interaction" level focuses on the activity, activation status, function, differentiation status and/or life status of the cells that interact. If needed, the "Cell_Cell_Interaction" can also be identified at the "Molecule_Cell_Interaction", or "Cell_Molecule_Interaction", or "Molecule_Molecule_Interaction" levels. The level of granularity will depend on the model and on the kind of data that are available for the modeling. The four types of interactions can be used to identify, not only a given interaction, but also a complex succession of interactions, as described above for the killing by CDC or ADCC. Finally we have shown that these concepts of interactions identified in CMM-Ontology are general for the immune response and, for that reason, they have been added to IMGT-ONTOLOGY Figure 8 Immunoglobulin_Cancer cell as an instance of the "Molecule_Cell_Interaction" concept in models related to immunotherapy, as defined in CMM. This instance is shown as IG_CC in Fig. 2. The interaction between the immunoglobulin and the antigen (Ag), on the surface of the cancer cell (in CMM, Ag is represented by p185), is the starting event that leads to effector interactions: (i) binding, at the cell level, of natural killer (NK) to cancer cell ("Cell_Cell_Interaction") and, at the molecule level, of the Fc receptor (FcR), on the surface of NK, to the IG that has recognized Ag on the cancer cell surface ("Molecule_Molecule_Interaction") and (ii) binding of the complement (Cpt) to the IG that has recognized Ag on the cancer cell surface ("Molecule_Molecule_Interaction"). These interactions by NK or complement lead to the killing of the cancer cell by 'antibody dependent cell cytotoxicity' or ADCC or 'complement dependent cytotoxicity' (CDC), respectively (IMGT Education > IMGT Lexique, ADCC and CDC, http://www.imgt.org).
[6], the reference in immunogenetics and immunoinformatics. This effort represents, as far we know, the first attempt at linking the well established molecular ontology with cellular ontology in cancer immunology. With the increasing use of models in biology and medicine the need of linking different levels, from molecules to cells to tissues and organs, is increasingly important. As a matter of fact only a unified ontology framework will allow to link models at different scales.
Attempts in that direction are, at the moment, based on the use of markup languages [15][16][17], i.e. XML, but a general framework is still to come. In modeling other pathologies [19][20][21] we experienced that using an ontology driven approach, itself generated from a model, resulted in speeding up the process of model construction as well as clarifying the biologist needs regarding model definion. We believe that this is a powerful methodology.
A model is a formal description of biological knowledge and its quantitative formulation using mathematical or computational tools. An ontology based description of these tools would clarify to biologists the value of the model results. However very few attempts in these directions have been made for modeling in physics and engineering [22]. We expect that the increasing interest in mathematical modeling in life science will push toward an increasing interest in this aspect in a near future.

Conclusion
In this paper we presented a first attempt, generated from a model, at defining an integrated molecular -cellular ontology to be used in modeling biological problems. As the overall goal of this approach is to use a standardized approach to describe biological entities we plan to adapt in the future this methodology to the most widely used software tool in this field, i.e. Protégé http://protege.stanford.edu). Work in this direction is in progress and results will be published in due course.