Nowadays, there are a big amount of pathway databases accessible through Internet. These databases store information that biologists often need to use to support their research, but this is complicated since each has its own different representations of the same concepts and they are often focused on specific information to solve only their particular research problems. This situation makes very difficult the work of the scientist loosing a lot of time and the integration of these data into only one repository becomes almost impossible. To solve this problematic situation for researchers and scientists, the purpose is to define in a standard way a format to represent pathway data. Some research work in this area have been developed, as for example PSI-MI format, but none of them is capable of representing all of the most frequently used pathway data types. In order to meet this challenge, BioPax defines a data exchange format for pathway data that aims to enable integration, exchange, visualization and analysis from a wide range of popular pathway databases. Thus, the integration is reduced to a mapping between the data models of each source and the data model of BioPax. BioPax format, currently implemented in OWL, is used by a lot of important pathway databases like Reactome or BioCyc to represent their information and it is a great step for the bioinformatics community. Now researchers have all interesting information about pathways in the same format, a lot of problems disappear, the concepts are defined following the same representation, but what happen when scientists want to work with a OWL files? When a biologist want to work with this kind of files, he finds some complications which makes the format less useful. The searches involve a great waste of time and the relationships between the entities in the file, are very difficult to interpret, so that the expert takes too much time on their work and he return to the initial problem has been discussed. Furthermore, all the files with which the expert works must be downloaded into his machine.
Due to this disadvantages, we purpose the development of an information system (IS) that integrates all pathways information using model driver development (MDD) and conceptual model-based methodologies to implement a relational database.
Conceptual modeling is widely used in IS field because it helps developers to describe and understand the problem before implementation and it also can help to manage its development on the future. Furthermore, these techniques alto ensures the final software product quality through techniques of transformations between models and they have been used in many different domains for a long time. Furthermore, when the description is complete, a database can be implemented using current trends in MDD which suggest the automatic generation of a database from its conceptual model. The database we propose is a relational database because it is widely uses nowadays, it has the available technology for manage huge amount of data in a computer, it provides a structured and non redundant organization of data and it uses the standard SQL language. Thus, the use of a relational database can solve the BioPax format problems cited before, why not use these methodologies to unify information from various repositories to facilitate the work of biologists? We think that the application of these techniques in the context of challenging domains as the pathways in bioinformatics is a fascinating task.


Using the BioPax Level 3 structure, we are going to represent pathways data using conceptual modeling representation techniques with the very widely used UML language. Furthermore, one the conceptual model is established, the corresponding database is created using a MDD tool. This database is intended to act as a unified repository of integrated information that will allow experts to perform efficient recovery tasks facilitating the connections between data and removing the expert from the need to have downloaded files with which he works into his machine. The tool selected to work directly with the database is Hibernate, that offers the possibility to automate the processes of retrieve and store information. An real application of this approach is carried out in the laboratory taking some files with BioPax format from a set of databases and storing their information into the database implemented. Finally, a viewer of pathways has been developed with the aim to show the researcher the relationships between elements in a visual way.

BioPax conceptual model

In this section, we are going to explain the correspondences between BioPax language and UML representation as well as the criteria that we had followed to model it. To do that, we took all main elements of UML representation and we tried to associate them with its corresponding representation in the BioPax description document. The main elements of UML representation are: classes, properties, relationships, relationship cardinalities and hierarchy.
  • Classes: we find a UML class in BioPax description document when the concept which represents is an entity that is important in the domain and it needs to be describe. Furthermore, BioPax description gives the information that all entities described start with uppercase (less kPrime) and they are written in blue bold.
  • Properties: Each class in BioPax document has a list of specific properties that define it. This properties appear in lowercase and in cursive.
  1. Attribute of a class: The properties BioPax that we consider attributes in UML desing are properties which have a primitive type i.e. String, Integer, etc. Normally, the type and cardinality of these attributes are represented implicitly inside the description text of BioPax.
  2. Association relationships: Inside the previous list of properties, it is also shown the relationships between classes too. The structure “0...1 object:ClassName”, “0...* object:ClassName” or “1...* object:ClassName” refers to the object relations between the class which has the property and the ClassName, but sometimes the data is implicit inside the description text of BioPax. The cardinality of these relationships is represented in only one direction and it is indicated in the description, but what about the other direction of the relationship? We have considered that if they don't provide this information is because the cardinality in opposite direction can take all possible values, or in other words it takes the minimal cardinality. Furthermore, to facilitate the comprehension of relationships, a graphical representation of them is added into BioPax document. The relationships between entities are represented using blue arcs. Each arc contains the name of relationship and, frequently, when the cardinality is 0...* or 1...* the name is accompanied by an asterisk.
  • Inheritance relationships: this property is indicated inside the description of each class by the literal “Parent Class”.

More details about description: here
Once all the mapping is done, we obtain the model. This complete model is attached as an image in the Documents section.

BioPax SQL database

From this conceptual model, a database can be implemented. At this point, due to the quantity of classes generated in UML representation and the continuous evolution of pathways domain, we consider the possibility of implement this database in an automatic way using some MDD tool. Finally we select Moskitt as development tool and the database is created. With the database implemented we select a set of files in BioPax format from some pathway database. This information is stored in the implemented database, and thus we will check that the database is well established. We have studied some possibilities to store this information into the database. The first option was to develop the functionality manually, and second, use a tool which facilitates the work. Finally we select Hibernate as tool to store information. Hibernate offers too the possibility to retrieve information of the database in an automatic way, but the information return is a little trivial, so we have decided to create another level of communication with Hibernate to make this kind of more complicated selections.

BioPax viewer

Finally, with this information stores in the database we have developed a pathway display. So, the user can interact with the pathways in a visual way.