« Previous - Version 17/51 (diff) - Next » - Current version
Javier Santoyo, 05/31/2010 08:54 am


SNOW methods

SNOW stands for “Studying Networks in the Omic World”. Snow extracts and evaluates the cooperative behavior of lists of proteins/genes in terms of protein-protein interactions (ppis). The complete set of protein-protein interactions in the cell is called Interactome.

Snow identifies hubs in the list of proteins/genes (nodes) and evaluates the global degree of connections, centrality and neighborhood aggregation of the list by comparing the distributions of nodes connections degree, betweenness centrality and clustering coefficient respectively against the complete distribution of these parameters into the interactome of reference. Furthermore, SNOW extracts the minimum network that connects the proteins/genes in the list. A user-fixed number of external proteins to connect nodes in the list is allowed. The topology of this network is evaluated by comparing distributions of node, edge and graph parameters of this network against pre-calculated distributions of a set (10000) of random lists with same size range. Thus, SNOW extracts information about whether the network represented in the list have more hubs, is more connected or have a more regular connections distribution than a random network. Similarly, a two lists comparison is also implemented.

Snow also provides an interactive visualization of the network and a complete description of interactomic and local network parameters of each protein/gene in the list as well as the external nodes introduced by the program. This information together with the provided functional annotation will guide the user to identify the important nodes within, or even outside, the list as well as evaluate the modular functionality of the list as an entity.

Protein-Protein interactions (ppi) databases

Currently SNOW uses human ppi datasets downloaded from the five main public databases

  • HPRD (release 7 downloaded 31/03/2009)
  • IntAct (downloaded 31/03/2009)
  • BIND (release 2007-05-10)
  • DIP (release Hsapi20090126)
  • MINT (release 2009/02/05)

Entries in databases were mapped to ensembl transcripts and ensembl genes using ensembl v44.

All those collections of ppi data are used to generate two different types of interactomes for both transcripts and genes:

  • A non-filtered interactome - all available ppis.
  • A filtered interactome - constraint in the methodologies (ppis that are detected for at least two different methodologies).

To build the filtered interactome the six top categories of experimental methods described in the Molecular Interaction (MI) Ontology plus the categories in vivo and in vitro from HPRD were taken as reference. Every ppi in each of the datasets was annotated with these categories. Ppis verified by at least two of these methods were introduced in the filtered interactome.

Software

Snow uses Boost Graph libraries as the software core for performing graph parameters calculation.

Network parameters

  • Connections (connections degree) - the number of edges, interaction events, for a node.
  • Shortest path - Dijkstra algorithm was used to calculate shortest paths among nodes.
  • Betweenness - This is a measure of centrality. It is calculated as the number of shortest paths that pass through a node divided the total of shortest path in the network.
  • Clustering coefficient - A measure of how interconnected the neighbours of that node are. This is the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them.
  • Component - a group of nodes connected among them.
  • Bicomponent - a group of nodes connected to another group of nodes by a single edge.
  • Articulation point - the edge that joins to bicomponents in a graph.

Evaluating lists by their interactome parameters

To evaluate the importance of a list of proteins/genes (nodes) in an interactome, the nodes are mapped into the interactome and their network parameters extracted. For each measurement (connections degree, betweenness centrality and clustering coefficient), the distribution of the values for the nodes in the list is compared against the interactome of reference distribution for that parameter applying kolgomorov-Smirnov test.

Evaluating Minimal Connected Network

The Minimal Connected Network (MCN) of a set of nodes is the minimal network that connects them. It is calculated by getting shortest paths for each pair of nodes from the interactome of reference. Paths that connects two nodes either directly or through a determined number of non-listed nodes are introduced in the MCN.

SNOW calculates the topological parameters of the MCN under study, they are parameters related to one node (connections degree, betweennes centrality and clustering coefficient) and parameters related to the whole network (number of components).

The distribution of each topological parameter (betweenness centrality, connections degree and clustering coefficient) in the studied MCN is compared (using Kolmogorov-Smirnov test) against an empirical distribution of each parameter generated from 10,000 MCNs calculated from same-sized random lists of proteins.

This empirical distributions are generated as follow:

  1. A range of list sizes is selected: 3-10, 10-20, 20-50, 50-100, 100-150, 150-200, 200-250, 250-300, 300-350, 350-400, 400-450, 450-500.
  2. Per each of the size ranges we generated 10,000 lists of proteins of random size (within the limits of the range) and random composition (random lists of proteins).
  3. Per each of the list of random proteins we calculate 4 MCNs introducing 0, 1, 2 and 3 non-listed proteins in the shortest paths that constitute 4 sets of MCNs.
  4. Per each of the sets of MCNs in step 3, we select a random node an extract each topological parameters (betweenness centrality, connections degree and clustering coefficient).
  5. The values of each topological parameter selected in step 4 (10,000) constitute the empirical distribution of such parameter for a MCN (characterised by the number of non-listed proteins allowed to be introduced in the shortest paths) generated from a list of random proteins of a given size.
  6. We also store the number of components of the MCNs of each of the sets of MCNs generated in step 3. This constitute the empirical distribution of the number of components or a MCN (characterised by the number of non-listed proteins allowed to be introduced in the shortest paths) generated from a list of random proteins of a given size.

For the number of components of the MCN studied we provide a 95% confidence interval of the number of components for the set of MCN generated from same sized from the 10,000 lists of random proteins/genes.

Two lists comparison

In a two list comparison scenario both list nodes interactome parameters and MCN node parameters are compared using kolmogorov-Smirnov test.

References

  1. Minguez P, Götz S, Montaner D, Al-Shahrour F, Dopazo J. SNOW, a web-based tool for the statistical analysis of protein-protein interaction networks. Nucleic Acids Res 2009 Jul;37(Web Server issue):W109-114.