IRB_logo

NetAligner

fast and accurate alignment of protein interaction networks

BSC_logo
NetAligner features fast complex to interactome, pathway to interactome and interactome to interactome alignment, respecting one-to-many and many-to-many homology relationships. For full functionality of the webserver, please enable Javascript and Flash support in your browser.

Help page overview


NetAligner strategy

  1. We collect all pairs of homologous proteins between the two input networks. Each of those pairs is represented by a vertex, and the user can choose to exclude distant homologs by setting a threshold on the vertex probability, which is calculated as the posterior probability of the respective proteins being real homologs given their BLAST E-value. In the alignment visualization, vertex probabilities are indicated by different shades of blue, ranging from 0 (white) to 1 (blue).
  2. The initial alignment graph is constructed by drawing edges between vertices that are involved in a conserved interaction (green). Optionally, the user can choose to predict likely conserved interactions for all pairs of homologs with an interaction in at least one of the input networks (yellow or blue), based on the difference between the evolutionary distances of the given pair of homologous proteins. This is motivated by the notion that the more similar the evolutionary pressures acting upon interacting proteins, the higher the probability that the interaction is conserved. Edges with a low probability are filtered out based on the given edge probability threshold.
  3. To identify alignment solution seeds, we search for connected components in the initial alignment graph (red ellipses), representing conserved core-complexes or -subnetworks.
  4. The alignment graph is then extended by connecting vertices of different seeds through gap (orange or magenta) or mismatch edges (red) if the given homologs are connected by indirect interactions in one or both input networks, respectively. This serves to identify conserved complexes, pathways or subnetworks, allowing a certain extent of network rewiring during evolution. Again, the edge probability threshold is used to filter out false positives.
  5. Lastly, we search for connected components in the extended alignment graph, which represent the final alignment solutions (red ellipses). All solutions get scored and ranked, and we determine their statistical significance using a Monte-Carlo permutation test. The score of each alignment solution represented by the graph G is computed as the weighted sum over all vertex scores Sv and all edge scores Se, with the user-defined weight α specifying the vertex to edge score balance. Individual vertex and edge scores are calculated as the logarithm of the respective vertex probability PvA/A' or edge probability PeA/A',B/B', respectively (adding 1 to allow a ranking by decreasing score), with A/A' and B/B' denoting pairs of homologous proteins.

NetAligner_strategy
(click to enlarge)

overall_score

vertex_score
edge_score

Top


Data sources

The following data sources are used by the NetAligner webserver to perform the alignments:

Proteomes For each organism, we collected all its protein sequences from the UniProt database, including splice variants, but excluding cDNAs and fragments. We then clustered the sequences using UniRef 100 to remove redundancy and kept only those with experimental evidence on protein or transcript level.
Interactomes To construct species interactomes, we collected all experimentally determined interactions from the public databases IntAct, MINT, HPRD, BioGRID, MatrixDB, MPIDB, InnateDB and DIP. To transform protein complexes into binary interactions, we used the 'spoke' model whenever the bait of the given affinity purification was specified, and the 'matrix' model otherwise. Then, we used the same UniRef 100 protein mapping as above to remove redundancy. Finally, all interactions involving proteins that were not in the respective species proteome were discarded, as well as all those interactions that could not be traced back to a publication or that were marked as 'weak' by the authors. We defined interaction reliabilities based on the number of publications reporting a given interaction (i.e. interactions reported by only one paper get a reliability of 0.1; those reported by two papers 0.3; and those found in at least three papers 0.9).
Homology data To identify homologous proteins, we performed proteome-wide all vs. all reciprocal BLAST searches for all species combinations. We then defined homologs as reciprocal top 10 hits with an E-value<10-10.
Top


Complex to interactome alignment

On the Complex to interactome page you can align a query complex of one species to the whole interactome of either the same species (to search for alternative complexes or subcomplexes that can fulfill the same or a similar function within the cell) or another (to identify conserved protein complexes across the two organisms). As an example, alignment of the human DNA polymerase alpha - primase complex to the yeast interactome reveals a similar topology of the complex in the two organisms and hints towards a potential cross-talk between the DNA polymerases alpha and delta in yeast (please see our PLoS ONE paper for details). After selecting the query and target species, you can enter the list of protein components of the given query complex and then either use the default NetAligner parameters for this alignment task (by clicking the 'Submit' button) or change them according to your own needs. In the benchmarks we performed for complex to interactome alignment, selecting the option to predict likely conserved interactions significantly increased the performance in recovering known conserved complexes. Since NetAligner expects two input networks, the server creates an induced complex network from the given list of protein components, using interaction data from the respective species interactome.

Top


Pathway to interactome alignment

On the Pathway to interactome page you can align a query pathway of one species to the whole interactome of either the same species (to search for alternative signalling routes, backup circuits or pathway cross-talk) or another (to identify conserved pathways across the two organisms). As an example, alignment of the fly PI3K-AKT-IKK signalling pathway to the human interactome predicts an IKKB homo- to IKKA/IKKB heteromultimer evolution and uncovers different interaction patterns of IKK with the three AKT isoforms in human, indicating different roles in cellular signalling events (please see our PLoS ONE paper for details). After selecting the query and target species, you can enter the list of protein interactions of the given query pathway and then either use the default NetAligner parameters for this alignment task (by clicking the 'Submit' button) or change them according to your own needs. In the benchmarks we performed for pathway to interactome alignment, selecting the option to predict likely conserved interactions significantly increased the performance in recovering known conserved pathways.

Top


Interactome to interactome alignment

On the Interactome to interactome page you can align the whole interactomes of two species to search for conserved protein complexes or subnetworks. This is especially useful when protein complexes are unknown in both species, thus precluding the use of complex to interactome alignment. As an example, alignment of the yeast to the human interactome indicates that the COP9 signalosome (CSN) might be able to substitute the lid subcomplex of the proteasome and suggests a functional role of the CSN in cell-cycle control through interaction with cyclins and cyclin-dependent kinases (please see our PLoS ONE paper for details). After selecting the query and target species, you can then either use the default NetAligner parameters for finding conserved protein complexes (by clicking the 'Submit' button) or change them according to your own needs. In the benchmarks we performed for interactome to interactome alignment, selecting the option to predict likely conserved interactions let to the recovery of conserved subnetworks in which different protein complexes are connected into higher-order assemblies, while deselecting it let to the identification of more tightly connected protein complexes. To find conserved protein complexes instead of larger conserved subnetworks, the default parameters are relatively strict for this alignment task, which can lead to few results for certain species pairs. In this case, we recommend to increase the max insertion length and/or to select the option of predicting likely conserved interactions.

Top


Parameter choices

We determined the default parameters for the complex to interactome and interactome to interactome alignment tasks based on a benchmark set of 71 conserved human/yeast complex pairs, consisting of 64 non-redundant human and 52 non-redundant yeast complexes from the manually-curated CORUM and MPACT databases, respectively. For pathway to interactome alignment, we calculated the default parameters based on a benchmark set of 19 human/fly, 32 human/yeast and 13 fly/yeast conserved pathway pairs originating from the KEGG database. You can find all the details about how we performed the benchmarks and determined the default parameters in the Materials and Methods section of our PLoS ONE paper, where we describe the NetAligner algorithm. The complexes and pathways benchmark sets are available in the Supplementary Information of that paper, and the stand-alone version of NetAligner can be downloaded here: Unix version and Windows version. We also offer the user the possibility to fine-tune individual parameters to fit his or her particular needs:

Predict likely conserved interactions Select this parameter if you want to include the prediction of likely conserved interactions in the network alignment. Likely conserved interactions represent instances where there exists a direct interaction between two proteins in one of the input networks, and this interaction is predicted to be likely conserved between a given pair of homologs of those proteins (see above). We provide this option as a way to counter the high number of false negatives in current interactome networks, meaning that many more interactions exist than are currently known and a large fraction of those are probably conserved across species. Selecting this parameter led to a significant increase in alignment performance in some cases and it is thus selected by default in the respective alignment scenarios.
Vertex probability threshold Only those alignment graph vertices (i.e. pairs of homologous proteins) with a probability above this threshold are considered in the alignment procedure. The threshold can thus be used to filter out distant pairs of homologs that have only little sequence similarity.
Edge probability threshold Only those edges in the alignment graph with a total probability above this threshold are considered in the alignment procedure. The total probability of each edge is calculated based on the given interaction conservation probability, as well as the individual interaction reliabilities. The threshold can thus be used to filter out interactions with only low reliability or only low probability of being conserved. Since interactome to interactome alignment requires extensive computational resources for low edge probability thresholds, we limited the minimum threshold allowed to the default one.
Max insertion length This parameter determines the maximum length of gaps and mismatches in terms of the number of protein insertions in the query or target network that are necessary to connect a given pair of proteins through an indirect interaction. Although the maximum insertion length can be set to any number, due to the small-world property of current interactome networks (meaning that the average path length between any two proteins in the given network is very small compared to its size), we suggest not to set this parameter higher than three.
Vertex to edge score balance This parameter is a weight in the range [0,1], that determines the extend to which the vertex and edge scores contribute to the final score of a given alignment solution (see above). A vertex to edge score balance near 0 gives more weight to the edge scores, while a value near 1 gives more weight to the vertex scores.
Top


Alignment results

After submitting a network alignment, a results page will open, on which you will see the alignment results summary once NetAligner has finished. Until then, you will receive a message that NetAligner is either queueing or running, and the page will automatically reload every 10 seconds. The address of this page will contain a session ID that is unambiguously linked to your specific alignment submission, and you can bookmark that page to also check back on your results at a later time and you can bookmark that page to also check back on your results at a later time or consult the 'My jobs' page where you can access all jobs submitted during the last 48 hours if you have cookies enabled in your browser. The alignment results for each session are available for 48 hours and will only then be deleted from our server. Thus, in case you have a specific question to your alignment results, you can send us an email with your session ID and we can check your case within that time frame. The results summary consists of two tables. The first one displays the type of alignment task performed, whether the prediction of likely conserved interactions was enabled or disabled, the query and target species selected and the total number of alignment solutions found, as well as the number of statistically significant alignment solutions. Note that, although alignment solutions may be statistically insignificant (e.g. due to low edge probabilities), they might still be biologically significant, because current interactome data is incomplete and the reliability of certain interactions might be underestimated. The second table provides an overview of all alignment solutions with the following information:

Rank Alignment solutions are ranked by increasing p-value, followed by decreasing score in case of ties.
Score The score of each alignment solution is calculated as the weighted sum over all vertex and edge scores (see above).
P-value The p-value represents the statistical significance of the given alignment solution and is calculated using a Monte-Carlo permutation test. Insignificant alignment solutions, based on a standard p-value threshold of 0.05, are marked in gray.
Vertices Number of vertices (i.e. pairs of homologous proteins) in the given alignment solution.
Edges Number of edges (i.e. conserved and likely conserved interactions, gaps and mismatches) in the given alignment solution.
Conserved interactions Number of conserved interactions.
Interactions likely conserved in the query Number of interactions of the target network that are likely conserved in the query species.
Interactions likely conserved in the target Number of interactions of the query network that are likely conserved in the target species.
Gaps in the query Number of gaps in the query network (i.e. pairs of homologous proteins connected through a direct interaction in the query network and an indirect interaction in the target network).
Gaps in the target Number of gaps in the target network (i.e. pairs of homologous proteins connected through a direct interaction in the target network and an indirect interaction in the query network).
Mismatches Number of mismatches in the alignment solution (i.e. pairs of homologous proteins connected through indirect interactions in both input networks).
Alignment visualization Visualization of the given alignment solution using Cytoscape Web (requires Flash support to be enabled in your browser). Clicking on the Cytoscape Web logo opens an interactive visualization pane with a network representation of the given alignment solution. For large alignment solutions, it can take a couple of seconds until the network layout has finished. You can then select individual vertices and edges to display annotation data, as well as zoom in and out using the control panel.
Download Links for downloading the given alignment solution either as an XGMML file that can be loaded into Cytoscape, or as a tab delimited file. At the bottom of the page, you can find a link to download the node attributes file, which contains a list of all proteins mapped during the alignment (across all alignment solutions), together with a short description of their cellular function and a probability representing their overall sequence similarity based on BLAST E-values.
Top


How to cite

Please cite the following publication:

Pache RA, Aloy P.
A Novel Framework for the Comparative Analysis of Biological Networks.
PLoS ONE. 2012 February;7(2):e31220.
PMID: 22363585

Top