What is Cytoscape?
Cytoscape is an open-source platform for visualizing complex networks and integrating these networks with attribute data. It is widely used in systems biology to analyze molecular interaction networks and integrate these networks with gene expression profiles and other state data.
Key Concepts and Terms
Network (or Graph): A collection of nodes (vertices) and edges (links) that connect pairs of nodes. In biological networks, nodes can represent genes, proteins, or other molecules, while edges represent interactions between them.
Node: An individual entity in a network, such as a gene, protein, or other biomolecule.
Edge: A connection or interaction between two nodes in a network.
Attribute Data: Additional data associated with nodes or edges, such as gene expression levels or interaction confidence scores.
Network Visualization: The process of representing a network visually, where nodes and edges are arranged in a two-dimensional or three-dimensional space.
Layout Algorithm: A computational method for arranging nodes and edges in a network to optimize readability and interpretability.
Installing and Launching Cytoscape
To begin using Cytoscape, you need to download and install it from the official Cytoscape website. Follow the installation instructions for your operating system.
Launching Cytoscape
After installing Cytoscape, launch the application. The main interface includes several key components:
Menu Bar: Provides access to all of Cytoscape's functions.
Tool Bar: Contains quick-access buttons for commonly used functions.
Control Panel: Located on the left, it includes tabs for managing network styles, filters, and annotations.
Network Panel: Located on the right, it displays the currently loaded network(s).
Table Panel: Located at the bottom, it shows attribute data for nodes and edges.
Main Network View: The central area where networks are displayed and visualized.
Loading a Sample Network
Go to the home page
Select the sample network Yeast Gene Interactions
The network will load and appear in the Main Network View. Nodes represent proteins or genes, and edges represent interactions between them.
Navigating the Network
Zoom In/Out: Use the mouse wheel or the zoom buttons in the Tool Bar.
Pan: Click and drag the background to move around the network.
Select Nodes/Edges: Click on a node or edge to select it. Hold Shift to select multiple nodes or edges.
Understanding the Network
Node Attributes: Click on a node to see its attributes in the Node Table at the bottom.
Edge Attributes: Click on an edge to see its attributes in the Edge Table.
Customizing Network Visualization
Changing Node and Edge Styles
Node Styles:
Go to the Style tab in the Control Panel.
Click on the Node section to expand it.
Change the node color, size, and shape based on different attributes.
Edge Styles:
Click on the Edge section in the Style tab.
Change the edge color, width, and style based on different attributes.
Applying Layouts
Go to Layout > Apply Preferred Layout.
Experiment with different layouts such as Force-Directed, Circular, Grid, etc.
Layouts help to arrange the nodes and edges in a visually appealing and informative manner.
Network Analysis
Network Analyzer
Go to Tools > Analyze Network.
Choose Analyze as Directed or Analyze as Undirected based on your network type.
Review the network analysis report, which includes metrics such as node degree, clustering coefficient, and shortest path length.
What do these metrics mean? Well let’s look at the data. First, let’s understand some of the key terms.
Columns and Key Terms:
AverageShortestPathLength:
Definition: The average length of the shortest path between a node and all other nodes in the network.
Importance: Indicates how close a node is to all other nodes, affecting the efficiency of information or signal flow.
BetweennessCentrality:
Definition: A measure of how often a node appears on the shortest paths between other nodes.
Importance: Nodes with high betweenness centrality control the flow of information and can act as bottlenecks.
ClosenessCentrality:
Definition: The inverse of the average shortest path length from a node to all other nodes.
Importance: Nodes with high closeness centrality can quickly interact with all other nodes.
Cluster:
Definition: A grouping of nodes that are more densely connected to each other than to the rest of the network.
Importance: Helps identify modules or communities within the network.
ClusteringCoefficient:
Definition: A measure of the degree to which nodes in a network tend to cluster together.
Importance: Indicates the presence of tightly knit groups in the network.
Degree:
Definition: The number of edges connected to a node.
Importance: Reflects the immediate connectivity of a node.
Eccentricity:
Definition: The maximum shortest path length from a node to any other node in the network.
Importance: Shows the farthest distance from a node to all other nodes, indicating how far a node is from the most distant node.
IsSingleNode:
Definition: Indicates if the node is isolated or not connected to any other nodes.
Importance: Isolated nodes may be of less interest in network analysis.
level2, level3, ...:
Definition: Custom attributes that could represent hierarchical levels or categories specific to the dataset.
Opacity, Opacity2:
Definition: Visual attributes used for rendering the network, indicating the transparency level of nodes or edges.
Importance: Affects how nodes and edges are displayed in visualizations.
ORF (Open Reading Frame):
Definition: The sequence of DNA that can be translated into a protein.
Importance: Identifies genes or gene products in the network.
PartnerOfMultiEdgedNodePairs:
Definition: Indicates if the node is part of multiple edges between the same pairs of nodes.
Importance: Shows redundancy or multiple interaction types between nodes.
Radiality:
Definition: A measure of how central a node is in a network based on its average shortest path length.
Importance: Similar to closeness centrality, affects information dissemination speed.
Selected:
Definition: Indicates if the node is currently selected in the network visualization.
Importance: Useful for interactive analysis and highlighting specific nodes.
SelfLoops:
Definition: Indicates if there are edges that connect a node to itself.
Importance: Self-loops can represent self-regulatory processes.
Shared Name:
Definition: A common identifier for the node, often used for labeling.
Importance: Helps in identifying nodes in visualizations and analysis.
Stress:
Definition: A measure of the total number of shortest paths that pass through a node.
Importance: Indicates the importance of a node in maintaining the shortest paths within the network.
TopologicalCoefficient:
Definition: A measure of the extent to which a node shares neighbors with other nodes.
Importance: Reflects the redundancy or robustness of node connections.
Now below, look at the sample analysis produced based on the network analysis table.
Node 0
AverageShortestPathLength: 4.524937
Explanation: This node has an average shortest path length of 4.524937, meaning on average, it takes about 4.52 steps to reach any other node in the network. This indicates it is relatively well-connected and not isolated.
BetweennessCentrality: 0.000784
Explanation: A betweenness centrality of 0.000784 suggests this node is not frequently found on the shortest paths between other nodes. It is not a major conduit for information flow within the network.
ClosenessCentrality: 0.220998
Explanation: A closeness centrality of 0.220998 indicates that this node can quickly reach other nodes in the network. Higher values indicate greater centrality, so this node is somewhat central but not the most central node.
Degree: 11
Explanation: The degree of 11 means this node has 11 direct connections (edges) to other nodes, making it a hub with a moderate number of interactions.
Node 1
AverageShortestPathLength: 4.633501
Explanation: An average shortest path length of 4.633501 means this node is slightly less central compared to Node 0, taking on average 4.63 steps to reach other nodes.
BetweennessCentrality: 0.001820
Explanation: With a betweenness centrality of 0.001820, this node is somewhat more involved in the shortest paths between other nodes than Node 0, but still not a key player in information flow.
ClosenessCentrality: 0.215820
Explanation: A closeness centrality of 0.215820 shows this node is less central than Node 0, taking slightly longer to reach other nodes on average.
Degree: 12
Explanation: The degree of 12 means this node has one more connection than Node 0, indicating it is a slightly more connected hub.
Node 3
AverageShortestPathLength: 5.400252
Explanation: This node has an average shortest path length of 5.400252, meaning it is less central and takes longer (about 5.4 steps) to reach other nodes.
BetweennessCentrality: 0.000504
Explanation: A betweenness centrality of 0.000504 indicates this node plays a minimal role in connecting other nodes, acting infrequently as a bridge.
ClosenessCentrality: 0.185177
Explanation: A closeness centrality of 0.185177 suggests this node is less central, reflecting its longer average path length to other nodes.
Degree: 4
Explanation: With a degree of 4, this node has fewer connections, indicating it is less of a hub compared to Nodes 0 and 1.
Node 4
AverageShortestPathLength: 4.939043
Explanation: An average shortest path length of 4.939043 means this node is moderately central, with paths to other nodes taking about 4.94 steps on average.
BetweennessCentrality: 0.000000
Explanation: A betweenness centrality of 0.000000 indicates this node is never on the shortest path between other nodes, meaning it does not facilitate indirect connections.
ClosenessCentrality: 0.202468
Explanation: A closeness centrality of 0.202468 shows this node is moderately central but not among the most central nodes in the network.
Degree: 2
Explanation: The degree of 2 indicates this node has only 2 connections, making it a peripheral node with limited interactions.
Summary
Node 0 is moderately well-connected with 11 direct interactions and relatively central.
Node 1 has a slightly higher degree (12) but similar centrality metrics.
Node 3 is less connected with only 4 direct interactions and lower centrality metrics.
Node 4 is peripheral with only 2 connections and minimal betweenness centrality.
These numbers help us understand the roles different nodes play within the network, identifying key hubs and peripheral nodes, and assessing how information or signals might flow through the network. This analysis is crucial for understanding the structure and function of biological networks.
Cluster Analysis
Go to Apps > App Manager.
Install the ClusterMaker2 app if it's not already installed.
Go to Apps > ClusterMaker2 and select a clustering algorithm (e.g., MCL or k-means).
Run the clustering algorithm and visualize the resulting clusters in the network.
Case Study: Analyzing a Sample Network
In this case study, we will analyze the sample network Yeast Gene Interactions to identify key genes and their interactions.
Loading and Exploring the Network
Load the sample network Yeast Gene Interactions as described in Chapter 2.
Explore the network by examining node and edge attributes.
Customizing the Visualization
Use the Style tab to color nodes based on the Degree attribute.
Apply the Force-Directed layout for better visualization of network clusters.
Performing Network Analysis
Analyze the network using Network Analyzer to identify key metrics.
Run ClusterMaker2 to identify clusters of highly interconnected proteins.
Integrating Gene Expression Data
Import gene expression data from a CSV file.
Visualize the expression levels using node color and size attributes.
Saving and Exporting Results
Save the Cytoscape session.
Export the network visualization as an image for inclusion in a research report.
Export the node and edge tables for further statistical analysis.
Conclusion
This tutorial provides a comprehensive introduction to using Cytoscape for systems biology and computational biology. By following the steps outlined, you will be able to load, customize, analyze, and integrate data into biological networks effectively. Cytoscape's flexibility and extensive app ecosystem make it a powerful tool for a wide range of network analysis tasks.
A screenshot of the Yeast Gene Interactions network along with the Node Table (highlighted in blue).