Network Portal Help
Please visit screencast tutorials page to quickly learn how to use Gaggle and Gaggle workspace
Discover Network Portal
What is Network Portal
The Network Portal is a database of gene transcription regulatory networks and enables exploration, annotation and comparative analysis.
Deciphering the complexity of biological systems requires a systems-level view of regulatory players and their interactions. However, inference of these complex interactions is challenging, and further, visualizing and analyzing these interactions often requires advanced computational expertise, tools and resources.
Network Portal provides analysis and visualization tools for selected gene regulatory networks to aid researchers in biological discovery and hypothesis development. It provides a user-friendly interface, extensive search capabilities and a number of visualization tools. We are also constantly adding regulatory networks for new species. Increasing the number of network models will also enable comparative network biology. We are also developing tools for comparing network models and these will be integrated into the portal as soon as they are available.
The Network Portal also acts as a gateway for the Gaggle Workspace. Gaggle Workspace makes the life of researchers easier by taking over the burden of repetitive analysis tasks. Integration with Network Portal provides full access to automated analysis of network models and associated biological information.
Navigating Gene Pages
If the gene is associated with a regulon(s), its connection to given regulons along with other members of that regulon are shown as a network using CytoscapeWeb. In this view, green circular nodes represent regulon member genes, purple diamonds represent regulon motifs and red triangles represent regulators. Each node is connected to regulon(s) (Biclusters) via edges. This representation provides a quick overview of all genes, regulators and motifs for regulons. It also allows one to see shared genes/motifs/regulators among different regulons. Network representation is interactive. You can zoom in/out and move nodes/edges around. Clicking on a node will open up a window to give more details. The information shown for genes includes the locus tag, organism, genomic coordinates, NCBI gene ID, whether it is transcription factor and any associated functional information. For regulators, the number of regulons is shown in addition to gene details. For motifs, e-value, consensus sequence and sequence logo will be shown. For regulons, expression profile plot, motif information, functional associations and motif locations for each member of the regulon will be shown. You can pin information boxes by using the button in the box title and open up additional ones on the same screen for comparative analysis.
The regulation tab for each gene includes regulatory influences such as environmental factors or transcription factors or their combinations identified by regulatory network inference algorithms. If the gene is a member of a regulon, regulators influencing that regulon are also considered to regulate the gene. The regulators table lists the total number of regulatory influences, regulators, regulons and type of the influence. You can see a description of the regulator inside the tooltip when you mouseover. In certain cases the regulatory influence is predicted to be the result of the combination of two influences. These are indicated as a combiner in the column labeled "Operator". For transcription factors, an additional table next to regulator table will be shown. This table shows regulons that are influenced by the transcription factor.
Network inference algorithm uses de novo motif prediction for assigning genes to regulons. If there are any motifs identified in the upstream region of a gene, the motif will be shown here. For each motif sequence logo, consensus and e-value will be shown.
Functional annotations for a given gene might be important to connect regulatory information with physiology of the cell. Therefore, functional annotations for each gene collected from KEGG, Gene Ontology (GO), Cluster of Orthologous Groups (COG) and TIGR Roles will be shown here. The function name is linked to its explanation and more details.
Regulon Members Tab
Identity of gene members in a regulon may help to identify potential interactions between different functional modules. Therefore, neighbor genes that share the same regulon(s) with gene under consideration are shown here. For each member, gene name, description and regulons that contain it are listed.
More general help can be accessed by clicking the help menu in the main navigation bar.
Network Portal is designed to promote collaboration through social interactions. Therefore, interested researchers can share information, questions and updates for a particular gene. Users can obtain login information by registering for our website. Alternatively, they can use their Facebook or Google accounts to connect to this page. Each regulon and gene page includes a comments tab that lists the history of interactions for that gene. You can browse the history, make updates, raise questions and share these activities within the social web. In future releases of the network portal, we are planning to create a personal space for each user that contains all the analysis steps performed by the user along with relevant information. It will be possible to share these personal spaces with other users.
CircVisOur circular regulon explorer is adapted from visquick, which was originally developed by Dick Kreisberg of Ilya Shmulevich lab at ISB for The Cancer Genome Atlas. We use a simplified version of visquick to display distribution of regulon members and their interactions across the genome. This view provides summary of regulation information for a gene. The main components are;
- 1. All genomic elements for the organism are represented as a circle and each element is separated by black tick marks. In this example "chromosome" and "pDV" represent the main chromosome and plasmid for D. vulgaris Hildenborough.
- 2. Source gene
- 3. Target genes (other regulon members)
- 4. Interactions between source and target genes for a particular regulon
- 5. Regulon(s) that the source gene and target genes belong to
- 6. Visualization legend
Navigating Module Pages
A network view of the module is created using cytoscapeWeb and enables dynamic, interactive exploration of the module properties. In this view, module member genes, motifs, and regulatory influences are represented as peripheral nodes connected to core module node via edges. Module members are green circles, regulators are red triangles and motifs are blue diamonds. Selection of a node gives access to detailed information in a pop-up window, which allows dragging and pinning to compare multiple selections. Selecting module members will show information about the selected gene such as name, species and fucntions. Motif selection will show motif logo image and e-values. Bicluster selction will show expression profile and summary statistics for the module.Module member Regulator Motif
For each module, single or AND logic connected regulatory influences are listed under the regulators tab. These regulatory influences are identified by Inferelator. Table shows name of the regulator and its type. tf: Transcription factor, ef: Environmental factor and combiner:Combinatorial influence of a tf or an ef through logic gate. Tabel is sortable by clicking on the arrows next to column headers.
Transcription factor binding motifs help to elucidate regulatory mechanism. cMonkey integrates powerful de novo motif detection to identify conditionally co-regulated sets of genes. De novo predicted motifs for each module are listed in the module page as motif logo images along with associated prediction statistics (e-values). The main module page also shows the location of these motifs within the upstream sequences of the module member genes.
Motifs of interest can be broadcasted to RegPredict (currently only available for Desulfovibrio vulgaris Hildenborough) in order to compare conservation in similar species. This integrated motif prediction and comparative analysis provides an additional checkpoint for regulatory motif prediction confidence.
Biological networks contain sets of regulatory units called functional modules that together play a role in regulation of specific functional processes. Connections between different modules in the network can help identify regulatory relationships such as hierarchy and epistasis. In addition, associating functions with modules enables putative assignment of functions to hypothetical genes. It is therefore essential to identify functional enrichment of modules within the regulatory network.
Functional annotations from single sources are often either not available or not complete. Therefore, we integrated KEGG pathway, Gene Ontology, TIGRFam and COG information as references for functional enrichment analysis.
We use hypergeometric p-values to identify significant overlaps between co-regulated module members and genes assigned to a particular functional annotation category. P-values are corrected for multiple comparisons by using Benjamini-Hochberg correction and filtered for p-values ≤ 0.05.
Network Portal presents functional ontologies from KEGG, GO, TIGRFAM, and COG as separate tables that include function name, type, corrected and uncorrected hypergeometric p-values, and the number of genes assigned to this category out of total number of genes in the module.
Gene member table shows all the genes included in the module. Listed attributes are;
- Name: Gene name or Locus tag
- Common Name: Gene short name
- Type: Type of the feature, usually CDS.
- Gene ID: Link to NCBI Gene ID
- Chromosome: Chromosome name from annotation file
- Start/End:Feature start and end coordinates
- Strand: strand of the gene
- Description: Description of the gene from annotation file
- TF: If the gene is a Transcription Factor or not.
If you are browsing the Network Portal by using Gaggle/Firegoose, firegoose plugin will capture the NameList of the gene members. Captured names can be saved into your Workspace by clicking on "Capture" in the firegoose toolbar or can be directly sent other desktop and web resources by using "Broadcast" option.
Residual: is a measure of bicluster quality. Mean bicluster residual is smaller when the expression profile of the genes in the module is "tighter". So smaller residuals are usually indicative of better bicluster quality.
Expression Profile: is a preview of the expression profiles of all the genes under subset of conditions included in the module. Tighter expression profiles are usually indicative of better bicluster quality.
Motif e-value: cMonkey tries to identify two motifs per modules in the upstream sequences of the module member genes. Motif e-value is an indicative of the motif co-occurences between the members of the module.Smaller e-values are indicative of significant sequence motifs. Our experience showed that e-values smaller than 10 are generally indicative of significant motifs.
Genes: Number of genes included in the module.
Functions: We identify functional enrichment of each module by camparing to different functional categories such as KEGG, COG, GO etc. by using hypergeometric function. If the module is significantly enriched for any of the functions, this column will list few of the these functions as an overview. Full list of functions is available upon visiting the module page under the Functions tab.
What is Gaggle?
The practice of systems biology depends upon many software tools, operating on many kinds of data from many different sources. Each of these tools typically excels at one (or a few) types of analysis with one (or a few) types of data. A crucial challenge, therefore, is to combine the capabilities of these and other, forthcoming tools to create a data exploration and analysis environment which can do justice to the variety and complexity of systems biology.
The Gaggle is a simple, open-source Java software environment which solves this problem. Guided by the classic software engineering strategy of separation of concerns and a policy of semantic flexibility, it combines existing popular programs and web resources into a user-friendly, rich, and easily extended environment in which to do systems biology.
We currently support a number of geese -- our name for any open source software which is adapted to run in the gaggle. This adaptation is generally only a small amount of programming work. Once gaggled the program can broadcast and receive any of a small number of data types which together constitute an adequate basis for exploratory analysis in systems biology. These data types include:
- Name list (i.e., these genes are interesting)
- Bicluster: Name list combined with a condition list (i.e., these genes are interesting in these conditions)
- Tuple (replaces HashMap): a collection of name/value pairs
- Matrix: rows and columns, each named, containing numerical data
- Network: a collection of nodes and edges, with arbitrary tuples associated with each
- Gaggle Boss Workflow Enabled
- Annotation. Search for functional annotations.
- Bioinformatics Resource Manager. A general purpose data management, analysis and integration environment for systems biology.
- Cytoscape. Network visualization. Workflow Enabled
- DMV. Datamatrix Viewer.
- Firegoose. Connect the Gaggle to the Web. Workflow Enabled
- Genome Browser. View data in genomic context.
- MatGoose. Exchange data between MatLab and the Gaggle.
- MeV. MultiExperiment Viewer. Cluster and visualize microarray data. Workflow Enabled
- R Goose. Exchange data between the R statistical environment and the Gaggle.
- Translator. Translate identifiers between naming systems or (by orthology) between organisms.
- Script Goose. A goose with an embedded HTTP server and command-line client, allowing programs written in any language to communicate with the Gaggle.
Gaggle is designed to make it easy to connect programs and databases with a minimum of effort. There are several third party Gaggle-connected applications.
Gaggle Development Principles
- The core components of the Gaggle are the Boss and the Goose interfaces. Because all Geese depend on these interfaces, changes to them require changes to all Geese. Therefore, changes to these interfaces must be carefully controlled to maintain compatibility.
- The Boss and Goose interfaces should be revised rarely and carefully.
- The right to change core components is reserved by the Baliga Lab at ISB.
- Feature requests for core components will be carefully considered.
- The Gaggle is designed as a system of loosely coupled components:
- Almost any desired functionality can be implemented as a Goose.
- Existing Data types:
- name list
- A few general data types are preferred over special-purpose data types.
- Almost all biological data can be represented as one of the existing types.
- New data types should be introduced only when necessary.
- Since the Gaggle is a merely a conduit for transferring data between applications, it does not need to assign specific meaning or roles to the data.
- Interpreting the data is left up to the user and the applications.
- The Gaggle is open source.
- The source repository for the core components of the Gaggle will be maintained at ISB.
- Permission to commit to the source repository will be granted on a case-by-case basis. Reading the source repository is open to all.
- Contributions of code, especially new Geese, are welcome.
The Gaggle Paper
These principles are more fully explained in the Gaggle paper.
Gaggle Workspace offers a comprehensive data analysis environment allowing users to:
- Capture and organize data from websites and local files;
- Manipulate data from the Workspace and via automatically opened applications;
- Create workflows automatically or manually which include multiple applications, websites, and datasets;
- Save the state of an analysis session across all open websites and desktop applications and reload the state later from any machine;
- Automatically generate summary reports;
- Keep a log of recent activities.
1. What is the Dataspace tab? How can I upload data to it?
The Dataspace is a central place to store and manipulate the data.
There are two types of data. First, the system generated data, which is in the "Network Portal Files" table of the Dataspace tab.
Second, user captured data, which is displayed in the "User Files" table. You can use the "Capture" button of the Firegoose to capture data from any web page (See this video). You can also upload your file to the workspace (this video explains how). You can save these data so you can reuse them later.
To capture data from various geese to the workspace, simply broadcast the data from the goose to Firegoose, and then use the "Capture" button to save the data to the Workspace.
2. What can I do with the data in the Dataspace? What applications are supported by the Gaggle Workspace?
You can open the data by selecting "Open" from the dropdown list on each row of the tables. Gaggle will try to automatically start the Boss and the goose associated with the data. For example, we will automatically start MeV if the file is a "MeV Analysis File". If we cannot decide which goose to open, a dialog will pop up and you can select which one to use to open the data. This video explains how to open data from the workspace. Once the geese are open, you can do analysis and broadcast data between them. For now we support three geese: Firegoose, Cytoscape, and MeV. Other geese will be added later.
3. What is the Workflow tab? How can I build a workflow?
The Gaggle Workspace supports the concept of workflow. It is very often that biologists need to perform repetitive analysis tasks where the same/different inputs are analyzed by using the same/different tools multiple times. In addition, a number of analyses tools may need to be used for analyzing the same set of inputs. The workflow allows automation of these repetitive tasks by linking inputs and outputs for various desktop and web based analysis tools.
There are two ways you can create a workflow. First, the system will automatically record the geese you opened, and the source and target of each broadcast, and display it as a workflow on the "Workflow" page. You can also manually create workflows. To do this, you drag a rectangle with goose name under the "Modules" column, and drop it on to the Workflow Canvas. A box will show up. It contains the execution path of the goose (you can change it if you know the installation path). Tutorial video can be found here. More details about the workflow can be found here.
4. What is save/load states? How can I use this feature?
Save/load state is a very useful feature. You can go to the "Saved States" tab and click the "Save State" button. What it will do is it will contact each geese and tell them to save their current state. The state files will be saved on the server so it takes some time. Once it is done, you will see an notification and a new row appear in the saved states table. You can reload the state from any machine later on. Note that currently only Firegoose, Cytoscape, and MeV support saving state. We are actively working to enable more geese. This video explains save/load session states.
5. What are session reports?
Session reports are automatically generated for workflows. A report consists of snapshots of each applications of the workflow after the workflow is finished. This video explains session reports.
6. How to enable/disable more web handlers in Firegoose?
Open Firefox, in the Firegoose toolbar, click the "Target" dropdown menu, select "More" menuitem. An "Enable Website Handlers" dialog will popup. Checking/unchecking the checkbox besides a web handler enables/disables it.
7. How to add custom web handlers in Firegoose?
Open Firefox, in the Firegoose toolbar, click the "Target" dropdown menu, select "Custom" menuitem. An "Add New Website Target" dialog will popup. Enter a descripte "Name" for the target and fill in the url. There are three mechanisms to receive.
- Empty textbox awaiting to be filled: Enter Website's DOM object's id in to the textbox
- Expects arguments embedded in the requested URL: Enter the name of the parameter to be used in the URL in to the textbox
For example: If you would like to add "Network Portal" as target, New Target's name: Network Portal New target's URL: http://networks.systemsbiology.net/search Select: Expects arguments embedded in the requested URL Enter "q" in the textbox. When you broadcast your gene list from Firegoose into Network Portal,a new tab will open with search results using your gene list as query. The URL will look like that:http://networks.systemsbiology.net/search?q=DVU0692%3BDVU0693
A series of video tutorials can be found here.
Researchers often come across repetitive analysis tasks where the same/different inputs are analyzed by using the same/different tools multiple times. In addition, a number of analyses tools may need to be used for analyzing the same set of inputs. The workflow allows automation of these repetitive tasks by linking inputs and outputs for various desktop and web based analysis tools.
The Workflow is developed based on our extensive experience with Gaggle in exchanging data between independently developed software tools and databases to enable interactive exploration of systems biology data. Exporting the "boss" and "goose" concept into the workflow, each goose becomes a module in the workflow. These modules can be added into a workflow sequentially or in parallel and can accept output from other modules in the workflow. Once the workflow is built, it can be saved into a personal workspace and can be called out anytime to invoke the same analysis steps.
Case 1 - Discovering Detailed Regulon Information
This exercise starts with a regulon in Network Portal. We will use a very simple workflow idea that includes broadcasting data via Firegoose to multiple web resources simultaneously. We will broadcast a Namelist of regulon members to DAVID for functional enrichment analysis and to Entrez to access NCBI tools. We will also broadcast motif PSSM information to STAMP in order to identify matches to known motifs.STEPS:
- Start Gaggle boss
- First we will collect regulon info from a website by adding a firegoose module. Path to Firefox should be automatically detected. Leave subaction unchanged. Enter data url, or web resource to get regulon information. In this example we will use one of Methanococcus maripaludis GRN regulon. (http://baliga.systemsbiology.net/cmonkey/enigma/mmp/cmonkey_4.8.8_mmp_1661x58_11_Oct_11_16:14:07/htmls/cluster0114.html)
- Add firegoose module for DAVID.
Select the "DAVID" option from subaction menu and connect to the first Firegoose module.
Manual: You need to convert the Namelist to a DAVID Id list once it has been submitted to DAVID.
- Add firegoose module for Entrez. Select "Entrez Protein" subaction and connect to the first Firegoose module.
- Last, we will add another firegoose module for STAMP. Select the "STAMP" subaction and connect to the first Firegoose module.
- Optional: Save your workflow
- Run your workflow by using "Run" button in the control panel.
Manual: You need to select an appropriate database for motif searching on the STAMP site.
Case 2 - Analyzing gene expression data
The second exercise includes both desktop applications and web resources. The aim of this workflow is to automate analysis of gene expression data and identify functional associations, functional enrichments and associated regulatory network modules. We will start by opening a gene expression file (tab delimited) in Mev. After performing statistical analyses, we will broadcast selected genes via Firegoose to KEGG (Metabolic pathway enrichment) and EMBL Strings (Functional associations). Inparallel, we will also open a Network model in Cytoscape and broadcast our gene list to the network in order to select network modules that include these genes.STEPS:
- Start Gaggle boss
- We will add mev module to open a gene expression file. Leave subaction unchanged. If you are opening the gene expression file from the web, enter the url; otherwise, enter the path to the file on your computer. In this example we will use gene expression data coming from studying energy metabolism for Desulfovibrio vulgaris.
We will compare gene expression upon growth on Pyruvate+Sulfate versus Pyruvate/Fermentation.
Manual: After your workflow has started Mev will load the gene expression file. You can perform SAM analyses and then select genes that are upregulated in Pyruvate/Fermentation conditions and then broadcast them to the workflow.
- Add firegoose module for EMBL Strings. Select "EMBL Strings" subaction and connect to MEV module. The EMBL Strings website will use the Namelist for our selected genes and search against the Strings database to identify functional associations. The resulting functional association network will be shown and captured in the workflow report.
- Add firegoose module for EMBL Strings Select "EMBL Strings" subaction and connect to MEV module. EMBL Strings website will use Namelist for our selected genes and search against Strings database to identify functional associations. Resulting functional association network will be shown and captured in the workflow report.
- Last, we will add Cytoscape module to open a network file. (For this example we will use GRN model for D. vulgaris). Connect Cytoscape module to MEV module and enter path to your cytoscape network file (web or desktop). When workflow is started cytoscape will open your network file and capture broadcast list from MEV. Upregulated genes will be selected in the network for further network manipulations.
- Optional: Save your workflow
- Run your workflow by using "Run" button in the control panel.
Case 3 - Assigning functions to hypothetical proteins
Network Portal Screencasts
For more screencasts visit our YouTube channel
One of the key benefits of the Gaggle and Gaggle Workspace is that anyone can implement their own Geese. We hope to promote interoperability among bioinformatics applications by making this as easy as possible. Have an application you'd like to see included on this page? Please contact us by email (sturkars at systemsbiology.org) or through the discussion groups.
We strongly encourage developers to take advantage of easy implementation features in Gaggle and Network Portal. If you are a developer who would like to integrate his/her desktop/web applications with Gaggle/Network Portal we provide all the support you need.
Gaggle Subversion Repository
The Gaggle's source code is all open source, licensed under the LGPL, and served using the Subversion source code control system.
The current development branch of Gaggle is located in: http://gaggle.systemsbiology.net/svn/gaggle/gaggle/trunk
- To use the Subversion archive hosted at the Institute for Systems Biology,
take the following steps. (These instructions use unix-style commands; Windows
users translate accordingly.)
- Locate or install a Subversion client. (Below we use the command-line client, svn.)
svn checkout http://gaggle.systemsbiology.net/svn/gaggle/gaggle/trunk yourInstallationDirectory
- For authenticated access (allowing commits) to the Subversion archive, a password is required:
svn checkout --username username --password password http://gaggle.systemsbiology.net/svn/gaggle/gaggle/trunk yourInstallationDirectory
Firegoose Source code
- Firegoose source is here: http://gaggle.systemsbiology.net/svn/gaggle/FireGoose/trunk/
Network Portal Source code
- Network Portal source is here: https://github.com/baliga-lab/network_portal
Gaggle Workspace API
- Workspace API documentation is here: API Documents