Frequently Asked Questions (FAQs)

    Data Input

  1. What are accepted data inputs for NetworkAnalyst?
  2. How to label a gene expression table?
  3. How to format a gene expression table?
  4. What if my microarray platform or organism is not supported?
  5. How to prepare network files for NetworkAnalyst?
  6. What are the advantages of registering an account?
  7. Data Processing

  8. What does gene-level summarization mean?
  9. How should I choose a suitable normalization procedure?
  10. Differential Expression Analysis

  11. My data contains multiple metadata, how should I choose a proper method for differential analysis?
  12. I received the error message "no residual degrees of freedom", what should I do?
  13. What are the differences between pair-wise comparisons and time-series comparisons?
  14. What is nested comparison?
  15. Meta-analysis

  16. How do I minimize study specific, platform specific bias of my gene expression datasets?
  17. How does ComBat work for batch effect adjustment?
  18. What are the differences between combining p-values, metaGSEA and direct merging approaches?
  19. Why are there grey cells in the meta gene sets heatmap?
  20. Network Construction

  21. How are the networks generated from my data?
  22. Which database is used for creating the PPI network?
  23. How many nodes can be visualized (network size limit)?
  24. How do I construct a network composed only of nodes connecting the seed proteins?
  25. What if the network is very small (< 100)?
  26. Can I construct a network for a large number (> 1000) of significant genes?
  27. How does the Trim function work?
  28. Network Analysis

  29. How do I identify important nodes using degree and betweenness?
  30. What is a module and how are modules identified in NetworkAnalyst?
  31. How do I interpret the p value of a module?
  32. Can I perform GO or pathway analysis on my significant genes alone?
  33. Can I test enriched functions for highlighted selection?
  34. How is enrichment analysis performed?
  35. Network Visualization

  36. How do I interpret the node colors and sizes in the default Topo View?
  37. Can I view my queries in the network?
  38. How can I create a 300 dpi high-resolution network for publication?
  39. Can I change the background color of the network?
  40. Can I change node color, size or shape?
  41. Can I change the position of a node?
  42. Can I change the position of a node cluster?
  43. How can I label the nodes in the network?
  44. Can I hide all the node labels in the network?
  45. Can I delete nodes from the network?
  46. How do I manually highlight any arbitrary sections of the network?
  47. Can I extract a module or the highlighted section from the network?
  48. How do I proceed to 3D and/or VR view of the current network?
  49. Functional Enrichment Analysis

  50. Can I view the expression profile of a pathway or a GO category?
  51. What is the difference between the enriched groups from heatmap clustering, network analysis, GSEA and GlobalTest?
  52. What is Gene Set Enrichment Analysis (GSEA)?
  53. Which metric is used for gene ranking before performing GSEA?
  54. What is Enrichment Score?
  55. What is GlobalTest?
  1. What are accepted data inputs for NetworkAnalyst?

    1. Gene or protein list data: a list of gene or protein IDs with optional expression profiles (i.e. fold changes). Each gene should be in a row. Please refer to our example data for more details.
    2. Short-reads RNA-Seq data: users can upload single or paired-ends RNA-Seq fastq files and perform quality checking, trimming, mapping using well-established Galaxy pipeline. Please note, as the task can not be complete in real time, (free) registration required - users will need to provide a valid email in order to retrieve the result later.
    3. A single gene expression data table: a data table containing expression values (i.e. gene/probe intensities from microarray, counts from RNA-seq saved as a tab delimited text file (.txt) with rows for features (genes/probes) and columns for samples. The tab delimited file can be generated from any spreadsheet program. More details are provided in the following sections.
    4. Multiple gene expression tables: this is required for meta-analysis when users want to integrate the gene expression data from multiple studies collected under similar conditions
  2. How to label a gene expression table?

    It is critical to properly label your data so that they can be recognized and compared. The following common gene and probe IDs are supported:

    1. Gene ID: Entrez ID, Ensembl Gene ID, GenBank Accession ID, RefSeq ID, Ensembl Transcript ID, and official Gene Symbol
    2. Probe ID (for human, mouse and rat only): popular microarray plotforms from Affymetrix, Agilent, Illumina;

    The gene expression data also should contain sample names in the first line. The class labels of experimental conditions should be in a new line beginning with "#CLASS". Multiple class labels can be indicated by adding a colon and its name (for example, "#CLASS:cancer_type" and "#CLASS:stage"). For meta-analysis, the same set of labels must be used for ALL datasets.

  3. How to format a gene expression table?

    Here is a good tutorial on how to generate tab delimited text files from the Excel Spreadsheet program. When open your data using any text editor (for example, WordPad), it should look like the following:

    • Sample name, one class label (one missing value)
      #NAME	Sample1	Sample2	Sample3	Sample4	Sampl5	Sampl6	Sample7	Sample8
      #CLASS	case	case	case	case	control	control	control	control
      Gene1	-3.06	-2.25	-1.15	-6.64	0.4	1.08	1.22	1.02
      Gene2	-1.36	-0.67	-0.17	-0.97	-2.32	-5.06	0.28	1.32
      Gene3	1.61	-0.27	0.71	-0.62	0.14		0.11	0.98
      Gene4	0.93	1.29	-0.23	-0.74	-2	-1.25	1.07	1.27    
    • Sample name, two class labels (cancer type and gender)
      #NAME           Sample1	Sample2	Sample3	Sample4	Sampl5	Sampl6	Sample7	Sample8
      #CLASS:CANCER	case	case	case	case	control	control	control	control
      #CLASS:SEX	F	F	M	M	F	M	F	M
      Gene1           -3.06	-2.25	-1.15	-6.64	0.4	1.08	1.22	1.02
      Gene2           -1.36	-0.67	-0.17	-0.97	-2.32	-5.06	0.28	1.32
      Gene3           1.61	-0.27	0.71	-0.62	0.14		0.11	0.98
      Gene4           0.93	1.29	-0.23	-0.74	-2	-1.25	1.07	1.27   
  4. Which network files are supported by NetworkAnalyst?

    NetworkAnalyst support four different types of files (.sif, .txt(edge list), .graphml and .json).
    Please click on the following links to see example files supported:

  5. What are the advantages of registering an account?>

    Registering on NetworkAnalyst allows you to save up to 10 projects which will be stored in the system for 10 months. You will be able to reload the work state of previous projects.

  6. What if my microarray platform or organism is not supported?

    For other microarray platforms for human, mouse, C. elegans and D. melanogaster, you can first annotate your probe IDs to gene IDs supported by NetworkAnalyst using the corresponding annotation file of the platform. It is possible to add support for other model organisms/platforms based on user requests. Feel free to send us your suggestions (jeff.xia [at]

  7. What does gene-level summarization mean?

    Microarray data provides probe-level expression measurements, and RNA-seq data provides expression at exon-level or transcript-level (i.e. different isoforms of the same gene) expression measurements. However, current functional annotations are mainly assigned at gene or protein level. Therefore, it is desirable to first map the probe-level or transcript-level measurements to corresponding gene-level measurements.

    When multiple probes or transcripts are mapped to the same gene, they need to be summarized into a single value for the corresponding gene. At the Gene Annotation step, users can choose to use the averages or medians of multiple probe intensities (microarray), or sums of counts from multiple transcripts (RNA-seq) to perform gene-level summarization.

  8. How should I choose a suitable normalization procedure?

    Yes, if the data is not already log transformed. This is mainly because the program uses linear model (Limma) for differential expression analysis. It is generally considered that differences in expression exist on a multiplicative scale: log transformation brings them into the additive scale, where a linear model (i.e. Limma) may apply. Log transformation can usually make the distribution more symmetric and Gaussian-like, allowing many additional statistical analyses to be applied.

    In order to perform log transformation, the data must not contain zero or negative values. To deal with this issue, We provides three versions. The Log_simple replaces only these values with a small positive value (i.e. detection limit); The Log_vsn_max will add large values to all data values with some adjustments based on the actual values; The Log_vsn_min is similar to the Log_vsn_max, but the values added are very small so as to be close to the original scales. The underlying R codes are given below. Note, if you are performing meta-analysis, you should NOT use different log normalization for different datasets. The transformation has a very large impact on the data. The first one is simple and easy to understand (suitable if data contains a small portion of negative values). The last one gives more desirable statisitcal properties (suitable for data with large amount of negative values).

    Log_simple:     min.val <- min(data[data>0], na.rm=T)/10;
                    data[data<=0] <- min.val;
                    data <- log2(data);
    Log_vsn_min:    min.val <- min(data[data>0], na.rm=T)/10;
                    data <- log2((data + sqrt(data^2 + min.val^2))/2);
    Log_vsn_max:    max.val <- max(data[data>0], na.rm=T)*10;
                    data <- log2(data+sqrt(data^2+max.val))

    If you are not sure whether the data is already log transformed or not, you can easily figure this out by visualizing the data (i.e. boxplot). For microarray data, log transformed data values are usually less than 16. For count data with 1 million count, log2(1,000,000) is less than 20. Therefore if all data values are all below 20, it is reasonable to assume that the data has already been log transformed.

  9. My data contains multiple metadata, how should I choose a proper method for differential analysis?

    The answer depends on your biological questions. Here are several suggestions:

    • Do a simple analysis first using the primary metadata of interest;
    • If you want to include secondary metadata, you need to decide whether this metadata is of interest by itself, or is included because it potentially affects the results of the primary metadata (e.g. studies where multiple samples are collected from the same subjects including paired samples, tissue types, or any potential batch effect). In the first case, a two-factor analysis is appropriate (i.e. you are interested in two independent metadata and their interactions). In the second case, the second metadata is a blocking factor. NetworkAnalyst will conduct comparisons within the block, which typically improves the accuracy of the result;

  10. I received the error message "no residual degrees of freedom", what should I do?

    This means you do not have enough samples to perform the analysis you specified. This usually happens when you want to combine two metadata for an independent two-factor analysis (i.e. the second metadata is not specified as a blocking factor). In this case, the total number of groups will be the product of the group numbers in each metadata (i.e. if the primary metadata contains 3 groups, and the secondary metadata contains 4 groups, the total groups will be 3 * 4 = 12 for the combined analysis). We recommend a minimum of 3 samples per group, therefore at least 36 samples are required in order to perform the analysis.

    In this case, you should focus on a single primary metadata and leave the seconday metadata as "Not available", and perform differential analysis with regard to individual metadata. You can then choose the other metadata as the primary metadata and perform the analysis again. If there are no or very few significant genes identified, it is most likely that incorporating the metadata into the analysis will not affect the result.

  11. What are the differences between pair-wise comparisons and time-series comparisons?

    The time-series comparison is only a subset of "all pairwise" comparisons. A time-series comparison only compares two groups that are directly neighbouring each other. For instance, take three groups A, B, and C. The "all pairwise" comparison will be A-B, A-C, and B-C; however, the time-series analysis will only compare A-B and B-C.

  12. What is nested comparison?

    In nested comparisons, the results from two differential expression analyses are compared and combined. For example, assume there are four conditions: A, B, C, D. If you choose the nested comparison as (B-A) versus (D-C), then the final significant genes (full model) are from three different sources:

    1. Genes significant in analysis B-A;
    2. Genes significant in analysis D-C;
    3. Interactions: genes significant in the overall comparison (i.e. genes that respond differently in B-A vs D-C);
    Note, you can choose to return significant genes from the interaction only.

  13. How do I minimize study specific, platform specific bias of my gene expression datasets?

    To normalize the gene expression values across datasets, we recommend the use of ComBat algorithm. At the quality check page, after datasets upload, you can first visualize the PCA clustering of samples from different datasets. If obvious batch effects are observed, select the checkbox located below the summary table to perform ComBat.

  14. How does ComBat work for batch effect adjustment?

    The method uses an empirical Bayes approach for adjusting batch effects in microarray and RNA-seq expression data. The algorithm can be summarized in three main steps:

    1. Genes are standardized to have similar overall mean and variance;
    2. Information is pooled across genes from a batch to estimate batch effects (increased level of expression, high variability, etc)
    3. Adjust the batch effects to obtain normalized data.
    For more information, refer to the original publication: Adjusting batch effects in microarray expression data using empirical Bayes methods

  15. What is the difference between combining p-values, metaGSEA approach and direct merging approaches?

    • Combining p-value is similar to the equivalent in gene-level meta-analysis as it first identifies enriched gene sets/pathways in each individual dataset (using GSEA method in this case) and uses combine the p-values of these gene sets across datasets. This approach is preferred when there is little overlap between genes contained in each dataset
    • metaGSEA approach first performs gene-level meta-analysis (combining effect size in this case) to obtain a list of genes coupled with gene-level statistics (effect size and p-value). GSEA function is then performed on this list of genes.
    • Direct merging approach merges the individual data matrix into a large data matrix composed of genes shared by all datasets. It then perform differential expression analysis on that merged data matrix. GSEA is then performed using the result of DE analysis.

  16. Why are there grey cells in the meta gene sets heatmap?

    This can happen if the gene set meta-analysis combining p-values or vote count approach is used. The grey cells mean that the corresponding dataset do not have expression values for this particular gene.

  17. How are the subnetworks generated from my data?

    The networks are generated by first mapping the significant genes/proteins to the underlying PPI database. A search algorithm is then performed to identify first-order neighbours (proteins that directly interact with a given protein) for each of these mapped proteins ("seeds"). The resulting nodes and their interaction partners are returned to build the subnetworks.

    The above approach will typically return one giant subnetwork ("continent") with multiple smaller ones ("islands"). Most subsequent analyses are performed on the continent. Note, networks with less than 3 nodes will be excluded.

  18. Which databases are used for creating the PPI network?

    NetworkAnalyst uses a comprehensive high-quality protein-protein interaction (PPI) database based on InnateDB. The database contains manually curated protein interaction data from published literature as well as experimental data from several PPI databases including IntAct, MINT, DIP, BIND, and BioGRID. The database currently contains 14755 proteins and 145955 interactions for human, and 5657 proteins and 14491 interactions for mouse. For C. elegans and D. melanogaster the PPI data is from the iRefWeb.

    Unless otherwise specified, PPI data added recently for new organisms were downloaded from the STRING database (version 10). The database contains information from numerous sources (including experimental data, computational prediction methods and public text collections), and is probably the only resource for less well-studies organisms.

  19. How many genes can be visualized (network size limit)?

    The visualization is actually limited by the performance of users' computers and screen resolutions. Too many nodes will make the network too dense to visualize and the computer slow to respond. We recommend limiting the total number of nodes to between 200 ~ 2000 for the best experience. For very large networks, please make sure you have a decent computer equipped with a modern browser (we recommend the latest Google Chrome).

  20. How do I construct a network composed only of nodes connecting the seed proteins (minimum interaction network)?

    You need to first create a large network connecting most of the seed proteins/genes. For instance, if the largest subnetwork from the default first-order interaction does not include most of the seed proteins, you can try to expand the network first. Note, some nodes may never connect to the main network due to the incomplete coverage of the PPI database. When you are satisfied with the result, trim the network to its minimum. Note, if there are many seed proteins, the procedure can take a while to compute.

  21. What if the network is very small (< 100)?

    NetworkAnalyst allows you to increase your network during the network construction step. You can either increase the input gene number or expand your search of the PPI database to higher-order interactors (i.e. including both friends and friends of friends).

  22. Can I construct a network for a large number (> 1000) of significant genes?

    When there are a large number of signficant genes or seed proteins, the resulting networks will be too large and complex to be visualized or interpreted. There are two possible solutions here:

    1. To reduce the networks using direct connections between seed proteins (zero-order interactors);
    2. To trim the networks to keep only seeds and their connecting nodes;
    3. To reduce the input genes by using larger fold change and/or smaller p value cutoffs;
    The above approaches aim to reduce the network size and complexity, and to retain the most relevant information for downstream functional analysis.

  23. How does the Trim function work?

    The "Trim" function is designed for cases when the first-order subnetwork is too large or too dense to be visualized effectively. The goal is to extract a minimally connected subgraph containing all the seed genes from this "big and dense" subnetwork. This is a well-known Steiner tree problem and the exact solution is far too slow to use on the public server. NetworkAnalyst implements an approximate approach based on shortest paths: we compute pair-wise shortest paths between all seed nodes, and remove the nodes that are not on the shortest paths. Some optimizations have also been applied to improve its performance when there are large numbers of nodes.

  24. How do I identify important nodes using degree and betweenness?

    Important nodes can be identified based on their position within the network. The assumption is that changes in the key positions of a network will have more impact on the network than changes on marginal or relatively isolated positions. NetworkAnalyst provides two well-established node centrality measures to estimate node importance - degree centrality and betweenness centrality. In a graph network, the degree of a node is the number of connections it has to other nodes. Nodes with higher node degree act as hubs in a network. The betweenness centrality measures the number of shortest paths going through the node. It takes into consideration the global network structure. For example, nodes that occur between two dense clusters will have a high betweenness centrality even if their degree centrality values are not high. Note, you can sort the node table based on either degree or betweenness values by double clicking the corresponding column header.

  25. What is a module and how are modules identified in NetworkAnalyst?

    Modules are tightly clustered subnetworks with more internal connections than expected randomly in the whole network. They are considered as to be relatively independent components in a graph. Members within a module are likely to work collectively to perform a biological function. The biological functions of a module can be revealed by functional enrichment analysis as described below.

    NetworkAnalyst currently uses a random walk based approach known as the Walktrap Algorithm for module detection. The general idea is that if you perform random walks on the graph, then the walks are more likely to stay within the same module because there are only a few edges that lead outside a given module. The Walktrap algorithm runs multiple short random walks and uses the results of these random walks to merge separate modules in a bottom-up manner.

    NetworkAnalyst also integrates the gene expression values as edge weights during module searches. Weights are calculated as the square of the mean absolute log fold changes of the two adjacent nodes. Larger weights mean closer connections during random walks. To avoid zero-weight errors for non-seed proteins during program run, pseudo-expression values are given to non-seed proteins of 1/10 of the minimal absolute log fold changes of the seed proteins. By giving larger weights to seed proteins, the program encourages detecting modules containing more seed proteins (shorter distances).

  26. How do I interpret the p value of a module?

    Let's call the edges within a module "internal" and the edges connecting the nodes of a module with the rest of the graph "external". Then the p value of a given module can be calculated using a Wilcoxon rank-sum test of the "internal" and "external" degrees. The null hypothesis of the test is that there is no difference between the number of "internal" and "external" connections to a given node in the module. More internal than external edges show that the module is significant. Note, the p values are calculated solely based on their connectivity. Users should also consider whether they are 'active' under the experimental conditions, by taking into account of the number of seed proteins, their average fold changes, as well as enriched functions, as displayed in the Module Explorer table.

  27. Can I perform GO or pathway analysis on my significant genes alone?

    Yes, you can test enriched gene ontologies or pathways (KEGG/Reactome) for only your query genes. To do so, first select and highlight query genes using the Highlight Color toolbar on the top left (you may have to highlight twice for upregulated and downregulated genes respectively); or you can use the Hub Explorer and select queries from the node table. After that, select a functional catergory from the Function Explorer section, and click the Submit button.

  28. Can I test the enriched functions of my highlighted selection?

    Yes. Users can perform enrichment tests on currently highlighted nodes in the network.

    • Module highlight: automatic: first perform module detection, then click on a module; manual: Set Scope to "including dependents", double click a node in the network to highlight the node together with its direct neighbours. Repeat the process to select more nodes.
    • Node highlight: manual: select nodes from the node table on the left or by double clicking on a node (Single Mode); automatic: using Hub Highlighting or Data Highlighting to select nodes based on degree or betweenness values.
    After you have selected the nodes or modules, click the Perform Enrichment Analysis button. The result table will be displayed in the panel below. Note, enrichment analyses are performed on ALL currently highlighted nodes. To ensure only your current selections are being used, first Reset the network, then perform highlighting/selections before performing the enrichment analysis.

  29. How is enrichment analysis performed?

    The enrichment analysis is to test whether any functional modules (gene sets) from the user selected library are significantly enriched among the currently highlighted nodes within the network. NetworkAnalyst's network viewer uses hypergeometric tests to compute the enrichment p values.

  30. How do I interpret the differences in node colors and sizes in the default network?

    In the default network generated by NetworkAnalyst, the size of the nodes are based on their degree values, with a big size for large degree values. The color of nodes are proportional to their betweenness centrality values. When user switches to Expression View, the color will be based on their expression values (if available).

  31. Can I view my queries in the network?

    Yes, to view your query genes or proteins, use the color palette on the top-left corner of the network viewer to set a highlight color. From the "Display Options" on the top right panel, click the "Highlight". Select "Upregulated nodes" or "Downregulated nodes", then click Submit button. You may also want to increase their node sizes by using the Size function under Node Options. Nodes will be labeled automatically when their size increase above a certain level.

  32. How can I create a 300 dpi high-resolution network for publication?

    Please use the Download option and choose "SVG Format" to save the current network view (tested using Chrome or FireFox, known issue with Safari). SVG is a vector based graphic format and you can then export it into any resolution static image (i.e. png) using a suitable graphic tool, for example, Adobe Illustrator or the free tool InkScape. Note, it is best to save SVG in white background, as the default background color in InkScape is in white. If your SVG is saved in Black background, after opening the SVG in InkScape, set the Background color to black (hex code: #222222) using the Document Properties menu.

  33. Can I change the background color of the network?

    Yes. To switch background color, click the pull-down menu next to Background on the toolbar at the top of the screen. From the dropdown menu list, select either White, Black or Custom. Selecting custom will prompts a dialog in which you can freely choose the color you want.

  34. Can I change node color, size or shape?

    You can change the color and size of a node. The shape cannot be changed in the current implementation.
    To change the node color, you need to first choose the color using the Color Palette for the next selection, then select (by clicking on the node) you want to change. The node color will be changed to your specification. You can also change the whole color spectrum of the network. Click the Node dropdown menu located on the top toolbar and click on Color option. A pop-up dialog will appear in which you are free to choose among the selection of color spectrums. Note that the nodes are colored based on their degree property (number of links to other nodes).
    To change node size, you can keep clicking it (double-clicking) to increase its size. You can also use the Node Size functions to increase or decrease the node size. Currently, the node shapes are all circles. Other node shapes are not supported.

  35. Can I change the position of a node?

    Yes. You can simply put your mouse cursor over the node. When its label shows up, left click and drag the node to a position. Release the mouse.

  36. Can I change the position of a node cluster?

    Yes. First use the Scope option on the top menu bar to make sure that the option including dependents is selected. Then drag the central node of the node cluster to a new position. Note, only dependant nodes (nodes that are only connected with the central node, but not to any other nodes) will be affected. If you also want to adjust the position of these non-dependant nodes, switch the Scope to "Current node", and then drag these nodes individually to the new position.

  37. How can I label the nodes in the network?

    Nodes will be automatically labeled when their sizes reach a certain threshold. Therefore, you can simply increase node size to label any node. To do so:

    • Label a single node:
    • Set to single node mode, and repeatedly click a node to increase its size until the label appears;
    • Right click the node of interest and click on Add Label option in the context menu. It will increase the size of the node so that the label appears.
    • Label all highlighted nodes: use the Node tab in the Display Options panel on top right, select "Highlighted nodes" and "Increase ++", then keep clicking Submit button to increase the size until labels show up.
    • If you would like to highlight all of the nodes in the current network, perform the same steps as the above, except you choose "All nodes" in the network.

  38. Can I hide all node labels in the network?

    Yes, it is possible to hide them. Click on the Nodes dropdown list located on the top menu and select label option. Click on display tab and select the "Hide" option.

  39. Can I delete nodes from the network?

    Yes. You can delete nodes (with their associated edges) from the current network. First you need to select the nodes from the Node Table in the left pane. Then click the Delete button at the top of the node table. A confirmation dialog will appear asking if you really want to delete these nodes. Note, this action will trigger network re-arrangement, especially if hub nodes are removed. In addition, "orphan" nodes may be produced due to removal. These nodes will also be excluded during re-arrangement.

  40. How do I manually highlight any arbitrary sections of the network?

    There are two basic steps in the network highlighting - setting the highlight color and making selections. Use the Color Palette to set the color for the Next selection. You also need to choose among two different Scopes:

    • Current node: for highlighting the node being clicked only;
    • Including-dependents: for highlighting the node and its direct neighbours;
    Now, double click on nodes to make your selections. Note, you can repeat the steps above to change colors and scope to make different effects.

  41. Can I extract a module or a highlighted section from the network?

    Yes. To do this, first select or highlight section of the network, then click the Extract icon on the left tool bar in the network view window. Note, the operation is expensive, and you have to wait for ~20 seconds for the extracted network to return. The returned network will be named as "moduleX" and is available in the "Network Explorer" panel on the top-left of the page for future reference.

  42. How do I proceed to 3D and/or VR view of the current network?

    To view the current network in 3D, click on 3D button located in the toolbar located at the top left corner of the network viewer.
    To view the network in VR, please make sure that you are in 3D view and click on VR button located in the same toolbar. Click on the VR goggle icon located at the bottom right corner of the new window. Make sure that you have a VR device connected to your computer.

  43. Can I view all the gene members of a pathway or a GO category within the current graph?

    Yes, after you have performed functional enrichment analysis, the over-represented themes will be displayed in the table below. By double clicking on a pathway name, all gene members of the pathway will be displayed on the focus view (heatmap analysis), or as highlighted nodes within the current network (network analysis), or as highlighted chords (chord diagrams analysis).

  44. What is the difference between the functional enrichment analysis methods from heatmap clustering, network analysis, venn diagram, GSEA and GlobalTest?

    Heatmap clustering, network viewer and venn diagram employs overrepresentation analysis to identify gene sets or pathways enriched with significant genes (and neigbouring genes in interaction networks) detected with differential expression analysis. Hypergeometric tests are used to compute the p-values.
    Gene Set Enrichment Analysis (GSEA) and GlobalTest, on the other hand, are cut-off free methods that looks at whether the gene sets or pathways are associated with certain type of expression pattern over the whole gene expression data. In contrast to overrepresentation analysis, GSEA and GlobalTest can detect weak coordinated changes of gene expression in sets of functionally related genes and are not limited by the issue of losing information from setting threshold.

  45. What is Gene Set Enrichment Analysis (GSEA)?

    GSEA is a statistical method that determines whether a predefined gene set (GO, KEGG, etc) demonstrates statistically significant difference between two groups. Taking as input a list of ranked genes and a gene set. It looks at whether the genes from the gene set is randomly distributed in the ranked list or significantly enriched in the top and bottom extremities of the ranked list. In the following schema, the gene set A is significantly enriched whether as gene set B represents a case where the genes are more randomly distributed.

  46. Which metric is used for gene ranking before performing GSEA?

    Ranking list of genes to be analyzed by GSEA is a critical step that has large consequences in the result. There is no concensus in which metric is the best to use. NetworkAnalyst uses log2 fold change multiplied by -log10 of p-values obtained from differential expression analysis.

  47. What is Enrichment Score?

    Enrichment score is the main output of GSEA method. It is the maximum deviation from zero encountered during the random walk that goes through the ranked list. It represents the amount of genes in the gene set that are over-represented at the either extremities of the list (up and down regulated expression).

  48. What is GlobalTest?

    GlobalTest is a statistical method for testing whether sets of genes are significantly associated with a response variable. Similar to GSEA, it considers the whole gene expression profile, not limited to a list of significantly differentially expressed genes. It uses Q-statistic to summarize the gene-level statistic and tests the null hypothesis that there are no differentially expressed genes in the gene set.

Processing ....
Your session is about to expire!

You will be logged off in seconds.

Do you want to continue your session?