Example Datasets
NetworkAnalyst provides multiple example datasets to illustrate data formats as well as for testing purposes. You can directly select and test these datasets using the Try Examples dialog in each module. Alternatively, You can download these datasets below.
  1. A 99 gene signature of human endotoxin tolerance, ID type: Entrez. (download here)
  2. A 96 gene signature of mouse Trem-1 activation, ID type: Entrez. (download here)
  3. Gene expression data (eight samples) of a breast-cancer cell line. Affymetrix Human Genome U95 GeneChip data, normalized, log 2 scale. Meta-data: Time (hour): 10, 48; Estrogen Receptor (ER): present, absent (download here)
  4. Gene expression (12 samples) in human PBMC using LPS as inducer. Illumina BeadArrays - Refseq ID, normalized, log 2 scale. Meta-data: Treatment: Control, LPS, LPS_LPS; Donor: 21, 46, 86, 92. (download here)
  5. Three example datasets created from three colon cancer gene expression GEO datasets (GSE13067, GSE13294 and GSE4554). A random selection of 500 genes is included for testing purposes. Affymetrix Human Genome U133 Plus 2.0 Array. All datasets have been normalized, log2-transformed. Meta-data: group 1: microsatelite instable; group 2: microsatelite instable. (download here)
  6. RNAseq data (8 samples) in mouse bone marrow-derived macrophages (BMDM) infected with Salmonella Typhimurium. Illumina HiSeq - Ensembl Gene ID, raw count table. Meta-data:CLASS: Infected, Control (download here)
  7. An example session file created using the above three colon cancer gene expression datasets (dataset1, dataset2 and dataset3) from the Multiple gene expression datasets module. (download here)

Data Format Overview

  • Gene or protein list data: a list of gene or protein IDs with optional expression profiles (i.e. fold changes). Each gene should be in a row. Please refer to our example data for more details.
  • Gene expression data: a data table containing expression values (i.e. gene/probe intensities from microarray, counts from RNA-seq saved as a tab delimited text file (.txt) with rows for features (genes/probes) and columns for samples. The tab delimited file can be generated from any spreadsheet program. More details are provided in the following sections.

How to label a dataset

It is critical to properly label your data so that they can be recognized and compared. The following common gene and probe IDs are supported:

  1. Gene ID: Entrez ID, Ensembl Gene ID, GenBank Accession ID, RefSeq ID, Ensembl Transcript ID, and official Gene Symbol
  2. Probe ID (for human and mouse only): 37 popular microarray plotforms from Affymetrix, Agilent, Illumina;

The gene expression data also should contain sample names in the first line. The class labels of experimental conditions should be in a new line beginning with "#CLASS". Multiple class labels can be indicated by adding a colon and its name (for example, "#CLASS:cancer_type" and "#CLASS:stage"). The same set of labels must be used for ALL datasets included in meta-analysis.

How to format a dataset

Here is a good tutorial on how to generate tab delimited text files from the Excel Spreadsheet program. When open your data using any text editor (for example, WordPad), it should look like the following:

  • Sample name, one class label (one missing value)
    #NAME	Sample1	Sample2	Sample3	Sample4	Sampl5	Sampl6	Sample7	Sample8
    #CLASS	case	case	case	case	control	control	control	control
    Gene1	-3.06	-2.25	-1.15	-6.64	0.4	1.08	1.22	1.02
    Gene2	-1.36	-0.67	-0.17	-0.97	-2.32	-5.06	0.28	1.32
    Gene3	1.61	-0.27	0.71	-0.62	0.14		0.11	0.98
    Gene4	0.93	1.29	-0.23	-0.74	-2	-1.25	1.07	1.27    
                            
  • Sample name, two class labels (cancer type and gender)
    #NAME           Sample1	Sample2	Sample3	Sample4	Sampl5	Sampl6	Sample7	Sample8
    #CLASS:CANCER	case	case	case	case	control	control	control	control
    #CLASS:SEX	F	F	M	M	F	M	F	M
    Gene1           -3.06	-2.25	-1.15	-6.64	0.4	1.08	1.22	1.02
    Gene2           -1.36	-0.67	-0.17	-0.97	-2.32	-5.06	0.28	1.32
    Gene3           1.61	-0.27	0.71	-0.62	0.14		0.11	0.98
    Gene4           0.93	1.29	-0.23	-0.74	-2	-1.25	1.07	1.27   
                            
Processing ....
Your session is about to expire!

You will be logged off in seconds.

Do you want to continue your session?