Tutorial for Cluster My Genes tool

What is Cluster My Genes?

Cluster My Genes tool enables users to retrieve and cluster all available public gene expression data in PortEco for a given set of Escherichia coli genes based on selection criteria like Experimental Conditions or Mutant or Strain or from a publication. Each public sample in PortEco is annotated with an experimental condition studied and associated with a publication. When appropriate and available each sample is also annotated with strain and mutant information. Cluster My Genes also lets the user to further select for most significantly expressed genes (up or down or both) based on the z-score/significance before clustering.

How to access Cluster My Genes?

Cluster My Genes tool can be launched from the main menu. Under Expression Data choose Cluster My Genes option.

PortEco --> Expression Data --> Cluster My Genes


How to use Cluster My Genes?

By clicking on appropriate tab first choose "Condition" or "Mutant" or "Strain" or "Publication" to further select the samples you are interested in. Each selected tab presents available conditions or mutants or strains or publications to choose by checking the corresponding checkbox. You can select one or more checkboxes. Number of samples selected based on your checked criteria is displayed. Next, you can enter the list of gene names or symbols you are interested in the open box as comma or space separated list. If you wish to retrieve data for all genes to cluster you may leave this box blank. After you select the samples and enter gene list you have the following options to retrieve and cluster data.

They are...

  • 1). Go To Gene Profiles: This option retrieves gene expression data from the selected samples and for the gene list provided (or all genes when left blank) and proceeds for clustering. You will be presented with a cluster image to explore. This cluster image is similar to the one you see in Gene Profiles tool.
  • 2). Show Most Significant Genes: This option lets you to select most significantly expressed (up and downregulated) genes from the selected samples. Results are presented in an intermediary table for review. You can proceed to cluster the gene expression data by clicking on "Go To Gene Profiles" button at the bottom of the intermediary results table.
  • 3). Show Most Significant Upregulated Genes: This option lets the user to select only significantly upregulated genes before proceeding to cluster.
  • 4). Show Most Significant Downregulated Genes: This option lets the user to select only significantly downregulated genes before proceeding to cluster.

How are the most significant upregulated or downregulated genes are determined?</apan>

We use the same z-scores calculated and used for Samples and Conditions tool here. In order to generate a list of the most significantly expressed genes we apply the following filtering and selection criteria. For each gene we exclude samples in which the z-scores range from -1 to +1, as these are considered not very significant. For the remaining samples the median z-score is calculated separately both for the +ve z-scores (upregulated) and for the -ve z-scores (downregulated). For a gene to be considered significantly expressed (up or down) the z-score should be more than +2 or -2 in a given sample. Thus two separate lists of significantly upregulated and downregulated genes are generated. These two lists are combined to generate a list of the most significantly expressed genes.


