FatiGO takes two lists of genes (ideally a group of interest and the rest of the genes in the experiment, although any two groups, formed in any way, can be tested against each other) and convert them into two lists of GO terms using the corresponding gene-GO association table. Then a Fisher's exact test for 2×2 contingency tables is used to check for significant over-representation of GO terms in one of the sets with respect to the other one. Multiple test correction to account for the multiple hypothesis tested (one for each GO term) is applied as previously described.

In addition to Gene Ontology (Ashburner et al., 2000) terms it can test simultaneously for KEGG pathways (Kanehisa et al. 2004), InterPro motifs (Mulder et al., 2003), Swissprot keywords (Boeckmann et al., 2003), microRNA (Griffiths-Jones et al., 2006), TFBSs (Wingender et al. 2000), cisRED motifs (Robertson et al., 2006) and BioCarta pathways. The distribution of any combination (or all) of the terms between two groups of genes can be simultaneously tested by means of a Fisher exact test. All the p-values are adjusted by FDR. The functionality of the old modules FatiWise and TransFat (Al-Shahrour et al., 2005) and FatiGO+ (Al-Shahrour et al., 2007) have been completely included here and, consequently these modules have been discontinued.

Data and format

FatiGO supports many gene identifiers for each organism (HGNC symbol, EMBL acc, UniProt/Swiss-Prot, UniProtKB/TrEMBL, Ensembl IDs, RefSeq, EntrezGene, Affymetrix, Agilent, PDB, Protein Id, IPI…), can be checked in the ID converter. These identifiers must be annotated in Ensembl and any gene not annotated in Ensembl will be lost in the analysis. (Please see the Ensembl documentation).

The format is list or a plain text file with a gene or protein identifier per line. See an example of Saccharomyces cerevisiae identifiers list:



