« Previous - Version 6/12 (diff) - Next » - Current version
Jose Carbonell, 01/15/2010 01:35 pm


FatiGO

FatiGO takes two lists of genes (ideally a group of interest and the rest of the genes in the experiment, although any two groups, formed in any way, can be tested against each other) and convert them into two lists of GO terms using the corresponding gene-GO association table. Then a Fisher's exact test for 2×2 contingency tables is used to check for significant over-representation of GO terms in one of the sets with respect to the other one. Multiple test correction to account for the multiple hypothesis tested (one for each GO term) is applied as previously described.

In addition to Gene Ontology (Ashburner et al., 2000) terms it can test simultaneously for KEGG pathways (Kanehisa et al. 2004), InterPro motifs (Mulder et al., 2003), Swissprot keywords (Boeckmann et al., 2003), microRNA (Griffiths-Jones et al., 2006), TFBSs (Wingender et al. 2000), cisRED motifs (Robertson et al., 2006) and BioCarta pathways. The distribution of any combination (or all) of the terms between two groups of genes can be simultaneously tested by means of a Fisher exact test. All the p-values are adjusted by FDR. The functionality of the old modules FatiWise and TransFat (Al-Shahrour et al., 2005) and FatiGO+ (Al-Shahrour et al., 2007) have been completely included here and, consequently these modules have been discontinued.

Typical genome scale experiments are annotated in two steps: firstly genes of interest are selected (because they co-express in a cluster or they are significantly over- or under-expressed when two classes of experiments are compared, etc.) and then the enrichment of any type of biologically relevant label in these genes is compared to the corresponding distribution of the label in the background (typically the rest of genes). With FatiGO you can study the differential distribution of GO terms for two sets of genes using a Fisher's exact test and you will obtain the functional terms over or under represented in your lists.

Data and format

FatiGO supports many gene identifiers for each organism (HGNC symbol, EMBL acc, UniProt/Swiss-Prot, UniProtKB/TrEMBL, Ensembl IDs, RefSeq, EntrezGene, Affymetrix, Agilent, PDB, Protein Id, IPI…), can be checked in the ID converter. These identifiers must be annotated in Ensembl and any gene not annotated in Ensembl will be lost in the analysis. (Please see the Ensembl documentation).

The format is list or a plain text file with a gene or protein identifier per line. See an example of Saccharomyces cerevisiae identifiers list:

YAL011W
GAL83
YDR116C
YGL104C
KNS1
ECM2
YHL018W
CDC45
YHL010C
YHR199C
SNO2
YJR141W
YOR059C

Worked examples

How the functional profiling should never be done

It is not uncommon to find the following assertion in papers and talks: “then we examined our set of genes selected in this way (whatever) and we discover that 65% of them were related to metabolism, so we can conclude that our experiment activates metabolism genes”. This could be true or not depending on the relative abundance of this term. If you look to the rest of genes not activated in the experiment and the proportion of them related to metabolism is, let's say 10%, then you are right. Contrarily, if the proportion is, let's say 61%, then the experiment has probably nothing to do with metabolism. The comparison is compulsory to support such assertions.

Comparing two lists of genes

There are many situations in which the comparison of two lists of genes is answering a relevant biological question. Actually a large number of problems can be addressed in this way. For example, one might be interested in knowing whether a group of genes that co-express are functionally related. Typically this implies the comparison of a set of genes that clustered together (by any clustering method) to the rest of genes. Other commonly addressed question is if genes differentially expressed when comparing two experimental conditions are functionally related. And many other similar questions are commonly asked when analysing microarray data or, in general, genomic data. The program FatiGO has specifically been designed to answer these questions.

Exercise 1 Exploring differences in GO terms with FatiGO

The simplest use of the tools is for having a quick look at the biological processes, molecular functions or subcellular locations (according to the corresponding GO categories) of a set of genes. The list of genes submitted is going to be analysed against the rest of the genome to obtain significance of the GO terms or other sets abundance.

  1. Here you can find the corresponding file, for this worked example, containing two list of genes of human (example.motor, example.apoptosis). Save these files to your desktop or local directory
  2. Create a new project (e.g. workedExample1) and start a new FatiGO (Compare tab) process in the Functional enrichment analysis section of the tools.
  3. Choose the organism database Saccharomyces cerevisiae
  4. Upload the file fatigo_example.1 you've downloaded in the List of genes #1 (or open it, select all copy and paste into the text box)
  5. Check the box Rest of genome in the List of genes #2
  6. Database section check the GO - biological process
  7. In the statistics section chose the option Over-represented terms of list 1 (recommended in genome comparison)
  8. Give a name to the new job (e.g. defaultBioProcess)
  9. Maintain the rest of the parameters as default
  10. Submit the job (press the run button)

You will get a table of significant GO terms associated to the genes. The table terms in the table are originally grouped by database and sorted by adjusted p-value. Afterwards, the table can be sorted by the different percentage between the genes annotated in this GO term in each list or by the p-value or p-value adjusted along with a graphical distribution of their frequencies (see below). As you can see, the terms found are only enriched in the list submitted (#1 red color) as we have chosen the Over-represented terms of list 1 option. In this example the significant terms are quite general as they belong to levels 3 to 6.

References

  • Al-Shahrour, F., Minguez, P., Tárraga, J., Medina, I., Alloza, E., Montaner, D., & Dopazo, J. (2007). FatiGO+: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Research 35 (Web Server issue): W91-96
  • Al-Shahrour, F., Minguez, P., Vaquerizas, J.M., Conde, L. & Dopazo, J. (2005). BABELOMICS: a suite of web-tools for functional annotation and analysis of group of genes in high-throughput experiments. Nucleic Acids Research, 33 (Web Server issue): W460-W464
  • Al-Shahrour, F., Díaz-Uriarte, R. & Dopazo, J. (2004). FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes]]. Bioinformatics 20: 578-580
  • Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Traver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock G. (2000) Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25-29.
  • Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I., Pilbout S. & Schneider M., The Swiss-Prot Protein Knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31:365-370(2003)
  • Griffiths-Jones S., Grocock R.J., van Dongen S, .Bateman A. & Enright A.J. (2006). miRBase: microRNA sequences, targets and gene nomenclature. Nucleics Acid Research, 34 (Database Issue): D140-D144
  • Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res.32:D277-280
  • Mulder N.J., Apweiler R., Attwood T.K., Bairoch A., Barrell D., Bateman A., Binns D., Biswas M., Bradley P., Bork P., Bucher P., Copley R.R., Courcelle E., Das U., Durbin R., Falquet L., Fleischmann W., Griffiths-Jones S., Haft D., Harte N., Hulo N., Kahn D., Kanapin A., Krestyaninova M., Lopez R., Letunic I., Lonsdale D., Silventoinen V., Orchard S.E., Pagni M., Peyruc D., Ponting C.P., Selengut J.D., Servant F., Sigrist C.J.A., Vaughan R, Zdobnov E.M. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucl. Acids. Res. 31: 315-318.
    Robertson, G., Bilenky, M., Lin, K., He, A., Yuen, W., Dagpinar, M., Varhol, R., Teague, K., Griffith, O.L., Zhang, X. et al. (2006) cisRED: a database system for genome-scale computational discovery of regulatory elements. Nucleic Acids Res, 34, D68-73
  • Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Prüß, M., Reuter, I. and Schacherer, F. (2000).TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28, 316-319

Publications

  • Al-Shahrour, F., Minguez, P., Tárraga, J., Medina, I., Alloza, E., Montaner, D., & Dopazo, J. (2007). FatiGO+: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Research 35 (Web Server issue): W91-96
  • Al-Shahrour, F., Minguez, P., Tárraga, J., Montaner, D., Alloza, E., Vaquerizas, J.MM., Conde, L., Blaschke, C., Vera, J. & Dopazo, J. (2006). BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Research (Web Server issue) 34: W472-W476
  • Al-Shahrour, F., Minguez, P., Vaquerizas, J.M., Conde, L. & Dopazo, J. (2005). BABELOMICS: a suite of web-tools for functional annotation and analysis of group of genes in high-throughput experiments. Nucleic Acids Research, 33 (Web Server issue): W460-W464
  • Al-Shahrour, F., Díaz-Uriarte, R. & Dopazo, J. (2004). FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20: 578-580

example.motor (753 Bytes) Jose Carbonell, 01/15/2010 01:36 pm

example.apoptosis (628 Bytes) Jose Carbonell, 01/15/2010 01:36 pm

example.overinteracting_05.txt - yeast.overinteracting.proteins (8.1 kB) Ana Conesa, 05/12/2010 12:29 am

example.underinteracting_05.txt - yeast.underinteracting.proteins (10.1 kB) Ana Conesa, 05/12/2010 12:30 am