Home >> Expression documentation

Predictors Input Form

Online examples

Here you can load small datasets from our server. You can use them to run examples and see how the tool works. Click on the links to load the data.

Select your data

Here you can select the dataset you want to cluster. You should have uploaded it previously using the Upload Menu in Babelomics and tagged it with the Data matrix - Expression data type.

Required input data

Input data matrix should be in a plain text and tab-separated file as following:

# some comments
# more comments
#NAMES Cond1 Cond2 Cond3 Cond4 Cond5 Cond6
gen1    -3.06    -2.25    -1.15    -6.64    0.40    1.08
gen2    -1.36    -0.67    -0.17    -0.97    -2.32    -5.06
gen3    -0.17    0.48    1.23    1.52    1.11    
gen4        1.61    -0.27    0.71    -0.62    0.14
gen5    2.09    2.12    2.62    1.95    1.04    2.18
gen6    0.20    -3.06    -0.03    0.64    0.84    
gen7    -2.00    -0.64    -0.29    0.08    -1.00    
gen8    0.93    1.29    -0.23    -0.74    -2.00    -1.25
gen9    0.88    0.31    -0.22    3.25        
gen10    0.71    1.03    -0.25        1.03    
Things to take into consideration:
  • Matrix rows correspond to genes and matrix columns correspond to samples (arrays).
  • All the data items must be separated by tabulators.
  • missing values are not allowed here. You can use the preprocessor for either get rid of them or imputing the values.
  • A line with #NAMES tag is mandatory.
  • Lines beginning with "#VARIABLE" will be selectable in the form for predictor training.
  • All lines beginning with "#" (different from #NAMES and "#VARIABLE") are treated as commentaries.


Choice the algorithm to proceed with one of the four prediction methods: SVM, KNN, Random forest.

  • Support Vector Machines (SVM)
  • Nearest neighbour (KNN)
  • Random Forest (RF)

Error estimation

Choice the cross-validation method to validate training.

  • Leave-one-out: Single observation from the original sample as the validation data, and the remaining observations as the training data
  • KFold: The original sample is randomly partitioned into K subsamples. A single subsample is retained as the validation data for testing the model.

Gene subset selection

Choice the subset of genes that will be used to compute the predictor.

  • Correlation-based Feature Selection (CFS)
  • Principal Components Analysis (PCA)


Set up a job title name. Optional description for the job title.

predictores.png (57.2 kB) Martina Marba, 01/15/2010 02:47 pm