« Previous - Version 5/9 (diff) - Next » - Current version
Cristina Gonzalez, 12/04/2012 10:37 am
Split description


HPG Variant VCF Tools

Biologists receive so much biological data that they have to spend a lot of time cleaning it up in order to get just the data they are interested in. HPG VCF Tools is a set of tools for preprocessing, filtering and manipulating VCF files. It aims to avoid excessive time consumption in tedious preprocessing tasks.

Supported input formats

Splitting a VCF file

A set of VCF files can be created by splitting one by a criterion. Each one of the output files is a fully valid VCF file.

The most basic command-line for invoking this tool is:

hpg-var-vcf split -v your_vcf_file.vcf --criterion chromosome

Currently available criteria are:

  • By chromosome: Each output file will be named chromosome_N_your_vcf_file.vcf (N being the chromosome name) and will contain the entries from a single chromosome.

Filtering entries from a VCF file

  • By regions
  • By positions corresponding to a SNP
  • By a minimum quality
  • By a minimum coverage
  • By number of alleles
  • By minimum allele frequency (MAF)

Getting statistics from a VCF file

General stats

  • Number of variants
  • Number of samples
  • Number of bi-allelic sites
  • Number of multi-allelic sites
  • Number of SNP
  • Number of indels
  • Number of transitions
  • Number of transversions
  • Ti/TV ratio
  • Percentage of PASS
  • Average quality in the VCF

Statistics per variant

  • Allelic and genotypic counts and frequencies per variant
  • Number of missing alleles and genotypes

Statistics per sample

Merging multiple VCF files

Feature plan

Split, filter and statistics tools will be enriched with more options. For more information, see the detailed feature plan.

Other tools suites