Installing Ensembl VEP#
The Ensembl VEP is used to annotate the variant mapping file. Below are the steps used to install it but for full details please see the Ensembl VEP website.
Install requirements#
First we will install the Perl modules required by VEP and the optional modules that can give some performance enhancements.
First we need to install cpanm, the easiest way to do this is to install locally and avoid CPAN. If you do not have a local bin directory, create one and ensure it is in your PATH in your ~/.bashrc
or ~/.bash_profile
, for example:
export PATH="$PATH:$HOME/bin"
Then download the cpanm as an executable:
cd ~/bin
curl -L https://cpanmin.us/ -o cpanm
chmod +x cpanm
We setup a local cpanm directory to install the required modules. First setup a directory:
mkdir ${HOME}/Software/cpanm
Then add that directory to your ~/.bashrc
or ~/.bash_profile
(depending on what one you use).
export PERL5LIB=$PERL5LIB:$HOME/Software/cpanm/lib/perl5
Now source the updated ~/.bashrc
or ~/.bash_profile
back in (. ~/.bashrc
), or open a fresh terminal and continue in that.
Install the required modules. You must also have at least Perl v5.10 and gcc/gcc+/make installed.
cpanm -l ${HOME}/Software/cpanm Archive::Zip
cpanm -l ${HOME}/Software/cpanm DBD::mysql
cpanm -l ${HOME}/Software/cpanm DBI
Install the additional (optional) modules
cpanm -l ~/Software/cpanm Set::IntervalTree
cpanm -l ~/Software/cpanm JSON
cpanm -l ~/Software/cpanm JSON
git clone https://github.com/Ensembl/ensembl-xs
cd ensembl-xs/
perl Makefile.PL INSTALL_BASE=~/Software/cpanm/
make
make install
I didn’t install the Bio::DB::BigFile
module as it looked too painful and I am not really going to use the BigWig format.
Installing VEP#
Download the source code:
git clone https://github.com/Ensembl/ensembl-vep.git
cd ensembl-vep
I had an error installing VEP (but it still seemed to work) ! Bio::Root::Version is not installed
and has also been seen before. The thread recommended a manual install of BioPerl, which it did and then reinstalled. I still got the error but everything seemed to work.
wget https://cpan.metacpan.org/authors/id/C/CJ/CJFIELDS/BioPerl-1.6.924.tar.gz
tar -xzvf BioPerl-1.6.924.tar.gz
export PERL5LIB="$PERL5LIB:$HOME/Software/cpanm/lib/perl5:${HOME}/src/bioperl-1.6.924"
I installed giving an alternate cache and plugin directory
perl INSTALL.pl -a p --PLUGINS all --CACHEDIR /data/vep_cache
Downloading the CADD data#
VEP has a CADD plug which will annotate CADD scores for overlapping variants. For this to work the CADD data sets will need to be downloaded:
wget https://kircherlab.bihealth.org/download/CADD/v1.6/GRCh38/whole_genome_SNVs.tsv.gz
wget https://kircherlab.bihealth.org/download/CADD/v1.6/GRCh38/gnomad.genomes.r3.0.indel.tsv.gz
Running VEP#
VEP is not that quick - which is kind of understandable given the lookups it has to do. I could never get near the 1M variants in 10 mins (not using a single fork anyway). So I run in it parallel on the cluster. However, the basic command I use is below:
vep -i "/scratch/test.vcf.gz" --sift s --polyphen s --plugin "CADD,/data/cadd/gnomad.genomes.r3.0.indel.tsv.gz" --fork 1 --assembly "GRCh38" -o "/tmp/vep-test2.vcf.gz" --format "vcf" --cache --dir_cache "/dev/shm" --dir_plugins /data/vep_cache/Plugins --offline --vcf --no_stats --force_overwrite --compress_output bgzip --fields "Allele,Consequence,Gene,Feature_type,Feature,SIFT,PolyPhen,CADD_PHRED,CADD_RAW"
Updating Ensembl VEP#
To update your VEP version, assuming you installed from git you will need to download the updated caches to match your VEP version. In this example, we are updating to version 105 and have Bio::DB::HTS
installed, so we will download the indexed versions of the cache for both GRCh37 and GRCh38.
$ wget http://ftp.ensembl.org/pub/release-105/variation/indexed_vep_cache/homo_sapiens_vep_105_GRCh37.tar.gz
$ wget http://ftp.ensembl.org/pub/release-105/variation/indexed_vep_cache/homo_sapiens_vep_105_GRCh38.tar.gz
Then to update the source code, go to the directory where you originally installed VEP, in this case ~/software/ensembl-vep
and do
$ git fetch
$ git checkout release/105
$ perl INSTALL.pl -a p --PLUGINS all --CACHEDIR /lustre/projects/DTAdb/resources/vep
WARNING: The following plugins have not been found: all
Available plugins: AncestralAllele,Blosum62,CADD,CSN,Carol,ClinPred,Condel,Conservation,DisGeNET,Downstream,Draw,ExAC,ExACpLI,FATHMM,FATHMM_MKL,FlagLRG,FunMotifs,G2P,GO,GeneSplicer,Gwava,HG
VSIntronOffset,LD,LOVD,LoF,LoFtool,LocalID,MPC,MTR,Mastermind,MaxEntScan,NMD,NearestExonJB,NearestGene,PON_P2,Phenotypes,PostGAP,PrimateAI,ProteinSeqs,REVEL,ReferenceQuality,SameCodon,Singl
eLetterAA,SpliceAI,SpliceRegion,StructuralVariantOverlap,SubsetVCF,TSSDistance,dbNSFP,dbscSNV,gnomADc,miRNA,neXtProt,satMutMPRA
- installing "AncestralAllele"
AncestralAllele already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/AncestralAllele.pm for details
- OK
- installing "Blosum62"
Blosum62 already installed; overwriting
- add "--plugin Blosum62" to your VEP command to use this plugin
- OK
- installing "CADD"
CADD already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/CADD.pm for details
- OK
- installing "CSN"
CSN already installed; overwriting
- add "--plugin CSN" to your VEP command to use this plugin
- OK
- installing "Carol"
Carol already installed; overwriting
- This plugin requires installation
- See /lustre/projects/DTAdb/resources/vep/Plugins/Carol.pm for details
- OK
- installing "ClinPred"
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/ClinPred.pm for details
- OK
- installing "Condel"
Condel already installed; overwriting
- This plugin requires installation
- See /lustre/projects/DTAdb/resources/vep/Plugins/Condel.pm for details
- OK
- installing "Conservation"
Conservation already installed; overwriting
- add "--plugin Conservation,[options]" to your VEP command to use this plugin
- OK
- installing "DisGeNET"
DisGeNET already installed; overwriting
- add "--plugin DisGeNET" to your VEP command to use this plugin
- OK
- installing "Downstream"
Downstream already installed; overwriting
- add "--plugin Downstream" to your VEP command to use this plugin
- OK
- installing "Draw"
Draw already installed; overwriting
- This plugin requires installation
- See /lustre/projects/DTAdb/resources/vep/Plugins/Draw.pm for details
- OK
- installing "ExAC"
ExAC already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/ExAC.pm for details
- OK
- installing "ExACpLI"
ExACpLI already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/ExACpLI.pm for details
- OK
- installing "FATHMM"
FATHMM already installed; overwriting
- add "--plugin FATHMM" to your VEP command to use this plugin
- OK
- installing "FATHMM_MKL"
FATHMM_MKL already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/FATHMM_MKL.pm for details
- OK
- installing "FlagLRG"
FlagLRG already installed; overwriting
- add "--plugin FlagLRG" to your VEP command to use this plugin
- OK
- installing "FunMotifs"
FunMotifs already installed; overwriting
- add "--plugin FunMotifs" to your VEP command to use this plugin
- OK
- installing "G2P"
G2P already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/G2P.pm for details
- OK
- installing "GO"
GO already installed; overwriting
- add "--plugin GO" to your VEP command to use this plugin
- OK
- installing "GeneSplicer"
GeneSplicer already installed; overwriting
- This plugin requires installation
- See /lustre/projects/DTAdb/resources/vep/Plugins/GeneSplicer.pm for details
- OK
- installing "Gwava"
Gwava already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/Gwava.pm for details
- OK
- installing "HGVSIntronOffset"
HGVSIntronOffset already installed; overwriting
- add "--plugin HGVSIntronOffset" to your VEP command to use this plugin
- OK
- installing "LD"
LD already installed; overwriting
- add "--plugin LD,[options]" to your VEP command to use this plugin
- OK
- installing "LOVD"
LOVD already installed; overwriting
- add "--plugin LOVD" to your VEP command to use this plugin
- OK
- installing "LoF"
LoF already installed; overwriting
- This plugin requires installation
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/LoF.pm for details
- OK
- installing "LoFtool"
LoFtool already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/LoFtool.pm for details
- OK
- installing "LocalID"
LocalID already installed; overwriting
- add "--plugin LocalID" to your VEP command to use this plugin
- OK
- installing "MPC"
MPC already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/MPC.pm for details
- OK
- installing "MTR"
MTR already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/MTR.pm for details
- OK
- installing "Mastermind"
Mastermind already installed; overwriting
- add "--plugin Mastermind" to your VEP command to use this plugin
- OK
- installing "MaxEntScan"
MaxEntScan already installed; overwriting
- This plugin requires installation
- See /lustre/projects/DTAdb/resources/vep/Plugins/MaxEntScan.pm for details
- OK
- installing "NMD"
- add "--plugin NMD" to your VEP command to use this plugin
- OK
- installing "NearestExonJB"
NearestExonJB already installed; overwriting
- add "--plugin NearestExonJB" to your VEP command to use this plugin
- OK
- installing "NearestGene"
NearestGene already installed; overwriting
- add "--plugin NearestGene" to your VEP command to use this plugin
- OK
- installing "PON_P2"
PON_P2 already installed; overwriting
- add "--plugin PON_P2" to your VEP command to use this plugin
- OK
- installing "Phenotypes"
Phenotypes already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/Phenotypes.pm for details
- OK
- installing "PostGAP"
PostGAP already installed; overwriting
- add "--plugin PostGAP" to your VEP command to use this plugin
- OK
- installing "PrimateAI"
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/PrimateAI.pm for details
- OK
- installing "ProteinSeqs"
ProteinSeqs already installed; overwriting
- add "--plugin ProteinSeqs" to your VEP command to use this plugin
- OK
- installing "REVEL"
REVEL already installed; overwriting
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/REVEL.pm for details
- OK
- installing "ReferenceQuality"
ReferenceQuality already installed; overwriting
- add "--plugin ReferenceQuality" to your VEP command to use this plugin
- OK
- installing "SameCodon"
SameCodon already installed; overwriting
- add "--plugin SameCodon" to your VEP command to use this plugin
- OK
- installing "SingleLetterAA"
SingleLetterAA already installed; overwriting
- add "--plugin SingleLetterAA" to your VEP command to use this plugin
- OK
- installing "SpliceAI"
SpliceAI already installed; overwriting
- add "--plugin SpliceAI" to your VEP command to use this plugin
- OK
- installing "SpliceRegion"
SpliceRegion already installed; overwriting
- add "--plugin SpliceRegion" to your VEP command to use this plugin
- OK
- installing "StructuralVariantOverlap"
StructuralVariantOverlap already installed; overwriting
- add "--plugin StructuralVariantOverlap" to your VEP command to use this plugin
- OK
- installing "SubsetVCF"
SubsetVCF already installed; overwriting
- add "--plugin SubsetVCF" to your VEP command to use this plugin
- OK
- installing "TSSDistance"
TSSDistance already installed; overwriting
- add "--plugin TSSDistance" to your VEP command to use this plugin
- OK
- installing "dbNSFP"
dbNSFP already installed; overwriting
- This plugin requires installation
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/dbNSFP.pm for details
- OK
- installing "dbscSNV"
dbscSNV already installed; overwriting
- This plugin requires installation
- This plugin requires data
- See /lustre/projects/DTAdb/resources/vep/Plugins/dbscSNV.pm for details
- OK
- installing "gnomADc"
gnomADc already installed; overwriting
- add "--plugin gnomADc" to your VEP command to use this plugin
- OK
- installing "miRNA"
miRNA already installed; overwriting
- add "--plugin miRNA" to your VEP command to use this plugin
- OK
- installing "neXtProt"
neXtProt already installed; overwriting
- add "--plugin neXtProt" to your VEP command to use this plugin
- OK
- installing "satMutMPRA"
satMutMPRA already installed; overwriting
- add "--plugin satMutMPRA" to your VEP command to use this plugin
- OK
NB: One or more plugins that you have installed will not work without installation or downloading data; see logs above
All done
For some reason the previous install command did not actally re-install anything,except the plugins, as a result the ensmbl APIs and VEP were out of sync (see below)
$ vep --help
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#
Versions:
ensembl : 102.347f9ed
ensembl-funcgen : 102.6bd93a0
ensembl-io : 102.ff1cf96
ensembl-variation : 102.2716d2e
ensembl-vep : 105.0
Help: dev@ensembl.org , helpdesk@ensembl.org
Twitter: @ensembl
http://www.ensembl.org/info/docs/tools/vep/script/index.html
Usage:
./vep [--cache|--offline|--database] [arguments]
Basic options
=============
--help Display this message and quit
-i | --input_file Input file
-o | --output_file Output file
--force_overwrite Force overwriting of output file
--species [species] Species to use [default: "human"]
--everything Shortcut switch to turn on commonly used options. See web
documentation for details [default: off]
--fork [num_forks] Use forking to improve script runtime
For full option documentation see:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html
So I tried the install again with no arguments this time and it seemed to work:
$ perl INSTALL.pl
Hello! This installer is configured to install v105 of the Ensembl API for use by the VEP.
It will not affect any existing installations of the Ensembl API that you may have.
It will also download and install cache files from Ensembl's FTP server.
Checking for installed versions of the Ensembl API...done
It looks like you have an older version (102) of the API installed.
This installer will install a limited set of the API v105 for use by the VEP only
Skip to the next step (n) to install cache files
Do you want to continue installing the API (y/n)? y
Setting up directories
Destination directory ./Bio already exists.
Do you want to overwrite it (if updating VEP this is probably OK) (y/n)? y
- fetching BioPerl
- unpacking ./Bio/tmp/release-1-6-924.zip
- moving files
Attempting to install Bio::DB::HTS and htslib.
>>> If this fails, try re-running with --NO_HTSLIB
- checking out HTSLib
fatal: destination path 'htslib' already exists and is not an empty directory.
- building HTSLIB in ./htslib
In /lustre/home/rmjdcfi/software/ensembl-vep/htslib
make: Nothing to be done for `all'.
- unpacking ./Bio/tmp/biodbhts.zip to ./Bio/tmp/
./Bio/tmp/Bio-DB-HTS-2.11 - moving files to ./biodbhts
- making Bio::DB:HTS
icc: command line warning #10121: overriding '-fstack-protector' with '-fstack-protector-strong'
Created MYMETA.yml and MYMETA.json
Creating new 'Build' script for 'Bio-DB-HTS' version '2.11'
Building Bio-DB-HTS
icc -I/lustre/home/rmjdcfi/software/ensembl-vep/htslib -I/usr/lib64/perl5/CORE -DVERSION="2.11" -DXS_VERSION="2.11" -fPIC -D_IOLIB=2 -D_FILE_OFFSET_BITS=64 -Wno-error -Wno-unused-result -c
-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptio
ns -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -o lib/Bio/DB/HTS.o lib/Bio/DB/HTS.c
icc: command line warning #10006: ignoring unknown option '-Wno-unused-result'
icc: command line warning #10121: overriding '-fstack-protector' with '-fstack-protector-strong'
lib/Bio/DB/HTS.xs(351): warning #2218: result of call is not used
sam_format1(header, b, &str);
^
lib/Bio/DB/HTS.xs(1205): warning #2218: result of call is not used
bgzf_seek(hfp->fp.bgzf,0,0);
^
lib/Bio/DB/HTS.xs(1770): warning #1786: function "bcf_hdr_fmt_text" (declared at line 439 of "/lustre/home/rmjdcfi/software/ensembl-vep/htslib/htslib/vcf.h") was declared deprecated ("use b
cf_hdr_format() instead")
RETVAL = newSVpv(bcf_hdr_fmt_text(header, is_bcf, &len), 0);
^
ExtUtils::Mkbootstrap::Mkbootstrap('blib/arch/auto/Bio/DB/HTS/HTS.bs')
gcc -shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wl,-z,relro -o blib/arch/au
to/Bio/DB/HTS/HTS.so lib/Bio/DB/HTS.o -L/lustre/home/rmjdcfi/software/ensembl-vep/htslib -Wl,-rpath,/lustre/home/rmjdcfi/software/ensembl-vep/htslib -lhts -lpthread -lz
icc -I/lustre/home/rmjdcfi/software/ensembl-vep/htslib -I/usr/lib64/perl5/CORE -DVERSION="2.11" -DXS_VERSION="2.11" -fPIC -D_IOLIB=2 -D_FILE_OFFSET_BITS=64 -Wno-error -Wno-unused-result -c
-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptio
ns -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -o lib/Bio/DB/HTS/Faidx.o lib/Bio/DB/HTS/Faidx.c
icc: command line warning #10006: ignoring unknown option '-Wno-unused-result'
icc: command line warning #10121: overriding '-fstack-protector' with '-fstack-protector-strong'
ExtUtils::Mkbootstrap::Mkbootstrap('blib/arch/auto/Bio/DB/HTS/Faidx/Faidx.bs')
gcc -shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wl,-z,relro -o blib/arch/au
to/Bio/DB/HTS/Faidx/Faidx.so lib/Bio/DB/HTS/Faidx.o -L/lustre/home/rmjdcfi/software/ensembl-vep/htslib -Wl,-rpath,/lustre/home/rmjdcfi/software/ensembl-vep/htslib -lhts -lpthread -lz
Downloading required Ensembl API files
- fetching ensembl
- unpacking ./Bio/tmp/ensembl.zip
- moving files
- getting version information
- fetching ensembl-variation
- unpacking ./Bio/tmp/ensembl-variation.zip
- moving files
- getting version information
- fetching ensembl-funcgen
- unpacking ./Bio/tmp/ensembl-funcgen.zip
- moving files
- getting version information
- fetching ensembl-io
- unpacking ./Bio/tmp/ensembl-io.zip
- moving files
- getting version information
Testing VEP installation
- OK!
The VEP can either connect to remote or local databases, or use local cache files.
Using local cache files is the fastest and most efficient way to run the VEP
Cache files will be stored in /home/rmjdcfi/.vep
Do you want to install any cache files (y/n)? n
Skipping cache installation
The VEP can use FASTA files to retrieve sequence data for HGVS notations and reference sequence checks.
FASTA files will be stored in /home/rmjdcfi/.vep
Do you want to install any FASTA files (y/n)? n
Skipping FASTA installation - Exiting
The VEP can use plugins to add functionality and data.
Plugins will be installed in /home/rmjdcfi/.vep/Plugins
Do you want to install any plugins (y/n)? n
Skipping plugin installation
All done
Now all in sync:
$ vep --help
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#
Versions:
ensembl : 105.525fbcb
ensembl-funcgen : 105.660df8f
ensembl-io : 105.2a0a40c
ensembl-variation : 105.ac8178e
ensembl-vep : 105.0
Help: dev@ensembl.org , helpdesk@ensembl.org
Twitter: @ensembl
http://www.ensembl.org/info/docs/tools/vep/script/index.html
Usage:
./vep [--cache|--offline|--database] [arguments]
Basic options
=============
--help Display this message and quit
-i | --input_file Input file
-o | --output_file Output file
--force_overwrite Force overwriting of output file
--species [species] Species to use [default: "human"]
--everything Shortcut switch to turn on commonly used options. See web
documentation for details [default: off]
--fork [num_forks] Use forking to improve script runtime
For full option documentation see:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html