VCF File Annotation Using GeneBe API
The GeneBe API can be used for annotating a VCF file. Currently, there are two ways to do so. The recommended way is by using the GeneBeClient from the GeneBe CLI project, which requires you to have Java installed. Another method is using the Python client pygenebe from the pygenebe repository and your own Python scripts, pygenebe works great with pandas.
Regardless of which client you choose, remember to use your API key to avoid exceeding the request limit.
Starting from the version 0.1.0 GeneBe Client supports annotations from the GeneBe Hub. You can read more on GeneBe Hub Quick Start Guide page. So using GeneBe Client you can get:
- SOTA automatic ACMG variant classifications
- Up to date annotations of gnomAD, ClinVar, REVEL and much more, just check in GeneBe Hub what is there ready to be used.
GeneBeClient β genebe-cliβ
This is the recommended method for annotating VCF files.
Requirements and Installationβ
You need to have Java installed, version 21 or higher. To install, download the most recent .jar file from GeneBe CLI releases. You can run the program like any jar file by calling it from the command line:
java -jar GeneBeClient.jar
To get help, run:
java -jar GeneBeClient.jar --help
For help on a specific command, run:
java -jar GeneBeClient-0.0.1-a.1.jar vcf --help
Running GeneBeClientβ
GeneBeClient is a classic command line app, you run it by invoking commands from the command line with arguments. By default it works in --output-mode=human, but you can change it to --output-mode=json if you prefer to read json format.
Here is an example of running GeneBeClient:
java -jar GeneBeClient-0.2.0-a.27.jar \
vcf annotate \
--input-vcf myfile.vcf.gz \
--output-vcf output.vcf \
--genome hg38 \
--annotations \
"@genebe/gnomad_exomes_depth:0.0.1-4.1.0" \
"@genebe/gnomad3_genomes_depth:0.0.1-3.0.1" \
"@genebe/alpha_missense:0.2.2" \
"@genebe/bayesdel_noaf:0.0.1" \
"@genebe/cadd_hg38:0.0.2-1.7.0" \
"@genebe/cardioboost_arrhythmias_hg38:0.0.1" \
"@genebe/cardioboost_cardiomyopathies_hg38:0.0.1" \
"@genebe/dann_hg38:0.0.1" \
"@genebe/gnomad_exomes4:0.0.4-4.1.0" \
"@genebe/gnomad_genomes4:0.0.4-4.1.0" \
"@genebe/gnomad_mnv_coding_hg38:0.0.2-2.1.1" \
"@genebe/primateai3d:0.0.1" \
"@genebe/promoterai:0.0.3" \
"@genebe/revel:0.0.1" \
"@genebe/spliceai:0.0.1" \
"@genebe/phylop100_hg38:0.0.1" \
--api-key ak-YOUR_API_KEY \
--username YOUR-EMAIL@YOUR-INSTITUTION
Remember to get your api-key in https://genebe.net/profile .
Featuresβ
- Genome recognition β It is recommended that you provide the genome version used for creating the VCF. However, if you donβt, the client will try to identify whether the reference genome used in the VCF file is
hg19orhg38. - Splitting multiallelic sites β If your VCF has multiple
ALTentries in a single row, GeneBeClient will split and normalize these into biallelic sites before annotation, which should work just fine. However, it is recommended to convert multiallelic VCF files to biallelic ones before using GeneBeClient. You can convert your VCF usingbcftools norm -m -any. - Automatic liftover β The GeneBe API annotates variants using hg38 databases. If you provide a VCF with hg19 coordinates, each variant will automatically be lifted over to hg38 before annotation.
.netrcfile β GeneBeClient supports the.netrcfile for automatic login. This is useful if you donβt want to provide your API key each time you run a command.- Multiple output formats β You can receive annotations in several formats, including VCF, .XLSX, .MDB (MS Access), .TSV (Tab-Separated Values), or even .parquet. Check the help section for more information.
- Support to annotations from the GeneBe Hub β Gives you access to a library of prepared, popular annotations and allows you to create your own annotations. Read more at GeneBe Hub Quick Start Guide.