Skip to main content

VCF File Annotation Using GeneBe API

The GeneBe API can be used for annotating a VCF file. Currently, there are two ways to do so. The recommended way is by using the GeneBeClient from the GeneBe CLI project, which requires you to have Java installed. Another method is using the Python client pygenebe from the pygenebe repository and your own Python scripts, pygenebe works great with pandas.

Regardless of which client you choose, remember to use your API key to avoid exceeding the request limit.

Starting from the version 0.1.0 GeneBe Client supports annotations from the GeneBe Hub. You can read more on GeneBe Hub Quick Start Guide page. So using GeneBe Client you can get:

  • SOTA automatic ACMG variant classifications
  • Up to date annotations of gnomAD, ClinVar, REVEL and much more, just check in GeneBe Hub what is there ready to be used.

GeneBeClient β€” genebe-cli​

This is the recommended method for annotating VCF files.

Requirements and Installation​

You need to have Java installed, version 21 or higher. To install, download the most recent .jar file from GeneBe CLI releases. You can run the program like any jar file by calling it from the command line:

java -jar GeneBeClient.jar

To get help, run:

java -jar GeneBeClient.jar --help

For help on a specific command, run:

java -jar GeneBeClient-0.0.1-a.1.jar vcf --help

Running GeneBeClient​

GeneBeClient is a classic command line app, you run it by invoking commands from the command line with arguments. By default it works in --output-mode=human, but you can change it to --output-mode=json if you prefer to read json format.

Here is an example of running GeneBeClient:

java -jar GeneBeClient-0.2.0-a.27.jar \
vcf annotate \
--input-vcf myfile.vcf.gz \
--output-vcf output.vcf \
--genome hg38 \
--annotations \
"@genebe/gnomad_exomes_depth:0.0.1-4.1.0" \
"@genebe/gnomad3_genomes_depth:0.0.1-3.0.1" \
"@genebe/alpha_missense:0.2.2" \
"@genebe/bayesdel_noaf:0.0.1" \
"@genebe/cadd_hg38:0.0.2-1.7.0" \
"@genebe/cardioboost_arrhythmias_hg38:0.0.1" \
"@genebe/cardioboost_cardiomyopathies_hg38:0.0.1" \
"@genebe/dann_hg38:0.0.1" \
"@genebe/gnomad_exomes4:0.0.4-4.1.0" \
"@genebe/gnomad_genomes4:0.0.4-4.1.0" \
"@genebe/gnomad_mnv_coding_hg38:0.0.2-2.1.1" \
"@genebe/primateai3d:0.0.1" \
"@genebe/promoterai:0.0.3" \
"@genebe/revel:0.0.1" \
"@genebe/spliceai:0.0.1" \
"@genebe/phylop100_hg38:0.0.1" \
--api-key ak-YOUR_API_KEY \
--username YOUR-EMAIL@YOUR-INSTITUTION

Remember to get your api-key in https://genebe.net/profile .

Features​

  • Genome recognition β€” It is recommended that you provide the genome version used for creating the VCF. However, if you don’t, the client will try to identify whether the reference genome used in the VCF file is hg19 or hg38.
  • Splitting multiallelic sites β€” If your VCF has multiple ALT entries in a single row, GeneBeClient will split and normalize these into biallelic sites before annotation, which should work just fine. However, it is recommended to convert multiallelic VCF files to biallelic ones before using GeneBeClient. You can convert your VCF using bcftools norm -m -any.
  • Automatic liftover β€” The GeneBe API annotates variants using hg38 databases. If you provide a VCF with hg19 coordinates, each variant will automatically be lifted over to hg38 before annotation.
  • .netrc file β€” GeneBeClient supports the .netrc file for automatic login. This is useful if you don’t want to provide your API key each time you run a command.
  • Multiple output formats β€” You can receive annotations in several formats, including VCF, .XLSX, .MDB (MS Access), .TSV (Tab-Separated Values), or even .parquet. Check the help section for more information.
  • Support to annotations from the GeneBe Hub β€” Gives you access to a library of prepared, popular annotations and allows you to create your own annotations. Read more at GeneBe Hub Quick Start Guide.