Usage Guide for pygenebe

pygenebe is a Python client designed to integrate with the GeneBe platform, offering efficient annotation of genetic variants via its API. It supports pandas DataFrames, VCF files, HGVS parsing, and more, making it a versatile tool for genetic research. Below are detailed usage instructions and examples.

Basic Setup

After installing pygenebe (see the installation guide), import the library to start annotating genetic variants.

import pygenebe

Annotating Variants

Annotating a Single Variant

To annotate a single genetic variant, use the annotate_variant function. Provide the chromosome, position, reference allele, and alternate allele:

result = pygenebe.annotate_variant(chr="6", pos=160585140, ref="T", alt="G")
print(result)

This returns a JSON-like response with annotations such as gene function, population frequencies, and pathogenicity scores.

Example with Additional Options

You can specify the genome build (e.g., GRCh38 or GRCh37) and request specific fields:

result = pygenebe.annotate_variant(
    chr="6",
    pos=160585140,
    ref="T",
    alt="G",
    genome="GRCh38",
    fields=["variant_id", "gene", "consequence"]
)
print(result)

Annotating Variants in a Pandas DataFrame

For batch processing, pygenebe supports pandas DataFrames. Create a DataFrame with columns for chromosome, position, reference, and alternate alleles:

import pandas as pd

data = pd.DataFrame({
    "chr": ["1", "6"],
    "pos": [10020, 160585140],
    "ref": ["A", "T"],
    "alt": ["G", "G"]
})

annotated = pygenebe.annotate_dataframe(data)
print(annotated)

Customizing DataFrame Annotation

You can customize the annotation by specifying the genome build and desired fields:

annotated = pygenebe.annotate_dataframe(
    df=data,
    genome="GRCh37",
    fields=["variant_id", "gnomad_af", "pathogenicity"]
)
print(annotated)

The result is a DataFrame with additional annotation columns appended.

Annotating Variants from HGVS Notation

pygenebe can parse HGVS (Human Genome Variation Society) notation for variant annotation:

result = pygenebe.annotate_hgvs("NM_000546.5:c.215C>G")
print(result)

This returns annotations for the specified HGVS variant, such as its genomic coordinates and effects.

Annotating a VCF File (CLI)

To annotate variants from a VCF file, use the command-line interface (CLI). The input VCF must be single-allelic (split multi-allelic entries with bcftools if needed):

genebe annotate --input input.vcf.gz --output output.vcf.gz

The output VCF includes additional annotation fields. Requires the cyvcf2 package (pip install cyvcf2).

VCF Example with Specific Fields

Request only specific annotation fields:

genebe annotate --input input.vcf.gz --output output.vcf.gz --fields variant_id,gene,consequence

Handling Large Datasets with API Key

For large datasets (e.g., over 10,000 variants), request limits may apply. Create a GeneBe account, generate an API key, and use it to increase your limit.

CLI with API Key

genebe annotate --input input.vcf.gz --output output.vcf.gz --username your_username --api-key your_api_key

Python with API Key

Set credentials before making requests:

pygenebe.set_credentials(username="your_username", api_key="your_api_key")
result = pygenebe.annotate_variant(chr="6", pos=160585140, ref="T", alt="G")
print(result)

Check your account limits:

genebe account

Using Docker

For a pre-configured environment, use the Docker image:

docker run -v input.vcf:/tmp/input.vcf --rm genebe/pygenebe:0.0.14 genebe annotate --input /tmp/input.vcf --output /dev/stdout

Mount your VCF file and retrieve the annotated output.

Additional Usage Examples

Exploring Variant Details

To explore detailed annotations for a variant:

result = pygenebe.annotate_variant(chr="17", pos=7674220, ref="G", alt="A")
print(result["gene"])  # Access specific annotation fields
print(result["gnomad_af"])  # Population allele frequency

Parse variants

Parse variants can translate variant expressed in multiple form, especially:

as a HGVS (f.ex. NM_000546.5:c.215C>G)
as dbSNP id (f.ex. rs11)
as gene with aminoacid change (f.ex. AGT M259T)

Look at the usage examples below:

res = gnb.parse_variants(
    ["NM_000546.5:c.215C>G", "AGT M259T", "rs10"],
    multiple=True,
)
print(res)

If multiple is True then return all possible values (result in List[List[str]]). Otherwise returns one value per variant.

res = gnb.parse_variants(
    ["NM_000546.5:c.215C>G", "AGT M259T", "rs10"],
    multiple=True,
)
print(res)

If you want to use pandas dataframe as input and output:

df = pd.DataFrame({"variant": ["AGT M259T", "rs10", "rs11", "rs12"]})
gnb.parse_variants_df(
    df,
    multiple=True,
)

  variant                                   parsed_variants
0    rs10  [7-92754574-A-C, 7-92754574-A-G, 7-92754574-A-T]
1    rs11                                  [7-11324574-C-T]
2    rs12                                  [7-11297537-A-C]

If you are trying to parse multiple variants from some external source it may happen, that some of them are invalid. The backend will throw an error in such case. If you want to ignore these errors and continue use ignore_errors=True switch. Look at the example below, where DHFR:p.N51I is an invalid HGVS (the reference aminoacid is incorrect).

    df = pd.DataFrame({"variant": ["DHFR:p.N51I", "rs10"]})

    res = gnb.parse_variants_df(
        df,
        endpoint_url="http://localhost:7180/cloud/api-public/v1/convert",
        multiple=True,
        ignore_errors=True,
    )

Error Handling

If a variant is invalid, pygenebe raises an exception:

try:
    result = pygenebe.annotate_variant(chr="X", pos=-1, ref="A", alt="T")
except ValueError as e:
    print(f"Error: {e}")

Notes and Limitations

VCF Format: Ensure VCF files are single-allelic. Use bcftools norm -m - to split multi-allelic variants.
Request Limits: Free usage has limits to prevent server overload. API key holders get higher limits (tens of thousands of requests daily). Contact GeneBe support for custom limits.
Documentation: For more details, see the official documentation and GitHub examples.

Troubleshooting

Installation Issues: Ensure cyvcf2 is installed for VCF support (pip install cyvcf2).
API Errors: Verify your API key and network connection. Check GitHub issues for help or to report problems.

pygenebe simplifies genetic variant annotation with flexible options for researchers and bioinformaticians. Dive into its features and enhance your genetic analysis workflows!

Basic Setup​

Annotating Variants​

Annotating a Single Variant​

Example with Additional Options​

Annotating Variants in a Pandas DataFrame​

Customizing DataFrame Annotation​

Annotating Variants from HGVS Notation​

Annotating a VCF File (CLI)​

VCF Example with Specific Fields​

Handling Large Datasets with API Key​

CLI with API Key​

Python with API Key​

Using Docker​

Additional Usage Examples​

Exploring Variant Details​

Parse variants​

Error Handling​

Notes and Limitations​

Troubleshooting​