Przejdź do głównej zawartości

Variant annotation API

This page describes in details the variant annotation API. If you are interested in annotating variants in Python remeber to visit the "pandas" chapter, where Python library is introduced. If you want to use API from any other well known language, consider generating a client using the OpenAPI definition published here https://api.genebe.net/cloud/gb-api-doc/swagger-ui/index.html . However, API is also convenient to use it without any wraper.

On this page examples will be presented using curl and simple browser links in case of GET queries.

Example of GET endpoint

GET endpoint is just for test purposes. If you want to annotate multiple variants please batch them using the POST endpoint described below. But for now: the GET endpoint:

curl -X 'GET' \
'https://api.genebe.net/cloud/api-public/v1/variant?chr=7&pos=140753336&ref=A&alt=T&allGenes=False&genome=hg38&useEnsembl=False' \
-H 'accept: */*'

See the results in browser by clicking

https://api.genebe.net/cloud/api-public/v1/variant?chr=7&pos=140753336&ref=A&alt=T&allGenes=False&genome=hg38&useEnsembl=False

{
"variants": [
{
"chr": "7",
"pos": 140753336,
"ref": "A",
"alt": "T",
"effect": "missense_variant",
"transcript": "NM_001374258.1",
"consequences": [
{
"aa_ref": "V",
"aa_alt": "E",
"canonical": false,
"protein_coding": true,
"consequences": [
"missense_variant"
],
"exon_rank": 16,
"exon_count": 20,
"gene_symbol": "BRAF",
"gene_hgnc_id": 1097,
"hgvs_c": "c.1919T>A",
"hgvs_p": "p.Val640Glu",
"transcript": "NM_001374258.1",
"protein_id": "NP_001361187.1",
"aa_start": 640,
"aa_length": 807,
"cds_start": 1919,
"cds_length": 2424,
"cdna_start": 2145,
"cdna_length": 9807,
"mane_plus": "ENST00000644969.2"
},
[...]
],
"gene_symbol": "BRAF",
"gene_hgnc_id": 1097,
"dbsnp": "rs113488022",
"frequency_reference_population": 0.0000013692834,
"hom_count_reference_population": 0,
"allele_count_reference_population": 2,
"gnomad_exomes_af": 0.000001369279971186188,
"gnomad_genomes_af": null,
"gnomad_exomes_ac": 2,
"gnomad_genomes_ac": null,
"gnomad_exomes_homalt": 0,
"gnomad_genomes_homalt": null,
"gnomad_mito_homoplasmic": null,
"gnomad_mito_heteroplasmic": null,
"computational_score_selected": 29.799999237060547,
"computational_prediction_selected": "Pathogenic",
"computational_source_selected": "Cadd",
"splice_score_selected": 0.0,
"splice_prediction_selected": "Benign",
"splice_source_selected": "max_spliceai",
"revel_score": 0.9309999942779541,
"revel_prediction": "Pathogenic",
"alphamissense_score": 0.9926999807357788,
"alphamissense_prediction": "Pathogenic",
"bayesdelnoaf_score": 0.3400000035762787,
"bayesdelnoaf_prediction": "Pathogenic",
"phylop100way_score": 9.236000061035156,
"phylop100way_prediction": "Pathogenic",
"spliceai_max_score": 0.0,
"spliceai_max_prediction": "Benign",
"dbscsnv_ada_score": null,
"dbscsnv_ada_prediction": null,
"apogee2_score": null,
"apogee2_prediction": null,
"mitotip_score": null,
"mitotip_prediction": null,
"acmg_score": 21,
"acmg_classification": "Pathogenic",
"acmg_criteria": "PS1,PM1,PM2,PM5,PP2,PP3_Moderate,PP5_Very_Strong",
"acmg_by_gene": [

],
"clinvar_disease": "Carcinoma of colon,Papillary thyroid carcinoma,Astrocytoma, low-grade, somatic,Nongerminomatous germ cell tumor,Non-small cell lung carcinoma,not provided,Melanoma,Cardio-facio-cutaneous syndrome,Malignant melanoma of skin,Glioblastoma,Squamous cell carcinoma of the head and neck,Colonic neoplasm,Ovarian neoplasm,Brainstem glioma,Lung adenocarcinoma,Multiple myeloma,Neoplasm of the large intestine,Lung carcinoma,Neoplasm of brain,Papillary renal cell carcinoma, sporadic,Gastrointestinal stromal tumor,Neoplasm,Cystic epithelial invagination containing papillae lined by columnar epithelium,Cerebral arteriovenous malformation,Nephroblastoma,Colorectal cancer,Malignant neoplastic disease,Lymphangioma,Vascular malformation,Cardiovascular phenotype",
"clinvar_classification": "Pathogenic/Likely pathogenic",
"phenotype_combined": null,
"pathogenicity_classification_combined": null,
"custom_annotations": null
}
],
"message": null
}

Important notices:

  • To make the output more readable some consequences were removed from the listing.
  • You may see consequences_ensembl and consequences_refseq in your answer. This are depreciated fields and will be removed soon. Please use the consequences field.
  • In the request I've explicitely asked NOT to add Ensembl consequences (useEnsembl=False).
  • The null values indicates no data.
  • acmg_by_gene is populated only if you set allGenes to true in the query
  • custom_annotations is populated only if customAnnotations is given. customAnnotations is a comma delimited list of custom annotations. If used new columns are added to the output, straight from our internal database. More documentation on available fields will be added soon.

Input

Variant description

NameDefaultDescriptionRequired
chrChromosomeRequired
posPosition of the change, as in VCF fileRequired
refRefernece bases, only [ACGT]+ allowedRequired
altAlternate bases, only [ACGT]+ allowedRequired
transcriptSpecify the transcript to use for ACMG score, if not specified usually MANE is selectedOptional
gene_symbolSpecify the transcript to use for ACMG score, usually the most affected gene is selectedOptional

Parameters

NameDefaultDescriptionRequired
genomehg38You can use hg38 or hg19 here. If hg19 used, your queries will be lifted to hg38 before annotationRequired
useRefseqtrueUse transcripts from Refseq for consequences field.Optional
useEnsembltrueUse transcripts from Ensembl for consequences field.Optional
omitAcmgfalseDon't add ACMG scores in the output. Set to true if you don't need them.Optional
omitCsqfalseDon't add consequences in the output.Optional
omitBasicfalseDon't add basic annotations (GnomAD frequencies etc) in the output.Optional
omitAdvancedfalseDon't add advanced annotations (ClinVar frequencies etc) in the output.Optional
omitNormalizationfalseDon't normalize variants. Use only if you are sure they are normalized already.Optional
allGenesfalseCompute ACMG score for all genes in this region.Optional
customAnnotationsemptyComma delimited list of custom annotations to be applied. Consult with documentation for recognized values.Optional
annotatorsnpeffWhich annotator to use. Please leave empty for now.Optional

Output

FieldDescription
chrChromosome where the variant is located. If lifting was required, this represents the new location.
posPosition of the variant on the chromosome. If lifting was required, this represents the new location.
refReference allele, i.e., the base found in the reference genome. This may differ from your query if lifting was required.
altAlternate allele, i.e., the base differing from the reference genome. This may differ from your query if lifting was required.
effectSelected effect of the variant (e.g., missense_variant), typically computed for the most relevant transcript, usually the MANE transcript.
transcriptSelected transcript ID (e.g., RefSeq or Ensembl). Typically, this is the MANE transcript of the most affected gene.
consequencesAn array of computed possible consequences.
consequences.aa_refReference amino acid before the mutation.
consequences.aa_altAlternate amino acid after the mutation.
consequences.canonicalIndicates whether the transcript is the canonical (main) transcript for the gene (true or false). Not always populated.
consequences.protein_codingIndicates if the transcript is protein-coding (true or false).
consequences.consequencesList of predicted biological consequences of the variant on the protein (e.g., missense_variant). Uses Sequence Ontology terms.
consequences.exon_rankThe exon number where the variant is located.
consequences.exon_countTotal number of exons in the transcript.
consequences.gene_symbolThe symbol of the gene where the variant is located (e.g., BRAF).
consequences.gene_hgnc_idHGNC ID for the gene.
consequences.hgvs_cHGVS notation describing the variant at the cDNA level.
consequences.hgvs_pHGVS notation describing the variant at the protein level.
consequences.transcriptTranscript ID for this consequence.
consequences.protein_idProtein ID linked to the transcript.
consequences.aa_startStart position of the affected amino acid in the protein sequence.
consequences.aa_lengthTotal length of the protein sequence.
consequences.cds_startStart position of the coding sequence (CDS) affected by the variant.
consequences.cds_lengthTotal length of the coding sequence.
consequences.cdna_startStart position of the variant in the cDNA sequence.
consequences.cdna_lengthTotal length of the cDNA sequence.
consequences.mane_plusMANE Plus Clinical transcript ID (a reference transcript for clinical reporting).
gene_symbolSelected gene symbol where the variant occurs.
gene_hgnc_idSelected HGNC ID for the gene.
dbsnpdbSNP ID for the variant (if present).
frequency_reference_populationAggregated frequency of the variant in various population databases (currently GnomAD Genomes and Exomes). May be null if no reliable data is available (e.g., due to low coverage or filtering).
hom_count_reference_populationTotal number of homozygous individuals for this variant in population databases (currently GnomAD Genomes and Exomes).
allele_count_reference_populationTotal allele count for the variant across all individuals in population databases (currently GnomAD Genomes and Exomes).
gnomad_exomes_afAllele frequency in gnomAD exome data.
gnomad_genomes_afAllele frequency in gnomAD genome data (may be null if unavailable).
gnomad_exomes_acAllele count in gnomAD exome data.
gnomad_genomes_acAllele count in gnomAD genome data (may be null if unavailable).
gnomad_exomes_homaltHomozygous alternate count in gnomAD exome data.
gnomad_genomes_homaltHomozygous alternate count in gnomAD genome data (may be null if unavailable).
gnomad_mito_homoplasmicHomoplasmic variant count in mitochondrial data from gnomAD (if applicable).
gnomad_mito_heteroplasmicHeteroplasmic variant count in mitochondrial data from gnomAD (if applicable).
computational_score_selectedComputational prediction score from the most reliable tool for variant pathogenicity (e.g., CADD, REVEL).
computational_prediction_selectedPrediction label based on the computational score (e.g., "Pathogenic", "Benign").
computational_source_selectedSource of the computational prediction (e.g., CADD, REVEL).
splice_score_selectedMaximum splice effect prediction score for the variant, predicted by the most reliable tool.
splice_prediction_selectedPrediction of whether the variant affects splicing (e.g., "Benign", "Pathogenic").
splice_source_selectedSource of the splicing prediction (e.g., SpliceAI).
revel_scoreREVEL score for variant pathogenicity prediction.
revel_predictionREVEL prediction label (e.g., "Pathogenic").
alphamissense_scoreAlphaMissense score for missense variant pathogenicity.
alphamissense_predictionAlphaMissense prediction label (e.g., "Pathogenic").
bayesdelnoaf_scoreBayesDelNoAF score for variant pathogenicity prediction.
bayesdelnoaf_predictionBayesDelNoAF prediction label (e.g., "Pathogenic").
phylop100way_scorePhyloP score for evolutionary conservation at the variant position (higher scores suggest greater conservation).
phylop100way_predictionPhyloP prediction label (e.g., "Pathogenic").
spliceai_max_scoreMaximum SpliceAI score for splicing impact prediction. This is the highest value from AL, DL, AG, and DG scores.
spliceai_max_predictionSpliceAI prediction label (e.g., "Benign").
dbscsnv_ada_scoreADA score from dbscSNV for splicing impact prediction (if available).
dbscsnv_ada_predictionADA prediction label (if available).
acmg_scoreACMG (American College of Medical Genetics) score for the variant, automatically evaluated based on GeneBe implementation.
acmg_classificationACMG classification (e.g., "Pathogenic", "Likely Pathogenic").
acmg_criteriaSpecific ACMG criteria met by the variant (e.g., PS1, PM1), comma-separated.
clinvar_diseaseList of diseases associated with the variant in ClinVar.
clinvar_classificationClinVar classification for the variant (e.g., "Pathogenic", "Likely Pathogenic").

Moreover, at the top level there is a message field, that may contain important message. Usually null.

Example of POST endpoint

It is very similar to the GET endpoint, just allows user to annotate multiple entries at once. You can send up to 1,000 variants in one request, but usually it's better to send them in smaller chunks, not to get timeout on some more computationally intensive request. Test for example batches of 500 variants. For the parameters and the description of the output please read the GET documentation above.

The body of the post is a JSON list of variants:

[
{
"chr": "string",
"pos": 0,
"ref": "string",
"alt": "string",
"transcript": "string",
"gene_symbol": "string"
}
]

where transcript and gene_symbol are optional (and rarely used). Take a look at the table in the GET documentation for more information.

To continue the example of BRAF V600E from the GET documentation above, let's create a body and curl it to the API:


curl -X 'POST' \
'https://api.genebe.net/cloud/api-public/v1/variants?useRefseq=True&useEnsembl=True&omitAcmg=False&omitCsq=False&omitBasic=False&omitAdvanced=False&omitNormalization=False&allGenes=False&genome=hg38' \
-H 'accept: */*' \
-H 'Content-Type: application/json' \
-d '[
{
"chr": "7",
"pos": 140753336,
"ref": "A",
"alt": "T"
}
]'

And again we get:


{
"variants": [
{
"chr": "7",
"pos": 140753336,
"ref": "A",
"alt": "T",
"effect": "missense_variant",
"transcript": "NM_001374258.1",
"consequences": [
{
"aa_ref": "V",
"aa_alt": "E",
"canonical": false,
"protein_coding": true,
"consequences": [
"missense_variant"
],
"exon_rank": 16,
"exon_count": 20,
"gene_symbol": "BRAF",
"gene_hgnc_id": 1097,
"hgvs_c": "c.1919T>A",
"hgvs_p": "p.Val640Glu",
"transcript": "NM_001374258.1",
"protein_id": "NP_001361187.1",
"aa_start": 640,
"aa_length": 807,
"cds_start": 1919,
"cds_length": 2424,
"cdna_start": 2145,
"cdna_length": 9807,
"mane_plus": "ENST00000644969.2"
},
...
],
"gene_symbol": "BRAF",
"gene_hgnc_id": null,
"dbsnp": "113488022",
"frequency_reference_population": 0.0000013692834,
"hom_count_reference_population": 0,
"allele_count_reference_population": 2,
"gnomad_exomes_af": 0.000001369279971186188,
"gnomad_genomes_af": null,
"gnomad_exomes_ac": 2,
"gnomad_genomes_ac": null,
"gnomad_exomes_homalt": 0,
"gnomad_genomes_homalt": null,
"gnomad_mito_homoplasmic": null,
"gnomad_mito_heteroplasmic": null,
"computational_prediction_selected": "Pathogenic",
"splice_prediction_selected": "Benign",
"revel_score": 0.9309999942779541,
"revel_prediction": "Pathogenic",
"alphamissense_score": 0.9926999807357788,
"alphamissense_prediction": "Pathogenic",
"bayesdelnoaf_score": 0.3400000035762787,
"bayesdelnoaf_prediction": "Pathogenic",
"phylop100way_score": 9.236000061035156,
"phylop100way_prediction": "Pathogenic",
"spliceai_max_score": 0,
"spliceai_max_prediction": "Benign",
"dbscsnv_ada_score": null,
"dbscsnv_ada_prediction": null,
"apogee2_score": null,
"apogee2_prediction": null,
"mitotip_score": null,
"mitotip_prediction": null,
"acmg_score": 21,
"acmg_classification": "Pathogenic",
"acmg_criteria": "PS1,PM1,PM2,PM5,PP2,PP3_Moderate,PP5_Very_Strong",
"acmg_by_gene": [],
"clinvar_disease": "Carcinoma of colon,Papillary thyroid carcinoma,Astrocytoma, low-grade, somatic,Nongerminomatous germ cell tumor,Non-small cell lung carcinoma,not provided,Melanoma,Cardio-facio-cutaneous syndrome,Malignant melanoma of skin,Glioblastoma,Squamous cell carcinoma of the head and neck,Colonic neoplasm,Ovarian neoplasm,Brainstem glioma,Lung adenocarcinoma,Multiple myeloma,Neoplasm of the large intestine,Lung carcinoma,Neoplasm of brain,Papillary renal cell carcinoma, sporadic,Gastrointestinal stromal tumor,Neoplasm,Cystic epithelial invagination containing papillae lined by columnar epithelium,Cerebral arteriovenous malformation,Nephroblastoma,Colorectal cancer,Malignant neoplastic disease,Lymphangioma,Vascular malformation,Cardiovascular phenotype",
"clinvar_classification": "Pathogenic/Likely pathogenic",
"phenotype_combined": null,
"pathogenicity_classification_combined": null,
"custom_annotations": null
}
],
"message": null
}