Gene

LaminDB provides access to the following public gene ontologies through bionty.

Here we show how to access and search gene ontologies.

# pip install lamindb
!lamin init --storage ./test-public-ontologies --modules bionty
 initialized lamindb: testuser1/test-public-ontologies
import bionty as bt
import pandas as pd
 connected lamindb: testuser1/test-public-ontologies

PublicOntology objects

Let us create a PublicOntology object with public(), which links a default public ontology source from Source:

public = bt.Gene.public(organism="human")
public
PublicOntology
Entity: Gene
Organism: human
Source: ensembl, release-114
#terms: 91673

Just like you can with registries, you can export the PublicOntology object as a DataFrame:

df = public.to_dataframe()
df.head()
ensembl_gene_id symbol ncbi_gene_id biotype description synonyms
0 ENSG00000000003 TSPAN6 7105 protein_coding tetraspanin 6 T245|TSPAN-6|TM4SF6
1 ENSG00000000005 TNMD 64102 protein_coding tenomodulin TEM|MYODULIN|TENDIN|CHM1L|BRICD4
2 ENSG00000000419 DPM1 8813 protein_coding dolichyl-phosphate mannosyltransferase subunit... CDGIE|MPDS
3 ENSG00000000457 SCYL3 57147 protein_coding SCY1 like pseudokinase 3 PACE-1|PACE1
4 ENSG00000000460 FIRRM 55732 protein_coding FIGNL1 interacting regulator of recombination ... MEICA1|APOLO1|FLIP|FLJ10706|C1ORF112

Unlike registries, you can also export it as a Pronto object via public.to_pronto().

Look up terms

As for registries, terms can be looked up with auto-complete:

lookup = public.lookup()

The . accessor provides normalized terms (lower case, only contains alphanumeric characters and underscores):

lookup.tcf7
Gene(ensembl_gene_id='ENSG00000081059', symbol='TCF7', ncbi_gene_id='6932', biotype='protein_coding', description='transcription factor 7 ', synonyms='TCF-1')

To look up the exact original strings, convert the lookup object to dict and use the [] accessor:

lookup_dict = lookup.dict()
lookup_dict["TCF7"]
Gene(ensembl_gene_id='ENSG00000081059', symbol='TCF7', ncbi_gene_id='6932', biotype='protein_coding', description='transcription factor 7 ', synonyms='TCF-1')

By default, the name field is used to generate lookup keys. You can specify another field to look up:

lookup = public.lookup(public.ncbi_gene_id)

If multiple entries are matched, they are returned as a list:

lookup.bt_100126572
Gene(ensembl_gene_id='ENSG00000203733', symbol='GJE1', ncbi_gene_id='100126572', biotype='protein_coding', description='gap junction protein epsilon 1 ', synonyms='CX23')

Search terms

Search behaves in the same way as it does for registries:

public.search("TP53").head(3)
ensembl_gene_id symbol ncbi_gene_id biotype description synonyms
8172 ENSG00000141510 TP53 7157 protein_coding tumor protein p53 LFS1|P53
15296 ENSG00000182165 TP53TG1 None lncRNA TP53 target 1 LINC00096|TP53LC12|H_RG012D21.9|TP53AP1
15655 ENSG00000183632 TP53TG3 24150 protein_coding TP53 target 3 P53TG3|TP53TG3A

By default, search also covers synonyms and all other fields containing strings:

public.search("PDL1").head(3)
ensembl_gene_id symbol ncbi_gene_id biotype description synonyms
5052 ENSG00000120217 CD274 29126 protein_coding CD274 molecule B7H1|PDL1|B7-H1|B7-H|PD-L1|PDCD1LG1
566 ENSG00000040275 SPDL1 54908 protein_coding spindle apparatus coiled-coil protein 1 FLJ20364|HSPINDLY|CCDC99
31300 ENSG00000229570 GAPDHP58 None processed_pseudogene glyceraldehyde 3 phosphate dehydrogenase pseud... GAPDL1|GAPDHL1

You can turn search only in symbols by passing field="symbol":

public.search("PDL1", field="symbol").head(3)
ensembl_gene_id symbol ncbi_gene_id biotype description synonyms
566 ENSG00000040275 SPDL1 54908 protein_coding spindle apparatus coiled-coil protein 1 FLJ20364|HSPINDLY|CCDC99

Search specific field (by default, search is done on all fields containing strings):

public.search("tumor protein p53", field=public.description).head()
ensembl_gene_id symbol ncbi_gene_id biotype description synonyms
1013 ENSG00000067369 TP53BP1 7158 protein_coding tumor protein p53 binding protein 1 P202|TDRD30|53BP1
1444 ENSG00000078804 TP53INP2 58476 protein_coding tumor protein p53 inducible nuclear protein 2 DKFZP434O0827|FLJ21759|C20ORF110|DJ1181N3.1|FL...
4395 ENSG00000115129 TP53I3 9540 protein_coding tumor protein p53 inducible protein 3 PIG3
5087 ENSG00000120471 TP53AIP1 63970 protein_coding tumor protein p53 regulated apoptosis inducing... P53AIP1
8172 ENSG00000141510 TP53 7157 protein_coding tumor protein p53 LFS1|P53

Standardize gene identifiers

Let us generate a DataFrame that stores a number of gene identifiers, some of which corrupted:

data = {
    "gene symbol": ["A1CF", "A1BG", "FANCD1", "corrupted"],
    "ncbi id": ["29974", "1", "5133", "corrupted"],
    "ensembl_gene_id": [
        "ENSG00000148584",
        "ENSG00000121410",
        "ENSG00000188389",
        "ENSGcorrupted",
    ],
}
df_orig = pd.DataFrame(data).set_index("ensembl_gene_id")
df_orig
gene symbol ncbi id
ensembl_gene_id
ENSG00000148584 A1CF 29974
ENSG00000121410 A1BG 1
ENSG00000188389 FANCD1 5133
ENSGcorrupted corrupted corrupted

First we can check whether any of our values are validated against the ontology reference:

validated = public.validate(df_orig.index, public.ensembl_gene_id)
df_orig.index[~validated]
! 1 unique term (25.00%) is not validated: 'ENSGcorrupted'
Index(['ENSGcorrupted'], dtype='object', name='ensembl_gene_id')

Next, we validate which symbols are mappable against the ontology:

# based on NCBI gene ID
public.validate(df_orig["ncbi id"], public.ncbi_gene_id)
! 1 unique term (25.00%) is not validated: 'corrupted'
array([ True,  True,  True, False])
# based on Gene symbols
validated_symbols = public.validate(df_orig["gene symbol"], public.symbol)
df_orig["gene symbol"][~validated_symbols]
! 2 unique terms (50.00%) are not validated: 'FANCD1', 'corrupted'
ensembl_gene_id
ENSG00000188389       FANCD1
ENSGcorrupted      corrupted
Name: gene symbol, dtype: object

Here, 2 of the gene symbols are not validated. Inspect why:

public.inspect(df_orig["gene symbol"], public.symbol);
! 2 unique terms (50.00%) are not validated for symbol: 'FANCD1', 'corrupted'
   detected 1 unique terms with synonym: FANCD1
→  standardize terms via .standardize()

Logging suggests to use .standardize():

mapped_symbol_synonyms = public.standardize(df_orig["gene symbol"])
mapped_symbol_synonyms
['A1CF', 'A1BG', 'BRCA2', 'corrupted']

Optionally, you can return a mapper in the form of {synonym1: standardized_name1, ...}:

public.standardize(df_orig["gene symbol"], return_mapper=True)
{'FANCD1': 'BRCA2'}

We can use the standardized symbols as the new standardized index:

df_curated = df_orig.reset_index()
df_curated.index = mapped_symbol_synonyms
df_curated
ensembl_gene_id gene symbol ncbi id
A1CF ENSG00000148584 A1CF 29974
A1BG ENSG00000121410 A1BG 1
BRCA2 ENSG00000188389 FANCD1 5133
corrupted ENSGcorrupted corrupted corrupted

You can convert identifiers by passing return_field to standardize():

public.standardize(
    df_curated.index,
    field=public.symbol,
    return_field=public.ensembl_gene_id,
)
['ENSG00000148584', 'ENSG00000121410', 'ENSG00000139618', 'corrupted']

And return mappable identifiers as a dict:

public.standardize(
    df_curated.index,
    field=public.symbol,
    return_field=public.ensembl_gene_id,
    return_mapper=True,
)
{'A1BG': 'ENSG00000121410',
 'BRCA2': 'ENSG00000139618',
 'A1CF': 'ENSG00000148584'}

Ontology source versions

For any given entity, we can choose from a number of versions:

bt.Source.filter(entity="bionty.Gene").to_dataframe()
Hide code cell output
uid entity organism name version in_db currently_used description url md5 source_website is_locked created_at branch_id space_id created_by_id run_id dataframe_artifact_id
id
42 7SwZGnr2 bionty.Gene saccharomyces cerevisiae ensembl release-114 False True Ensembl s3://bionty-assets/df_saccharomyces cerevisiae... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
41 2wv9SRzv bionty.Gene mouse ensembl release-114 False True Ensembl s3://bionty-assets/df_mouse__ensembl__release-... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
40 2w43l1YS bionty.Gene human ensembl release-114 False True Ensembl s3://bionty-assets/df_human__ensembl__release-... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
# only lists the sources that are currently used
bt.Source.filter(entity="bionty.Gene", currently_used=True).to_dataframe()
uid entity organism name version in_db currently_used description url md5 source_website is_locked created_at branch_id space_id created_by_id run_id dataframe_artifact_id
id
42 7SwZGnr2 bionty.Gene saccharomyces cerevisiae ensembl release-114 False True Ensembl s3://bionty-assets/df_saccharomyces cerevisiae... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
41 2wv9SRzv bionty.Gene mouse ensembl release-114 False True Ensembl s3://bionty-assets/df_mouse__ensembl__release-... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
40 2w43l1YS bionty.Gene human ensembl release-114 False True Ensembl s3://bionty-assets/df_human__ensembl__release-... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None

When instantiating a Bionty object, we can choose a source or version:

source = bt.Source.get(name="ensembl", version="release-114", organism="human")
public = bt.Gene.public(source=source)
public
PublicOntology
Entity: Gene
Organism: human
Source: ensembl, release-114
#terms: 91673

The currently used ontologies can be displayed using:

bt.Source.filter(currently_used=True).to_dataframe()
Hide code cell output
uid entity organism name version in_db currently_used description url md5 source_website is_locked created_at branch_id space_id created_by_id run_id dataframe_artifact_id
id
67 5JnVODh4 BioSample all ncbi 2023-09 False True NCBI BioSample attributes s3://bionty-assets/df_all__ncbi__2023-09__BioS... None https://www.ncbi.nlm.nih.gov/biosample/docs/at... False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
66 7au3ZQrD bionty.Ethnicity human hancestro 2025-10-14 False True Human Ancestry Ontology http://purl.obolibrary.org/obo/hancestro/relea... None https://github.com/EBISPOT/hancestro False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
65 6na9vRls bionty.DevelopmentalStage mouse mmusdv 2025-01-23 False True Mouse Developmental Stages https://github.com/obophenotype/developmental-... None https://github.com/obophenotype/developmental-... False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
64 7JO1x6p1 bionty.DevelopmentalStage human hsapdv 2025-01-23 False True Human Developmental Stages https://github.com/obophenotype/developmental-... None https://github.com/obophenotype/developmental-... False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
62 ugaIoIlj Drug all dron 2024-08-05 False True Drug Ontology http://purl.obolibrary.org/obo/dron/releases/2... None https://bioportal.bioontology.org/ontologies/DRON False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
61 3rm9aOzL BFXPipeline all lamin 1.0.0 False True Bioinformatics Pipeline s3://bionty-assets/df_all__lamin__1.0.0__BFXpi... None https://lamin.ai False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
59 2UZHts8n bionty.Pathway all go 2025-10-10 False True Gene Ontology http://purl.obolibrary.org/obo/go/releases/202... None http://geneontology.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
57 h5EFbQNJ bionty.Phenotype human hp 2026-01-08 False True Human Phenotype Ontology https://github.com/obophenotype/human-phenotyp... None https://hpo.jax.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
56 2rMQe2ZH bionty.Phenotype all pato 2025-05-14 False True Phenotype And Trait Ontology http://purl.obolibrary.org/obo/pato/releases/2... None https://github.com/pato-ontology/pato False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
55 7DFdvM5S bionty.ExperimentalFactor all efo 3.85.0 False True The Experimental Factor Ontology http://www.ebi.ac.uk/efo/releases/v3.85.0/efo.owl None https://bioportal.bioontology.org/ontologies/EFO False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
53 5pzW1FWn bionty.Disease human doid 2025-12-23 False True Human Disease Ontology http://purl.obolibrary.org/obo/doid/releases/2... None https://disease-ontology.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
52 1gZ2spLp bionty.Disease all mondo 2026-01-06 False True Mondo Disease Ontology http://purl.obolibrary.org/obo/mondo/releases/... None https://mondo.monarchinitiative.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
51 2zRjWH6J bionty.Tissue all uberon 2025-12-04 False True Uberon multi-species anatomy ontology http://purl.obolibrary.org/obo/uberon/releases... None http://obophenotype.github.io/uberon False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
50 6Z0wRdof bionty.CellType all cl 2025-12-17 False True Cell Ontology http://purl.obolibrary.org/obo/cl/releases/202... None https://obophenotype.github.io/cell-ontology False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
47 5kJm0APo bionty.CellLine all cellosaurus 53.0 False True Cellosaurus s3://bionty-assets/df_all__cellosaurus__53.0__... None https://www.cellosaurus.org/ False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
46 7bV5uJo3 bionty.CellMarker mouse cellmarker 2.0 False True CellMarker s3://bionty-assets/mouse_cellmarker_2.0_CellMa... None http://bio-bigdata.hrbmu.edu.cn/CellMarker False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
45 3kDh8qAX bionty.CellMarker human cellmarker 2.0 False True CellMarker s3://bionty-assets/human_cellmarker_2.0_CellMa... None http://bio-bigdata.hrbmu.edu.cn/CellMarker False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
44 01RWXN2V bionty.Protein mouse uniprot 2024-03 False True Uniprot s3://bionty-assets/df_mouse__uniprot__2024-03_... None https://www.uniprot.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
43 3EYyGRYN bionty.Protein human uniprot 2024-03 False True Uniprot s3://bionty-assets/df_human__uniprot__2024-03_... None https://www.uniprot.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
42 7SwZGnr2 bionty.Gene saccharomyces cerevisiae ensembl release-114 False True Ensembl s3://bionty-assets/df_saccharomyces cerevisiae... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
41 2wv9SRzv bionty.Gene mouse ensembl release-114 False True Ensembl s3://bionty-assets/df_mouse__ensembl__release-... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
40 2w43l1YS bionty.Gene human ensembl release-114 False True Ensembl s3://bionty-assets/df_human__ensembl__release-... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
39 3wWO3xfZ bionty.Organism all ncbitaxon 2025-12-03 False True NCBItaxon Ontology http://purl.obolibrary.org/obo/ncbitaxon/2025-... None https://github.com/obophenotype/ncbitaxon False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
38 7GPHh16S bionty.Organism plants ensembl release-57 False True Ensembl https://ftp.ensemblgenomes.ebi.ac.uk/pub/plant... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
37 2PmTrc8x bionty.Organism metazoa ensembl release-57 False True Ensembl https://ftp.ensemblgenomes.ebi.ac.uk/pub/metaz... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
36 6s9nV6xh bionty.Organism fungi ensembl release-57 False True Ensembl https://ftp.ensemblgenomes.ebi.ac.uk/pub/fungi... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
35 6bbVUTCS bionty.Organism bacteria ensembl release-57 False True Ensembl https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacte... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None
34 6o9usTh3 bionty.Organism vertebrates ensembl release-114 False True Ensembl https://ftp.ensembl.org/pub/release-114/specie... None https://www.ensembl.org False 2026-01-27 17:31:29.036000+00:00 1 1 3 None None