Jupyter Notebook

Gene Ontology (GO)

In this notebook we manage a pathway registry based on “2023 GO Biological Process” ontology. We’ll walk you through the steps of registering pathways and link them to genes.

In the Cell type annotation and pathway analysis notebook, we’ll demonstrate how to perform a pathway enrichment analysis and track the dataset with LaminDB.

# pip install lamindb gseapy
!lamin init --storage ./use-cases-registries --modules bionty
Hide code cell output
 initialized lamindb: testuser1/use-cases-registries
import lamindb as ln
import bionty as bt
import gseapy as gp
Hide code cell output
 connected lamindb: testuser1/use-cases-registries

Fetch GO pathways annotated with human genes using Enrichr

First we fetch the "GO_Biological_Process_2023" pathways for humans using GSEApy which wraps GSEA and Enrichr.

go_bp = gp.get_library(name="GO_Biological_Process_2025", organism="Human")
print(f"Number of pathways {len(go_bp)}")
Hide code cell output
Number of pathways 5341
go_bp["ATF6-mediated Unfolded Protein Response (GO:0036500)"]
Hide code cell output
['MBTPS1', 'MBTPS2', 'XBP1', 'ATF6B', 'MANF', 'DDIT3', 'CREBZF']

Parse out the ontology_id from keys, convert into the format of {ontology_id: (name, genes)}

def parse_ontology_id_from_keys(key):
    """Parse out the ontology id.

    "ATF6-mediated Unfolded Protein Response (GO:0036500)" -> ("GO:0036500", "ATF6-mediated Unfolded Protein Response")
    """
    name, id = key.rsplit(" (", 1)
    return id.rstrip(")"), name
go_bp_parsed = {
    parse_ontology_id_from_keys(k)[0]: (parse_ontology_id_from_keys(k)[1], v)
    for k, v in go_bp.items()
}
go_bp_parsed["GO:0036500"]
Hide code cell output
('ATF6-mediated Unfolded Protein Response',
 ['MBTPS1', 'MBTPS2', 'XBP1', 'ATF6B', 'MANF', 'DDIT3', 'CREBZF'])

Register pathway ontology in LaminDB

source = bt.Source.get(name="go")
source
Hide code cell output
Source(uid='2UZHts8n', entity='bionty.Pathway', organism='all', name='go', version='2025-10-10', in_db=False, currently_used=True, description='Gene Ontology', url='http://purl.obolibrary.org/obo/go/releases/2025-10-10/extensions/go-plus.owl', md5=None, source_website='http://geneontology.org', branch_id=1, space_id=1, created_by_id=3, run_id=None, dataframe_artifact_id=None, created_at=2026-01-27 17:31:50 UTC, is_locked=False)
bionty = bt.Pathway.public(source=source)
bionty
Hide code cell output
PublicOntology
Entity: Pathway
Organism: all
Source: go, 2025-10-10
#terms: 80453

Next, we register all the pathways and genes in LaminDB to finally link pathways to genes.

Register pathway terms

To register the pathways we make use of .from_values to directly parse the annotated GO pathway ontology IDs into LaminDB.

pathways = bt.Pathway.from_values(go_bp_parsed.keys(), bt.Pathway.ontology_id).save()
Hide code cell output
! ontology ID BFO:0000015 not found in DataFrame
 starting creation of 10454 Pathway_parents records in batches of 10000

Register gene symbols

Similarly, we use .from_values for all Pathway associated genes to register them with LaminDB.

all_genes = bt.Gene.standardize(sum(go_bp.values(), []), organism="human")
genes = bt.Gene.from_values(all_genes, organism="human").save()
Hide code cell output
! found 35 synonyms in public source (output truncated): [np.str_('C17ORF99'), np.str_('C6ORF89'), np.str_('C9ORF78'), np.str_('C15ORF62'), np.str_('C2ORF69'), np.str_('C19ORF12'), np.str_('CPAP'), np.str_('C9ORF72'), np.str_('C12ORF57'), np.str_('HEMK2'), '...']
  please add corresponding Gene records via (output truncated): `.from_values([np.str_('C17ORF99'), np.str_('C6ORF89'), np.str_('C9ORF78'), np.str_('C15ORF62'), np.str_('C2ORF69'), np.str_('C19ORF12'), np.str_('CPAP'), np.str_('C9ORF72'), np.str_('C12ORF57'), np.str_('HEMK2'), '...'])`
! ambiguous validation in Bionty for 1006 records: 'GART', 'HSPA1L', 'HSPA1A', 'HSPA1B', 'CCT8', 'KGD4', 'ATAT1', 'TRIM71', 'DHX36', 'CMTR2', 'PKLR', 'LDHA', 'SLC25A24', 'ATF6B', 'MAGEL2', 'TRIM27', 'PTPRC', 'GPS2', 'AKAP17A', 'SLC39A7', ...
 starting creation of 16187 Gene records in batches of 10000

Manually register the 32 non-validated symbols:

inspect_result = bt.Gene.inspect(all_genes, organism="human")
organism = bt.Organism.get(name="human")

nonval_genes = []
for g in inspect_result.non_validated:
    nonval_genes.append(bt.Gene(symbol=g, organism=organism))

ln.save(nonval_genes)
Hide code cell output
! received 14217 unique terms, 154953 empty/duplicated terms are ignored
! 32 unique terms (0.20%) are not validated for symbol: 'LOC112694756', 'LOC102724971', 'IGL', 'LOC102723407', 'APOBEC3A_B', 'LOC102724560', 'TNFAIP8L2-SCNM1', 'TRA', 'LOC124905743', 'CCL4L1', ...
   couldn't validate 32 terms: 'LOC102724652', 'DUX1', 'TRA', 'DNAAF19', 'CCL3L1', 'LOC102724971', 'LOC112268384', 'RBMY1C', 'LOC100533997', 'LOC102724560', 'TMEM278', 'LOC102725023', 'FSAF1', 'LOC124905743', 'VMA22', 'CHLSN', 'LOC102723407', 'LOC107987479', 'LOC112694756', 'SLC67A1', ...
→  if you are sure, create new records via Gene() and save to your registry
! you are trying to create a record with name='IGL' but records with similar symbols exist: 'IGLL5', 'IGLL1', 'PIGL'. Did you mean to load one of them?
! you are trying to create a record with name='TRA' but records with similar symbols exist: 'TRAF5', 'TRAF6', 'TRAPPC2L'. Did you mean to load one of them?
! you are trying to create a record with name='CCL4L1' but records with similar symbols exist: 'CCL4', 'CCL4', 'CCL4'. Did you mean to load one of them?
! you are trying to create a record with name='CCL3L1' but records with similar symbols exist: 'CCL3', 'CCL3', 'CCL3'. Did you mean to load one of them?
! you are trying to create a record with name='DNAAF19' but records with similar symbols exist: 'DNAAF4', 'DNAAF3', 'DNAAF2'. Did you mean to load one of them?
! you are trying to create a record with name='TMEM278' but records with similar symbols exist: 'TMEM230', 'TMEM203', 'TMEM231'. Did you mean to load one of them?
! you are trying to create a record with name='RBMY1C' but records with similar symbols exist: 'RBMY1B', 'RBMY1F', 'RBMY1E'. Did you mean to load one of them?