Gene Ontology (GO) ¶

In this notebook we manage a pathway registry based on “2023 GO Biological Process” ontology. We’ll walk you through the steps of registering pathways and link them to genes.

In the Cell type annotation and pathway analysis notebook, we’ll demonstrate how to perform a pathway enrichment analysis and track the dataset with LaminDB.

# pip install lamindb gseapy
!lamin init --storage ./use-cases-registries --modules bionty

import lamindb as ln
import bionty as bt
import gseapy as gp

Fetch GO pathways annotated with human genes using Enrichr¶

First we fetch the "GO_Biological_Process_2023" pathways for humans using GSEApy which wraps GSEA and Enrichr.

go_bp = gp.get_library(name="GO_Biological_Process_2025", organism="Human")
print(f"Number of pathways {len(go_bp)}")

go_bp["ATF6-mediated Unfolded Protein Response (GO:0036500)"]

Parse out the ontology_id from keys, convert into the format of {ontology_id: (name, genes)}

def parse_ontology_id_from_keys(key):
    """Parse out the ontology id.

    "ATF6-mediated Unfolded Protein Response (GO:0036500)" -> ("GO:0036500", "ATF6-mediated Unfolded Protein Response")
    """
    name, id = key.rsplit(" (", 1)
    return id.rstrip(")"), name

go_bp_parsed = {
    parse_ontology_id_from_keys(k)[0]: (parse_ontology_id_from_keys(k)[1], v)
    for k, v in go_bp.items()
}

go_bp_parsed["GO:0036500"]

Register pathway ontology in LaminDB¶

source = bt.Source.get(name="go")
source

bionty = bt.Pathway.public(source=source)
bionty

Next, we register all the pathways and genes in LaminDB to finally link pathways to genes.

Register pathway terms¶

To register the pathways we make use of .from_values to directly parse the annotated GO pathway ontology IDs into LaminDB.

pathways = bt.Pathway.from_values(go_bp_parsed.keys(), bt.Pathway.ontology_id).save()

Register gene symbols¶

Similarly, we use .from_values for all Pathway associated genes to register them with LaminDB.

all_genes = bt.Gene.standardize(sum(go_bp.values(), []), organism="human")
genes = bt.Gene.from_values(all_genes, organism="human").save()

Manually register the 32 non-validated symbols:

inspect_result = bt.Gene.inspect(all_genes, organism="human")
organism = bt.Organism.get(name="human")

nonval_genes = []
for g in inspect_result.non_validated:
    nonval_genes.append(bt.Gene(symbol=g, organism=organism))

ln.save(nonval_genes)

Link pathway to genes¶

Now that we are tracking all pathways and genes records, we can link both of them to make the pathways even more queryable.

symbols_genes = {record.symbol: record for record in genes}

for pathway in pathways:
    pathway_genes = go_bp_parsed.get(pathway.ontology_id)[1]
    pathway_genes_records = [symbols_genes.get(gene) for gene in pathway_genes]
    pathway.genes.set(pathway_genes_records)

Now genes are linked to pathways:

pathway.genes.to_list("symbol")

pathway.genes.to_list("ensembl_gene_id")