Cell type annotation and pathway analysis

Please make sure that you run the GO Ontology notebook before this one so that your CellType and Pathway registries are populated.

!lamin connect use-cases-registries
Hide code cell output
 connected lamindb: testuser1/use-cases-registries
 to map a local dev directory, call: lamin settings set dev-dir .
# pip install lamindb celltypist gseapy
import lamindb as ln
import bionty as bt
from lamin_usecases import datasets as ds
import scanpy as sc
import matplotlib.pyplot as plt
import celltypist
import gseapy as gp
Hide code cell output
 connected lamindb: testuser1/use-cases-registries
/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/celltypist/classifier.py:11: FutureWarning: `__version__` is deprecated, use `importlib.metadata.version('scanpy')` instead
  from scanpy import __version__ as scv

An interferon-beta treated dataset

A small peripheral blood mononuclear cell dataset that is split into control and stimulated groups. The stimulated group was treated with interferon beta. Let’s load the dataset and perform some preprocessing:

adata = ds.anndata_seurat_ifnb(preprocess=False, populate_registries=True)
adata
Hide code cell output
AnnData object with n_obs × n_vars = 13999 × 9731
    obs: 'stim'
    var: 'symbol'
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pp.pca(adata, n_comps=20)
sc.pp.neighbors(adata, n_pcs=10)
sc.tl.umap(adata)

Analysis: Cell type annotation using CellTypist

model = celltypist.models.Model.load(model="Immune_All_Low.pkl")
Hide code cell output
🔎 No available models. Downloading...
📜 Retrieving model list from server https://celltypist.cog.sanger.ac.uk/models/models.json
📚 Total models in list: 60
📂 Storing models in /home/runner/.celltypist/data/models
💾 Downloading model [1/60]: Immune_All_Low.pkl
💾 Downloading model [2/60]: Immune_All_High.pkl
💾 Downloading model [3/60]: Adult_COVID19_PBMC.pkl
💾 Downloading model [4/60]: Adult_CynomolgusMacaque_Hippocampus.pkl
💾 Downloading model [5/60]: Adult_Human_MTG.pkl
💾 Downloading model [6/60]: Adult_Human_PancreaticIslet.pkl
💾 Downloading model [7/60]: Adult_Human_PrefrontalCortex.pkl
💾 Downloading model [8/60]: Adult_Human_Skin.pkl
💾 Downloading model [9/60]: Adult_Human_Vascular.pkl
💾 Downloading model [10/60]: Adult_Mouse_Gut.pkl
💾 Downloading model [11/60]: Adult_Mouse_OlfactoryBulb.pkl
💾 Downloading model [12/60]: Adult_Pig_Hippocampus.pkl
💾 Downloading model [13/60]: Adult_RhesusMacaque_Hippocampus.pkl
💾 Downloading model [14/60]: Adult_cHSPCs_Illumina.pkl
💾 Downloading model [15/60]: Adult_cHSPCs_Ultima.pkl
💾 Downloading model [16/60]: Autopsy_COVID19_Lung.pkl
💾 Downloading model [17/60]: COVID19_HumanChallenge_Blood.pkl
💾 Downloading model [18/60]: COVID19_Immune_Landscape.pkl
💾 Downloading model [19/60]: Cells_Adult_Breast.pkl
💾 Downloading model [20/60]: Cells_Fetal_Lung.pkl
💾 Downloading model [21/60]: Cells_Human_Tonsil.pkl
💾 Downloading model [22/60]: Cells_Intestinal_Tract.pkl
💾 Downloading model [23/60]: Cells_Lung_Airway.pkl
💾 Downloading model [24/60]: Developing_Human_Brain.pkl
💾 Downloading model [25/60]: Developing_Human_Gonads.pkl
💾 Downloading model [26/60]: Developing_Human_Hippocampus.pkl
💾 Downloading model [27/60]: Developing_Human_Organs.pkl
💾 Downloading model [28/60]: Developing_Human_Thymus.pkl
💾 Downloading model [29/60]: Developing_Mouse_Brain.pkl
💾 Downloading model [30/60]: Developing_Mouse_Hippocampus.pkl
💾 Downloading model [31/60]: Fetal_Human_AdrenalGlands.pkl
💾 Downloading model [32/60]: Fetal_Human_Pancreas.pkl
💾 Downloading model [33/60]: Fetal_Human_Pituitary.pkl
💾 Downloading model [34/60]: Fetal_Human_Retina.pkl
💾 Downloading model [35/60]: Fetal_Human_Skin.pkl
💾 Downloading model [36/60]: Healthy_Adult_Heart.pkl
💾 Downloading model [37/60]: Healthy_COVID19_PBMC.pkl
💾 Downloading model [38/60]: Healthy_Human_Liver.pkl
💾 Downloading model [39/60]: Healthy_Mouse_Liver.pkl
💾 Downloading model [40/60]: Human_AdultAged_Hippocampus.pkl
💾 Downloading model [41/60]: Human_Colorectal_Cancer.pkl
💾 Downloading model [42/60]: Human_Developmental_Retina.pkl
💾 Downloading model [43/60]: Human_Embryonic_YolkSac.pkl
💾 Downloading model [44/60]: Human_Endometrium_Atlas.pkl
💾 Downloading model [45/60]: Human_IPF_Lung.pkl
💾 Downloading model [46/60]: Human_Longitudinal_Hippocampus.pkl
💾 Downloading model [47/60]: Human_Lung_Atlas.pkl
💾 Downloading model [48/60]: Human_PF_Lung.pkl
💾 Downloading model [49/60]: Human_Placenta_Decidua.pkl
💾 Downloading model [50/60]: Lethal_COVID19_Lung.pkl
💾 Downloading model [51/60]: Mouse_Dendritic_Subtypes.pkl
💾 Downloading model [52/60]: Mouse_Dentate_Gyrus.pkl
💾 Downloading model [53/60]: Mouse_Isocortex_Hippocampus.pkl
💾 Downloading model [54/60]: Mouse_Postnatal_DentateGyrus.pkl
💾 Downloading model [55/60]: Mouse_Whole_Brain.pkl
💾 Downloading model [56/60]: Nuclei_Human_InnerEar.pkl
💾 Downloading model [57/60]: Nuclei_Lung_Airway.pkl
💾 Downloading model [58/60]: PaediatricAdult_COVID19_Airway.pkl
💾 Downloading model [59/60]: PaediatricAdult_COVID19_PBMC.pkl
💾 Downloading model [60/60]: Pan_Fetal_Human.pkl
predictions = celltypist.annotate(
    adata, model="Immune_All_Low.pkl", majority_voting=True
)
adata.obs["cell_type_celltypist"] = predictions.predicted_labels.majority_voting
Hide code cell output
🔬 Input data has 13999 cells and 9731 genes
🔗 Matching reference genes in the model
🧬 3641 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Detected a neighborhood graph in the input object, will run over-clustering on the basis of it
⛓️ Over-clustering input data with resolution set to 10
🗳️ Majority voting the predictions
✅ Majority voting done!
adata.obs["cell_type_celltypist"] = bt.CellType.standardize(
    adata.obs["cell_type_celltypist"]
)
sc.pl.umap(
    adata,
    color=["cell_type_celltypist", "stim"],
    frameon=False,
    legend_fontsize=10,
    wspace=0.4,
)
... storing 'cell_type_celltypist' as categorical
_images/a76f10a19634631bb835e8715ecba9e343aeca01dc1519b9b5a88f20892da955.png

Analysis: Pathway enrichment analysis using Enrichr

This analysis is based on the GSEApy scRNA-seq Example.

First, we compute differentially expressed genes using a Wilcoxon test between stimulated and control cells.

# compute differentially expressed genes
sc.tl.rank_genes_groups(
    adata,
    groupby="stim",
    use_raw=False,
    method="wilcoxon",
    groups=["STIM"],
    reference="CTRL",
)

rank_genes_groups_df = sc.get.rank_genes_groups_df(adata, "STIM")
rank_genes_groups_df.head()
Hide code cell output
names scores logfoldchanges pvals pvals_adj
0 ISG15 99.317345 7.148426 0.0 0.0
1 ISG20 96.655273 5.100947 0.0 0.0
2 IFI6 94.771217 5.828259 0.0 0.0
3 IFIT3 92.449509 7.450851 0.0 0.0
4 IFIT1 90.678734 8.078267 0.0 0.0

Next, we filter out up/down-regulated differentially expressed gene sets:

degs_up = rank_genes_groups_df[
    (rank_genes_groups_df["logfoldchanges"] > 0)
    & (rank_genes_groups_df["pvals_adj"] < 0.05)
]
degs_dw = rank_genes_groups_df[
    (rank_genes_groups_df["logfoldchanges"] < 0)
    & (rank_genes_groups_df["pvals_adj"] < 0.05)
]
degs_up.shape, degs_dw.shape
Hide code cell output
((535, 5), (939, 5))

Run pathway enrichment analysis on DEGs and plot top 10 pathways:

enr_up = gp.enrichr(degs_up.names, gene_sets="GO_Biological_Process_2023").res2d
gp.dotplot(enr_up, figsize=(2, 3), title="Up", cmap=plt.cm.autumn_r);
_images/2f5f7b51c568342877bc0995fdab5e428f460f05ad820e6a438da7c848f94847.png
enr_dw = gp.enrichr(degs_dw.names, gene_sets="GO_Biological_Process_2023").res2d
gp.dotplot(enr_dw, figsize=(2, 3), title="Down", cmap=plt.cm.winter_r);
_images/8387c461f1ca2323eed349bbf1236a1f8750c92a4f5f78a61a983c6e96008bd2.png

Annotate & save dataset

gRegister new features and labels (check out more details here):

ln.Feature(name="cell_type_celltypist", dtype=bt.CellType).save()
ln.Feature(name="stim", dtype=str).save()
obs_schema = ln.Schema(
    name="celltype_obs_schema",
    features=[
        ln.Feature(name="cell_type_celltypist", dtype=bt.CellType).save(),
        ln.Feature(name="stim", dtype=str).save(),
    ],
).save()
var_schema = ln.Schema(
    name="gene_var_schema",
    itype=bt.Gene,
).save()

schema = ln.Schema(
    name="anndata_seurat_ifnb_schema",
    slots={"obs": obs_schema, "var.T": var_schema},
    otype="AnnData",
).save()
Hide code cell output
 returning feature with same name: 'cell_type_celltypist'
 returning feature with same name: 'stim'

Register dataset using an Artifact object:

artifact = ln.Artifact.from_anndata(
    adata,
    description="seurat_ifnb_activated_Bcells",
    schema=schema,
).save()
artifact.describe()
Hide code cell output
! no run & transform got linked, call `ln.track()` & re-run
 writing the in-memory object into cache
 loading artifact into memory for validation
 returning schema with same hash: Schema(uid='9vqWMLrBO9mi9sIo', is_type=False, name='celltype_obs_schema', description=None, n_members=2, coerce=None, flexible=False, itype='Feature', otype=None, hash='7p_v8-GHwatOLM5xmlmoAg', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=3, run_id=None, type_id=None, created_at=2026-01-27 17:36:45 UTC, is_locked=False)
 not annotating with 11053 features for slot var.T as it exceeds 1000 (ln.settings.annotation.n_max_records)
Artifact:  (0000)
|   description: seurat_ifnb_activated_Bcells
├── uid: DIhydOnoFqt2wCsI0000            run:                 
kind: dataset                        otype: AnnData       
hash: ati52pj8HYV83unTto3jE_         size: 202.0 MB       
branch: main                         space: all           
created_at: 2026-01-27 17:36:47 UTC  created_by: testuser1
n_observations: 13999                                     
├── storage/path: 
/home/runner/work/lamin-usecases/lamin-usecases/docs/use-cases-registries/.lamindb/DIhydOnoFqt2wCsI0000.h5ad
├── Dataset features
├── obs (2)                                                                                                    
│   cell_type_celltypist           bionty.CellType                      B cell, central memory CD4-positive, a…
│   stim                           str                                                                         
└── var.T (11053 bionty.Gene.sym…                                                                              
└── Labels
    └── .cell_types                    bionty.CellType                      B cell, dendritic cell, human, natural…

Manage pathway objects

Let’s create two schemas (two feature sets) for degs_up and degs_dw:

schema_degs_up = ln.Schema.from_values(
    degs_up.names,
    bt.Gene.symbol,
    name="Up-regulated DEGs STIM vs CTRL",
    organism="human",
).save()
schema_degs_dw = ln.Schema.from_values(
    degs_dw.names,
    bt.Gene.symbol,
    name="Down-regulated DEGs STIM vs CTRL",
    organism="human",
).save()

Link the top 10 pathways to the corresponding differentially expressed genes:

def parse_ontology_id_from_enrichr_results(key):
    """Parse out the ontology id.

    "ATF6-mediated Unfolded Protein Response (GO:0036500)" -> ("GO:0036500", "ATF6-mediated Unfolded Protein Response")
    """
    id = key.split(" ")[-1].replace("(", "").replace(")", "")
    name = key.replace(f" ({id})", "")
    return (id, name)


# get ontology ids for the top 10 pathways
enr_up_top10 = [
    pw_id[0]
    for pw_id in enr_up.head(10).Term.apply(parse_ontology_id_from_enrichr_results)
]
enr_dw_top10 = [
    pw_id[0]
    for pw_id in enr_dw.head(10).Term.apply(parse_ontology_id_from_enrichr_results)
]

# get pathway objects
enr_up_top10_pathways = bt.Pathway.from_values(enr_up_top10, bt.Pathway.ontology_id)
enr_dw_top10_pathways = bt.Pathway.from_values(enr_dw_top10, bt.Pathway.ontology_id)

Associate the pathways to the differentially expressed genes:

schema_degs_up.pathways.set(enr_up_top10_pathways)
schema_degs_dw.pathways.set(enr_dw_top10_pathways)
schema_degs_up.pathways.to_list("name")
Hide code cell output
['cellular response to cytokine stimulus',
 'defense response to symbiont',
 'defense response to virus',
 'negative regulation of viral genome replication',
 'negative regulation of viral process',
 'positive regulation of cytokine production',
 'regulation of viral genome replication',
 'response to cytokine',
 'response to interferon-beta',
 'response to type II interferon']

With this, we stored the result of the differential expression analysis via schema objects where each schema object links a gene set and its set of enriched pathways in the dataset.

This allows queries along the lines below.

Query pathways

Querying for pathways contains “interferon-beta” in the name:

bt.Pathway.filter(name__contains="interferon-beta").to_dataframe()
Hide code cell output
uid name ontology_id abbr synonyms description is_locked created_at branch_id space_id created_by_id run_id source_id
id
4885 3VZq4dMeKlygVU response to interferon-beta GO:0035456 None response to fibroblast interferon|response to ... Any Process That Results In A Change In State ... False 2026-01-27 17:32:15.538000+00:00 1 1 3 None 59
4277 54R2a0elVmUTQY regulation of interferon-beta production GO:0032648 None regulation of IFN-beta production Any Process That Modulates The Frequency, Rate... False 2026-01-27 17:32:15.483000+00:00 1 1 3 None 59
3121 3x0xmK1yUYa5Uk positive regulation of interferon-beta production GO:0032728 None up-regulation of interferon-beta production|po... Any Process That Activates Or Increases The Fr... False 2026-01-27 17:32:15.388000+00:00 1 1 3 None 59
2150 1NzHDJDiVT0xDa negative regulation of interferon-beta production GO:0032688 None downregulation of interferon-beta production|n... Any Process That Stops, Prevents, Or Reduces T... False 2026-01-27 17:32:15.305000+00:00 1 1 3 None 59
689 1l4z0v8WGwcxuN cellular response to interferon-beta GO:0035458 None cellular response to fiblaferon|cellular respo... Any Process That Results In A Change In State ... False 2026-01-27 17:32:15.167000+00:00 1 1 3 None 59

Query pathways from a gene:

bt.Pathway.filter(genes__symbol="IFITM1").to_dataframe()
Hide code cell output
uid name ontology_id abbr synonyms description is_locked created_at branch_id space_id created_by_id run_id source_id
id
5251 3dRO41YW68H1ME type I interferon-mediated signaling pathway GO:0060337 None type I interferon-activated signaling pathway|... The Series Of Molecular Signals Initiated By T... False 2026-01-27 17:32:15.567000+00:00 1 1 3 None 59
4924 7m7ayBKAWfxEoI response to type II interferon GO:0034341 None response to immune interferon Any Process That Results In A Change In State ... False 2026-01-27 17:32:15.538000+00:00 1 1 3 None 59
4885 3VZq4dMeKlygVU response to interferon-beta GO:0035456 None response to fibroblast interferon|response to ... Any Process That Results In A Change In State ... False 2026-01-27 17:32:15.538000+00:00 1 1 3 None 59
4884 1Mkdeon3xT6DKg response to interferon-alpha GO:0035455 None response to lymphoblastoid interferon|response... Any Process That Results In A Change In State ... False 2026-01-27 17:32:15.532000+00:00 1 1 3 None 59
4859 6WvSjRrrPb7dFT response to cytokine GO:0034097 None response to cytokine stimulus Any Process That Results In A Change In State ... False 2026-01-27 17:32:15.532000+00:00 1 1 3 None 59
4792 41BNWUg3sxpyoS regulation of viral genome replication GO:0045069 None None Any Process That Modulates The Frequency, Rate... False 2026-01-27 17:32:15.526000+00:00 1 1 3 None 59
4498 2qCl1QmE5rYFHy regulation of osteoblast differentiation GO:0045667 None None Any Process That Modulates The Frequency, Rate... False 2026-01-27 17:32:15.504000+00:00 1 1 3 None 59
4050 1co8167Esy6IID regulation of cell population proliferation GO:0042127 None None Any Process That Modulates The Frequency, Rate... False 2026-01-27 17:32:15.466000+00:00 1 1 3 None 59
4046 4LqrUcXAKER15Q regulation of cell migration GO:0030334 None None Any Process That Modulates The Frequency, Rate... False 2026-01-27 17:32:15.466000+00:00 1 1 3 None 59
3300 66TxxRHcFq47cP positive regulation of osteoblast differentiation GO:0045669 None up-regulation of osteoblast differentiation|up... Any Process That Activates Or Increases The Fr... False 2026-01-27 17:32:15.399000+00:00 1 1 3 None 59
2945 2xtOQMxdMWpZOC positive regulation of cell differentiation GO:0045597 None up-regulation of cell differentiation|up regul... Any Process That Activates Or Increases The Fr... False 2026-01-27 17:32:15.372000+00:00 1 1 3 None 59
2442 2DyRG9whXU3xJB negative regulation of viral process GO:0048525 None None Any Process That Stops, Prevents, Or Reduces T... False 2026-01-27 17:32:15.327000+00:00 1 1 3 None 59
2440 KduBJgyedVzK2X negative regulation of viral genome replication GO:0045071 None down-regulation of viral genome replication|do... Any Process That Stops, Prevents, Or Reduces T... False 2026-01-27 17:32:15.327000+00:00 1 1 3 None 59
2034 3rjhK2RNOFztVV negative regulation of cellular process GO:0048523 None downregulation of cellular process|down-regula... Any Process That Stops, Prevents, Or Reduces T... False 2026-01-27 17:32:15.294000+00:00 1 1 3 None 59
2024 701eu1FQnCjB3N negative regulation of cell population prolife... GO:0008285 None down-regulation of cell proliferation|downregu... Any Process That Stops, Prevents Or Reduces Th... False 2026-01-27 17:32:15.294000+00:00 1 1 3 None 59
2023 51PBemXLpN96fT negative regulation of cell motility GO:2000146 None negative regulation of cell locomotion|negativ... Any Process That Stops, Prevents, Or Reduces T... False 2026-01-27 17:32:15.294000+00:00 1 1 3 None 59
2021 5dL3FOu3nuaoMl negative regulation of cell migration GO:0030336 None downregulation of cell migration|down regulati... Any Process That Stops, Prevents, Or Reduces T... False 2026-01-27 17:32:15.294000+00:00 1 1 3 None 59
1402 2QMDbGJZh3R1bF interferon-mediated signaling pathway GO:0140888 None interferon signaling pathway|interferon-activa... The Series Of Molecular Signals Initiated By T... False 2026-01-27 17:32:15.225000+00:00 1 1 3 None 59
1344 36S4bCrBw6ebbo host-mediated suppression of symbiont invasion GO:0046597 None negative regulation of viral penetration into ... A Process In Which A Host Inhibits Or Disrupts... False 2026-01-27 17:32:15.220000+00:00 1 1 3 None 59
880 43JHYr10qy8F6o defense response to virus GO:0051607 None defense response to viruses|antiviral response... Reactions Triggered In Response To The Presenc... False 2026-01-27 17:32:15.182000+00:00 1 1 3 None 59
749 1pWhXgT75NiXXf cellular response to type I interferon GO:0071357 None cellular response to type I IFN Any Process That Results In A Change In State ... False 2026-01-27 17:32:15.172000+00:00 1 1 3 None 59

Query artifacts from a pathway:

ln.Artifact.filter(feature_sets__pathways__name__icontains="interferon-beta").first()

Query schemas from a pathway to learn from which geneset this pathway was computed:

pathway = bt.Pathway.get(ontology_id="GO:0035456")
pathway
Hide code cell output
Pathway(uid='3VZq4dMeKlygVU', name='response to interferon-beta', ontology_id='GO:0035456', abbr=None, synonyms='response to fibroblast interferon|response to interferon beta|response to fiblaferon', description='Any Process That Results In A Change In State Or Activity Of A Cell Or An Organism (In Terms Of Movement, Secretion, Enzyme Production, Gene Expression, Etc.) As A Result Of An Interferon-Beta Stimulus. Interferon-Beta Is A Type I Interferon.', branch_id=1, space_id=1, created_by_id=3, run_id=None, source_id=59, created_at=2026-01-27 17:32:15 UTC, is_locked=False)
degs = ln.Schema.get(pathways__ontology_id=pathway.ontology_id)

Get the list of genes that are differentially expressed and belong to this pathway:

contributing_genes = pathway.genes.all() & degs.genes.all()
contributing_genes.to_list("symbol")
Hide code cell output
['MNDA',
 'AIM2',
 'PNPT1',
 'PLSCR1',
 'OAS1',
 'IRF1',
 'CALM1',
 'STAT1',
 'IFITM2',
 'IFI16',
 'IFITM3',
 'IFITM1',
 'BST2',
 'SHFL',
 'XAF1']
Hide code cell content
# clean up test instance
!rm -r ./use-cases-registries
!lamin delete --force use-cases-registries