Cell type annotation and pathway analysis¶
Please make sure that you run the GO Ontology notebook before this one so that your CellType and Pathway registries are populated.
!lamin connect use-cases-registries
Show code cell output
→ connected lamindb: testuser1/use-cases-registries
• to map a local dev directory, call: lamin settings set dev-dir .
# pip install lamindb celltypist gseapy
import lamindb as ln
import bionty as bt
from lamin_usecases import datasets as ds
import scanpy as sc
import matplotlib.pyplot as plt
import celltypist
import gseapy as gp
Show code cell output
→ connected lamindb: testuser1/use-cases-registries
/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/celltypist/classifier.py:11: FutureWarning: `__version__` is deprecated, use `importlib.metadata.version('scanpy')` instead
from scanpy import __version__ as scv
An interferon-beta treated dataset¶
A small peripheral blood mononuclear cell dataset that is split into control and stimulated groups. The stimulated group was treated with interferon beta. Let’s load the dataset and perform some preprocessing:
adata = ds.anndata_seurat_ifnb(preprocess=False, populate_registries=True)
adata
Show code cell output
AnnData object with n_obs × n_vars = 13999 × 9731
obs: 'stim'
var: 'symbol'
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pp.pca(adata, n_comps=20)
sc.pp.neighbors(adata, n_pcs=10)
sc.tl.umap(adata)
Analysis: Cell type annotation using CellTypist¶
model = celltypist.models.Model.load(model="Immune_All_Low.pkl")
Show code cell output
🔎 No available models. Downloading...
📜 Retrieving model list from server https://celltypist.cog.sanger.ac.uk/models/models.json
📚 Total models in list: 60
📂 Storing models in /home/runner/.celltypist/data/models
💾 Downloading model [1/60]: Immune_All_Low.pkl
💾 Downloading model [2/60]: Immune_All_High.pkl
💾 Downloading model [3/60]: Adult_COVID19_PBMC.pkl
💾 Downloading model [4/60]: Adult_CynomolgusMacaque_Hippocampus.pkl
💾 Downloading model [5/60]: Adult_Human_MTG.pkl
💾 Downloading model [6/60]: Adult_Human_PancreaticIslet.pkl
💾 Downloading model [7/60]: Adult_Human_PrefrontalCortex.pkl
💾 Downloading model [8/60]: Adult_Human_Skin.pkl
💾 Downloading model [9/60]: Adult_Human_Vascular.pkl
💾 Downloading model [10/60]: Adult_Mouse_Gut.pkl
💾 Downloading model [11/60]: Adult_Mouse_OlfactoryBulb.pkl
💾 Downloading model [12/60]: Adult_Pig_Hippocampus.pkl
💾 Downloading model [13/60]: Adult_RhesusMacaque_Hippocampus.pkl
💾 Downloading model [14/60]: Adult_cHSPCs_Illumina.pkl
💾 Downloading model [15/60]: Adult_cHSPCs_Ultima.pkl
💾 Downloading model [16/60]: Autopsy_COVID19_Lung.pkl
💾 Downloading model [17/60]: COVID19_HumanChallenge_Blood.pkl
💾 Downloading model [18/60]: COVID19_Immune_Landscape.pkl
💾 Downloading model [19/60]: Cells_Adult_Breast.pkl
💾 Downloading model [20/60]: Cells_Fetal_Lung.pkl
💾 Downloading model [21/60]: Cells_Human_Tonsil.pkl
💾 Downloading model [22/60]: Cells_Intestinal_Tract.pkl
💾 Downloading model [23/60]: Cells_Lung_Airway.pkl
💾 Downloading model [24/60]: Developing_Human_Brain.pkl
💾 Downloading model [25/60]: Developing_Human_Gonads.pkl
💾 Downloading model [26/60]: Developing_Human_Hippocampus.pkl
💾 Downloading model [27/60]: Developing_Human_Organs.pkl
💾 Downloading model [28/60]: Developing_Human_Thymus.pkl
💾 Downloading model [29/60]: Developing_Mouse_Brain.pkl
💾 Downloading model [30/60]: Developing_Mouse_Hippocampus.pkl
💾 Downloading model [31/60]: Fetal_Human_AdrenalGlands.pkl
💾 Downloading model [32/60]: Fetal_Human_Pancreas.pkl
💾 Downloading model [33/60]: Fetal_Human_Pituitary.pkl
💾 Downloading model [34/60]: Fetal_Human_Retina.pkl
💾 Downloading model [35/60]: Fetal_Human_Skin.pkl
💾 Downloading model [36/60]: Healthy_Adult_Heart.pkl
💾 Downloading model [37/60]: Healthy_COVID19_PBMC.pkl
💾 Downloading model [38/60]: Healthy_Human_Liver.pkl
💾 Downloading model [39/60]: Healthy_Mouse_Liver.pkl
💾 Downloading model [40/60]: Human_AdultAged_Hippocampus.pkl
💾 Downloading model [41/60]: Human_Colorectal_Cancer.pkl
💾 Downloading model [42/60]: Human_Developmental_Retina.pkl
💾 Downloading model [43/60]: Human_Embryonic_YolkSac.pkl
💾 Downloading model [44/60]: Human_Endometrium_Atlas.pkl
💾 Downloading model [45/60]: Human_IPF_Lung.pkl
💾 Downloading model [46/60]: Human_Longitudinal_Hippocampus.pkl
💾 Downloading model [47/60]: Human_Lung_Atlas.pkl
💾 Downloading model [48/60]: Human_PF_Lung.pkl
💾 Downloading model [49/60]: Human_Placenta_Decidua.pkl
💾 Downloading model [50/60]: Lethal_COVID19_Lung.pkl
💾 Downloading model [51/60]: Mouse_Dendritic_Subtypes.pkl
💾 Downloading model [52/60]: Mouse_Dentate_Gyrus.pkl
💾 Downloading model [53/60]: Mouse_Isocortex_Hippocampus.pkl
💾 Downloading model [54/60]: Mouse_Postnatal_DentateGyrus.pkl
💾 Downloading model [55/60]: Mouse_Whole_Brain.pkl
💾 Downloading model [56/60]: Nuclei_Human_InnerEar.pkl
💾 Downloading model [57/60]: Nuclei_Lung_Airway.pkl
💾 Downloading model [58/60]: PaediatricAdult_COVID19_Airway.pkl
💾 Downloading model [59/60]: PaediatricAdult_COVID19_PBMC.pkl
💾 Downloading model [60/60]: Pan_Fetal_Human.pkl
predictions = celltypist.annotate(
adata, model="Immune_All_Low.pkl", majority_voting=True
)
adata.obs["cell_type_celltypist"] = predictions.predicted_labels.majority_voting
Show code cell output
🔬 Input data has 13999 cells and 9731 genes
🔗 Matching reference genes in the model
🧬 3641 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Detected a neighborhood graph in the input object, will run over-clustering on the basis of it
⛓️ Over-clustering input data with resolution set to 10
🗳️ Majority voting the predictions
✅ Majority voting done!
adata.obs["cell_type_celltypist"] = bt.CellType.standardize(
adata.obs["cell_type_celltypist"]
)
sc.pl.umap(
adata,
color=["cell_type_celltypist", "stim"],
frameon=False,
legend_fontsize=10,
wspace=0.4,
)
... storing 'cell_type_celltypist' as categorical
Analysis: Pathway enrichment analysis using Enrichr¶
This analysis is based on the GSEApy scRNA-seq Example.
First, we compute differentially expressed genes using a Wilcoxon test between stimulated and control cells.
# compute differentially expressed genes
sc.tl.rank_genes_groups(
adata,
groupby="stim",
use_raw=False,
method="wilcoxon",
groups=["STIM"],
reference="CTRL",
)
rank_genes_groups_df = sc.get.rank_genes_groups_df(adata, "STIM")
rank_genes_groups_df.head()
Show code cell output
| names | scores | logfoldchanges | pvals | pvals_adj | |
|---|---|---|---|---|---|
| 0 | ISG15 | 99.317345 | 7.148426 | 0.0 | 0.0 |
| 1 | ISG20 | 96.655273 | 5.100947 | 0.0 | 0.0 |
| 2 | IFI6 | 94.771217 | 5.828259 | 0.0 | 0.0 |
| 3 | IFIT3 | 92.449509 | 7.450851 | 0.0 | 0.0 |
| 4 | IFIT1 | 90.678734 | 8.078267 | 0.0 | 0.0 |
Next, we filter out up/down-regulated differentially expressed gene sets:
degs_up = rank_genes_groups_df[
(rank_genes_groups_df["logfoldchanges"] > 0)
& (rank_genes_groups_df["pvals_adj"] < 0.05)
]
degs_dw = rank_genes_groups_df[
(rank_genes_groups_df["logfoldchanges"] < 0)
& (rank_genes_groups_df["pvals_adj"] < 0.05)
]
degs_up.shape, degs_dw.shape
Show code cell output
((535, 5), (939, 5))
Run pathway enrichment analysis on DEGs and plot top 10 pathways:
enr_up = gp.enrichr(degs_up.names, gene_sets="GO_Biological_Process_2023").res2d
gp.dotplot(enr_up, figsize=(2, 3), title="Up", cmap=plt.cm.autumn_r);
enr_dw = gp.enrichr(degs_dw.names, gene_sets="GO_Biological_Process_2023").res2d
gp.dotplot(enr_dw, figsize=(2, 3), title="Down", cmap=plt.cm.winter_r);
Annotate & save dataset¶
gRegister new features and labels (check out more details here):
ln.Feature(name="cell_type_celltypist", dtype=bt.CellType).save()
ln.Feature(name="stim", dtype=str).save()
obs_schema = ln.Schema(
name="celltype_obs_schema",
features=[
ln.Feature(name="cell_type_celltypist", dtype=bt.CellType).save(),
ln.Feature(name="stim", dtype=str).save(),
],
).save()
var_schema = ln.Schema(
name="gene_var_schema",
itype=bt.Gene,
).save()
schema = ln.Schema(
name="anndata_seurat_ifnb_schema",
slots={"obs": obs_schema, "var.T": var_schema},
otype="AnnData",
).save()
Show code cell output
→ returning feature with same name: 'cell_type_celltypist'
→ returning feature with same name: 'stim'
Register dataset using an Artifact object:
artifact = ln.Artifact.from_anndata(
adata,
description="seurat_ifnb_activated_Bcells",
schema=schema,
).save()
artifact.describe()
Show code cell output
! no run & transform got linked, call `ln.track()` & re-run
→ writing the in-memory object into cache
→ loading artifact into memory for validation
→ returning schema with same hash: Schema(uid='9vqWMLrBO9mi9sIo', is_type=False, name='celltype_obs_schema', description=None, n_members=2, coerce=None, flexible=False, itype='Feature', otype=None, hash='7p_v8-GHwatOLM5xmlmoAg', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=3, run_id=None, type_id=None, created_at=2026-01-27 17:36:45 UTC, is_locked=False)
→ not annotating with 11053 features for slot var.T as it exceeds 1000 (ln.settings.annotation.n_max_records)
Artifact: (0000) | description: seurat_ifnb_activated_Bcells ├── uid: DIhydOnoFqt2wCsI0000 run: │ kind: dataset otype: AnnData │ hash: ati52pj8HYV83unTto3jE_ size: 202.0 MB │ branch: main space: all │ created_at: 2026-01-27 17:36:47 UTC created_by: testuser1 │ n_observations: 13999 ├── storage/path: │ /home/runner/work/lamin-usecases/lamin-usecases/docs/use-cases-registries/.lamindb/DIhydOnoFqt2wCsI0000.h5ad ├── Dataset features │ ├── obs (2) │ │ cell_type_celltypist bionty.CellType B cell, central memory CD4-positive, a… │ │ stim str │ └── var.T (11053 bionty.Gene.sym… └── Labels └── .cell_types bionty.CellType B cell, dendritic cell, human, natural…
Manage pathway objects¶
Let’s create two schemas (two feature sets) for degs_up and degs_dw:
schema_degs_up = ln.Schema.from_values(
degs_up.names,
bt.Gene.symbol,
name="Up-regulated DEGs STIM vs CTRL",
organism="human",
).save()
schema_degs_dw = ln.Schema.from_values(
degs_dw.names,
bt.Gene.symbol,
name="Down-regulated DEGs STIM vs CTRL",
organism="human",
).save()
Link the top 10 pathways to the corresponding differentially expressed genes:
def parse_ontology_id_from_enrichr_results(key):
"""Parse out the ontology id.
"ATF6-mediated Unfolded Protein Response (GO:0036500)" -> ("GO:0036500", "ATF6-mediated Unfolded Protein Response")
"""
id = key.split(" ")[-1].replace("(", "").replace(")", "")
name = key.replace(f" ({id})", "")
return (id, name)
# get ontology ids for the top 10 pathways
enr_up_top10 = [
pw_id[0]
for pw_id in enr_up.head(10).Term.apply(parse_ontology_id_from_enrichr_results)
]
enr_dw_top10 = [
pw_id[0]
for pw_id in enr_dw.head(10).Term.apply(parse_ontology_id_from_enrichr_results)
]
# get pathway objects
enr_up_top10_pathways = bt.Pathway.from_values(enr_up_top10, bt.Pathway.ontology_id)
enr_dw_top10_pathways = bt.Pathway.from_values(enr_dw_top10, bt.Pathway.ontology_id)
Associate the pathways to the differentially expressed genes:
schema_degs_up.pathways.set(enr_up_top10_pathways)
schema_degs_dw.pathways.set(enr_dw_top10_pathways)
schema_degs_up.pathways.to_list("name")
Show code cell output
['cellular response to cytokine stimulus',
'defense response to symbiont',
'defense response to virus',
'negative regulation of viral genome replication',
'negative regulation of viral process',
'positive regulation of cytokine production',
'regulation of viral genome replication',
'response to cytokine',
'response to interferon-beta',
'response to type II interferon']
With this, we stored the result of the differential expression analysis via schema objects where each schema object links a gene set and its set of enriched pathways in the dataset.
This allows queries along the lines below.
Query pathways¶
Querying for pathways contains “interferon-beta” in the name:
bt.Pathway.filter(name__contains="interferon-beta").to_dataframe()
Show code cell output
| uid | name | ontology_id | abbr | synonyms | description | is_locked | created_at | branch_id | space_id | created_by_id | run_id | source_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||
| 4885 | 3VZq4dMeKlygVU | response to interferon-beta | GO:0035456 | None | response to fibroblast interferon|response to ... | Any Process That Results In A Change In State ... | False | 2026-01-27 17:32:15.538000+00:00 | 1 | 1 | 3 | None | 59 |
| 4277 | 54R2a0elVmUTQY | regulation of interferon-beta production | GO:0032648 | None | regulation of IFN-beta production | Any Process That Modulates The Frequency, Rate... | False | 2026-01-27 17:32:15.483000+00:00 | 1 | 1 | 3 | None | 59 |
| 3121 | 3x0xmK1yUYa5Uk | positive regulation of interferon-beta production | GO:0032728 | None | up-regulation of interferon-beta production|po... | Any Process That Activates Or Increases The Fr... | False | 2026-01-27 17:32:15.388000+00:00 | 1 | 1 | 3 | None | 59 |
| 2150 | 1NzHDJDiVT0xDa | negative regulation of interferon-beta production | GO:0032688 | None | downregulation of interferon-beta production|n... | Any Process That Stops, Prevents, Or Reduces T... | False | 2026-01-27 17:32:15.305000+00:00 | 1 | 1 | 3 | None | 59 |
| 689 | 1l4z0v8WGwcxuN | cellular response to interferon-beta | GO:0035458 | None | cellular response to fiblaferon|cellular respo... | Any Process That Results In A Change In State ... | False | 2026-01-27 17:32:15.167000+00:00 | 1 | 1 | 3 | None | 59 |
Query pathways from a gene:
bt.Pathway.filter(genes__symbol="IFITM1").to_dataframe()
Show code cell output
| uid | name | ontology_id | abbr | synonyms | description | is_locked | created_at | branch_id | space_id | created_by_id | run_id | source_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | |||||||||||||
| 5251 | 3dRO41YW68H1ME | type I interferon-mediated signaling pathway | GO:0060337 | None | type I interferon-activated signaling pathway|... | The Series Of Molecular Signals Initiated By T... | False | 2026-01-27 17:32:15.567000+00:00 | 1 | 1 | 3 | None | 59 |
| 4924 | 7m7ayBKAWfxEoI | response to type II interferon | GO:0034341 | None | response to immune interferon | Any Process That Results In A Change In State ... | False | 2026-01-27 17:32:15.538000+00:00 | 1 | 1 | 3 | None | 59 |
| 4885 | 3VZq4dMeKlygVU | response to interferon-beta | GO:0035456 | None | response to fibroblast interferon|response to ... | Any Process That Results In A Change In State ... | False | 2026-01-27 17:32:15.538000+00:00 | 1 | 1 | 3 | None | 59 |
| 4884 | 1Mkdeon3xT6DKg | response to interferon-alpha | GO:0035455 | None | response to lymphoblastoid interferon|response... | Any Process That Results In A Change In State ... | False | 2026-01-27 17:32:15.532000+00:00 | 1 | 1 | 3 | None | 59 |
| 4859 | 6WvSjRrrPb7dFT | response to cytokine | GO:0034097 | None | response to cytokine stimulus | Any Process That Results In A Change In State ... | False | 2026-01-27 17:32:15.532000+00:00 | 1 | 1 | 3 | None | 59 |
| 4792 | 41BNWUg3sxpyoS | regulation of viral genome replication | GO:0045069 | None | None | Any Process That Modulates The Frequency, Rate... | False | 2026-01-27 17:32:15.526000+00:00 | 1 | 1 | 3 | None | 59 |
| 4498 | 2qCl1QmE5rYFHy | regulation of osteoblast differentiation | GO:0045667 | None | None | Any Process That Modulates The Frequency, Rate... | False | 2026-01-27 17:32:15.504000+00:00 | 1 | 1 | 3 | None | 59 |
| 4050 | 1co8167Esy6IID | regulation of cell population proliferation | GO:0042127 | None | None | Any Process That Modulates The Frequency, Rate... | False | 2026-01-27 17:32:15.466000+00:00 | 1 | 1 | 3 | None | 59 |
| 4046 | 4LqrUcXAKER15Q | regulation of cell migration | GO:0030334 | None | None | Any Process That Modulates The Frequency, Rate... | False | 2026-01-27 17:32:15.466000+00:00 | 1 | 1 | 3 | None | 59 |
| 3300 | 66TxxRHcFq47cP | positive regulation of osteoblast differentiation | GO:0045669 | None | up-regulation of osteoblast differentiation|up... | Any Process That Activates Or Increases The Fr... | False | 2026-01-27 17:32:15.399000+00:00 | 1 | 1 | 3 | None | 59 |
| 2945 | 2xtOQMxdMWpZOC | positive regulation of cell differentiation | GO:0045597 | None | up-regulation of cell differentiation|up regul... | Any Process That Activates Or Increases The Fr... | False | 2026-01-27 17:32:15.372000+00:00 | 1 | 1 | 3 | None | 59 |
| 2442 | 2DyRG9whXU3xJB | negative regulation of viral process | GO:0048525 | None | None | Any Process That Stops, Prevents, Or Reduces T... | False | 2026-01-27 17:32:15.327000+00:00 | 1 | 1 | 3 | None | 59 |
| 2440 | KduBJgyedVzK2X | negative regulation of viral genome replication | GO:0045071 | None | down-regulation of viral genome replication|do... | Any Process That Stops, Prevents, Or Reduces T... | False | 2026-01-27 17:32:15.327000+00:00 | 1 | 1 | 3 | None | 59 |
| 2034 | 3rjhK2RNOFztVV | negative regulation of cellular process | GO:0048523 | None | downregulation of cellular process|down-regula... | Any Process That Stops, Prevents, Or Reduces T... | False | 2026-01-27 17:32:15.294000+00:00 | 1 | 1 | 3 | None | 59 |
| 2024 | 701eu1FQnCjB3N | negative regulation of cell population prolife... | GO:0008285 | None | down-regulation of cell proliferation|downregu... | Any Process That Stops, Prevents Or Reduces Th... | False | 2026-01-27 17:32:15.294000+00:00 | 1 | 1 | 3 | None | 59 |
| 2023 | 51PBemXLpN96fT | negative regulation of cell motility | GO:2000146 | None | negative regulation of cell locomotion|negativ... | Any Process That Stops, Prevents, Or Reduces T... | False | 2026-01-27 17:32:15.294000+00:00 | 1 | 1 | 3 | None | 59 |
| 2021 | 5dL3FOu3nuaoMl | negative regulation of cell migration | GO:0030336 | None | downregulation of cell migration|down regulati... | Any Process That Stops, Prevents, Or Reduces T... | False | 2026-01-27 17:32:15.294000+00:00 | 1 | 1 | 3 | None | 59 |
| 1402 | 2QMDbGJZh3R1bF | interferon-mediated signaling pathway | GO:0140888 | None | interferon signaling pathway|interferon-activa... | The Series Of Molecular Signals Initiated By T... | False | 2026-01-27 17:32:15.225000+00:00 | 1 | 1 | 3 | None | 59 |
| 1344 | 36S4bCrBw6ebbo | host-mediated suppression of symbiont invasion | GO:0046597 | None | negative regulation of viral penetration into ... | A Process In Which A Host Inhibits Or Disrupts... | False | 2026-01-27 17:32:15.220000+00:00 | 1 | 1 | 3 | None | 59 |
| 880 | 43JHYr10qy8F6o | defense response to virus | GO:0051607 | None | defense response to viruses|antiviral response... | Reactions Triggered In Response To The Presenc... | False | 2026-01-27 17:32:15.182000+00:00 | 1 | 1 | 3 | None | 59 |
| 749 | 1pWhXgT75NiXXf | cellular response to type I interferon | GO:0071357 | None | cellular response to type I IFN | Any Process That Results In A Change In State ... | False | 2026-01-27 17:32:15.172000+00:00 | 1 | 1 | 3 | None | 59 |
Query artifacts from a pathway:
ln.Artifact.filter(feature_sets__pathways__name__icontains="interferon-beta").first()
Query schemas from a pathway to learn from which geneset this pathway was computed:
pathway = bt.Pathway.get(ontology_id="GO:0035456")
pathway
Show code cell output
Pathway(uid='3VZq4dMeKlygVU', name='response to interferon-beta', ontology_id='GO:0035456', abbr=None, synonyms='response to fibroblast interferon|response to interferon beta|response to fiblaferon', description='Any Process That Results In A Change In State Or Activity Of A Cell Or An Organism (In Terms Of Movement, Secretion, Enzyme Production, Gene Expression, Etc.) As A Result Of An Interferon-Beta Stimulus. Interferon-Beta Is A Type I Interferon.', branch_id=1, space_id=1, created_by_id=3, run_id=None, source_id=59, created_at=2026-01-27 17:32:15 UTC, is_locked=False)
degs = ln.Schema.get(pathways__ontology_id=pathway.ontology_id)
Get the list of genes that are differentially expressed and belong to this pathway:
contributing_genes = pathway.genes.all() & degs.genes.all()
contributing_genes.to_list("symbol")
Show code cell output
['MNDA',
'AIM2',
'PNPT1',
'PLSCR1',
'OAS1',
'IRF1',
'CALM1',
'STAT1',
'IFITM2',
'IFI16',
'IFITM3',
'IFITM1',
'BST2',
'SHFL',
'XAF1']
Show code cell content
# clean up test instance
!rm -r ./use-cases-registries
!lamin delete --force use-cases-registries