Bulk RNA-seq¶
# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage test-bulkrna --modules bionty
Show code cell output
→ initialized lamindb: testuser1/test-bulkrna
import lamindb as ln
import bionty as bt
import pandas as pd
import anndata as ad
from pathlib import Path
Show code cell output
→ connected lamindb: testuser1/test-bulkrna
Ingest data¶
Access
¶
We start by simulating a nf-core RNA-seq run which yields us a count matrix artifact.
(See Nextflow for running this with Nextflow.)
# pretend we're running a bulk RNA-seq pipeline
ln.track(
transform=ln.Transform(name="nf-core RNA-seq", reference="https://nf-co.re/rnaseq")
)
# create a directory for its output
Path("./test-bulkrna/output_dir").mkdir(exist_ok=True)
# get the count matrix
path = ln.core.datasets.file_tsv_rnaseq_nfcore_salmon_merged_gene_counts(
populate_registries=True
)
# move the count matrix into the output directory
path = path.rename(f"./test-bulkrna/output_dir/{path.name}")
# register the count matrix
ln.Artifact(path, description="Merged Bulk RNA counts").save()
Show code cell output
/tmp/ipykernel_3833/1244583888.py:3: FutureWarning: `name` will be removed soon, please pass 'nf-core RNA-seq' to `key` instead
transform=ln.Transform(name="nf-core RNA-seq", reference="https://nf-co.re/rnaseq")
→ created Transform('AGTzE5YhMBnw0000'), started new Run('yoSk6VTH...') at 2025-07-21 11:25:11 UTC
• recommendation: to identify the script across renames, pass the uid: ln.track("AGTzE5YhMBnw")
/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/bionty/base/dev/_io.py:131: FutureWarning: Use synchronize_to instead of synchronize_to, synchronize_to will be removed in the future.
remote_path.synchronize(localpath, error_no_origin=False, print_progress=True)
/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/bionty/base/dev/_io.py:131: FutureWarning: Use synchronize_to instead of synchronize_to, synchronize_to will be removed in the future.
remote_path.synchronize(localpath, error_no_origin=False, print_progress=True)
/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/bionty/base/dev/_io.py:131: FutureWarning: Use synchronize_to instead of synchronize_to, synchronize_to will be removed in the future.
remote_path.synchronize(localpath, error_no_origin=False, print_progress=True)
Artifact(uid='OZlm5wVhZn8vW5pD0000', is_latest=True, key='output_dir/salmon.merged.gene_counts.tsv', description='Merged Bulk RNA counts', suffix='.tsv', size=3787, hash='xxw0k3au3KtxFcgtbEr4eQ', branch_id=1, space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-07-21 11:25:13 UTC)
Transform
¶
ln.track("s5V0dNMVwL9i0000")
Let’s query the artifact:
artifact = ln.Artifact.get(description="Merged Bulk RNA counts")
df = artifact.load()
If we look at it, we realize it deviates far from the tidy data standard Wickham14, conventions of statistics & machine learning Hastie09, Murphy12 and the major Python & R data packages.
Variables are not in columns and observations are not in rows:
df
Show code cell output
gene_id | gene_name | RAP1_IAA_30M_REP1 | RAP1_UNINDUCED_REP1 | RAP1_UNINDUCED_REP2 | WT_REP1 | WT_REP2 | |
---|---|---|---|---|---|---|---|
0 | Gfp_transgene_gene | Gfp_transgene_gene | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 |
1 | HRA1 | HRA1 | 0.0 | 8.572 | 0.0 | 0.0 | 0.0 |
2 | snR18 | snR18 | 3.0 | 8.000 | 4.0 | 8.0 | 8.0 |
3 | tA(UGC)A | TGA1 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 |
4 | tL(CAA)A | SUP56 | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... |
120 | YAR064W | YAR064W | 0.0 | 2.000 | 0.0 | 0.0 | 0.0 |
121 | YAR066W | YAR066W | 3.0 | 13.000 | 8.0 | 5.0 | 11.0 |
122 | YAR068W | YAR068W | 9.0 | 28.000 | 24.0 | 5.0 | 7.0 |
123 | YAR069C | YAR069C | 0.0 | 0.000 | 0.0 | 0.0 | 1.0 |
124 | YAR070C | YAR070C | 0.0 | 0.000 | 0.0 | 0.0 | 0.0 |
125 rows × 7 columns
Let’s change that and move observations into rows:
df = df.T
df
Show code cell output
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gene_id | Gfp_transgene_gene | HRA1 | snR18 | tA(UGC)A | tL(CAA)A | tP(UGG)A | tS(AGA)A | YAL001C | YAL002W | YAL003W | YAL004W | YAL005C | YAL007C | YAL008W | YAL009W | YAL010C | YAL011W | YAL012W | YAL013W | YAL014C | YAL015C | YAL016C-A | YAL016C-B | YAL016W | YAL017W | YAL018C | YAL019W | YAL019W-A | YAL020C | YAL021C | YAL022C | YAL023C | YAL024C | YAL025C | YAL026C | YAL026C-A | YAL027W | YAL028W | YAL029C | YAL030W | YAL031C | YAL031W-A | YAL032C | YAL033W | YAL034C | YAL034C-B | YAL034W-A | YAL035W | YAL036C | YAL037C-A | YAL037C-B | YAL037W | YAL038W | YAL039C | YAL040C | YAL041W | YAL042C-A | YAL042W | YAL043C | YAL044C | YAL044W-A | YAL045C | YAL046C | YAL047C | YAL047W-A | YAL048C | YAL049C | YAL051W | YAL053W | YAL054C | YAL055W | YAL056C-A | YAL056W | YAL058W | YAL059C-A | YAL059W | YAL060W | YAL061W | YAL062W | YAL063C | YAL063C-A | YAL064C-A | YAL064W | YAL064W-B | YAL065C | YAL066W | YAL067C | YAL067W-A | YAL068C | YAL068W-A | YAL069W | YAR002C-A | YAR002W | YAR003W | YAR007C | YAR008W | YAR009C | YAR010C | YAR014C | YAR015W | YAR018C | YAR019C | YAR019W-A | YAR020C | YAR023C | YAR027W | YAR028W | YAR029W | YAR030C | YAR031W | YAR033W | YAR035C-A | YAR035W | YAR042W | YAR047C | YAR050W | YAR053W | YAR060C | YAR061W | YAR062W | YAR064W | YAR066W | YAR068W | YAR069C | YAR070C |
gene_name | Gfp_transgene_gene | HRA1 | snR18 | TGA1 | SUP56 | TRN1 | tS(AGA)A | TFC3 | VPS8 | EFB1 | YAL004W | SSA1 | ERP2 | FUN14 | SPO7 | MDM10 | SWC3 | CYS3 | DEP1 | SYN8 | NTG1 | YAL016C-A | YAL016C-B | TPD3 | PSK1 | LDS1 | FUN30 | YAL019W-A | ATS1 | CCR4 | FUN26 | PMT2 | LTE1 | MAK16 | DRS2 | YAL026C-A | SAW1 | FRT2 | MYO4 | SNC1 | GIP4 | YAL031W-A | PRP45 | POP5 | FUN19 | YAL034C-B | MTW1 | FUN12 | RBG1 | YAL037C-A | YAL037C-B | YAL037W | CDC19 | CYC3 | CLN3 | CDC24 | YAL042C-A | ERV46 | PTA1 | GCV3 | YAL044W-A | YAL045C | AIM1 | SPC72 | YAL047W-A | GEM1 | AIM2 | OAF1 | FLC2 | ACS1 | PEX22 | YAL056C-A | GPB2 | CNE1 | YAL059C-A | ECM1 | BDH1 | BDH2 | GDH3 | FLO9 | YAL063C-A | TDA8 | YAL064W | YAL064W-B | YAL065C | YAL066W | SEO1 | YAL067W-A | PAU8 | YAL068W-A | YAL069W | ERP1 | NUP60 | SWD1 | RFA1 | SEN34 | YAR009C | YAR010C | BUD14 | ADE1 | KIN3 | CDC15 | YAR019W-A | PAU7 | YAR023C | UIP3 | YAR028W | YAR029W | YAR030C | PRM9 | MST28 | YAR035C-A | YAT1 | SWH1 | YAR047C | FLO1 | YAR053W | YAR060C | YAR061W | YAR062W | YAR064W | YAR066W | YAR068W | YAR069C | YAR070C |
RAP1_IAA_30M_REP1 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 0.0 | 1.0 | 55.0 | 36.0 | 632.0 | 1.0 | 6174.0 | 46.0 | 14.0 | 14.0 | 11.0 | 10.0 | 247.0 | 8.0 | 16.0 | 12.0 | 0.0 | 0.0 | 148.0 | 100.0 | 0.0 | 101.0 | 0.0 | 12.0 | 105.0 | 42.0 | 302.0 | 49.0 | 19.0 | 122.0 | 12.0 | 10.0 | 9.0 | 178.0 | 14.0 | 38.0 | 0.0 | 14.0 | 13.0 | 16.0 | 0.0 | 8.0 | 409.0 | 49.0 | 0.0 | 1.0 | 4.0 | 5710.0 | 9.0 | 34.0 | 61.0 | 0.0 | 141.0 | 63.0 | 33.0 | 5.0 | 0.0 | 2.0 | 20.0 | 0.0 | 18.0 | 26.0 | 38.0 | 116.0 | 3.0 | 6.0 | 0.0 | 61.0 | 23.0 | 0.0 | 17.0 | 49.0 | 30.0 | 5.0 | 4.643 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 2.0 | 1.0 | 2.0 | 64.0 | 57.0 | 27.0 | 56.0 | 8.0 | 16523.0 | 5741.0 | 55.0 | 53.0 | 20.0 | 21.0 | 0.0 | 0.0 | 1.0 | 25.0 | 18.0 | 1.0 | 2.0 | 17.0 | 1.0 | 0.0 | 2.0 | 104.0 | 0.0 | 4.357 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 3.0 | 9.0 | 0.0 | 0.0 |
RAP1_UNINDUCED_REP1 | 0.0 | 8.572 | 8.0 | 0.0 | 0.0 | 0.0 | 0.0 | 72.0 | 33.0 | 810.0 | 345.089 | 6000.911 | 56.0 | 17.0 | 15.0 | 12.0 | 25.0 | 232.0 | 18.0 | 20.0 | 12.0 | 13.999 | 1.0 | 154.001 | 114.0 | 0.0 | 111.0 | 2.901 | 5.099 | 102.0 | 36.0 | 323.0 | 52.0 | 29.0 | 159.428 | 12.0 | 11.0 | 8.0 | 177.0 | 14.0 | 40.0 | 0.0 | 14.0 | 12.0 | 16.0 | 0.0 | 2.0 | 482.0 | 59.0 | 0.0 | 264.768 | 5.0 | 6162.232 | 23.0 | 44.0 | 65.0 | 12.533 | 133.467 | 67.0 | 58.0 | 0.0 | 3.0 | 3.0 | 23.0 | 1.0 | 20.0 | 37.0 | 60.0 | 129.0 | 13.0 | 2.0 | 1.0 | 49.0 | 27.0 | 0.0 | 23.0 | 53.0 | 19.0 | 27.0 | 23.28 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 3.0 | 0.0 | 5.0 | 0.0 | 1.0 | 78.0 | 60.0 | 17.0 | 67.0 | 14.0 | 17154.0 | 6178.0 | 61.0 | 63.0 | 14.0 | 35.0 | 0.0 | 0.0 | 1.0 | 34.0 | 13.0 | 2.0 | 0.0 | 15.0 | 17.0 | 0.0 | 5.0 | 105.0 | 0.0 | 15.72 | 0.0 | 0.0 | 0.0 | 3.0 | 2.0 | 13.0 | 28.0 | 0.0 | 0.0 |
RAP1_UNINDUCED_REP2 | 0.0 | 0.0 | 4.0 | 0.0 | 0.0 | 0.0 | 0.0 | 115.0 | 82.0 | 1693.0 | 1.0 | 13355.0 | 132.0 | 24.0 | 19.0 | 36.0 | 36.0 | 536.0 | 35.0 | 43.0 | 28.0 | 0.0 | 0.0 | 326.0 | 210.0 | 0.0 | 238.0 | 0.0 | 18.0 | 203.0 | 86.0 | 659.0 | 99.0 | 56.0 | 314.989 | 19.011 | 20.0 | 19.0 | 359.0 | 40.0 | 72.0 | 0.0 | 37.0 | 24.0 | 44.0 | 0.0 | 25.0 | 872.0 | 147.0 | 0.0 | 3.0 | 10.0 | 13457.0 | 39.0 | 92.0 | 140.0 | 0.0 | 291.0 | 135.0 | 123.0 | 14.0 | 1.0 | 7.0 | 40.0 | 0.0 | 46.0 | 65.0 | 119.0 | 262.0 | 7.0 | 5.0 | 0.0 | 133.0 | 49.0 | 0.0 | 42.0 | 114.0 | 82.0 | 37.0 | 13.228 | 1.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 4.0 | 0.0 | 15.0 | 1.0 | 1.0 | 156.0 | 121.0 | 35.0 | 126.0 | 18.0 | 33244.0 | 11826.0 | 168.0 | 151.0 | 28.0 | 72.0 | 0.0 | 0.0 | 5.0 | 74.0 | 56.0 | 4.0 | 2.0 | 42.0 | 20.0 | 0.0 | 4.0 | 198.0 | 2.0 | 13.772 | 0.0 | 4.0 | 0.0 | 2.0 | 0.0 | 8.0 | 24.0 | 0.0 | 0.0 |
WT_REP1 | 0.0 | 0.0 | 8.0 | 0.0 | 0.0 | 1.0 | 0.0 | 60.0 | 63.0 | 1115.0 | 0.0 | 8218.0 | 61.0 | 10.0 | 9.0 | 30.0 | 19.0 | 385.0 | 21.0 | 28.0 | 8.0 | 0.0 | 0.0 | 194.0 | 101.0 | 0.0 | 186.0 | 0.0 | 12.0 | 136.0 | 54.0 | 432.0 | 69.0 | 50.0 | 180.41 | 14.59 | 5.0 | 4.0 | 241.0 | 15.0 | 36.0 | 0.0 | 18.0 | 18.0 | 11.0 | 0.0 | 5.0 | 760.0 | 82.0 | 0.0 | 0.0 | 4.0 | 10313.0 | 12.0 | 50.0 | 89.0 | 0.0 | 168.0 | 77.0 | 47.0 | 7.0 | 0.0 | 8.0 | 19.0 | 0.0 | 26.0 | 34.0 | 82.0 | 133.0 | 5.0 | 1.0 | 0.0 | 72.0 | 39.0 | 0.0 | 33.0 | 43.0 | 9.0 | 13.0 | 4.535 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 3.0 | 0.0 | 0.0 | 85.0 | 61.0 | 22.0 | 95.0 | 4.0 | 36435.0 | 13470.0 | 68.0 | 91.0 | 25.0 | 26.0 | 0.0 | 0.0 | 2.0 | 26.0 | 16.0 | 0.0 | 1.0 | 12.0 | 5.0 | 0.0 | 12.0 | 127.0 | 1.0 | 13.465 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 5.0 | 5.0 | 0.0 | 0.0 |
WT_REP2 | 0.0 | 0.0 | 8.0 | 0.0 | 0.0 | 0.0 | 0.0 | 30.0 | 25.0 | 704.0 | 1.0 | 4279.0 | 44.0 | 3.0 | 5.0 | 10.0 | 17.0 | 230.0 | 10.0 | 17.0 | 6.0 | 0.0 | 0.0 | 104.0 | 60.0 | 0.0 | 84.0 | 0.0 | 2.0 | 79.0 | 27.0 | 244.0 | 46.0 | 21.0 | 123.638 | 5.362 | 2.0 | 4.0 | 139.0 | 8.0 | 15.0 | 0.0 | 13.0 | 7.0 | 6.0 | 0.0 | 6.0 | 390.0 | 48.0 | 0.0 | 0.0 | 1.0 | 6339.0 | 5.0 | 36.0 | 50.0 | 0.0 | 102.0 | 44.0 | 25.0 | 3.0 | 0.0 | 6.0 | 14.0 | 0.0 | 10.0 | 12.0 | 39.0 | 68.0 | 1.0 | 2.0 | 0.0 | 38.0 | 27.0 | 0.0 | 20.0 | 29.0 | 5.0 | 4.0 | 1.109 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 64.0 | 40.0 | 14.0 | 45.0 | 5.0 | 17184.0 | 6132.0 | 50.0 | 39.0 | 14.0 | 18.0 | 0.0 | 0.0 | 0.0 | 19.0 | 11.0 | 0.0 | 0.0 | 9.0 | 4.0 | 0.0 | 2.0 | 75.0 | 0.0 | 6.891 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 11.0 | 7.0 | 1.0 | 0.0 |
Now, it’s clear that the first two rows are in fact no observations, but descriptions of the variables (or features) themselves.
Let’s create an AnnData object to model this. First, create a dataframe for the variables:
var = pd.DataFrame({"gene_name": df.loc["gene_name"].values}, index=df.loc["gene_id"])
var.head()
Show code cell output
gene_name | |
---|---|
gene_id | |
Gfp_transgene_gene | Gfp_transgene_gene |
HRA1 | HRA1 |
snR18 | snR18 |
tA(UGC)A | TGA1 |
tL(CAA)A | SUP56 |
Now, let’s create an AnnData object:
# we're also fixing the datatype here, which was string in the tsv
adata = ad.AnnData(df.iloc[2:].astype("float32"), var=var)
adata
Show code cell output
AnnData object with n_obs × n_vars = 5 × 125
var: 'gene_name'
The AnnData object is in tidy form and complies with conventions of statistics and machine learning:
adata.to_df()
Show code cell output
gene_id | Gfp_transgene_gene | HRA1 | snR18 | tA(UGC)A | tL(CAA)A | tP(UGG)A | tS(AGA)A | YAL001C | YAL002W | YAL003W | YAL004W | YAL005C | YAL007C | YAL008W | YAL009W | YAL010C | YAL011W | YAL012W | YAL013W | YAL014C | YAL015C | YAL016C-A | YAL016C-B | YAL016W | YAL017W | YAL018C | YAL019W | YAL019W-A | YAL020C | YAL021C | YAL022C | YAL023C | YAL024C | YAL025C | YAL026C | YAL026C-A | YAL027W | YAL028W | YAL029C | YAL030W | YAL031C | YAL031W-A | YAL032C | YAL033W | YAL034C | YAL034C-B | YAL034W-A | YAL035W | YAL036C | YAL037C-A | YAL037C-B | YAL037W | YAL038W | YAL039C | YAL040C | YAL041W | YAL042C-A | YAL042W | YAL043C | YAL044C | YAL044W-A | YAL045C | YAL046C | YAL047C | YAL047W-A | YAL048C | YAL049C | YAL051W | YAL053W | YAL054C | YAL055W | YAL056C-A | YAL056W | YAL058W | YAL059C-A | YAL059W | YAL060W | YAL061W | YAL062W | YAL063C | YAL063C-A | YAL064C-A | YAL064W | YAL064W-B | YAL065C | YAL066W | YAL067C | YAL067W-A | YAL068C | YAL068W-A | YAL069W | YAR002C-A | YAR002W | YAR003W | YAR007C | YAR008W | YAR009C | YAR010C | YAR014C | YAR015W | YAR018C | YAR019C | YAR019W-A | YAR020C | YAR023C | YAR027W | YAR028W | YAR029W | YAR030C | YAR031W | YAR033W | YAR035C-A | YAR035W | YAR042W | YAR047C | YAR050W | YAR053W | YAR060C | YAR061W | YAR062W | YAR064W | YAR066W | YAR068W | YAR069C | YAR070C |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RAP1_IAA_30M_REP1 | 0.0 | 0.000 | 3.0 | 0.0 | 0.0 | 0.0 | 1.0 | 55.0 | 36.0 | 632.0 | 1.000000 | 6174.000000 | 46.0 | 14.0 | 14.0 | 11.0 | 10.0 | 247.0 | 8.0 | 16.0 | 12.0 | 0.000 | 0.0 | 148.000000 | 100.0 | 0.0 | 101.0 | 0.000 | 12.000 | 105.0 | 42.0 | 302.0 | 49.0 | 19.0 | 122.000000 | 12.000 | 10.0 | 9.0 | 178.0 | 14.0 | 38.0 | 0.0 | 14.0 | 13.0 | 16.0 | 0.0 | 8.0 | 409.0 | 49.0 | 0.0 | 1.000000 | 4.0 | 5710.000000 | 9.0 | 34.0 | 61.0 | 0.000 | 141.000000 | 63.0 | 33.0 | 5.0 | 0.0 | 2.0 | 20.0 | 0.0 | 18.0 | 26.0 | 38.0 | 116.0 | 3.0 | 6.0 | 0.0 | 61.0 | 23.0 | 0.0 | 17.0 | 49.0 | 30.0 | 5.0 | 4.643000 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 2.0 | 1.0 | 2.0 | 64.0 | 57.0 | 27.0 | 56.0 | 8.0 | 16523.0 | 5741.0 | 55.0 | 53.0 | 20.0 | 21.0 | 0.0 | 0.0 | 1.0 | 25.0 | 18.0 | 1.0 | 2.0 | 17.0 | 1.0 | 0.0 | 2.0 | 104.0 | 0.0 | 4.357 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 3.0 | 9.0 | 0.0 | 0.0 |
RAP1_UNINDUCED_REP1 | 0.0 | 8.572 | 8.0 | 0.0 | 0.0 | 0.0 | 0.0 | 72.0 | 33.0 | 810.0 | 345.088989 | 6000.911133 | 56.0 | 17.0 | 15.0 | 12.0 | 25.0 | 232.0 | 18.0 | 20.0 | 12.0 | 13.999 | 1.0 | 154.001007 | 114.0 | 0.0 | 111.0 | 2.901 | 5.099 | 102.0 | 36.0 | 323.0 | 52.0 | 29.0 | 159.427994 | 12.000 | 11.0 | 8.0 | 177.0 | 14.0 | 40.0 | 0.0 | 14.0 | 12.0 | 16.0 | 0.0 | 2.0 | 482.0 | 59.0 | 0.0 | 264.768005 | 5.0 | 6162.231934 | 23.0 | 44.0 | 65.0 | 12.533 | 133.466995 | 67.0 | 58.0 | 0.0 | 3.0 | 3.0 | 23.0 | 1.0 | 20.0 | 37.0 | 60.0 | 129.0 | 13.0 | 2.0 | 1.0 | 49.0 | 27.0 | 0.0 | 23.0 | 53.0 | 19.0 | 27.0 | 23.280001 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 3.0 | 0.0 | 5.0 | 0.0 | 1.0 | 78.0 | 60.0 | 17.0 | 67.0 | 14.0 | 17154.0 | 6178.0 | 61.0 | 63.0 | 14.0 | 35.0 | 0.0 | 0.0 | 1.0 | 34.0 | 13.0 | 2.0 | 0.0 | 15.0 | 17.0 | 0.0 | 5.0 | 105.0 | 0.0 | 15.720 | 0.0 | 0.0 | 0.0 | 3.0 | 2.0 | 13.0 | 28.0 | 0.0 | 0.0 |
RAP1_UNINDUCED_REP2 | 0.0 | 0.000 | 4.0 | 0.0 | 0.0 | 0.0 | 0.0 | 115.0 | 82.0 | 1693.0 | 1.000000 | 13355.000000 | 132.0 | 24.0 | 19.0 | 36.0 | 36.0 | 536.0 | 35.0 | 43.0 | 28.0 | 0.000 | 0.0 | 326.000000 | 210.0 | 0.0 | 238.0 | 0.000 | 18.000 | 203.0 | 86.0 | 659.0 | 99.0 | 56.0 | 314.989014 | 19.011 | 20.0 | 19.0 | 359.0 | 40.0 | 72.0 | 0.0 | 37.0 | 24.0 | 44.0 | 0.0 | 25.0 | 872.0 | 147.0 | 0.0 | 3.000000 | 10.0 | 13457.000000 | 39.0 | 92.0 | 140.0 | 0.000 | 291.000000 | 135.0 | 123.0 | 14.0 | 1.0 | 7.0 | 40.0 | 0.0 | 46.0 | 65.0 | 119.0 | 262.0 | 7.0 | 5.0 | 0.0 | 133.0 | 49.0 | 0.0 | 42.0 | 114.0 | 82.0 | 37.0 | 13.228000 | 1.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 4.0 | 0.0 | 15.0 | 1.0 | 1.0 | 156.0 | 121.0 | 35.0 | 126.0 | 18.0 | 33244.0 | 11826.0 | 168.0 | 151.0 | 28.0 | 72.0 | 0.0 | 0.0 | 5.0 | 74.0 | 56.0 | 4.0 | 2.0 | 42.0 | 20.0 | 0.0 | 4.0 | 198.0 | 2.0 | 13.772 | 0.0 | 4.0 | 0.0 | 2.0 | 0.0 | 8.0 | 24.0 | 0.0 | 0.0 |
WT_REP1 | 0.0 | 0.000 | 8.0 | 0.0 | 0.0 | 1.0 | 0.0 | 60.0 | 63.0 | 1115.0 | 0.000000 | 8218.000000 | 61.0 | 10.0 | 9.0 | 30.0 | 19.0 | 385.0 | 21.0 | 28.0 | 8.0 | 0.000 | 0.0 | 194.000000 | 101.0 | 0.0 | 186.0 | 0.000 | 12.000 | 136.0 | 54.0 | 432.0 | 69.0 | 50.0 | 180.410004 | 14.590 | 5.0 | 4.0 | 241.0 | 15.0 | 36.0 | 0.0 | 18.0 | 18.0 | 11.0 | 0.0 | 5.0 | 760.0 | 82.0 | 0.0 | 0.000000 | 4.0 | 10313.000000 | 12.0 | 50.0 | 89.0 | 0.000 | 168.000000 | 77.0 | 47.0 | 7.0 | 0.0 | 8.0 | 19.0 | 0.0 | 26.0 | 34.0 | 82.0 | 133.0 | 5.0 | 1.0 | 0.0 | 72.0 | 39.0 | 0.0 | 33.0 | 43.0 | 9.0 | 13.0 | 4.535000 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 3.0 | 0.0 | 0.0 | 85.0 | 61.0 | 22.0 | 95.0 | 4.0 | 36435.0 | 13470.0 | 68.0 | 91.0 | 25.0 | 26.0 | 0.0 | 0.0 | 2.0 | 26.0 | 16.0 | 0.0 | 1.0 | 12.0 | 5.0 | 0.0 | 12.0 | 127.0 | 1.0 | 13.465 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 5.0 | 5.0 | 0.0 | 0.0 |
WT_REP2 | 0.0 | 0.000 | 8.0 | 0.0 | 0.0 | 0.0 | 0.0 | 30.0 | 25.0 | 704.0 | 1.000000 | 4279.000000 | 44.0 | 3.0 | 5.0 | 10.0 | 17.0 | 230.0 | 10.0 | 17.0 | 6.0 | 0.000 | 0.0 | 104.000000 | 60.0 | 0.0 | 84.0 | 0.000 | 2.000 | 79.0 | 27.0 | 244.0 | 46.0 | 21.0 | 123.638000 | 5.362 | 2.0 | 4.0 | 139.0 | 8.0 | 15.0 | 0.0 | 13.0 | 7.0 | 6.0 | 0.0 | 6.0 | 390.0 | 48.0 | 0.0 | 0.000000 | 1.0 | 6339.000000 | 5.0 | 36.0 | 50.0 | 0.000 | 102.000000 | 44.0 | 25.0 | 3.0 | 0.0 | 6.0 | 14.0 | 0.0 | 10.0 | 12.0 | 39.0 | 68.0 | 1.0 | 2.0 | 0.0 | 38.0 | 27.0 | 0.0 | 20.0 | 29.0 | 5.0 | 4.0 | 1.109000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 64.0 | 40.0 | 14.0 | 45.0 | 5.0 | 17184.0 | 6132.0 | 50.0 | 39.0 | 14.0 | 18.0 | 0.0 | 0.0 | 0.0 | 19.0 | 11.0 | 0.0 | 0.0 | 9.0 | 4.0 | 0.0 | 2.0 | 75.0 | 0.0 | 6.891 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 11.0 | 7.0 | 1.0 | 0.0 |
Curate
¶
We define a simple Schema for Bulk RNA datasets that only expects genes with stable IDs to be stored in the dataset. Later, we can add additional metadata to the curated dataset such as the assay or the organism.
bulk_schema = ln.Schema(itype=bt.Gene.stable_id, otype="AnnData").save()
# set the organism to map to saccharomyces cerevisiae genes
bt.settings.organism = "saccharomyces cerevisiae"
curator = ln.curators.AnnDataCurator(adata, bulk_schema)
curator.validate()
Let’s create and save the artifact:
curated_af = curator.save_artifact(description="Curated bulk RNA counts")
Link additional metadata records:
efs = bt.ExperimentalFactor.lookup()
organism = bt.Organism.lookup()
features = ln.Feature.lookup()
curated_af.labels.add(efs.rna_seq, features.assay)
curated_af.labels.add(organism.saccharomyces_cerevisiae, features.organism)
curated_af.describe()
Show code cell output
Artifact .h5ad · AnnData · dataset ├── General │ ├── description: Curated bulk RNA counts │ ├── uid: nUIzhbwMxkamDAS70000 hash: 6bieh8XjOCCz6bJToN4u1g │ ├── size: 27.5 KB transform: bulkrna.ipynb │ ├── space: all branch: all │ ├── created_by: testuser1 created_at: 2025-07-21 11:25:14 │ ├── n_observations: 5 │ └── storage path: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-bulkrna/nUIzhbwMxkamDAS70000 ├── Linked features │ └── assay cat[bionty.ExperimentalFactor] RNA-Seq │ organism cat[bionty.Organism] saccharomyces cerevisiae └── Labels └── .organisms bionty.Organism saccharomyces cerevisiae .experimental_factors bionty.ExperimentalFactor RNA-Seq
Query data¶
We have two files in the artifact registry:
ln.Artifact.df()
Show code cell output
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
2 | nUIzhbwMxkamDAS70000 | None | Curated bulk RNA counts | .h5ad | dataset | AnnData | 28180 | 6bieh8XjOCCz6bJToN4u1g | None | 5.0 | md5 | True | False | 1 | 1 | 1.0 | None | True | 2 | 2025-07-21 11:25:14.393000+00:00 | 1 | {'af': {'0': True}} | 1 |
1 | OZlm5wVhZn8vW5pD0000 | output_dir/salmon.merged.gene_counts.tsv | Merged Bulk RNA counts | .tsv | None | None | 3787 | xxw0k3au3KtxFcgtbEr4eQ | None | NaN | md5 | False | False | 1 | 1 | NaN | None | True | 1 | 2025-07-21 11:25:13.104000+00:00 | 1 | None | 1 |
curated_af.view_lineage()
# clean up test instance
!rm -r test-bulkrna
!lamin delete --force test-bulkrna