Transfer data¶
This guide shows how to transfer data from a source database into the currently connected database.
# pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-transfer --modules bionty
Show code cell output
! using anonymous user (to identify, call: lamin login)
→ initialized lamindb: anonymous/test-transfer
import lamindb as ln
ln.track("ITeOtm7bhtdq")
Show code cell output
→ connected lamindb: anonymous/test-transfer
→ created Transform('ITeOtm7bhtdq0000'), started new Run('1tVIrqFR...') at 2025-07-21 11:33:29 UTC
→ notebook imports: lamindb==1.9.0
Query all artifacts in the laminlabs/lamindata
instance and filter them to their latest versions.
# query all latest artifact versions
artifacts = ln.Artifact.using("laminlabs/lamindata").filter(is_latest=True)
# convert the QuerySet to a DataFrame and show the latest 5 versions
artifacts.df().head()
Show code cell output
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1282 | WQtsc0CQZKB9GEst0000 | None | Example R cars dataset | .parquet | dataset | DataFrame | 2402.0 | eIk8NXNiwMoGmhhjrMILbg | NaN | NaN | md5 | True | False | 1 | 2 | NaN | None | True | 460.0 | 2025-01-15 14:22:51.192955+00:00 | 30 | None | 1 |
1349 | 9KD0HE9lVveLpvuI0000 | data/prep_adata | None | .h5ad | None | AnnData | 124511524.0 | gnwU_GFFN_xIhtncrxu-tv | NaN | NaN | sha1-fl | True | False | 1 | 2 | NaN | None | True | NaN | 2025-03-03 23:24:56.184549+00:00 | 35 | None | 1 |
1451 | cGi8QjXNQQfZzL4n0000 | simple-lineage/figures/pca_all.pdf | None | None | None | 4707.0 | QexvSEBGMa80m0pV5KXd4w | NaN | NaN | md5 | True | False | 1 | 2 | NaN | None | True | 569.0 | 2025-04-01 11:33:47.714024+00:00 | 9 | None | 1 | |
1699 | 2qBNr2ICBnMS8JSC0000 | mini_text_files/file32.txt | None | .txt | None | None | 2.0 | Y2TT8PSVtqudz407XG4LAQ | NaN | NaN | md5 | False | False | 1 | 2 | NaN | None | True | 669.0 | 2025-05-05 14:15:55.974243+00:00 | 9 | None | 1 |
1742 | FoaS7BF8AZpt0Va80000 | mini_text_files/file64.txt | None | .txt | None | None | 2.0 | 6l0vHEYIIy4H06o9mY5RNQ | NaN | NaN | md5 | False | False | 1 | 2 | NaN | None | True | 669.0 | 2025-05-05 14:16:01.479023+00:00 | 9 | None | 1 |
You can now further subset or search the QuerySet
. Here we query by whether the description contains “tabula sapiens”.
artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.describe()
Show code cell output
Artifact .h5ad ├── General │ ├── key: tabula_sapiens_lung.h5ad │ ├── description: Part of Tabula Sapiens, a benchmark, first-draft human cell atlas. │ ├── uid: dPraor9rU1EofcFb6Wph hash: 8mB1KK2wd51F6HQdvqipcQ │ ├── size: 3.6 GB transform: ux-session-tb-lung │ ├── space: all branch: main │ ├── created_by: main created_at: 2023-07-14 19:00:30 │ └── storage path: s3://lamindata/tabula_sapiens_lung.h5ad └── Labels └── .ulabels ULabel TSP1, TSP2, TSP14 .tissues bionty.Tissue lung .cell_types bionty.CellType type I pneumocyte, adventitial cell, ba… .experimental_factors bionty.ExperimentalFactor anoxya, stroke
By saving the artifact record that’s currently attached to the source database instance, you transfer it to the default database instance.
artifact.save()
Show code cell output
→ transferred: Artifact(uid='dPraor9rU1EofcFb6Wph'), Storage(uid='D9BilDV2')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, key='tabula_sapiens_lung.h5ad', description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', branch_id=1, space_id=1, storage_id=2, run_id=2, created_by_id=1, created_at=2023-07-14 19:00:30 UTC)
How do I know if a record is saved in the default database instance or not?
Every record has an attribute ._state.db
which can take the following values:
None
: the record has not yet been saved to any database"default"
: the record is saved on the default database instance"account/name"
: the record is saved on a non-default database instance referenced byaccount/name
(e.g.,laminlabs/lamindata
)
The artifact record has been transferred to the current database without feature & label annotations, but with updated data lineage.
artifact.describe()
Show code cell output
Artifact .h5ad └── General ├── key: tabula_sapiens_lung.h5ad ├── description: Part of Tabula Sapiens, a benchmark, first-draft human cell atlas. ├── uid: dPraor9rU1EofcFb6Wph hash: 8mB1KK2wd51F6HQdvqipcQ ├── size: 3.6 GB transform: __lamindb_transfer__/4XIuR0tvaiXM ├── space: all branch: all ├── created_by: anonymous created_at: 2023-07-14 19:00:30 └── storage path: s3://lamindata/tabula_sapiens_lung.h5ad
You see that the data itself remained in the original storage location, which has been added to the current instance’s storage location as a read-only location (indicated by the fact that the instance_uid
doesn’t match the current instance).
ln.Storage.df()
Show code cell output
uid | root | description | type | region | instance_uid | space_id | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
1 | qRqXIfWYSC46 | /home/runner/work/lamindb/lamindb/docs/test-tr... | None | local | None | 1FHu5eE0uxm4 | 1 | NaN | 2025-07-21 11:33:26.069000+00:00 | 1 | None | 1 |
2 | D9BilDV2 | s3://lamindata | None | s3 | us-east-1 | 4XIuR0tvaiXM | 1 | 2.0 | 2023-04-22 05:50:06.537267+00:00 | 1 | None | 1 |
See the state of the database.
ln.view()
Show code cell output
****************
* module: core *
****************
Artifact
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1 | dPraor9rU1EofcFb6Wph | tabula_sapiens_lung.h5ad | Part of Tabula Sapiens, a benchmark, first-dra... | .h5ad | None | None | 3899435772 | 8mB1KK2wd51F6HQdvqipcQ | None | None | sha1-fl | False | False | 1 | 2 | None | None | True | 2 | 2023-07-14 19:00:30.621330+00:00 | 1 | None | 1 |
Run
uid | name | started_at | finished_at | reference | reference_type | _is_consecutive | _status_code | space_id | transform_id | report_id | _logfile_id | environment_id | initiated_by_run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||
1 | 1tVIrqFRWK5CZaIZ | None | 2025-07-21 11:33:29.603830+00:00 | None | None | None | None | -1.0 | 1 | 1 | None | None | None | NaN | 2025-07-21 11:33:29.604000+00:00 | 1 | None | 1 |
2 | MyNiWAb8lpARJZaj | None | 2025-07-21 11:33:32.034000+00:00 | None | None | None | None | NaN | 1 | 2 | None | None | None | 1.0 | 2025-07-21 11:33:32.034000+00:00 | 1 | None | 1 |
Storage
uid | root | description | type | region | instance_uid | space_id | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
1 | qRqXIfWYSC46 | /home/runner/work/lamindb/lamindb/docs/test-tr... | None | local | None | 1FHu5eE0uxm4 | 1 | NaN | 2025-07-21 11:33:26.069000+00:00 | 1 | None | 1 |
2 | D9BilDV2 | s3://lamindata | None | s3 | us-east-1 | 4XIuR0tvaiXM | 1 | 2.0 | 2023-04-22 05:50:06.537267+00:00 | 1 | None | 1 |
Transform
uid | key | description | type | source_code | hash | reference | reference_type | space_id | _template_id | version | is_latest | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||
2 | 4XIuR0tvaiXM0000 | __lamindb_transfer__/4XIuR0tvaiXM | Transfer from `laminlabs/lamindata` | function | None | None | None | None | 1 | None | None | True | 2025-07-21 11:33:32.027000+00:00 | 1 | None | 1 |
1 | ITeOtm7bhtdq0000 | transfer.ipynb | Transfer data | notebook | None | None | None | None | 1 | None | None | True | 2025-07-21 11:33:29.592000+00:00 | 1 | None | 1 |
******************
* module: bionty *
******************
Source
uid | entity | organism | name | in_db | currently_used | description | url | md5 | source_website | space_id | dataframe_artifact_id | version | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||
1 | 33TUF039 | bionty.Organism | vertebrates | ensembl | False | True | Ensembl | https://ftp.ensembl.org/pub/release-112/specie... | None | https://www.ensembl.org | 1 | None | release-112 | None | 2025-07-21 11:33:26.175000+00:00 | 1 | None | 1 |
2 | 6bbVUTCS | bionty.Organism | bacteria | ensembl | False | True | Ensembl | https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacte... | None | https://www.ensembl.org | 1 | None | release-57 | None | 2025-07-21 11:33:26.175000+00:00 | 1 | None | 1 |
3 | 6s9nV6xh | bionty.Organism | fungi | ensembl | False | True | Ensembl | https://ftp.ensemblgenomes.ebi.ac.uk/pub/fungi... | None | https://www.ensembl.org | 1 | None | release-57 | None | 2025-07-21 11:33:26.175000+00:00 | 1 | None | 1 |
4 | 2PmTrc8x | bionty.Organism | metazoa | ensembl | False | True | Ensembl | https://ftp.ensemblgenomes.ebi.ac.uk/pub/metaz... | None | https://www.ensembl.org | 1 | None | release-57 | None | 2025-07-21 11:33:26.175000+00:00 | 1 | None | 1 |
5 | 7GPHh16S | bionty.Organism | plants | ensembl | False | True | Ensembl | https://ftp.ensemblgenomes.ebi.ac.uk/pub/plant... | None | https://www.ensembl.org | 1 | None | release-57 | None | 2025-07-21 11:33:26.175000+00:00 | 1 | None | 1 |
6 | 4tsksCMX | bionty.Organism | all | ncbitaxon | False | True | NCBItaxon Ontology | http://purl.obolibrary.org/obo/ncbitaxon/2023-... | None | https://github.com/obophenotype/ncbitaxon | 1 | None | 2023-06-20 | None | 2025-07-21 11:33:26.175000+00:00 | 1 | None | 1 |
7 | 4UGNz3fr | bionty.Gene | human | ensembl | False | True | Ensembl | s3://bionty-assets/df_human__ensembl__release-... | None | https://www.ensembl.org | 1 | None | release-112 | None | 2025-07-21 11:33:26.175000+00:00 | 1 | None | 1 |
View lineage:
artifact.view_lineage()
Show code cell output
! calling anonymously, will miss private instances
The transferred dataset is linked to a special type of transform that stores the slug and uid of the source instance:
artifact.transform.description
Show code cell output
'Transfer from `laminlabs/lamindata`'
The transform key has the form f"__lamindb_transfer__/{source_instance.uid}"
:
artifact.transform.key
Show code cell output
'__lamindb_transfer__/4XIuR0tvaiXM'
The current notebook run is linked as the initiated_by_run of the “transfer run”:
artifact.run.initiated_by_run.transform
Show code cell output
Transform(uid='ITeOtm7bhtdq0000', is_latest=True, key='transfer.ipynb', description='Transfer data', type='notebook', branch_id=1, space_id=1, created_by_id=1, created_at=2025-07-21 11:33:29 UTC)
Upon re-transferring a record, it will identify that the record already exists in the target database and simply map the record.
artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.save()
Show code cell output
→ mapped: Artifact(uid='dPraor9rU1EofcFb6Wph')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, key='tabula_sapiens_lung.h5ad', description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', branch_id=1, space_id=1, storage_id=2, run_id=2, created_by_id=1, created_at=2023-07-14 19:00:30 UTC)
If you also want to transfer annotations of the artifact, you can pass transfer="annotations"
to save()
. Just note that this might populate your target database with metadata that doesn’t match the conventions you want to enforce.
artifact = artifacts.filter(description__contains="Tabula Sapiens").first()
artifact.save(transfer="annotations")
Show code cell output
→ mapped: Artifact(uid='dPraor9rU1EofcFb6Wph'), Tissue(uid='7Tt4iEKc'), CellType(uid='5tiBvp96'), CellType(uid='7Crr32HI'), CellType(uid='6dzoXJ3Y'), CellType(uid='01NqvhnI'), CellType(uid='5NceZTYm'), CellType(uid='4PSMdO3I'), CellType(uid='3JO0EdVd'), CellType(uid='6rfrjhvo'), CellType(uid='37mWPv6o'), CellType(uid='5Z76sCep'), CellType(uid='2OWUH6Z1'), CellType(uid='5TU8SFt5'), CellType(uid='ryEtgi1y'), CellType(uid='1lMgAPE8'), CellType(uid='7m6Ruz32'), CellType(uid='42qbvc90'), CellType(uid='puGNwNrs'), CellType(uid='1T8bGe2I'), CellType(uid='6IC9NGJE'), CellType(uid='6ujMwy7s'), CellType(uid='3eecYgWR'), CellType(uid='zQ4dyjEs'), CellType(uid='7mNqzyFE'), CellType(uid='5A9EFjNB'), CellType(uid='3lsrLTv6'), CellType(uid='1HYtHpIc'), CellType(uid='6UmKFrzn'), CellType(uid='7eZArDpo'), CellType(uid='2KCFdGIk'), CellType(uid='1V5wVqK5'), CellType(uid='5i19XYug'), CellType(uid='2nPA0h4F'), CellType(uid='5Xi2OLvZ'), CellType(uid='3kaL3W1c'), ExperimentalFactor(uid='5YDCOg0V'), ExperimentalFactor(uid='7R1OhRJ7')
→ transferred: CellType(uid='4mZaXZQg'), CellType(uid='5rVn0X39'), CellType(uid='EWy46Sey'), CellType(uid='4yqLzwwm'), ULabel(uid='vfLXaHgD'), ULabel(uid='ZaVLDCZE'), ULabel(uid='gk6w8qC5'), ULabel(uid='tZCTk48f')
Artifact(uid='dPraor9rU1EofcFb6Wph', is_latest=True, key='tabula_sapiens_lung.h5ad', description='Part of Tabula Sapiens, a benchmark, first-draft human cell atlas.', suffix='.h5ad', size=3899435772, hash='8mB1KK2wd51F6HQdvqipcQ', branch_id=1, space_id=1, storage_id=2, run_id=2, created_by_id=1, created_at=2023-07-14 19:00:30 UTC)
The artifact is now annotated.
artifact.describe()
Show code cell output
Artifact .h5ad ├── General │ ├── key: tabula_sapiens_lung.h5ad │ ├── description: Part of Tabula Sapiens, a benchmark, first-draft human cell atlas. │ ├── uid: dPraor9rU1EofcFb6Wph hash: 8mB1KK2wd51F6HQdvqipcQ │ ├── size: 3.6 GB transform: __lamindb_transfer__/4XIuR0tvaiXM │ ├── space: all branch: all │ ├── created_by: anonymous created_at: 2023-07-14 19:00:30 │ └── storage path: s3://lamindata/tabula_sapiens_lung.h5ad └── Labels └── .tissues bionty.Tissue lung .cell_types bionty.CellType pulmonary alveolar type 1 cell, adventi… .experimental_factors bionty.ExperimentalFactor anoxya, stroke .ulabels ULabel TSP1, TSP2, TSP14
Show code cell content
# test the last 3 cells here
assert artifact.transform.description == "Transfer from `laminlabs/lamindata`"
assert artifact.transform.key == "__lamindb_transfer__/4XIuR0tvaiXM"
assert artifact.transform.uid == "4XIuR0tvaiXM0000"
assert artifact.run.initiated_by_run.transform.description == "Transfer data"
# clean up test instance
!lamin delete --force test-transfer
! calling anonymously, will miss private instances
• deleting instance anonymous/test-transfer