RDF export & SPARQL queries
¶
SPARQL is a query language used to retrieve and manipulate data stored in Resource Description Framework (RDF) format. In this tutorial, we demonstrate how lamindb registries can be queried with SPARQL.
import warnings
warnings.filterwarnings("ignore")
# pip install 'lamindb[bionty]' rdflib
!lamin connect laminlabs/lamindata
Show code cell output
→ connected lamindb: laminlabs/lamindata
• to map a local dev directory, call: lamin settings set dev-dir .
import bionty as bt
from rdflib import Graph, Literal, RDF, URIRef
Show code cell output
→ connected lamindb: laminlabs/lamindata
Generally, we need to build a directed RDF Graph composed of triple statements. Such a graph statement is represented by:
a node for the subject
an arc that goes from a subject to an object for the predicate
a node for the object.
Each of the three parts can be identified by a URI.
We can use the DataFrame representation of lamindb registries to build a RDF graph.
Building a RDF graph¶
diseases = bt.Disease.to_dataframe()
diseases.head()
Show code cell output
! truncated query result to limit=100 Disease objects
| uid | name | ontology_id | abbr | synonyms | description | is_locked | created_at | branch_id | created_on_id | space_id | created_by_id | run_id | source_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ||||||||||||||
| 182 | 3lEY9l5KHXHvI6 | precancerous condition | MONDO:0021074 | None | precancerous condition|premalignant condition|... | A Pathological Process With Signs Indicating I... | False | 2026-04-13 15:09:54.889610+00:00 | 1 | 1 | 1 | 9 | 6070.0 | 73 |
| 181 | 3UbujjzPMgLW9n | breast ductal adenocarcinoma | MONDO:0005590 | None | ductal carcinoma of breast|mammary duct adenoc... | A Breast Carcinoma Arising From The Ducts. Whi... | False | 2026-04-13 15:09:54.889610+00:00 | 1 | 1 | 1 | 9 | 6070.0 | 73 |
| 180 | 1yX8SwADeAa956 | breast carcinoma | MONDO:0004989 | None | mammary carcinoma|carcinoma of the breast|brea... | A Carcinoma That Arises From Epithelial Cells ... | False | 2026-04-13 15:09:54.889610+00:00 | 1 | 1 | 1 | 9 | 6070.0 | 73 |
| 179 | 5BMsSAHuNlNkVR | breast adenocarcinoma | MONDO:0004988 | None | adenocarcinoma of breast|adenocarcinoma of the... | A Carcinoma That Arises From Glandular Epithel... | False | 2026-04-13 15:09:54.889610+00:00 | 1 | 1 | 1 | 9 | 6070.0 | 73 |
| 178 | 1vrnOyXFQ3Nd9W | breast carcinoma in situ | MONDO:0004658 | None | stage 0 carcinoma of breast|non-infiltrating b... | A In Situ Carcinoma That Involves The Breast. | False | 2026-04-13 15:09:54.889610+00:00 | 1 | 1 | 1 | 9 | 6070.0 | 73 |
We convert the DataFrame to RDF by generating triples.
rdf_graph = Graph()
namespace = URIRef("http://sparql-example.org/")
for _, row in diseases.iterrows():
subject = URIRef(namespace + str(row["ontology_id"]))
rdf_graph.add((subject, RDF.type, URIRef(namespace + "Disease")))
rdf_graph.add((subject, URIRef(namespace + "name"), Literal(row["name"])))
rdf_graph.add(
(subject, URIRef(namespace + "description"), Literal(row["description"]))
)
rdf_graph
Show code cell output
<Graph identifier=N0fcee6021b9c40b2a9aab3d826508d7e (<class 'rdflib.graph.Graph'>)>
Now we can query the RDF graph using SPARQL for the name and associated description:
query = """
SELECT ?name ?description
WHERE {
?disease a <http://sparql-example.org/Disease> .
?disease <http://sparql-example.org/name> ?name .
?disease <http://sparql-example.org/description> ?description .
}
LIMIT 5
"""
for row in rdf_graph.query(query):
print(f"Name: {row.name}, Description: {row.description}")
Show code cell output
Name: precancerous condition, Description: A Pathological Process With Signs Indicating It May Become Cancerous. Representative Examples Include Leukoplakia, Dysplastic Nevus, Actinic Keratosis, Xeroderma Pigmentosum, And Intraepithelial Neoplasia.
Name: breast ductal adenocarcinoma, Description: A Breast Carcinoma Arising From The Ducts. While Ductal Carcinomas Can Arise At Other Sites, This Term Is Universally Used To Refer To Carcinomas Of The Breast. Ductal Carcinomas Account For About Two Thirds Of All Breast Cancers. Two Types Of Ductal Carcinomas Have Been Described: Ductal Carcinoma In Situ (Dcis) And Invasive Ductal Carcinoma. The Latter Often Spreads To The Axillary Lymph Nodes And Other Anatomic Sites. The Two Forms Of Ductal Carcinoma Often Coexist.
Name: breast carcinoma, Description: A Carcinoma That Arises From Epithelial Cells Of The Breast
Name: breast adenocarcinoma, Description: A Carcinoma That Arises From Glandular Epithelial Cells Of The Breast
Name: breast carcinoma in situ, Description: A In Situ Carcinoma That Involves The Breast.