RDF export & SPARQL queries .md .md

SPARQL is a query language used to retrieve and manipulate data stored in Resource Description Framework (RDF) format. In this tutorial, we demonstrate how lamindb registries can be queried with SPARQL.

import warnings

warnings.filterwarnings("ignore")
# pip install 'lamindb[bionty]' rdflib
!lamin connect laminlabs/lamindata
Hide code cell output
 connected lamindb: laminlabs/lamindata
 to map a local dev directory, call: lamin settings set dev-dir .
import bionty as bt

from rdflib import Graph, Literal, RDF, URIRef
Hide code cell output
 connected lamindb: laminlabs/lamindata

Generally, we need to build a directed RDF Graph composed of triple statements. Such a graph statement is represented by:

  1. a node for the subject

  2. an arc that goes from a subject to an object for the predicate

  3. a node for the object.

Each of the three parts can be identified by a URI.

We can use the DataFrame representation of lamindb registries to build a RDF graph.

Building a RDF graph

diseases = bt.Disease.to_dataframe()
diseases.head()
Hide code cell output
! truncated query result to limit=100 Disease objects
uid name ontology_id abbr synonyms description is_locked created_at branch_id created_on_id space_id created_by_id run_id source_id
id
182 3lEY9l5KHXHvI6 precancerous condition MONDO:0021074 None precancerous condition|premalignant condition|... A Pathological Process With Signs Indicating I... False 2026-04-13 15:09:54.889610+00:00 1 1 1 9 6070.0 73
181 3UbujjzPMgLW9n breast ductal adenocarcinoma MONDO:0005590 None ductal carcinoma of breast|mammary duct adenoc... A Breast Carcinoma Arising From The Ducts. Whi... False 2026-04-13 15:09:54.889610+00:00 1 1 1 9 6070.0 73
180 1yX8SwADeAa956 breast carcinoma MONDO:0004989 None mammary carcinoma|carcinoma of the breast|brea... A Carcinoma That Arises From Epithelial Cells ... False 2026-04-13 15:09:54.889610+00:00 1 1 1 9 6070.0 73
179 5BMsSAHuNlNkVR breast adenocarcinoma MONDO:0004988 None adenocarcinoma of breast|adenocarcinoma of the... A Carcinoma That Arises From Glandular Epithel... False 2026-04-13 15:09:54.889610+00:00 1 1 1 9 6070.0 73
178 1vrnOyXFQ3Nd9W breast carcinoma in situ MONDO:0004658 None stage 0 carcinoma of breast|non-infiltrating b... A In Situ Carcinoma That Involves The Breast. False 2026-04-13 15:09:54.889610+00:00 1 1 1 9 6070.0 73

We convert the DataFrame to RDF by generating triples.

rdf_graph = Graph()

namespace = URIRef("http://sparql-example.org/")

for _, row in diseases.iterrows():
    subject = URIRef(namespace + str(row["ontology_id"]))
    rdf_graph.add((subject, RDF.type, URIRef(namespace + "Disease")))
    rdf_graph.add((subject, URIRef(namespace + "name"), Literal(row["name"])))
    rdf_graph.add(
        (subject, URIRef(namespace + "description"), Literal(row["description"]))
    )

rdf_graph
Hide code cell output
<Graph identifier=N0fcee6021b9c40b2a9aab3d826508d7e (<class 'rdflib.graph.Graph'>)>

Now we can query the RDF graph using SPARQL for the name and associated description:

query = """
SELECT ?name ?description
WHERE {
  ?disease a <http://sparql-example.org/Disease> .
  ?disease <http://sparql-example.org/name> ?name .
  ?disease <http://sparql-example.org/description> ?description .
}
LIMIT 5
"""

for row in rdf_graph.query(query):
    print(f"Name: {row.name}, Description: {row.description}")
Hide code cell output
Name: precancerous condition, Description: A Pathological Process With Signs Indicating It May Become Cancerous. Representative Examples Include Leukoplakia, Dysplastic Nevus, Actinic Keratosis, Xeroderma Pigmentosum, And Intraepithelial Neoplasia.
Name: breast ductal adenocarcinoma, Description: A Breast Carcinoma Arising From The Ducts. While Ductal Carcinomas Can Arise At Other Sites, This Term Is Universally Used To Refer To Carcinomas Of The Breast. Ductal Carcinomas Account For About Two Thirds Of All Breast Cancers. Two Types Of Ductal Carcinomas Have Been Described: Ductal Carcinoma In Situ (Dcis) And Invasive Ductal Carcinoma. The Latter Often Spreads To The Axillary Lymph Nodes And Other Anatomic Sites. The Two Forms Of Ductal Carcinoma Often Coexist.
Name: breast carcinoma, Description: A Carcinoma That Arises From Epithelial Cells Of The Breast
Name: breast adenocarcinoma, Description: A Carcinoma That Arises From Glandular Epithelial Cells Of The Breast
Name: breast carcinoma in situ, Description: A In Situ Carcinoma That Involves The Breast.