## Lightning

This guide offers more context on the
"lamindb.integrations.lightning.Checkpoint" callback. For end-to-end
examples, see the following guides:

* *docs:clearml*

* Weights & Biases

* MLFlow

# Quickstart

Pass "ll.Checkpoint" and a logger into "Trainer". The logger is what
gives checkpoints meaningful, namespaced artifact keys — without it,
keys fall back to a bare "checkpoints/" prefix (or just the run UID
when "ln.track()" is active).

Any logger implementing Lightning's "Logger" interface works
("TensorBoardLogger", "WandbLogger", "MLFlowLogger", "CSVLogger",
etc.). We use "TensorBoardLogger" in the examples below.

 import lamindb as ln
 import lightning.pytorch as pl
 from lightning.pytorch.loggers import TensorBoardLogger
 from lamindb.integrations import lightning as ll

 ln.track()

 logger = TensorBoardLogger(save_dir="logs")
 checkpoint = ll.Checkpoint(monitor="val_loss", mode="min", save_top_k=3)

 trainer = pl.Trainer(
 max_epochs=10,
 callbacks=[checkpoint],
 logger=logger,
 )
 trainer.fit(model, datamodule=datamodule)

After training, each saved checkpoint file is a LaminDB artifact:

 checkpoint.last_checkpoint_artifact
 checkpoint.last_checkpoint_artifact.key
 # e.g. "logs/lightning_logs/2r5pIRnK7z0q/checkpoints/epoch=0-step=100.ckpt"

 checkpoint.checkpoint_key_prefix
 # e.g. "logs/lightning_logs/2r5pIRnK7z0q/checkpoints"

### How is a run organized?

A Lightning "Trainer" coordinates three concerns during training:

1. **Logger** — writes metrics (loss curves, learning rate, etc.) to a
 dashboard directory. The logger determines the local directory
 layout: "{save_dir}/{name}/{version}/".

2. **ModelCheckpoint** — saves model snapshots (".ckpt" files) into a
 "checkpoints/" subdirectory underneath the logger's directory.

3. **SaveConfigCallback** — when using "LightningCLI", writes the
 fully resolved "config.yaml" into the logger's directory so you can
 reproduce exactly which hyperparameters were used.

All three share the same directory tree. The logger creates it, the
checkpoint callback writes into it, and the config callback stores
beside it:

 logs/ # logger save_dir
 lightning_logs/ # logger name
 version_0/ # logger version (local filesystem)
 events.out.tfevents.* # ← logger output (TensorBoard)
 config.yaml # ← SaveConfigCallback
 checkpoints/
 epoch=0-step=100.ckpt  # ← ModelCheckpoint
 epoch=1-step=200.ckpt
 hparams.yaml # ← auto-generated by Lightning

LaminDB's integration replaces "ModelCheckpoint" with "ll.Checkpoint"
and Lightning's "SaveConfigCallback" with "ll.SaveConfigCallback".
Checkpoint files, the config, and "hparams.yaml" become
"lamindb.Artifact" records with lineage tracking and optional feature
annotations.

Note that artifact keys in LaminDB do **not** mirror the local
directory layout exactly — the callback uses the LaminDB run UID
instead of Lightning's auto-incrementing "version_N" directory by
default. See *How artifact keys are derived* for details.

### Which kind of artifacts?

"Checkpoint" saves three kinds of artifacts:

| --- | --- | --- |
| Kind | Example key | When |
| =================================== | =================================== | =================================== |
| "checkpoint" | "…/checkpoints/epoch=0-step=100. | Every time Lightning writes a |
| ckpt" | checkpoint |
| --- | --- | --- |
| "config" | "…/config.yaml" | When using |
| "ll.SaveConfigCallback" |
| --- | --- | --- |
| "hparams" | "…/checkpoints/hparams.yaml" | When Lightning generates it |
| --- | --- | --- |

Checkpoints and "hparams.yaml" live under the "checkpoints/"
subdirectory, while the config sits directly under the base prefix.

The callback tracks the latest artifact of each kind:

 checkpoint.last_checkpoint_artifact
 checkpoint.last_config_artifact
 checkpoint.last_hparams_artifact
 checkpoint.last_artifact_event

### How is data lineage tracked?

When a run is being tracked with "ln.track()":

* "checkpoint" artifacts are recorded as **run outputs** — they are
  produced by the training run.

* "config" artifacts are recorded as **run inputs** — the resolved
  config is part of the run specification.

* "hparams.yaml" is saved as an artifact but not linked as a run
  input.

# How are artifact keys derived?

LaminDB artifact keys are **not** necessarily a mirror of the local
filesystem layout. Lightning uses auto-incrementing version
directories ("version_0", "version_1", …) on disk, but these are
meaningless as artifact identifiers — they depend on what already
exists locally and cannot reliably distinguish runs across machines.

Instead, when "ln.track()" is active, the callback uses the **LaminDB
run UID** as the version segment by default
("run_uid_is_version=True"). This guarantees that every tracked run
produces unique artifact keys regardless of local state.

The base prefix is determined by priority:

| --- | --- |
| Scenario | Base prefix |
| ==================================================== | ==================================================== |
| "dirpath" set (± logger) | "{dirpath}/{run_uid}" |
| --- | --- |
| No "dirpath" + logger | "{save_dir_basename}/{name}/{run_uid}" |
| --- | --- |
| No "dirpath" + no logger | "{run_uid}" |
| --- | --- |

"run_uid" above refers to the active LaminDB run UID (from
"ln.context.run.uid"). When no run is tracked or
"run_uid_is_version=False", the callback falls back to the logger's
own version (e.g. "version_0") or omits the segment entirely.

**Checkpoint & hparams keys:**

| --- | --- |
| Scenario | LaminDB key pattern |
| ==================================================== | ==================================================== |
| Logger present (recommended) | "{save_dir_basename}/{name}/{run_uid}/checkpoints |
| /{filename}" |
| --- | --- |
| No logger, explicit "dirpath" | "{dirpath}/{run_uid}/checkpoints/{filename}" |
| --- | --- |
| No logger, no "dirpath" | "{run_uid}/checkpoints/{filename}" |
| --- | --- |

**Config keys:**

| --- | --- |
| Scenario | Key pattern |
| ==================================================== | ==================================================== |
| Logger present | "{save_dir_basename}/{name}/{run_uid}/config.yaml" |
| --- | --- |
| No logger, explicit "dirpath" | "{dirpath}/{run_uid}/config.yaml" |
| --- | --- |
| No logger, no "dirpath" | "{run_uid}/config.yaml" |
| --- | --- |

For example, with "TensorBoardLogger(save_dir="logs")" and a tracked
run:

 logs/lightning_logs/2r5pIRnK7z0q/ # base prefix ({save_dir_basename}/{name}/{run_uid})
 config.yaml # ← config artifact
 checkpoints/
 epoch=0-step=100.ckpt # ← checkpoint artifact
 hparams.yaml # ← hparams artifact

### Opting out of run UID keys

Pass "run_uid_is_version=False" to fall back to the logger-managed
version directory, matching Lightning's local layout more closely:

 checkpoint = ll.Checkpoint(
 monitor="val_loss",
 run_uid_is_version=False,
 )

With this setting, the key uses the logger's version ("version_0",
etc.) instead of the run UID. This is mainly useful when you don't
call "ln.track()" or when you want artifact keys that exactly mirror
the local directory tree.

### Why run UIDs instead of "version_N"?

Lightning's auto-incrementing "version_N" depends on what directories
already exist at "save_dir". Two runs on different machines — or the
same machine after clearing "logs/" — can both produce "version_0".
With "run_uid_is_version=True" (the default), each tracked run gets a
unique prefix derived from the Lamin run, so artifact keys never
collide.

# Use with the Lightning CLI

The Lightning CLI resolves a YAML config into concrete model and data
module arguments. To also store that resolved config as a LaminDB
artifact, pass "ll.SaveConfigCallback" in your training script and
declare the trainer, logger, callbacks, model, and data in a config
file.

**"config.yaml"**

 trainer:
 max_epochs: 10

 logger:
 class_path: lightning.pytorch.loggers.TensorBoardLogger
 init_args:
 save_dir: logs

 callbacks:
 - class_path: lamindb.integrations.lightning.Checkpoint
 init_args:
 monitor: val/loss
 mode: min
 save_top_k: 3

 model:
 learning_rate: 1.0e-3

 data:
 batch_size: 64

**"train.py"**

 import lamindb as ln
 from lightning.pytorch.cli import LightningCLI
 from lamindb.integrations.lightning import SaveConfigCallback

 ln.track()

 def cli_main() -> None:
 LightningCLI(
 model_class=MyModel,
 datamodule_class=MyDataModule,
 save_config_callback=SaveConfigCallback,
 )

 if __name__ == "__main__":
 cli_main()

 python train.py fit --config config.yaml

"ll.SaveConfigCallback" extends Lightning's built-in version: it
writes the local file as usual and then delegates to whichever
"ArtifactPublishingModelCheckpoint" is registered on the trainer to
persist the config as an artifact.

# Annotating with features

Attach custom run-level and artifact-level feature values through
"features=":

 logger = TensorBoardLogger(save_dir="logs")
 checkpoint = ll.Checkpoint(
 monitor="val_loss",
 features={
 "run": {"training_framework": "lightning"},
 "artifact": {"dataset_version": "2026-03"},
 },
 )

 trainer = pl.Trainer(callbacks=[checkpoint], logger=logger)

Feature names must already exist in Lamin.

The callback can also auto-track standard Lightning fields. Create the
corresponding LaminDB features once:

 ll.save_lightning_features()

This enables auto-features:

* Artifact-level: "is_best_model", "is_last_model", "score",
  "model_rank", "save_weights_only", "monitor", "mode"

* Run-level: "logger_name", "logger_version", "max_epochs",
  "max_steps", "precision", "accumulate_grad_batches",
  "gradient_clip_val", "monitor", "mode"

# Extending the callback

### Subclass "Checkpoint"

Subclass when you want to keep LaminDB persistence and additionally
notify an external system after each artifact is saved:

 from lamindb.integrations import lightning as ll
 from my_model_registry import ModelRegistry

 class ModelRegistryCheckpoint(ll.Checkpoint):
 """Register each checkpoint in an external model registry."""

 def __init__(self, *args, registry_project: str, **kwargs):
 super().__init__(*args, **kwargs)
 self.registry_project = registry_project
 self.model_registry = ModelRegistry()

 def on_artifact_saved(self, event: ll.ArtifactSavedEvent) -> None:
 if event.kind == "checkpoint":
 # register the model in your external system
 self.model_registry.register(
 project=self.registry_project,
 model_uri=event.storage_uri,
 metadata={"lamin_key": event.key},
 )

 logger = TensorBoardLogger(save_dir="logs")
 checkpoint = ModelRegistryCheckpoint(
 registry_project="my-project",
 monitor="val_loss",
 save_top_k=3,
 )
 trainer = pl.Trainer(callbacks=[checkpoint], logger=logger)
 trainer.fit(model, datamodule=datamodule)

Each event gives you:

* "event.kind": ""checkpoint"", ""config"", or ""hparams""

* "event.artifact": the persisted LaminDB artifact

* "event.key": the LaminDB artifact key

* "event.local_path": the local file path Lightning wrote

* "event.storage_uri": the stable storage URI for downstream systems

### Attach an observer

Observers are useful when you want composition instead of inheritance:

 from lamindb.integrations import lightning as ll

 class ArtifactLogger:
 def on_artifact_saved(self, event: ll.ArtifactSavedEvent) -> None:
 print(event.kind, event.storage_uri)

 def on_artifact_removed(self, event: ll.ArtifactRemovedEvent) -> None:
 print("removed", event.key)

 logger = TensorBoardLogger(save_dir="logs")
 checkpoint = ll.Checkpoint(
 monitor="val_loss",
 artifact_observers=[ArtifactLogger()],
 )

 trainer = pl.Trainer(callbacks=[checkpoint], logger=logger)
 trainer.fit(model, datamodule=datamodule)

Observers receive the same events that subclasses see.

# Integrating other systems

To register checkpoints in another system (e.g. ClearML, Weights &
Biases, MLflow, Neptune, or Comet), use the artifact lifecycle events
rather than re-deriving paths from Lightning internals.

The key hand-off value is "event.storage_uri", which resolves to the
persisted artifact location. "event.artifact" gives you the full
LaminDB record when you need metadata beyond the URI.