##### Single-cell imaging [image: .md][image]

Here, you will learn how to structure, featurize, and make a large
imaging collection queryable for large-scale machine learning:

1. Load and annotate a "Collection" of microscopy images ([image: sc-
 imaging1/4][image])

2. Generate single-cell images ([image: sc-imaging2/4][image])

3. Featurize single-cell images ([image: sc-imaging3/4][image])

4. Train model to identify autophagy positive cells ([image: sc-
 imaging4/4][image])

First, we load and annotate a collection of microscopy images in TIFF
format that was previously uploaded.

The images used here were acquired as part of a study on autophagy, a
cellular process during which cells recycle their components in
autophagosomes. The study tracked genetic determinants of autophagy
through fluorescence microscopy of human U2OS cells.

 # pip install 'lamindb[jupyter,bionty]'
 !lamin init --storage ./test-sc-imaging --modules bionty

 import lamindb as ln
 import bionty as bt
 from tifffile import imread
 import matplotlib.pyplot as plt

 ln.track()

All image metadata is stored in an already ingested ".csv" file on the
"scportrait/examples" instance.

 metadata_files = (
 ln.Artifact.connect("scportrait/examples")
 .get(key="input_data_imaging_usecase/metadata_files.csv")
 .load()
 )

 metadata_files.head(2)

 metadata_files.apply(lambda col: col.unique())

#### Curating artifacts

All images feature the U2OS cell line, captured using an Opera Phenix
microscope at 20X magnification.

To induce autophagy, cells were treated under two conditions:

* Treated: Exposed to "Torin-1" (a starvation-mimicking small
  molecule) for 14 hours

* Control: Left untreated

The U2OS cells were genetically engineered with fluorescently tagged
proteins to visualize the process of autophagosome formation:

* "LC3B" -> Autophagosome marker (visible in mCherry channel)

* "LckLip" -> Membrane-targeted fluorescence protein for cell boundary
  visualization (visible in Alexa488 channel)

* "Hoechst" -> DNA stain for nucleus identification (visible in DAPI
  channel)

Each image contains three separate channels:

| --- | --- | --- |
| Channel | Imaged Structure | Fluorescent Marker |
| =================================== | =================================== | =================================== |
| 1 | DNA | "Hoechst" (DAPI) |
| --- | --- | --- |
| 2 | Autophagosomes | "LC3B" (mCherry) |
| --- | --- | --- |
| 3 | Plasma Membrane | "LckLip" (Alexa488) |
| --- | --- | --- |

Two genotypes were analyzed:

* WT (Wild-type cells)

* EI24KO ("EI24" gene knockout cells)

For each genotype, two different clonal cell lines were studied, with
multiple fields of view (FOVs) captured per experimental condition.

All images are annotated with corresponding metadata to enable
efficient querying and analysis.

###### Define a schema

We define a "Schema" to curate metadata.

 ulabel_names = [
 "genotype",
 "stimulation",
 "cell_line_clone",
 "channel",
 "FOV",
 "magnification",
 "microscope",
 "imaged structure",
 ]

 autophagy_imaging_schema = ln.Schema(
 name="Autophagy imaging schema",
 features=[
 *[ln.Feature(name=name, dtype=ln.ULabel.name).save() for name in ulabel_names],
 ln.Feature(name="image_path", dtype=str, description="image path").save(),
 ln.Feature(name="cell_line", dtype=bt.CellLine.name).save(),
 ln.Feature(
 name="resolution", dtype=float, description="conversion factor for px to µm"
 ).save(),
 ],
 coerce_dtype=True,
 ).save()

###### Curate the dataset

 curator = ln.curators.DataFrameCurator(metadata_files, autophagy_imaging_schema)

 try:
 curator.validate()
 except ln.core.exceptions.ValidationError as e:
 print(e)

Add and standardize missing terms:

 curator.cat.standardize("cell_line")

 for key in curator.cat.non_validated.keys():
 curator.cat.add_new_from(key)

 curator.validate()

###### Annotate images with metadata

We add images to our "lamindb" instance and annotate them with their
metadata.

 # Create study feature and associated label
 ln.Feature(name="study", dtype=ln.ULabel).save()
 ln.ULabel(name="autophagy imaging").save()

 artifacts = []

 for _, row in metadata_files.iterrows():
 artifact = (
 ln.Artifact.connect("scportrait/examples")
 .filter(key__icontains=row["image_path"])
 .one()
 )
 artifact.save()
 artifact.cell_lines.add(bt.CellLine.filter(name=row.cell_line).one())

 artifact.features.add_values(
 {
 col: row[col]
 for col in [
 "genotype",
 "stimulation",
 "cell_line_clone",
 "channel",
 "FOV",
 "magnification",
 "microscope",
 "resolution",
 ]
 }
| {"imaged structure": row["imaged structure"], "study": "autophagy imaging"} |
 )

 artifacts.append(artifact)

 artifacts[0].describe()

In addition, we create a "Collection" to hold all "Artifact" that
belong to this specific imaging study.

 collection = ln.Collection(
 artifacts,
 key="Annotated autophagy imaging raw images",
 description="annotated microscopy images of cells stained for autophagy markers",
 ).save()

Let's look at some example images where we match images from the same
clone, stimulation condition, and FOV to ensure correct channel
alignment.

 def plot_example_images(df, n_images=3, title_prefix=""):
 """Plot example images from dataframe."""
 fig, axs = plt.subplots(1, n_images, figsize=(15, 5))
 if n_images == 1:
 axs = [axs]
 for idx, row in df.iterrows():
 path = (
 ln.Artifact.connect("scportrait/examples")
 .get(key=row["image_path"])
 .cache()
 )
 image = imread(path)
 axs[idx].imshow(image)
 axs[idx].set_title(f"{title_prefix}{row['imaged structure']}")
 axs[idx].axis("off")
 return fig, axs

 sorted_metadata = metadata_files.sort_values(
 by=["cell_line_clone", "stimulation", "FOV"]
 )

 # Plot first 3 and last 3
 plot_example_images(sorted_metadata.head(3).reset_index(drop=True))
 plot_example_images(sorted_metadata.tail(3).reset_index(drop=True));

 ln.finish()