Point Cluster Context¶

The PointClusterContext class is used to cluster cells of a given type and delineating the cells belonging to the clusters. Here we will use it to cluster immune cells to get a hang of how the immune cells are clustered within the stroma.

After fititng the context, The PointClusterContext.context attribute will contain a nested dict where each value is a dict contain the following key-value pairs:

roi_area - the alpha shape of the cluster (gpd.GeoDataFrame)
roi_cells - the cells inside the the alpha shape of the cluster. (gpd.GeoDataFrame)
roi_grid - the grids fitted on top of the cluster areas. (gpd.GeoDataFrame)
roi_network - the network fitted on the roi_cells. (libpysal.weights.W)

Fitting the Context¶

In [1]:

Copied!





from cellseg_gsontools.data import cervix_cells
from cellseg_gsontools.spatial_context import PointClusterContext

# Create a WithinContext object
pcsp = PointClusterContext(
    cell_gdf=cervix_cells(),
    labels="inflammatory",  # Cluster the inflammatory cells
    cluster_method="dbscan",  # Use DBSCAN for clustering
    min_area_size=50000,  # Discard clusters that are smaller than this
    parallel=False,  # Whether to run in parallel
    num_processes=1,  # Number of processes to use
    graph_type="distband",  # Use a distance band graph (distance thresholded KNN-graph)
    dist_thresh=75,  # Distance threshold for the graph (in pixels)
    backend="geopandas",  # Use geopandas for fitting
    grid_type="hex",  # Use a hexagonal grid
    resolution=10,  # Resolution of the grid,
    min_samples=60,  # Minimum number of samples for DBSCAN
)
pcsp.fit(fit_grid=True, fit_graph=True)
from cellseg_gsontools.data import cervix_cells
from cellseg_gsontools.spatial_context import PointClusterContext

# Create a WithinContext object
pcsp = PointClusterContext(
    cell_gdf=cervix_cells(),
    labels="inflammatory",  # Cluster the inflammatory cells
    cluster_method="dbscan",  # Use DBSCAN for clustering
    min_area_size=50000,  # Discard clusters that are smaller than this
    parallel=False,  # Whether to run in parallel
    num_processes=1,  # Number of processes to use
    graph_type="distband",  # Use a distance band graph (distance thresholded KNN-graph)
    dist_thresh=75,  # Distance threshold for the graph (in pixels)
    backend="geopandas",  # Use geopandas for fitting
    grid_type="hex",  # Use a hexagonal grid
    resolution=10,  # Resolution of the grid,
    min_samples=60,  # Minimum number of samples for DBSCAN
)
pcsp.fit(fit_grid=True, fit_graph=True)

Processing roi area: 2: 100%|██████████| 3/3 [00:00<00:00,  6.89it/s]

In [2]:

Copied!

# Cells of the first cluster
pcsp.context[0]["roi_cells"].head()
# Cells of the first cluster
pcsp.context[0]["roi_cells"].head()

Out[2]:

	type	geometry	class_name	global_id
global_id
6631	Feature	POLYGON ((6924.01182 48092.00490, 6924.01182 4...	connective	6631
6630	Feature	POLYGON ((6943.01182 48092.00490, 6943.01182 4...	connective	6630
6638	Feature	POLYGON ((6910.01182 48130.99510, 6912.00490 4...	connective	6638
6651	Feature	POLYGON ((6875.01182 48148.00490, 6875.01188 4...	inflammatory	6651
8364	Feature	POLYGON ((7031.01182 48176.99510, 7032.00490 4...	connective	8364

Plotting the clusters¶

Let's plot the immune cell clusters.

In [3]:

Copied!

pcsp.plot("roi_area", figsize=(12, 6))
pcsp.plot("roi_area", figsize=(12, 6))

Out[3]:

<Axes: >

No description has been provided for this image

In [4]:

Copied!





pcsp.plot(
    "roi_area",
    network_key="roi_network",
    figsize=(12, 6),
    edge_kws={"linewidth": 0.5}
)
pcsp.plot(
    "roi_area",
    network_key="roi_network",
    figsize=(12, 6),
    edge_kws={"linewidth": 0.5}
)

Out[4]:

<Axes: >

In [5]:

Copied!

pcsp.plot("roi_area", grid_key="roi_grid", figsize=(12, 6))
pcsp.plot("roi_area", grid_key="roi_grid", figsize=(12, 6))

Out[5]:

<Axes: >

$No description has been provided for this image$

Downstream Analysis¶

Now that we've fitted the context, we can use the clusters in the context class attribute to compute more cell level features for the clustered cells. Here, we will showcase some lightweight downstream analyses for the clustered cells.

Example 1: Immune Cell Density¶

Now, if we wanted to compute the density of the inflammatory cells within one of the clusters, we could use the roi_grid key of the context and compute the cell count of the inflammatory cells within the grid cells.

In [6]:

Copied!





import geopandas as gpd
import mapclassify

from cellseg_gsontools.grid import grid_classify
from cellseg_gsontools.plotting import plot_all


# helper function to replace legend items
def replace_legend_items(legend, mapping):
    for txt in legend.texts:
        for k, v in mapping.items():
            if txt.get_text() == str(k):
                txt.set_text(v)


# Immune cell cnt heuristic to classify the grid cells into two classes
def get_immune_cell_cnt(gdf: gpd.GeoDataFrame, **kwargs) -> int:
    try:
        cnt = gdf.class_name.value_counts()["inflammatory"]
    except KeyError:
        cnt = 0

    return int(cnt)


# Count the immune cells within the grid cells with the cell cnt heuristic
tumor_grid = grid_classify(
    grid=pcsp.context[0]["roi_grid"],
    objs=pcsp.context[0]["roi_cells"],
    metric_func=get_immune_cell_cnt,
    predicate="intersects",
    new_col_names="immune_cnt",
    parallel=False,
)

# bin the grid cells into two classes ("has immune cells" and "no immune cells")
col = "immune_cnt"
bins = mapclassify.Quantiles(tumor_grid[col], k=10)
tumor_grid["immune_density_level"] = bins.yb

immune_density_plot = plot_all(
    pcsp.context[0]["roi_area"],
    pcsp.context[0]["roi_cells"],
    grid_gdf=tumor_grid.copy(),
    figsize=(10, 10),
    grid_col="immune_density_level",
    grid_cmap="turbo",
    grid_n_bins=bins.k,
    show_legends=True,
)
mapping = dict([(i, s) for i, s in enumerate(bins.get_legend_classes())])
replace_legend_items(immune_density_plot.get_legend(), mapping)
immune_density_plot
import geopandas as gpd
import mapclassify

from cellseg_gsontools.grid import grid_classify
from cellseg_gsontools.plotting import plot_all


# helper function to replace legend items
def replace_legend_items(legend, mapping):
    for txt in legend.texts:
        for k, v in mapping.items():
            if txt.get_text() == str(k):
                txt.set_text(v)


# Immune cell cnt heuristic to classify the grid cells into two classes
def get_immune_cell_cnt(gdf: gpd.GeoDataFrame, **kwargs) -> int:
    try:
        cnt = gdf.class_name.value_counts()["inflammatory"]
    except KeyError:
        cnt = 0

    return int(cnt)


# Count the immune cells within the grid cells with the cell cnt heuristic
tumor_grid = grid_classify(
    grid=pcsp.context[0]["roi_grid"],
    objs=pcsp.context[0]["roi_cells"],
    metric_func=get_immune_cell_cnt,
    predicate="intersects",
    new_col_names="immune_cnt",
    parallel=False,
)

# bin the grid cells into two classes ("has immune cells" and "no immune cells")
col = "immune_cnt"
bins = mapclassify.Quantiles(tumor_grid[col], k=10)
tumor_grid["immune_density_level"] = bins.yb

immune_density_plot = plot_all(
    pcsp.context[0]["roi_area"],
    pcsp.context[0]["roi_cells"],
    grid_gdf=tumor_grid.copy(),
    figsize=(10, 10),
    grid_col="immune_density_level",
    grid_cmap="turbo",
    grid_n_bins=bins.k,
    show_legends=True,
)
mapping = dict([(i, s) for i, s in enumerate(bins.get_legend_classes())])
replace_legend_items(immune_density_plot.get_legend(), mapping)
immune_density_plot

Out[6]:

<Axes: >

We can see that the density of the inflammatory cells is higher in the upper part of the cluster. This could be avoided by adjusting the dbscan parameters (especially min_samples).

Example 2: Comparing Morphological Features of the Immune Cells Between Clusters¶

Next, we will compute some basic morphological metrics for the inflammatory cell shapes in all of the clusters.

In [7]:

Copied!





from cellseg_gsontools.geometry import shape_metric

# first join the cluster labels to the cells
gdf = pcsp.context2gdf("roi_cells")
gdf = gdf.sjoin(
    pcsp.context2gdf("roi_area"), how="inner", predicate="intersects"
)
gdf.rename(columns={
    "class_name_left": "class_name", 
    "label_right": "label",
    "global_id_right": "global_id"
    }, 
    inplace=True
)

immune_cells = gdf.loc[gdf.class_name == "inflammatory"]  # only neoplastic cells

# morphological metrics to compute
metrics = [
    "area",
    "eccentricity",
    "solidity",
    "circularity",
    "sphericity",
    "compactness",
    "elongation",
    "fractal_dimension",
    "shape_index",
]
immune_cells = shape_metric(
    immune_cells,
    metrics=metrics,
    parallel=True,
)

cols = [
    "geometry",
    "global_id",
    "class_name",
    "area",
    "eccentricity",
    "solidity",
    "circularity",
    "sphericity",
    "compactness",
    "elongation",
    "fractal_dimension",
    "shape_index",
    "label" # cluster label
]
immune_cells = immune_cells[cols]
immune_cells.head()
from cellseg_gsontools.geometry import shape_metric

# first join the cluster labels to the cells
gdf = pcsp.context2gdf("roi_cells")
gdf = gdf.sjoin(
    pcsp.context2gdf("roi_area"), how="inner", predicate="intersects"
)
gdf.rename(columns={
    "class_name_left": "class_name", 
    "label_right": "label",
    "global_id_right": "global_id"
    }, 
    inplace=True
)

immune_cells = gdf.loc[gdf.class_name == "inflammatory"]  # only neoplastic cells

# morphological metrics to compute
metrics = [
    "area",
    "eccentricity",
    "solidity",
    "circularity",
    "sphericity",
    "compactness",
    "elongation",
    "fractal_dimension",
    "shape_index",
]
immune_cells = shape_metric(
    immune_cells,
    metrics=metrics,
    parallel=True,
)

cols = [
    "geometry",
    "global_id",
    "class_name",
    "area",
    "eccentricity",
    "solidity",
    "circularity",
    "sphericity",
    "compactness",
    "elongation",
    "fractal_dimension",
    "shape_index",
    "label" # cluster label
]
immune_cells = immune_cells[cols]
immune_cells.head()

Out[7]:

	geometry	class_name	area	eccentricity	solidity	circularity	sphericity	compactness	elongation	fractal_dimension	shape_index
3	POLYGON ((6875.01182 48148.00490, 6875.01188 4...	inflammatory	301.427924	0.575238	0.995752	0.876845	0.559229	0.876266	0.817986	0.575238	0.782543
5	POLYGON ((7097.01182 48173.00490, 7097.01182 4...	inflammatory	364.240603	0.298218	0.994756	0.938544	0.672787	0.936714	0.954498	0.298218	0.853750
13	POLYGON ((6743.01182 48255.00490, 6743.01182 4...	inflammatory	525.026189	0.754449	0.983428	0.835840	0.503512	0.830843	0.656357	0.754449	0.723241
20	POLYGON ((6974.01182 48228.00490, 6974.01188 4...	inflammatory	311.929175	0.635046	0.997471	0.930487	0.612086	0.929395	0.772477	0.635046	0.871300
24	POLYGON ((6983.01182 48284.00490, 6983.01182 4...	inflammatory	314.988442	0.303277	0.971783	0.867348	0.560565	0.858898	0.952894	0.303277	0.801863

Let's Now tidy up the data (convert from wide format to long format) to compare the morphological features between the clusters.

In [8]:

Copied!





m = list(metrics)
tidy = immune_cells.reset_index().set_index("label")
tidy = tidy[m]
tidy = tidy.stack()
tidy = tidy.reset_index()
tidy = tidy.rename(
    columns={"label": "Label", "level_1": "Attribute", 0: "Values"}
)
tidy
m = list(metrics)
tidy = immune_cells.reset_index().set_index("label")
tidy = tidy[m]
tidy = tidy.stack()
tidy = tidy.reset_index()
tidy = tidy.rename(
    columns={"label": "Label", "level_1": "Attribute", 0: "Values"}
)
tidy

Out[8]:

	Label	Attribute	Values
0	0	area	301.427924
1	0	eccentricity	0.575238
2	0	solidity	0.995752
3	0	circularity	0.876845
4	0	sphericity	0.559229
...	...	...	...
19588	2	sphericity	0.549432
19589	2	compactness	0.811198
19590	2	elongation	0.821277
19591	2	fractal_dimension	0.356534
19592	2	shape_index	0.811585

19593 rows × 3 columns

In [9]:

Copied!

# Install seaborn for plotting in the next cell
# !pip install seaborn
# Install seaborn for plotting in the next cell
# !pip install seaborn

In [10]:

Copied!





import seaborn

seaborn.set_style("whitegrid")
seaborn.set(font_scale=1.5)

# Setup the facets
facets = seaborn.FacetGrid(
    data=tidy,
    col="Attribute",
    hue="Label",
    sharey=False,
    sharex=False,
    aspect=2,
    col_wrap=3,
)

# Build the plot from `sns.kdeplot`
kde_ax = facets.map(seaborn.kdeplot, "Values", fill=True).add_legend()
import seaborn

seaborn.set_style("whitegrid")
seaborn.set(font_scale=1.5)

# Setup the facets
facets = seaborn.FacetGrid(
    data=tidy,
    col="Attribute",
    hue="Label",
    sharey=False,
    sharex=False,
    aspect=2,
    col_wrap=3,
)

# Build the plot from `sns.kdeplot`
kde_ax = facets.map(seaborn.kdeplot, "Values", fill=True).add_legend()

We can see that the morphological features are distributed quite similarly between the clusters. The only feature that seems to be different is the area feature. This is likely due to the different sizes and the number of immune cells contained in the clusters.