Point Cluster Context¶
The PointClusterContext
class is used to cluster cells of a given type and delineating the cells belonging to the clusters. Here we will use it to cluster immune cells to get a hang of how the immune cells are clustered within the stroma.
After fititng the context, The PointClusterContext.context
attribute will contain a nested dict where each value is a dict contain the following key-value pairs:
roi_area
- the alpha shape of the cluster (gpd.GeoDataFrame
)roi_cells
- the cells inside the the alpha shape of the cluster. (gpd.GeoDataFrame
)roi_grid
- the grids fitted on top of the cluster areas. (gpd.GeoDataFrame
)roi_network
- the network fitted on theroi_cells
. (libpysal.weights.W
)
Fitting the Context¶
from cellseg_gsontools.data import cervix_cells
from cellseg_gsontools.spatial_context import PointClusterContext
# Create a WithinContext object
pcsp = PointClusterContext(
cell_gdf=cervix_cells(),
labels="inflammatory", # Cluster the inflammatory cells
cluster_method="dbscan", # Use DBSCAN for clustering
min_area_size=50000, # Discard clusters that are smaller than this
parallel=False, # Whether to run in parallel
num_processes=1, # Number of processes to use
graph_type="distband", # Use a distance band graph (distance thresholded KNN-graph)
dist_thresh=75, # Distance threshold for the graph (in pixels)
backend="geopandas", # Use geopandas for fitting
grid_type="hex", # Use a hexagonal grid
resolution=10, # Resolution of the grid,
min_samples=60, # Minimum number of samples for DBSCAN
)
pcsp.fit(fit_grid=True, fit_graph=True)
Processing roi area: 2: 100%|██████████| 3/3 [00:00<00:00, 6.89it/s]
# Cells of the first cluster
pcsp.context[0]["roi_cells"].head()
type | geometry | class_name | global_id | |
---|---|---|---|---|
global_id | ||||
6631 | Feature | POLYGON ((6924.01182 48092.00490, 6924.01182 4... | connective | 6631 |
6630 | Feature | POLYGON ((6943.01182 48092.00490, 6943.01182 4... | connective | 6630 |
6638 | Feature | POLYGON ((6910.01182 48130.99510, 6912.00490 4... | connective | 6638 |
6651 | Feature | POLYGON ((6875.01182 48148.00490, 6875.01188 4... | inflammatory | 6651 |
8364 | Feature | POLYGON ((7031.01182 48176.99510, 7032.00490 4... | connective | 8364 |
Plotting the clusters¶
Let's plot the immune cell clusters.
pcsp.plot("roi_area", figsize=(12, 6))
<Axes: >
pcsp.plot(
"roi_area",
network_key="roi_network",
figsize=(12, 6),
edge_kws={"linewidth": 0.5}
)
<Axes: >
pcsp.plot("roi_area", grid_key="roi_grid", figsize=(12, 6))
<Axes: >
Downstream Analysis¶
Now that we've fitted the context, we can use the clusters in the context
class attribute to compute more cell level features for the clustered cells. Here, we will showcase some lightweight downstream analyses for the clustered cells.
Example 1: Immune Cell Density¶
Now, if we wanted to compute the density of the inflammatory cells within one of the clusters, we could use the roi_grid
key of the context and compute the cell count of the inflammatory cells within the grid cells.
import geopandas as gpd
import mapclassify
from cellseg_gsontools.grid import grid_classify
from cellseg_gsontools.plotting import plot_all
# helper function to replace legend items
def replace_legend_items(legend, mapping):
for txt in legend.texts:
for k, v in mapping.items():
if txt.get_text() == str(k):
txt.set_text(v)
# Immune cell cnt heuristic to classify the grid cells into two classes
def get_immune_cell_cnt(gdf: gpd.GeoDataFrame, **kwargs) -> int:
try:
cnt = gdf.class_name.value_counts()["inflammatory"]
except KeyError:
cnt = 0
return int(cnt)
# Count the immune cells within the grid cells with the cell cnt heuristic
tumor_grid = grid_classify(
grid=pcsp.context[0]["roi_grid"],
objs=pcsp.context[0]["roi_cells"],
metric_func=get_immune_cell_cnt,
predicate="intersects",
new_col_names="immune_cnt",
parallel=False,
)
# bin the grid cells into two classes ("has immune cells" and "no immune cells")
col = "immune_cnt"
bins = mapclassify.Quantiles(tumor_grid[col], k=10)
tumor_grid["immune_density_level"] = bins.yb
immune_density_plot = plot_all(
pcsp.context[0]["roi_area"],
pcsp.context[0]["roi_cells"],
grid_gdf=tumor_grid.copy(),
figsize=(10, 10),
grid_col="immune_density_level",
grid_cmap="turbo",
grid_n_bins=bins.k,
show_legends=True,
)
mapping = dict([(i, s) for i, s in enumerate(bins.get_legend_classes())])
replace_legend_items(immune_density_plot.get_legend(), mapping)
immune_density_plot
<Axes: >
We can see that the density of the inflammatory cells is higher in the upper part of the cluster. This could be avoided by adjusting the dbscan parameters (especially min_samples
).
Example 2: Comparing Morphological Features of the Immune Cells Between Clusters¶
Next, we will compute some basic morphological metrics for the inflammatory cell shapes in all of the clusters.
from cellseg_gsontools.geometry import shape_metric
# first join the cluster labels to the cells
gdf = pcsp.context2gdf("roi_cells")
gdf = gdf.sjoin(
pcsp.context2gdf("roi_area"), how="inner", predicate="intersects"
)
gdf.rename(columns={
"class_name_left": "class_name",
"label_right": "label",
"global_id_right": "global_id"
},
inplace=True
)
immune_cells = gdf.loc[gdf.class_name == "inflammatory"] # only neoplastic cells
# morphological metrics to compute
metrics = [
"area",
"eccentricity",
"solidity",
"circularity",
"sphericity",
"compactness",
"elongation",
"fractal_dimension",
"shape_index",
]
immune_cells = shape_metric(
immune_cells,
metrics=metrics,
parallel=True,
)
cols = [
"geometry",
"global_id",
"class_name",
"area",
"eccentricity",
"solidity",
"circularity",
"sphericity",
"compactness",
"elongation",
"fractal_dimension",
"shape_index",
"label" # cluster label
]
immune_cells = immune_cells[cols]
immune_cells.head()
geometry | global_id | class_name | area | eccentricity | solidity | circularity | sphericity | compactness | elongation | fractal_dimension | shape_index | label | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | POLYGON ((6875.01182 48148.00490, 6875.01188 4... | 0 | inflammatory | 301.427924 | 0.575238 | 0.995752 | 0.876845 | 0.559229 | 0.876266 | 0.817986 | 0.575238 | 0.782543 | 0 |
5 | POLYGON ((7097.01182 48173.00490, 7097.01182 4... | 0 | inflammatory | 364.240603 | 0.298218 | 0.994756 | 0.938544 | 0.672787 | 0.936714 | 0.954498 | 0.298218 | 0.853750 | 0 |
13 | POLYGON ((6743.01182 48255.00490, 6743.01182 4... | 0 | inflammatory | 525.026189 | 0.754449 | 0.983428 | 0.835840 | 0.503512 | 0.830843 | 0.656357 | 0.754449 | 0.723241 | 0 |
20 | POLYGON ((6974.01182 48228.00490, 6974.01188 4... | 0 | inflammatory | 311.929175 | 0.635046 | 0.997471 | 0.930487 | 0.612086 | 0.929395 | 0.772477 | 0.635046 | 0.871300 | 0 |
24 | POLYGON ((6983.01182 48284.00490, 6983.01182 4... | 0 | inflammatory | 314.988442 | 0.303277 | 0.971783 | 0.867348 | 0.560565 | 0.858898 | 0.952894 | 0.303277 | 0.801863 | 0 |
Let's Now tidy up the data (convert from wide format to long format) to compare the morphological features between the clusters.
m = list(metrics)
tidy = immune_cells.reset_index().set_index("label")
tidy = tidy[m]
tidy = tidy.stack()
tidy = tidy.reset_index()
tidy = tidy.rename(
columns={"label": "Label", "level_1": "Attribute", 0: "Values"}
)
tidy
Label | Attribute | Values | |
---|---|---|---|
0 | 0 | area | 301.427924 |
1 | 0 | eccentricity | 0.575238 |
2 | 0 | solidity | 0.995752 |
3 | 0 | circularity | 0.876845 |
4 | 0 | sphericity | 0.559229 |
... | ... | ... | ... |
19588 | 2 | sphericity | 0.549432 |
19589 | 2 | compactness | 0.811198 |
19590 | 2 | elongation | 0.821277 |
19591 | 2 | fractal_dimension | 0.356534 |
19592 | 2 | shape_index | 0.811585 |
19593 rows × 3 columns
# Install seaborn for plotting in the next cell
# !pip install seaborn
import seaborn
seaborn.set_style("whitegrid")
seaborn.set(font_scale=1.5)
# Setup the facets
facets = seaborn.FacetGrid(
data=tidy,
col="Attribute",
hue="Label",
sharey=False,
sharex=False,
aspect=2,
col_wrap=3,
)
# Build the plot from `sns.kdeplot`
kde_ax = facets.map(seaborn.kdeplot, "Values", fill=True).add_legend()
We can see that the morphological features are distributed quite similarly between the clusters. The only feature that seems to be different is the area
feature. This is likely due to the different sizes and the number of immune cells contained in the clusters.