Cell/Nuclei Neighborhoods¶
Cell neighborhoods are a useful concept in spatial analysis. They are defined as the set of cells that are adjacent to a given cell. In network science terms, the cells are nodes and adjacent cells are nodes that are connected to the given cell via an edge. The edges between two cells are determined by a connectivity graph algorithm. There are several algorithms to do this and cellseg_gsontools
provides a few of them through the libpysal
package. In our (spatial single cell analysis) domain, there is usually some sort of distance criteria that determines if there will be an edge between two cells. For example, if the distance between two cells is less than 50 microns, then there will be an edge between them.
This notebook is a quick tutorial on how to extract the cell neighborhoods from your data with cellseg_gsontools
and how to use them to get other related neighborhood attributes.
To get the neighborhoods neatly extracted in your dataframe, we will be utilizing the gdf_apply
and cellseg_gsontools.neighbors
module. The neighbors
module provides functions that operate on a contiuity graph and a GeoDataFrame
. These functions are designed to be used together with the gdf_apply
function which support parallelized operations on a GeoDataFrame
.
The Data¶
We'll be taking a look at some glandular epithelial cells and their shapes espically. cellseg_gsontools
provides a small example segmentation mask that contains a lot glandular epithelial cells.
from cellseg_gsontools.data import gland_cells
gc = gland_cells()
gc.plot(column="class_name", figsize=(10,10), legend=True)
<Axes: >
gc
type | geometry | class_name | |
---|---|---|---|
0 | Feature | POLYGON ((92.000 1083.984, 95.990 1079.993, 99... | glandular_epithel |
1 | Feature | POLYGON ((61.005 1085.988, 64.755 1086.000, 68... | glandular_epithel |
2 | Feature | POLYGON ((138.997 1093.988, 142.992 1092.988, ... | glandular_epithel |
3 | Feature | POLYGON ((109.005 1092.988, 115.995 1092.988, ... | glandular_epithel |
4 | Feature | POLYGON ((69.012 1091.997, 70.011 1094.994, 72... | glandular_epithel |
... | ... | ... | ... |
1354 | Feature | POLYGON ((1932.004 1671.988, 1936.755 1672.000... | glandular_epithel |
1355 | Feature | POLYGON ((1944.005 1691.988, 1951.996 1691.988... | glandular_epithel |
1356 | Feature | POLYGON ((1981.005 1723.988, 1985.996 1723.988... | glandular_epithel |
1357 | Feature | POLYGON ((1962.007 1726.990, 1965.005 1728.988... | glandular_epithel |
1358 | Feature | POLYGON ((1991.012 1739.997, 1992.011 1742.994... | glandular_epithel |
1359 rows × 3 columns
Spatial Weights¶
First, we need to fit a connectivity graph to the GeoDataFrame
to get the neighborhoods of the cells (sometimes it is called spatial weights in geospatial analysis jargon). cellseg_gsontools
provides a fit_graph
function which can be used to do that. The actual fitting is done with the libpysal
package and the fit_graph
-function is basically a wrapper around different graph fitting methods. The allowed spatial weights are:
knn
: k-nearest neighborsdelaunay
- Delaunay triangulationdistband
- Distance band i.e. a distance thresholded knn graphrelative_nhood
- Relative neighborhood graph
We will be using the delaunay
method in this example, however, note that for large data the delaunay
method can get quite slow and for example the distband
method is a lot faster. Here, we will set a distance threshold for the neighbors to be within 50 microns of the cell centroid. The distance unit in the example data is in pixels so 50 microns in pixels of 20x magnified segmentation mask is around 50*2 = 100 pixels.
from cellseg_gsontools.graphs import fit_graph
from cellseg_gsontools.utils import set_uid
# To fit the delaunay graph, we need to set a unique id for each cell first
gc = set_uid(gc, id_col="uid")
w = fit_graph(gc, type="delaunay", thresh=100, id_col="uid")
w
<libpysal.weights.weights.W at 0x7f4268c1abf0>
# let's convert the graph to a dataframe and plot it
from cellseg_gsontools.links import weights2gdf
wdf = weights2gdf(gc, w)
ax = gc.plot(column="class_name", figsize=(10,10), legend=True)
ax = wdf.plot(
ax=ax,
linewidth=0.5,
column="class_name",
cmap="Set1_r",
legend=True,
legend_kwds={
"loc": "center left",
"bbox_to_anchor": (1.0, 0.91)
}
)
ax.set_title("Delaunay Graph Fitted on the Cells")
Text(0.5, 1.0, 'Delaunay Graph Fitted on the Cells')
Extracting the Neighborhood Indices¶
So now we have the connectivity graph (libpysal.weights.W
) and we can get down to business. That is, extracting the neighborhoods and their attributes. Let's start by extracting just the neighbor indices.
from cellseg_gsontools.apply import gdf_apply
from cellseg_gsontools.neighbors import neighborhood
from functools import partial
# Get the neihgboring nodes of the graph
func = partial(neighborhood, spatial_weights=w)
gc["nhood"] = gdf_apply(gc, func, columns=["uid"])
gc.head(5)
type | geometry | class_name | uid | nhood | |
---|---|---|---|---|---|
uid | |||||
0 | Feature | POLYGON ((92.000 1083.984, 95.990 1079.993, 99... | glandular_epithel | 0 | [0, 1, 3, 4, 483, 484] |
1 | Feature | POLYGON ((61.005 1085.988, 64.755 1086.000, 68... | glandular_epithel | 1 | [1, 0, 4, 482, 483, 487] |
2 | Feature | POLYGON ((138.997 1093.988, 142.992 1092.988, ... | glandular_epithel | 2 | [2, 3, 5, 6, 484, 493] |
3 | Feature | POLYGON ((109.005 1092.988, 115.995 1092.988, ... | glandular_epithel | 3 | [3, 0, 2, 4, 5, 7, 484] |
4 | Feature | POLYGON ((69.012 1091.997, 70.011 1094.994, 72... | glandular_epithel | 4 | [4, 0, 1, 3, 7, 487, 488] |
Easy!, Now we have the neighbor indices neatly in our dataframe. Let's then extract some attributes of the neighbors. We will extract the areas and class names of the neighbors.
from cellseg_gsontools.neighbors import nhood_vals
# # compute the areas
# gc["area"] = gc.area
# get the class values of the neighbors
func = partial(nhood_vals, values=gc.class_name)
gc["neighbor_classes"] = gdf_apply(
gc,
func=func,
parallel=True,
columns=["nhood"],
)
# get the area values of the neighbors
func = partial(nhood_vals, values=gc.area.round(2))
gc["neighbor_areas"] = gdf_apply(
gc,
func=func,
parallel=True,
columns=["nhood"],
)
gc.head(5)
type | geometry | class_name | uid | nhood | neighbor_classes | neighbor_areas | |
---|---|---|---|---|---|---|---|
uid | |||||||
0 | Feature | POLYGON ((92.000 1083.984, 95.990 1079.993, 99... | glandular_epithel | 0 | [0, 1, 3, 4, 483, 484] | [glandular_epithel, glandular_epithel, glandul... | [520.24, 565.58, 435.91, 302.26, 241.85, 418.02] |
1 | Feature | POLYGON ((61.005 1085.988, 64.755 1086.000, 68... | glandular_epithel | 1 | [1, 0, 4, 482, 483, 487] | [glandular_epithel, glandular_epithel, glandul... | [565.58, 520.24, 302.26, 318.15, 241.85, 485.71] |
2 | Feature | POLYGON ((138.997 1093.988, 142.992 1092.988, ... | glandular_epithel | 2 | [2, 3, 5, 6, 484, 493] | [glandular_epithel, glandular_epithel, glandul... | [721.5, 435.91, 556.05, 466.96, 418.02, 678.35] |
3 | Feature | POLYGON ((109.005 1092.988, 115.995 1092.988, ... | glandular_epithel | 3 | [3, 0, 2, 4, 5, 7, 484] | [glandular_epithel, glandular_epithel, glandul... | [435.91, 520.24, 721.5, 302.26, 556.05, 655.42... |
4 | Feature | POLYGON ((69.012 1091.997, 70.011 1094.994, 72... | glandular_epithel | 4 | [4, 0, 1, 3, 7, 487, 488] | [glandular_epithel, glandular_epithel, glandul... | [302.26, 520.24, 565.58, 435.91, 655.42, 485.7... |
That was easy! Now we have the neighbor areas and class names neatly in our dataframe. Let's then extract the cell counts of glandular epithelial cells in each neighborhood.
from cellseg_gsontools.neighbors import nhood_type_count
func = partial(nhood_type_count, cls="glandular_epithel", frac=False)
gc["n_gland_neighbors"] = gdf_apply(
gc,
func=func,
parallel=True,
columns=["neighbor_classes"],
)
gc.head(5)
type | geometry | class_name | uid | nhood | neighbor_classes | neighbor_areas | n_gland_neighbors | |
---|---|---|---|---|---|---|---|---|
uid | ||||||||
0 | Feature | POLYGON ((92.000 1083.984, 95.990 1079.993, 99... | glandular_epithel | 0 | [0, 1, 3, 4, 483, 484] | [glandular_epithel, glandular_epithel, glandul... | [520.24, 565.58, 435.91, 302.26, 241.85, 418.02] | 6.0 |
1 | Feature | POLYGON ((61.005 1085.988, 64.755 1086.000, 68... | glandular_epithel | 1 | [1, 0, 4, 482, 483, 487] | [glandular_epithel, glandular_epithel, glandul... | [565.58, 520.24, 302.26, 318.15, 241.85, 485.71] | 6.0 |
2 | Feature | POLYGON ((138.997 1093.988, 142.992 1092.988, ... | glandular_epithel | 2 | [2, 3, 5, 6, 484, 493] | [glandular_epithel, glandular_epithel, glandul... | [721.5, 435.91, 556.05, 466.96, 418.02, 678.35] | 6.0 |
3 | Feature | POLYGON ((109.005 1092.988, 115.995 1092.988, ... | glandular_epithel | 3 | [3, 0, 2, 4, 5, 7, 484] | [glandular_epithel, glandular_epithel, glandul... | [435.91, 520.24, 721.5, 302.26, 556.05, 655.42... | 7.0 |
4 | Feature | POLYGON ((69.012 1091.997, 70.011 1094.994, 72... | glandular_epithel | 4 | [4, 0, 1, 3, 7, 487, 488] | [glandular_epithel, glandular_epithel, glandul... | [302.26, 520.24, 565.58, 435.91, 655.42, 485.7... | 7.0 |
Again, easy! Next, we'll bin the cell areas and get the counts of cells in each area bin.
import mapclassify
from cellseg_gsontools.neighbors import nhood_counts
bins = mapclassify.Quantiles(gc.area, k=5)
func = partial(nhood_counts, values=gc.area, bins=bins.bins)
gc["area_bins"] = gdf_apply(
gc,
func,
columns=["nhood"],
)
gc.head(5)
type | geometry | class_name | uid | nhood | neighbor_classes | neighbor_areas | n_gland_neighbors | area_bins | |
---|---|---|---|---|---|---|---|---|---|
uid | |||||||||
0 | Feature | POLYGON ((92.000 1083.984, 95.990 1079.993, 99... | glandular_epithel | 0 | [0, 1, 3, 4, 483, 484] | [glandular_epithel, glandular_epithel, glandul... | [520.24, 565.58, 435.91, 302.26, 241.85, 418.02] | 6.0 | [0, 2, 0, 3, 1] |
1 | Feature | POLYGON ((61.005 1085.988, 64.755 1086.000, 68... | glandular_epithel | 1 | [1, 0, 4, 482, 483, 487] | [glandular_epithel, glandular_epithel, glandul... | [565.58, 520.24, 302.26, 318.15, 241.85, 485.71] | 6.0 | [0, 2, 1, 2, 1] |
2 | Feature | POLYGON ((138.997 1093.988, 142.992 1092.988, ... | glandular_epithel | 2 | [2, 3, 5, 6, 484, 493] | [glandular_epithel, glandular_epithel, glandul... | [721.5, 435.91, 556.05, 466.96, 418.02, 678.35] | 6.0 | [0, 0, 0, 3, 3] |
3 | Feature | POLYGON ((109.005 1092.988, 115.995 1092.988, ... | glandular_epithel | 3 | [3, 0, 2, 4, 5, 7, 484] | [glandular_epithel, glandular_epithel, glandul... | [435.91, 520.24, 721.5, 302.26, 556.05, 655.42... | 7.0 | [0, 1, 0, 3, 3] |
4 | Feature | POLYGON ((69.012 1091.997, 70.011 1094.994, 72... | glandular_epithel | 4 | [4, 0, 1, 3, 7, 487, 488] | [glandular_epithel, glandular_epithel, glandul... | [302.26, 520.24, 565.58, 435.91, 655.42, 485.7... | 7.0 | [0, 1, 0, 3, 3] |
Nice! Now we have the counts of cells in each area bin. Finally, let's get the neighborhood distances from the cell centroid for each cell.
from cellseg_gsontools.neighbors import nhood_dists
func = partial(nhood_dists, centroids=gc.centroid)
gc["nhood_dists"] = gdf_apply(
gc,
func,
columns=["nhood"],
)
gc.head(5)
type | geometry | class_name | uid | nhood | neighbor_classes | neighbor_areas | n_gland_neighbors | area_bins | nhood_dists | |
---|---|---|---|---|---|---|---|---|---|---|
uid | ||||||||||
0 | Feature | POLYGON ((92.000 1083.984, 95.990 1079.993, 99... | glandular_epithel | 0 | [0, 1, 3, 4, 483, 484] | [glandular_epithel, glandular_epithel, glandul... | [520.24, 565.58, 435.91, 302.26, 241.85, 418.02] | 6.0 | [0, 2, 0, 3, 1] | [0.0, 26.675, 24.786, 30.068, 30.228, 41.284] |
1 | Feature | POLYGON ((61.005 1085.988, 64.755 1086.000, 68... | glandular_epithel | 1 | [1, 0, 4, 482, 483, 487] | [glandular_epithel, glandular_epithel, glandul... | [565.58, 520.24, 302.26, 318.15, 241.85, 485.71] | 6.0 | [0, 2, 1, 2, 1] | [0.0, 26.675, 23.428, 42.962, 39.039, 23.949] |
2 | Feature | POLYGON ((138.997 1093.988, 142.992 1092.988, ... | glandular_epithel | 2 | [2, 3, 5, 6, 484, 493] | [glandular_epithel, glandular_epithel, glandul... | [721.5, 435.91, 556.05, 466.96, 418.02, 678.35] | 6.0 | [0, 0, 0, 3, 3] | [0.0, 25.577, 39.348, 46.097, 34.309, 29.478] |
3 | Feature | POLYGON ((109.005 1092.988, 115.995 1092.988, ... | glandular_epithel | 3 | [3, 0, 2, 4, 5, 7, 484] | [glandular_epithel, glandular_epithel, glandul... | [435.91, 520.24, 721.5, 302.26, 556.05, 655.42... | 7.0 | [0, 1, 0, 3, 3] | [0.0, 24.786, 25.577, 39.574, 37.829, 47.16, 3... |
4 | Feature | POLYGON ((69.012 1091.997, 70.011 1094.994, 72... | glandular_epithel | 4 | [4, 0, 1, 3, 7, 487, 488] | [glandular_epithel, glandular_epithel, glandul... | [302.26, 520.24, 565.58, 435.91, 655.42, 485.7... | 7.0 | [0, 1, 0, 3, 3] | [0.0, 30.068, 23.428, 39.574, 29.225, 36.337, ... |
And now we have the neighborhood distances as well!
All of these kinds of neighborhood arrays that were computed in this noteook are used in many different spatial metrics. We will be using them exhaustively in the coming notebooks as well.