Skip to content

cluster_cells

cellseg_gsontools.clustering.cluster_cells(cells, cell_type='inflammatory', graph_type='distband', dist_thresh=100, min_size=10, seed=42, spatial_weights=None)

Cluster the cells of the given type.

Uses Local Moran analysis to find the LISA clusters of the cells.

Note

LISA is short for local indicator of spatial association. You can read more, for example, from: - https://geodacenter.github.io/workbook/6a_local_auto/lab6a.html#lisa-principle. The LISA clusters are calculated using the local Moran analysis. The cluster labels are set to HH, LL, LH, HL.

Note

In this function, the local statistic used to form the clusters is the fraction of objects of type label in the neighborhood times the absolute number of the objects of type label in the neighborhood. Due to the stochastic nature of the LISA analysis, the clustering results may wary marginally between runs if seed is changed. This is due to the random selection of the permutations in the local Moran analysis.

Parameters:

Name Type Description Default
cells GeoDataFrame

The GeoDataFrame with the cells.

required
cell_type str

The class name of the cells to cluster.

'inflammatory'
graph_type str

The type of graph to fit. Options are "delaunay", "knn" and "distband".

'distband'
dist_thresh int

The distance threshold to use for the graph.

100
min_size int

The minimum size of the cluster to assign a label.

10
seed int

The random seed to use in the Moran_Local analysis.

42
spatial_weights W

The spatial weights object to use in the analysis. If None, the spatial weights are calculated.

None

Returns:

Name Type Description
clustered_cells GeoDataFrame

The GeoDataFrame with the clustered cells.

Examples:

Cluster the inflammatory cells in a GeoDataFrame.

>>> from cellseg_gsontools.clustering import cluster_cells
>>> from cellseg_gsontools.utils import read_gdf
>>> cells = read_gdf("cells.geojson")
>>> clustered_cells = cluster_cells(cells, cell_type="inflammatory", seed=42)
    class_name    geometry                            lisa_label    label
uid
0    inflammatory  POLYGON ((64.00 115.020, 69.010 ...  HH            0
1    inflammatory  POLYGON ((65.00 15.020, 61.010 ...   HH            0
2   inflammatory  POLYGON ((66.00 110.020, 69.010 ...   HH            2
Source code in cellseg_gsontools/clustering.py
def cluster_cells(
    cells: gpd.GeoDataFrame,
    cell_type: str = "inflammatory",
    graph_type: str = "distband",
    dist_thresh: int = 100,
    min_size: int = 10,
    seed: int = 42,
    spatial_weights: W = None,
) -> gpd.GeoDataFrame:
    """Cluster the cells of the given type.

    Uses Local Moran analysis to find the LISA clusters of the cells.

    Note:
        LISA is short for local indicator of spatial association. You can read more,
        for example, from:
        - https://geodacenter.github.io/workbook/6a_local_auto/lab6a.html#lisa-principle.
        The LISA clusters are calculated using the local Moran analysis. The cluster
        labels are set to HH, LL, LH, HL.

    Note:
        In this function, the local statistic used to form the clusters is the fraction
        of objects of type `label` in the neighborhood times the absolute number of the
        objects of type `label` in the neighborhood. Due to the stochastic nature of the
        LISA analysis, the clustering results may wary marginally between runs if seed is
        changed. This is due to the random selection of the permutations in the
        local Moran analysis.

    Parameters:
        cells (gpd.GeoDataFrame):
            The GeoDataFrame with the cells.
        cell_type (str):
            The class name of the cells to cluster.
        graph_type (str):
            The type of graph to fit. Options are "delaunay", "knn" and "distband".
        dist_thresh (int):
            The distance threshold to use for the graph.
        min_size (int):
            The minimum size of the cluster to assign a label.
        seed (int):
            The random seed to use in the Moran_Local analysis.
        spatial_weights (W):
            The spatial weights object to use in the analysis.
            If None, the spatial weights are calculated.

    Returns:
        clustered_cells (gpd.GeoDataFrame):
            The GeoDataFrame with the clustered cells.

    Examples:
        Cluster the inflammatory cells in a GeoDataFrame.

        >>> from cellseg_gsontools.clustering import cluster_cells
        >>> from cellseg_gsontools.utils import read_gdf
        >>> cells = read_gdf("cells.geojson")
        >>> clustered_cells = cluster_cells(cells, cell_type="inflammatory", seed=42)
            class_name    geometry                            lisa_label    label
        uid
        0    inflammatory  POLYGON ((64.00 115.020, 69.010 ...  HH            0
        1    inflammatory  POLYGON ((65.00 15.020, 61.010 ...   HH            0
        2   inflammatory  POLYGON ((66.00 110.020, 69.010 ...   HH            2

    """
    # Find the LISA clusters
    lisa_labels, w = find_lisa_clusters(
        cells,
        label=cell_type,
        graph_type=graph_type,
        dist_thresh=dist_thresh,
        seed=seed,
        spatial_weights=spatial_weights,
    )
    cells["lisa_label"] = lisa_labels

    # Select the HH clusters
    clustered_cells = cells.loc[
        (cells["class_name"] == cell_type) & (cells["lisa_label"] == "HH")
    ]
    clustered_cells = clustered_cells.assign(label=-1)

    # Get the connected components
    sub_graphs = get_connected_components(clustered_cells, w)
    clustered_cells = label_connected_components(
        clustered_cells, sub_graphs, "label", min_size=min_size
    )
    clustered_cells.set_crs(4328, inplace=True, allow_override=True)

    # drop too small clusters
    clustered_cells = clustered_cells.loc[clustered_cells["label"] != -1]

    return clustered_cells