Skip to content

find_lisa_clusters

cellseg_gsontools.clustering.find_lisa_clusters(gdf, label, graph_type='distband', dist_thresh=100, permutations=100, seed=42, spatial_weights=None)

Calculate LISA clusters of objects with class_name=label.

Note

LISA is short for local indicator of spatial association. You can read more, for example, from: - https://geodacenter.github.io/workbook/6a_local_auto/lab6a.html#lisa-principle. The LISA clusters are calculated using the local Moran analysis. The cluster labels are set to HH, LL, LH, HL.

Note

In this function, the local statistic used to form the clusters is the fraction of objects of type label in the neighborhood times the absolute number of the objects of type label in the neighborhood. Due to the stochastic nature of the LISA analysis, the clustering results may wary marginally between runs if seed is changed. This is due to the random selection of the permutations in the local Moran analysis.

Parameters:

Name Type Description Default
gdf GeoDataFrame

The GeoDataFrame with the objects to calculate the LISA clusters of.

required
label str

The class name to calculate the LISA clusters of.

required
graph_type str

The type of graph to fit. Options are "delaunay", "knn" and "distband".

'distband'
dist_thresh int

The distance threshold to use for the graph.

100
permutations int

The number of permutations to use in the Moran_Local analysis.

100
seed int

The random seed to use in the Moran_Local analysis.

42
spatial_weights W

The spatial weights object to use in the analysis. If None, the spatial weights are calculated.

None

Returns:

Name Type Description
labels List[int]

The cluster labels of the objects.

w W

The spatial weights object used in the analysis.

Examples:

Find the LISA clusters of inflammatory cells in a GeoDataFrame.

>>> from cellseg_gsontools.clustering import find_lisa_clusters
>>> from cellseg_gsontools.utils import read_gdf
>>> cells = read_gdf("cells.geojson")
>>> labels, w = find_lisa_clusters(cells, label="inflammatory", seed=42)
Source code in cellseg_gsontools/clustering.py
def find_lisa_clusters(
    gdf: gpd.GeoDataFrame,
    label: str,
    graph_type: str = "distband",
    dist_thresh: int = 100,
    permutations: int = 100,
    seed: int = 42,
    spatial_weights: W = None,
) -> Tuple[List[int], W]:
    """Calculate LISA clusters of objects with `class_name=label`.

    Note:
        LISA is short for local indicator of spatial association. You can read more,
        for example, from:
        - https://geodacenter.github.io/workbook/6a_local_auto/lab6a.html#lisa-principle.
        The LISA clusters are calculated using the local Moran analysis. The cluster
        labels are set to HH, LL, LH, HL.

    Note:
        In this function, the local statistic used to form the clusters is the fraction
        of objects of type `label` in the neighborhood times the absolute number of the
        objects of type `label` in the neighborhood. Due to the stochastic nature of the
        LISA analysis, the clustering results may wary marginally between runs if seed is
        changed. This is due to the random selection of the permutations in the
        local Moran analysis.

    Parameters:
        gdf (gpd.GeoDataFrame):
            The GeoDataFrame with the objects to calculate the LISA clusters of.
        label (str):
            The class name to calculate the LISA clusters of.
        graph_type (str):
            The type of graph to fit. Options are "delaunay", "knn" and "distband".
        dist_thresh (int):
            The distance threshold to use for the graph.
        permutations (int):
            The number of permutations to use in the Moran_Local analysis.
        seed (int):
            The random seed to use in the Moran_Local analysis.
        spatial_weights (W):
            The spatial weights object to use in the analysis.
            If None, the spatial weights are calculated.

    Returns:
        labels (List[int]):
            The cluster labels of the objects.
        w (W):
            The spatial weights object used in the analysis.

    Examples:
        Find the LISA clusters of inflammatory cells in a GeoDataFrame.
        >>> from cellseg_gsontools.clustering import find_lisa_clusters
        >>> from cellseg_gsontools.utils import read_gdf
        >>> cells = read_gdf("cells.geojson")
        >>> labels, w = find_lisa_clusters(cells, label="inflammatory", seed=42)
    """
    try:
        import esda
    except ImportError:
        raise ImportError(
            "This function requires the esda package to be installed."
            "Install it with: pip install esda"
        )

    if spatial_weights is not None:
        w = spatial_weights
    else:
        # Fit the distband
        w = fit_graph(
            gdf,
            type=graph_type,
            id_col="uid",
            thresh=dist_thresh,
        )

        # Row-standardized weights
        w.transform = "R"

    # Get the neihgboring nodes of the graph
    func = partial(neighborhood, spatial_weights=w)
    gdf["nhood"] = gdf_apply(gdf, func, columns=["uid"])

    # Get the classes of the neighboring nodes
    func = partial(nhood_vals, values=gdf["class_name"])
    gdf["nhood_classes"] = gdf_apply(
        gdf,
        func=func,
        parallel=True,
        columns=["nhood"],
    )

    # Get the number of inflammatory gdf in the neighborhood
    func = partial(nhood_type_count, cls=label, frac=False)
    gdf[f"{label}_cnt"] = gdf_apply(
        gdf,
        func=func,
        parallel=True,
        columns=["nhood_classes"],
    )

    # Get the fraction of objs of type `label` gdf in the neighborhood
    func = partial(nhood_type_count, cls=label, frac=True)
    gdf[f"{label}_frac"] = gdf_apply(
        gdf,
        func=func,
        parallel=True,
        columns=["nhood_classes"],
    )

    # This will smooth the extremes (e.g. if there is only one cell of type label in the
    # neighborhood, the fraction will be 1)
    gdf[f"{label}_index"] = gdf[f"{label}_frac"] * gdf[f"{label}_cnt"]

    # Standardize the index
    gdf[f"{label}_index_normed"] = gdf[f"{label}_index"] - gdf[f"{label}_index"].mean()

    # Find lisa clusters
    gdf[gdf[f"{label}_index_normed"] > 0][f"{label}_cnt"].value_counts(sort=False)

    gdf[f"{label}_index_lag"] = lag_spatial(w, gdf[f"{label}_index_normed"].values)

    lisa = esda.Moran_Local(
        gdf[f"{label}_index_normed"],
        w,
        island_weight=np.nan,
        seed=seed,
        permutations=permutations,
    )

    # Classify the gdf to HH, LL, LH, HL
    clusters = moran_hot_cold_spots(lisa)

    cluster_labels = ["ns", "HH", "LH", "LL", "HL"]
    labels = [cluster_labels[i] for i in clusters]

    return labels, w