Skip to content

shannon_index

cellseg_gsontools.diversity.shannon_index(counts)

Compute the Shannon Weiner index/entropy on a count vector.

Note

"The Shannon index is related to the concept of uncertainty. If for example, a community has very low diversity, we can be fairly certain of the identity of an organism we might choose by random (high certainty or low uncertainty). If a community is highly diverse and we choose an organism by random, we have a greater uncertainty of which species we will choose (low certainty or high uncertainty)." - A. Wilson, N. Gownaris

Shannon index: $$ H^{\prime} = -\sum_{i=1}^n p_i \ln(p_i) $$

where \(p_i\) is the proportion of species \(i\) and \(n\) is the total count of species.

Parameters:

Name Type Description Default
counts Sequence

A count vector/list of shape (C, ).

required

Returns:

Name Type Description
float float

The computed Shannon diversity index.

Source code in cellseg_gsontools/diversity.py
def shannon_index(counts: Sequence) -> float:
    """Compute the Shannon Weiner index/entropy on a count vector.

    Note:
        "*The Shannon index is related to the concept of uncertainty. If for example,
        a community has very low diversity, we can be fairly certain of the identity of
        an organism we might choose by random (high certainty or low uncertainty). If a
        community is highly diverse and we choose an organism by random, we have a
        greater uncertainty of which species we will choose (low certainty or high
        uncertainty).*"
        - [A. Wilson, N. Gownaris](https://bio.libretexts.org/Courses/Gettysburg_College/01%3A_Ecology_for_All/22%3A_Biodiversity/22.02%3A_Diversity_Indices)

    **Shannon index:**
    $$
    H^{\\prime} = -\\sum_{i=1}^n p_i \\ln(p_i)
    $$

    where $p_i$ is the proportion of species $i$ and $n$ is the total count of species.

    Parameters:
        counts (Sequence):
            A count vector/list of shape (C, ).

    Returns:
        float:
            The computed Shannon diversity index.
    """
    N = np.sum(counts) + SMALL
    probs = [float(n) / N for n in counts]

    entropy = -np.sum([p * np.log(p) for p in probs if p != 0])

    if entropy == 0:
        return 0.0

    return entropy