Merging Segmentation Maps¶
Segmentation maps are usually split into multiple tiles to keep the file sizes manageable. The downside is that when the segmentation maps are split into multiple tiles the cells that are located on the edges of the tiles are split into two or more parts. This is not ideal for WSI-level downstream analysis, thus, merging the nuclei or tissue segmentation maps from multiple images into a single map is often necessary. In this tutorial, we will look at how to merge adjacent segmentation tiles into a single segmentation map.
Adjacet Nuclei Segmentation Tiles¶
Let's look at some nuclei segmentations files that are provided by the cellseg_gsontools
package. These segmentation maps are adjacent to each other.
from cellseg_gsontools.data import cell_merge_dir
# The cell_merge_dir() is a path to a dir that contains adjacent segmentation tiles
tiles = sorted(cell_merge_dir().glob("*"))
for f in tiles:
print(f.name)
x-41000_y-87000_cells.feather x-41000_y-88000_cells.feather x-41000_y-89000_cells.feather x-42000_y-86000_cells.feather x-42000_y-87000_cells.feather x-42000_y-88000_cells.feather x-43000_y-86000_cells.feather x-43000_y-87000_cells.feather x-43000_y-88000_cells.feather
As you can see, the starting x- and y-coordinates are encoded into the filenames of the tiles. The starting x- and y-coordinates are necessary if we want them to be merged.
Let's visualize what two of the segmentation maps look like when we naively just concatenate them together.
import pandas as pd
from cellseg_gsontools.utils import read_gdf
# Let's look what the data would look like if we don't merge
gdfs = []
for f in tiles[::3][-2:]: # read only two tiles for demo
gdf = read_gdf(f)
gdfs.append(gdf)
gdf_not_merged = pd.concat(gdfs)
gdf_not_merged.head()
type | geometry | class_name | |
---|---|---|---|
0 | Feature | POLYGON ((42769.000 86000.000, 42769.000 86005... | glandular_epithel |
1 | Feature | POLYGON ((42944.000 86000.000, 42944.000 86002... | connective |
2 | Feature | POLYGON ((42877.000 86003.000, 42876.000 86004... | connective |
3 | Feature | POLYGON ((42818.000 86005.000, 42815.000 86008... | inflammatory |
4 | Feature | POLYGON ((42913.000 86008.000, 42911.000 86010... | inflammatory |
gdf_not_merged.plot(
figsize=(14, 8),
column="class_name",
legend=True,
edgecolor="red",
)
<Axes: >
If you look closely up from the x coordinate 4300, you can see that some of the nuclei that are split in two (Zoom in).
Merging the Tiles¶
cellseg_gsontools
provides tools for merging adjacent tiles of either instance segmentation (cells/nuclei) or semantic segmentation (tissue) maps. The merging is handled by the classes CellMerger
and AreaMerger
. These are are initialized with a path to a directory containing the segmentation maps of the tiles. The filenames of the tiles are assumed to contain the starting x
and y
coordinates and the tiles are assumed to be of the same size. The merger classes have a merge_dir()
method that merges the tiles, caches the result in a geopandas.GeoDataFrame
and saves to either .geojson
, .feather
or .parquet
format.
from cellseg_gsontools.merging import CellMerger
# tile size needs to be specified when merging cell segmentation patches
merger = CellMerger(cell_merge_dir(), tile_size=(1000, 1000))
# MERGE CELLS
merger.merge_dir(verbose=True)
Processing file: x-43000_y-88000_cells.feather: 100%|██████████| 9/9 [00:03<00:00, 2.54it/s]
Saving the merged geojson file: None to `self.annots`
Let's visualize the merged segmentation map. All of the cells that were split in two or more parts are now merged.
merger.annots.plot(figsize=(10, 12), column="class_name", legend=True)
<Axes: >