Cityscapes Semantic Segmentation Training Script (U-Net & DeepLabV3+)

Introduction

This repository provides a complete pipeline for semantic segmentation on the Cityscapes dataset using TensorFlow/Keras. It supports two state-of-the-art architectures:

U-Net (with optional ImageNet-pretrained backbone)
DeepLabV3+ (with ResNet50 backbone)

The code is fully modular, covering data loading, preprocessing, model building, training, and inference. Segmentation masks are returned as base64-encoded PNGs, making the pipeline API-ready (e.g., for Flask or FastAPI deployments).

Features

Data Preprocessing: Extracts Cityscapes images and ground-truth annotations, remaps the 30 fine-grained classes into 8 broad categories (flat, construction, object, nature, sky, human, vehicle, void), and provides a configurable data generator with augmentation.
Model Architectures:
- U-Net: Classic encoder–decoder with skip connections; supports training from scratch or with a pretrained MobileNetV2 encoder.
- DeepLabV3+: Modern atrous convolutional model with ASPP and decoder refinement, built on a ResNet50 backbone.
Training Pipeline: Implements data generators, compiles models with SparseCategoricalCrossentropy loss and a custom Mean IoU metric, and uses callbacks (ModelCheckpoint, EarlyStopping).
Inference Module: Runs segmentation on arbitrary input images and returns color-coded masks encoded as base64 PNGs for seamless API integration.
Modularity & Best Practices: Organized into reusable functions (load_data, build_unet, build_deeplabv3p, train_model, infer_image), with no cloud-specific dependencies.

Repository Structure

├── data/
│   ├── leftImg8bit/         # Extracted Cityscapes images (train/val/test)
│   └── gtFine/              # Extracted Cityscapes annotations (train/val/test)
├── notebooks/               # Optional Jupyter notebooks
├── scripts/
│   ├── data_utils.py        # Data extraction & generator
│   ├── models.py            # U-Net & DeepLabV3+ definitions
│   ├── train.py             # Training entry point
│   └── infer.py             # Inference & API integration
├── requirements.txt         # Python dependencies
└── README.md

Installation

Clone the repository:

git clone https://github.com/yourusername/cityscapes-segmentation.git
cd cityscapes-segmentation

Install dependencies (tested on Python 3.8+):
```
pip install -r requirements.txt
```
Download and extract the Cityscapes dataset zip (P8_Cityscapes_gtFine_trainvaltest.zip) into the data/ directory:
```
unzip P8_Cityscapes_gtFine_trainvaltest.zip -d data/
```

Usage

1. Prepare the Data

Scripts will automatically locate images under data/leftImg8bit/ and annotations under data/gtFine/. They remap label IDs into 8 categories via a configurable mapping in scripts/data_utils.py.

2. Training

Train the model of your choice (U-Net or DeepLabV3+):

# U-Net from scratch
python scripts/train.py --arch unet --pretrained False --batch_size 4 --epochs 50
 
# DeepLabV3+ with pretrained ResNet50
python scripts/train.py --arch deeplabv3p --pretrained True --batch_size 4 --epochs 50

Checkpoints and logs will be saved to the working directory. The best model (by validation Mean IoU) is stored as best_model.h5.

3. Inference

Generate a segmentation mask for a new image:

python scripts/infer.py --model_path best_model.h5 --input_image path/to/image.png --output_base64

This prints a base64-encoded PNG string representing the color-coded segmentation mask. You can save it with:

import base64
with open("mask.png", "wb") as f:
    f.write(base64.b64decode(your_base64_string))

API Integration Example (FastAPI)

from fastapi import FastAPI, File, UploadFile
from scripts.infer import infer_image, load_model
 
app = FastAPI()
model = load_model("best_model.h5")
 
@app.post("/segment")
async def segment_image(file: UploadFile = File(...)):
    img_bytes = await file.read()
    # Temporarily save or convert bytes
    mask_b64 = infer_image(model, img_bytes)
    return {"segmentation_mask": mask_b64}

Configuration

All hyperparameters (input size, batch size, learning rate, augmentation) can be adjusted in scripts/train.py and scripts/data_utils.py. Model definitions in scripts/models.py expose flags for pretrained encoders.

Contributing

Contributions are welcome! Please open issues for bug reports or feature requests, and submit pull requests for fixes and enhancements.

License

This project is licensed under the MIT License. See the LICENSE file for details.

References

Cityscapes Dataset: "The Cityscapes Dataset for Semantic Urban Scene Understanding"
Ronneberger et al., "U-Net: Convolutional Networks for Biomedical Image Segmentation" (2015)
Chen et al., "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation" (DeepLabV3+) (2018)