AI-based Fashion Recommendation System

Fashion-Insta, a company in the fashion industry, aims to launch a mobile application that leverages artificial intelligence to provide users with fashion recommendations. The app allows users to take photos of their favorite outfits and receive suggestions for similar clothing items. As an AI Product Manager, you are responsible for leading this project, working under the guidance of Alicia, the VP of Product. The app will utilize cloud technologies from Microsoft Azure, and the project follows an agile SCRUM methodology. The project scope includes defining user stories, prioritizing them, estimating costs, and ensuring GDPR compliance, particularly regarding personal data collected from user images. A key objective is to demonstrate the app’s profitability and technical feasibility to the executive committee (COMEX) within three weeks. Your responsibilities include presenting a detailed project plan, outlining potential risks, and providing mitigation strategies to gain approval for this innovative AI-driven fashion application.

Cityscapes Semantic Segmentation Training Script (U-Net & DeepLabV3+)

Introduction

This repository provides a complete pipeline for semantic segmentation on the Cityscapes dataset using TensorFlow/Keras. It supports two state-of-the-art architectures:

  • U-Net (with optional ImageNet-pretrained backbone)
  • DeepLabV3+ (with ResNet50 backbone)

The code is fully modular, covering data loading, preprocessing, model building, training, and inference. Segmentation masks are returned as base64-encoded PNGs, making the pipeline API-ready (e.g., for Flask or FastAPI deployments).

Features

  • Data Preprocessing: Extracts Cityscapes images and ground-truth annotations, remaps the 30 fine-grained classes into 8 broad categories (flat, construction, object, nature, sky, human, vehicle, void), and provides a configurable data generator with augmentation.

  • Model Architectures:

    • U-Net: Classic encoder–decoder with skip connections; supports training from scratch or with a pretrained MobileNetV2 encoder.
    • DeepLabV3+: Modern atrous convolutional model with ASPP and decoder refinement, built on a ResNet50 backbone.
  • Training Pipeline: Implements data generators, compiles models with SparseCategoricalCrossentropy loss and a custom Mean IoU metric, and uses callbacks (ModelCheckpoint, EarlyStopping).

  • Inference Module: Runs segmentation on arbitrary input images and returns color-coded masks encoded as base64 PNGs for seamless API integration.

  • Modularity & Best Practices: Organized into reusable functions (load_data, build_unet, build_deeplabv3p, train_model, infer_image), with no cloud-specific dependencies.

Repository Structure

├── data/
│   ├── leftImg8bit/         # Extracted Cityscapes images (train/val/test)
│   └── gtFine/              # Extracted Cityscapes annotations (train/val/test)
├── notebooks/               # Optional Jupyter notebooks
├── scripts/
│   ├── data_utils.py        # Data extraction & generator
│   ├── models.py            # U-Net & DeepLabV3+ definitions
│   ├── train.py             # Training entry point
│   └── infer.py             # Inference & API integration
├── requirements.txt         # Python dependencies
└── README.md

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/cityscapes-segmentation.git
    cd cityscapes-segmentation
  2. Install dependencies (tested on Python 3.8+):

    pip install -r requirements.txt
  3. Download and extract the Cityscapes dataset zip (P8_Cityscapes_gtFine_trainvaltest.zip) into the data/ directory:

    unzip P8_Cityscapes_gtFine_trainvaltest.zip -d data/

Usage

1. Prepare the Data

Scripts will automatically locate images under data/leftImg8bit/ and annotations under data/gtFine/. They remap label IDs into 8 categories via a configurable mapping in scripts/data_utils.py.

2. Training

Train the model of your choice (U-Net or DeepLabV3+):

# U-Net from scratch
python scripts/train.py --arch unet --pretrained False --batch_size 4 --epochs 50
 
# DeepLabV3+ with pretrained ResNet50
python scripts/train.py --arch deeplabv3p --pretrained True --batch_size 4 --epochs 50

Checkpoints and logs will be saved to the working directory. The best model (by validation Mean IoU) is stored as best_model.h5.

3. Inference

Generate a segmentation mask for a new image:

python scripts/infer.py --model_path best_model.h5 --input_image path/to/image.png --output_base64

This prints a base64-encoded PNG string representing the color-coded segmentation mask. You can save it with:

import base64
with open("mask.png", "wb") as f:
    f.write(base64.b64decode(your_base64_string))

API Integration Example (FastAPI)

from fastapi import FastAPI, File, UploadFile
from scripts.infer import infer_image, load_model
 
app = FastAPI()
model = load_model("best_model.h5")
 
@app.post("/segment")
async def segment_image(file: UploadFile = File(...)):
    img_bytes = await file.read()
    # Temporarily save or convert bytes
    mask_b64 = infer_image(model, img_bytes)
    return {"segmentation_mask": mask_b64}

Configuration

All hyperparameters (input size, batch size, learning rate, augmentation) can be adjusted in scripts/train.py and scripts/data_utils.py. Model definitions in scripts/models.py expose flags for pretrained encoders.

Contributing

Contributions are welcome! Please open issues for bug reports or feature requests, and submit pull requests for fixes and enhancements.

License

This project is licensed under the MIT License. See the LICENSE file for details.

References

  1. Cityscapes Dataset: "The Cityscapes Dataset for Semantic Urban Scene Understanding"
  2. Ronneberger et al., "U-Net: Convolutional Networks for Biomedical Image Segmentation" (2015)
  3. Chen et al., "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation" (DeepLabV3+) (2018)