EyeWitness | Pratyay Dutta

Concept Note

Learn robust person embeddings that remain stable under viewpoint shift and partial occlusion.

Use global transformer self-attention to preserve long-range appearance cues missed by CNNs.

Connect this baseline to the broader occluded-ReID research pipeline with diffusion-guided inpainting.

The Project Brief

A transformer-based framework for person re-identification (ReID) — the task of matching pedestrian identities across non-overlapping camera views, a core challenge in intelligent surveillance and autonomous systems.

Problem: CNN-based ReID methods struggle with long-range spatial dependencies, occlusion, and significant viewpoint variation. Transformer architectures, with their global self-attention mechanism, are well-suited to capture holistic appearance representations.

Approach:

Vision Transformer (ViT) backbone pre-trained on large-scale image datasets, fine-tuned for the ReID objective.
Triplet loss training with hard negative mining for discriminative embedding learning.
Evaluated on standard ReID benchmarks for cross-view matching accuracy.

Relation to broader research: This project directly informs ongoing work on occluded person re-identification using latent diffusion-guided feature inpainting.

Stack: Python · PyTorch · Vision Transformer (ViT) · CUDA

Techniques

Vision Transformer Triplet Loss Hard Negative Mining Cross-view Embeddings Person Re-identification

GitHub README

Key Components

👁️

ViT Backbone

Global self-attention over person patches

⚖️

Triplet Loss

Hard negative mining for tight embeddings

📷

Cross-Camera

Non-overlapping view matching

🔍

Re-ID Retrieval

Rank-1 / mAP gallery evaluation

Project Highlights

Repository Scope

Transformer-based person ReID implementation, positioned as a foundation for occluded and diffusion-augmented ReID research.

Primary Direction

Focus on transformer features for cross-camera matching stability under occlusion and viewpoint change.

Planned Expansion

Dataset protocol, mAP/Rank-1 metrics tables, and visual retrieval gallery outputs.

Quickstart

Clone repo and prepare environment according to dependency file or notebook imports.

Train/fine-tune transformer encoder with triplet-style objective on a standard ReID dataset.

Evaluate rank-based retrieval metrics and inspect qualitative retrieval panels.

"In a sea of strangers, the right architecture never loses sight of who it's looking for."