EyeWitness
Cross-camera identity matching with transformer attention that remembers the whole person, not just local patches.
Learn robust person embeddings that remain stable under viewpoint shift and partial occlusion.
Use global transformer self-attention to preserve long-range appearance cues missed by CNNs.
Connect this baseline to the broader occluded-ReID research pipeline with diffusion-guided inpainting.
The Project Brief
A transformer-based framework for person re-identification (ReID) — the task of matching pedestrian identities across non-overlapping camera views, a core challenge in intelligent surveillance and autonomous systems.
Problem: CNN-based ReID methods struggle with long-range spatial dependencies, occlusion, and significant viewpoint variation. Transformer architectures, with their global self-attention mechanism, are well-suited to capture holistic appearance representations.
Approach:
- Vision Transformer (ViT) backbone pre-trained on large-scale image datasets, fine-tuned for the ReID objective.
- Triplet loss training with hard negative mining for discriminative embedding learning.
- Evaluated on standard ReID benchmarks for cross-view matching accuracy.
Relation to broader research: This project directly informs ongoing work on occluded person re-identification using latent diffusion-guided feature inpainting.
Stack: Python · PyTorch · Vision Transformer (ViT) · CUDA
Key Components
Project Highlights
Transformer-based person ReID implementation, positioned as a foundation for occluded and diffusion-augmented ReID research.
Focus on transformer features for cross-camera matching stability under occlusion and viewpoint change.
Dataset protocol, mAP/Rank-1 metrics tables, and visual retrieval gallery outputs.
Quickstart
"In a sea of strangers, the right architecture never loses sight of who it's looking for."