Remember To Play
When memory enters the Q-network, the agent stops guessing and starts planning.
Treat single-frame Atari as partially observable instead of fully observable — a more realistic framing.
Compare memoryless Q-learning versus recurrent Q-learning under controlled partial observability.
Evaluate exactly where temporal memory changes policy quality across different game dynamics.
The Project Brief
A systematic empirical study comparing Deep Q-Networks (DQN) and Deep Recurrent Q-Networks (DRQN) across standard and partially observable reinforcement learning environments.
Motivation: Atari games are typically treated as fully observable MDPs, but restricting the agent to a single frame induces partial observability — turning the problem into a POMDP. We hypothesize that LSTM-augmented networks can recover missing temporal context and outperform memoryless DQN under this constraint.
Contributions:
- Clean, reproducible PyTorch implementations of DQN (3-layer CNN) and DRQN (CNN + LSTM).
- Systematic comparison across Atari environments (Assault, Breakout) and CartPole under single-frame observation.
- Ablation over replay memory strategies: standard experience replay vs. episode-based sequential replay for DRQN.
- Quantitative evidence that recurrent memory provides measurable gains in partially observable settings.
Stack: Python · PyTorch · OpenAI Gymnasium · CUDA
Key Components
Project Highlights
README frames the study as a POMDP testbed and investigates whether recurrent memory improves decision quality under limited observations.
CartPole-v1, Assault-v5, and Breakout-v5 are evaluated with wrappers that induce partial observability.
Reported results show DRQN benefits in scenarios where temporal context is critical for high reward.
Quickstart
"Memory is not a luxury for an agent — it is the difference between reacting and reasoning."