Complete reinforcement learning pipeline with Q-learning agent and real-time visualization

Timeline
2024
Category
Reinforcement Learning
Platform
Desktop
Overview
This project implements a complete reinforcement learning pipeline combining algorithm development, environment simulation, and interactive visualization. The agent employs Q-learning with an epsilon-greedy exploration strategy to learn optimal navigation policies in a dynamic grid world.
Key Features
- Q-Learning Algorithm: Implements epsilon-greedy exploration strategy for optimal policy learning
- Reward Shaping: Proximity-based bonuses and out-of-bounds penalties guide learning
- Real-Time Visualization: SFML-based UI displays grid layout, agent trajectory, and reward heatmaps
- Training Analytics: Live monitoring of Q-value distributions and episode metrics
- Interactive Controls: Pause/resume training, manual agent control, and adjustable training speed
Technical Implementation
Built with C++ and SFML for high-performance rendering and real-time interaction. The Q-learning implementation uses temporal difference learning with customizable hyperparameters. The visualization system provides comprehensive insights into how the agent's policy evolves over episodes.
Learning Outcomes
This integrated system demonstrates practical applications of Q-learning in a visually engaging environment, making reinforcement learning concepts tangible through interactive exploration and real-time feedback.