Blackjack Reinforcement Learning

I built a Blackjack playing AI that learns entirely through experience by playing thousands of simulated games. Instead of relying on fixed rules, the agent learns when to hit or stick by observing which actions lead to better outcomes over time. As training continues, it begins to form strategies that closely match optimal Blackjack play, even in more challenging situations such as strong dealer cards. By the end of training, the agent reaches a stable win rate close to the theoretical best, showing how reinforcement learning can be used to train an AI to make effective decisions.

Overview

AI agent plays Blackjack through self learning
Learns strategy by trying thousands of simulations
No fixed rules, discovers behavior independently

Implementation

Uses Q learning to evaluate state action values
Selects actions using visit based exploration strategy
Learns when to hit or stick based on long term rewards

Results

Reached a win rate close to theoretical optimal
Handles soft hands and strong dealer cards better
Behavior becomes stable after enough training episodes

What I Learned

Reinforcement learning fundamentals
Exploration versus exploitation tradeoffs
Analyzing training patterns and agent performance

Built With:

Python, Gymnasium (Blackjack-v1 environment)

Blackjack Reinforcement Learning

Overview

Implementation

Results

What I Learned

Connect with me!

Download my resume