NFL Win Probability Neural Network
A neural network built from scratch using NumPy to predict the probability that the home team wins an NFL game, given a snapshot of the game state. Implemented without PyTorch, TensorFlow, or autograd.

Overview
This project demonstrates a deep understanding of neural network fundamentals by implementing forward propagation, backpropagation, and gradient descent entirely from scratch. The model predicts NFL game outcomes based on real-time game state features.
What I Learned
- Implementing a neural network from scratch without high-level frameworks
- Understanding forward and backward propagation in detail
- Manual gradient computation using the chain rule
- Implementing gradient descent optimization
- Preventing data leakage in machine learning datasets
- Building a complete ML pipeline from data preprocessing to model evaluation
- Validating backpropagation with numerical gradient checking
Problem Definition
Task:
Predict the probability that the home team wins the game
Input:
A snapshot of the game at a single moment in time
Output:
A probability in the range [0, 1]
Label:
home_win = 1 if the home team eventually wins the game, otherwise 0
Dataset
The dataset is derived from NFL play-by-play data sourced from nflfastR via nflreadr. Each row represents a game state snapshot from which we make predictions.
| Feature | Description |
|---|---|
| score_diff | home_score minus away_score |
| seconds_remaining | Seconds remaining in the game |
| quarter | Game quarter (1 to 4) |
| down | Down (1 to 4) |
| yards_to_go | Yards needed for a first down |
| yardline_100 | Distance to opponent end zone |
| possession_is_home | 1 if home team has possession, else 0 |
Neural Network Architecture
This project uses a single hidden layer neural network with the following architecture:
X (N, D)
→ z1 = X @ W1 + b1
→ h = tanh(z1)
→ z2 = h @ W2 + b2
→ yhat = sigmoid(z2)
→ loss = binary_cross_entropy(yhat, y)
Why tanh and sigmoid?
- tanh introduces nonlinearity and has a simple derivative
- sigmoid maps logits to probabilities
- binary cross entropy pairs naturally with sigmoid
Backpropagation
Backpropagation is implemented manually using the chain rule. Every gradient is computed explicitly without relying on automatic differentiation libraries.
Key identity used:
dL/dz2 = (yhat - y) / N
From there, gradients flow backward to W2 and b2, then through tanh using (1 - h²), and finally to W1 and b1.