You are browsing as a guest. Sign up (or log in) to start making projects!

World Cup: Match Outcome Predictor

  • 1 Devlogs
  • 0 Total hours

Predicts winners of different FIFA World Cup matches, using data from last year's world cup (ELO Ratings, Recent Form, Goals Scored, and Home Advantage), and a logistic regression model trained on past 10 years of data.

Open comments for this post

18m 51s logged

Building a World Cup Match Outcome Predictor
The idea is simple: give it two national teams, get back a win probability for each side; actual math.

The model pulls from four inputs:

  1. ELO ratings (a numerical strength score that updates after every match)
  2. Recent form (average points earned across the last 5 games)
  3. Goal differential (how many a team scores vs. concedes on average)
  4. Venue advantage (neutral ground vs. true home)

All of that gets fed into a logistic regression model trained on 10 years of international match data.
The stack is pandas for cleaning and merging datasets, scikit-learn for the model, requests for pulling live match data, and seaborn for charts and heatmaps.
Right now I’m in the data pipeline phase. The trickiest part so far is merging ELO scores to matches by date; you have to make sure the model only sees ELO ratings that existed before each match was played, otherwise you’re leaking future information into the training data.
Next up is feature engineering, then model training, then actual predictions.

Building a World Cup Match Outcome Predictor
The idea is simple: give it two national teams, get back a win probability for each side; actual math.

The model pulls from four inputs:

  1. ELO ratings (a numerical strength score that updates after every match)
  2. Recent form (average points earned across the last 5 games)
  3. Goal differential (how many a team scores vs. concedes on average)
  4. Venue advantage (neutral ground vs. true home)

All of that gets fed into a logistic regression model trained on 10 years of international match data.
The stack is pandas for cleaning and merging datasets, scikit-learn for the model, requests for pulling live match data, and seaborn for charts and heatmaps.
Right now I’m in the data pipeline phase. The trickiest part so far is merging ELO scores to matches by date; you have to make sure the model only sees ELO ratings that existed before each match was played, otherwise you’re leaking future information into the training data.
Next up is feature engineering, then model training, then actual predictions.

Replying to @nathanaditya254

1
4

Followers

Loading…