Comparison of different models Bundesliga Dataset

Author

Oliver Dürr

The experiments take some time to run, therefore we used the R-Script to producte the results https://github.com/oduerr/da/blob/master/website/Euro24/eval_performance_runner.R.

Loading the data

Code
  df = read.csv('~/Documents/GitHub/da/website/Euro24/eval_performance_bundesliga_20.csv')
  df <- mutate(df, name = sub("^.*/", "", name))
  
  
  df_raw = read.csv('~/Documents/GitHub/da/website/Euro24/bundesliga2000.csv')
  head(df_raw) %>% kable()
Div Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR Attendance Referee HS AS HST AST HHW AHW HC AC HF AF HO AO HY AY HR AR HBP ABP GBH GBD GBA IWH IWD IWA LBH LBD LBA SBH SBD SBA WHH WHD WHA
D1 11/08/00 Dortmund Hansa Rostock 1 0 H 0 0 D 61000 J<f6>rg Kessler 17 5 7 2 0 0 7 3 25 19 4 8 1 5 0 0 10 50 1.5 3.4 5.0 1.45 3.5 5.0 NA NA NA 1.50 3.50 6.00 1.44 3.6 6.5
D1 12/08/00 Bayern Munich Hertha 4 1 H 1 0 H 57000 Markus Merk 14 11 6 5 1 0 4 9 13 12 3 3 1 0 0 0 10 0 1.3 4.5 6.0 1.45 3.5 5.0 NA NA NA 1.40 3.75 7.00 1.44 3.6 6.5
D1 12/08/00 Freiburg Stuttgart 4 0 H 2 0 H 22500 Helmut Krug 15 18 7 5 0 0 4 7 22 17 0 0 1 1 0 0 10 10 2.4 3.1 2.5 2.30 2.9 2.5 NA NA NA 2.60 3.25 2.38 2.40 3.2 2.5
D1 12/08/00 Hamburg Munich 1860 2 2 D 2 2 D 35000 Herbert Fandel 18 9 5 7 1 0 5 3 0 0 0 0 2 2 0 1 20 45 1.8 3.3 3.8 1.80 3.0 3.5 NA NA NA 1.75 3.30 4.00 1.66 3.3 4.5
D1 12/08/00 Kaiserslautern Bochum 0 1 A 0 0 D 38000 Helmut Fleischer 11 5 2 2 0 0 5 5 9 8 0 0 1 0 0 0 10 0 1.5 3.4 4.6 1.45 3.5 5.0 NA NA NA 1.44 3.80 6.00 1.50 3.6 5.5
D1 12/08/00 Leverkusen Wolfsburg 2 0 H 2 0 H 22500 Edgar Steinborn 18 5 5 5 0 0 6 4 25 22 0 0 0 2 0 0 0 20 1.4 4.0 5.5 1.55 3.3 4.5 NA NA NA 1.44 3.80 6.00 1.44 3.6 6.5

Model Comparisons

We use the negative log likelihood (NLL) as a measure of the predictive performance of the models. The lower the NLL, the better the model. However, strictly speaking it is the negative log posterior predictive density (divided by \(n\)) evaluated at the \(n\) games after the training data.

\[ \text{NLL} = -\frac{1}{n}\sum_{i=1}^n \log p(y_i | x_i, \theta) \]

Code
   # Assuming df is your dataframe
  df %>% filter(type == 'NLL_PRED') %>% 
    ggplot(aes(x = ntrain, y = res, color = name)) + 
    geom_line() + 
    geom_point() + 
    theme_minimal() + 
    labs(
      title = 'Comparison of different models for the Bundesliga 2000 dataset', 
      x = 'Number of training data', 
      y = 'Negative Log Likelihood'
    ) + 
    ylim(2.5, 4) +
    theme(legend.position = "bottom") +
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Observations

  • Especially for small training data, the hierarchical model performs better than the non-hierarchical model.

  • The Correlated Dataset model performs slightly better than non-correlated one

  • There is partically no difference in predictive performance when comparing the model with and without Cholesky decomposition.

  • The negative binomial model performs comparable to Poisson model.

Comparison of predicted vs PSIS-LOO

Code
  df %>% filter(type %in% c('NLL_PRED', 'NLL_PSIS', 'NLL_PRED_STAN')) %>%
    ggplot(aes(x = ntrain, y = res, color = type)) + 
    geom_line(aes(linetype = type)) + 
    geom_point() + 
    theme_minimal() + 
    labs(
      title =  'Comparison of different models for the Bundesliga 2000 dataset',
      x = 'Number of training data', 
      y = 'Negative Log Likelihood'
    ) + 
    ylim(2.5, 4) +
    facet_wrap(~name) +
    theme(legend.position = "bottom") + 
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Observations

NNL for win, draws and losses

Code
  df %>% 
    filter(type %in% c('NLL_RESULTS', 'NLL_BOOKIE')) %>%
    ggplot(aes(x = ntrain, y = res, color = type)) + 
    geom_line(aes(linetype = name)) + 
    geom_point() + 
    theme_minimal() + 
    labs(
      title = 'Comparison of different models for the Bundesliga 2000 dataset',
      x = 'Number of training data', 
      y = 'Negative Log Likelihood'
    ) + 
    ylim(0.75, 1.5) +
    theme(legend.position = "bottom") + 
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Observations

  • The NLL for the bookie is always better than the NLL of the models, so we should not bet.

Betting Returns

Code
  df %>% 
    filter(type %in% c('BET_RETURN')) %>%
    ggplot(aes(x = ntrain, y = res, color = name)) + 
    geom_line(aes(linetype = name)) +  
    geom_point() + 
    theme_minimal() + 
    labs(
      title = 'Comparison of different models for the Bundesliga 2000 dataset',
      x = 'Number of training data', 
      y = 'Betting Returns'
    ) + 
    #ylim(0.75, 1.5) +
    theme(legend.position = "bottom") + 
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Observations

We see quite some fluctuation in the betting return. Since the NLL shows that the odds from the booki are always better then the NLLs of the models we should not bet.

Technical Details

Code
  df %>% 
    filter(type %in% c('MIN_SUM_PROB')) %>%
    ggplot(aes(x = ntrain, y = res, color = name)) + 
    geom_line(aes(linetype = name)) + 
    geom_point() + 
    theme_minimal() + 
    labs(
      title = 'Comparison of different models for the Bundesliga 2000 dataset',
      x = 'Number of training data', 
      y = 'Sum of Probabilities from 0 to 10 goals (should be 1)'
    ) + 
    ylim(0.75, 1.01) +
    theme(legend.position = "bottom") + 
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Code
  df %>% 
    filter(type %in% c('num_divergent')) %>%
    ggplot(aes(x = ntrain, y = res, color = name)) + 
    geom_line(aes(linetype = name)) + 
    geom_point() + 
    theme_minimal() + 
    labs(
      title = 'Comparison of different models for the Bundesliga 2000 dataset',
      x = 'Number of training data', 
      y = 'Number of Divergent Transitions (sqrt scale)'
    ) + 
    theme(legend.position = "bottom") + 
    scale_y_sqrt() + 
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Code
df %>% 
  filter(type %in% c('ebfmi')) %>%
  ggplot(aes(x = ntrain, y = res, color = name)) + 
  geom_line(aes(linetype = name)) + 
  geom_point() + 
  theme_minimal() + 
  labs(
    title = 'Comparison of different models for the Bundesliga 2000 dataset',
    x = 'Number of training data', 
    y = 'ebfmi' 
  ) + 
  theme(legend.position = "bottom") + 
  geom_hline(yintercept = 0.3, linetype = "dashed", color = "red") + 
  annotate("text", x = Inf, y = 0.33, label = "Acceptable", hjust = 1.1, color = "red") + 
  annotate("text", x = Inf, y = 0.27, label = "Non-Acceptable", hjust = 1.1, color = "red") +
  coord_cartesian(clip = "off") # Allow lines to go outside the plot area