Comparison of different models Bundesliga Dataset

Author

Oliver Dürr

The experiments take some time to run, therefore we used the R-Script to producte the results https://github.com/oduerr/da/blob/master/website/Euro24/eval_performance_runner.R.

Loading the data

Code

  df = read.csv('~/Documents/GitHub/da/website/Euro24/eval_performance_bundesliga_20.csv')
  df <- mutate(df, name = sub("^.*/", "", name))
  
  
  df_raw = read.csv('~/Documents/GitHub/da/website/Euro24/bundesliga2000.csv')
  head(df_raw) %>% kable()

Div	Date	HomeTeam	AwayTeam	FTHG	FTAG	FTR	HTHG	HTAG	HTR	Attendance	Referee	HS	AS	HST	AST	HHW	HC	AC	HF	AF	HO	AO	HY	AY	AR	HBP	ABP	GBH	GBD	GBA	IWH	IWD	IWA	LBH	LBD	LBA	SBH	SBD	SBA	WHH	WHD	WHA
D1	11/08/00	Dortmund	Hansa Rostock	1	0	H	0	0	D	61000	J<f6>rg Kessler	17	5	7	2	0	7	3	25	19	4	8	1	5	0	10	50	1.5	3.4	5.0	1.45	3.5	5.0	NA	NA	NA	1.50	3.50	6.00	1.44	3.6	6.5
D1	12/08/00	Bayern Munich	Hertha	4	1	H	1	0	H	57000	Markus Merk	14	11	6	5	1	4	9	13	12	3	3	1	0	0	10	0	1.3	4.5	6.0	1.45	3.5	5.0	NA	NA	NA	1.40	3.75	7.00	1.44	3.6	6.5
D1	12/08/00	Freiburg	Stuttgart	4	0	H	2	0	H	22500	Helmut Krug	15	18	7	5	0	4	7	22	17	0	0	1	1	0	10	10	2.4	3.1	2.5	2.30	2.9	2.5	NA	NA	NA	2.60	3.25	2.38	2.40	3.2	2.5
D1	12/08/00	Hamburg	Munich 1860	2	2	D	2	2	D	35000	Herbert Fandel	18	9	5	7	1	5	3	0	0	0	0	2	2	1	20	45	1.8	3.3	3.8	1.80	3.0	3.5	NA	NA	NA	1.75	3.30	4.00	1.66	3.3	4.5
D1	12/08/00	Kaiserslautern	Bochum	0	1	A	0	0	D	38000	Helmut Fleischer	11	5	2	2	0	5	5	9	8	0	0	1	0	0	10	0	1.5	3.4	4.6	1.45	3.5	5.0	NA	NA	NA	1.44	3.80	6.00	1.50	3.6	5.5
D1	12/08/00	Leverkusen	Wolfsburg	2	0	H	2	0	H	22500	Edgar Steinborn	18	5	5	5	0	6	4	25	22	0	0	0	2	0	0	20	1.4	4.0	5.5	1.55	3.3	4.5	NA	NA	NA	1.44	3.80	6.00	1.44	3.6	6.5

Model Comparisons

We use the negative log likelihood (NLL) as a measure of the predictive performance of the models. The lower the NLL, the better the model. However, strictly speaking it is the negative log posterior predictive density (divided by \(n\)) evaluated at the \(n\) games after the training data.

\[ \text{NLL} = -\frac{1}{n}\sum_{i=1}^n \log p(y_i | x_i, \theta) \]

Code

   # Assuming df is your dataframe
  df %>% filter(type == 'NLL_PRED') %>% 
    ggplot(aes(x = ntrain, y = res, color = name)) + 
    geom_line() + 
    geom_point() + 
    theme_minimal() + 
    labs(
      title = 'Comparison of different models for the Bundesliga 2000 dataset', 
      x = 'Number of training data', 
      y = 'Negative Log Likelihood'
    ) + 
    ylim(2.5, 4) +
    theme(legend.position = "bottom") +
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Observations

Especially for small training data, the hierarchical model performs better than the non-hierarchical model.
The Correlated Dataset model performs slightly better than non-correlated one
There is partically no difference in predictive performance when comparing the model with and without Cholesky decomposition.
The negative binomial model performs comparable to Poisson model.

Comparison of predicted vs PSIS-LOO

Code

  df %>% filter(type %in% c('NLL_PRED', 'NLL_PSIS', 'NLL_PRED_STAN')) %>%
    ggplot(aes(x = ntrain, y = res, color = type)) + 
    geom_line(aes(linetype = type)) + 
    geom_point() + 
    theme_minimal() + 
    labs(
      title =  'Comparison of different models for the Bundesliga 2000 dataset',
      x = 'Number of training data', 
      y = 'Negative Log Likelihood'
    ) + 
    ylim(2.5, 4) +
    facet_wrap(~name) +
    theme(legend.position = "bottom") + 
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Observations

NNL for win, draws and losses

Code

  df %>% 
    filter(type %in% c('NLL_RESULTS', 'NLL_BOOKIE')) %>%
    ggplot(aes(x = ntrain, y = res, color = type)) + 
    geom_line(aes(linetype = name)) + 
    geom_point() + 
    theme_minimal() + 
    labs(
      title = 'Comparison of different models for the Bundesliga 2000 dataset',
      x = 'Number of training data', 
      y = 'Negative Log Likelihood'
    ) + 
    ylim(0.75, 1.5) +
    theme(legend.position = "bottom") + 
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Observations

The NLL for the bookie is always better than the NLL of the models, so we should not bet.

Betting Returns

Code

  df %>% 
    filter(type %in% c('BET_RETURN')) %>%
    ggplot(aes(x = ntrain, y = res, color = name)) + 
    geom_line(aes(linetype = name)) +  
    geom_point() + 
    theme_minimal() + 
    labs(
      title = 'Comparison of different models for the Bundesliga 2000 dataset',
      x = 'Number of training data', 
      y = 'Betting Returns'
    ) + 
    #ylim(0.75, 1.5) +
    theme(legend.position = "bottom") + 
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Observations

We see quite some fluctuation in the betting return. Since the NLL shows that the odds from the booki are always better then the NLLs of the models we should not bet.

Technical Details

Code

  df %>% 
    filter(type %in% c('MIN_SUM_PROB')) %>%
    ggplot(aes(x = ntrain, y = res, color = name)) + 
    geom_line(aes(linetype = name)) + 
    geom_point() + 
    theme_minimal() + 
    labs(
      title = 'Comparison of different models for the Bundesliga 2000 dataset',
      x = 'Number of training data', 
      y = 'Sum of Probabilities from 0 to 10 goals (should be 1)'
    ) + 
    ylim(0.75, 1.01) +
    theme(legend.position = "bottom") + 
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Code

  df %>% 
    filter(type %in% c('num_divergent')) %>%
    ggplot(aes(x = ntrain, y = res, color = name)) + 
    geom_line(aes(linetype = name)) + 
    geom_point() + 
    theme_minimal() + 
    labs(
      title = 'Comparison of different models for the Bundesliga 2000 dataset',
      x = 'Number of training data', 
      y = 'Number of Divergent Transitions (sqrt scale)'
    ) + 
    theme(legend.position = "bottom") + 
    scale_y_sqrt() + 
    coord_cartesian(clip = "off") # Allow lines to go outside the plot area

Code

df %>% 
  filter(type %in% c('ebfmi')) %>%
  ggplot(aes(x = ntrain, y = res, color = name)) + 
  geom_line(aes(linetype = name)) + 
  geom_point() + 
  theme_minimal() + 
  labs(
    title = 'Comparison of different models for the Bundesliga 2000 dataset',
    x = 'Number of training data', 
    y = 'ebfmi' 
  ) + 
  theme(legend.position = "bottom") + 
  geom_hline(yintercept = 0.3, linetype = "dashed", color = "red") + 
  annotate("text", x = Inf, y = 0.33, label = "Acceptable", hjust = 1.1, color = "red") + 
  annotate("text", x = Inf, y = 0.27, label = "Non-Acceptable", hjust = 1.1, color = "red") +
  coord_cartesian(clip = "off") # Allow lines to go outside the plot area