Introduction to Betting Data Modelling with R Dr

Introduction to Betting Data Modelling with R Dr Alun Owen • Whilst you are waiting can you please download the conference documents from: www. smartersig. com/conference/seminardocs. html • And save these to a folder on your PC • Then start R and open the script file named: Part 1. R Smartersig Conference September 2012

Introduction to Betting Data Modelling with R Dr Alun Owen Alun. J. Owen@gmail. com Smartersig Conference September 2012

Overview • Part 1: Saturday morning - use aiplus data for – Data exploration – Model development and model assessment • Part 2: Saturday afternoon and Sunday morning - use horse data to: – Practice what we have learned – Deploy a model for future race betting Smartersig Conference September 2012

Want do we want to Model? • Want to predict/model probability of winning • We observe whether a horse wins or not – this is a binary DEPENDENT variable – often recorded as 1 (for win) or 0 ( for did not win) • Interested in how this depends on: – Horse’s Sire SR, past performance of horse/jockey/trainer, days since LTO, age etc. – these are our PREDICTOR variables. • Binary logistic regression is a simple approach that allows us to do this Smartersig Conference September 2012

Binary Logistic Regression Model • This model can be stated as: x is a predictor such as Sire. SR p is the probability a horse will win p/(1 -p) is the odds a horse will win is the logarithm of the odds Smartersig Conference September 2012

Binary Logistic Regression Model • This model can be also be stated as: • • Here p is modelled as a curve This curve is typically S-shaped Often called the logistic curve Hence the name logistic regression! Smartersig Conference September 2012

Typical Logistic Curve Smartersig Conference September 2012

Compare with cd Plot for Sire. SR Smartersig Conference September 2012

Simple Model Involving Sire. SR • Can use R to fit the model i. e. determine values for the parameters b 0 and b 1 that provide a curve that “best fits” our data: win<-ifelse(position==1, 1, 0) sire. SR. trunc<-ifelse(sire. SR<=12, sire. SR, 12) glm(win~sire. SR. trunc, family=binomial) Smartersig Conference September 2012

R Output Hence model is: Smartersig Conference September 2012

Fitted Curve See notes in Word doc and R code in script file for how to produce this! Smartersig Conference September 2012

Prediction Using the Model • What is the model predicted win probability (and “fair” odds) for a horse running which has a sire. SR of 10%? • So odds = exp(-2. 35535) = 0. 095 Smartersig Conference September 2012

Prediction Using the Model Cont. • • So “fair” odds are 1 to 0. 095 i. e. 10. 5 to 1 11. 5 if include the stake Hence win probability = 1/11. 5 = 0. 087 Smartersig Conference September 2012

Predicting Probabilities for Several Horses in a Future Race ourmodel<-glm(win~sire. SR. trunc, family=binomial) new. sire. SR<-c(4, 8, 10, 12) newrace. sire. SR<data. frame(sire. SR. trunc=new. sire. SR) prob<-predict(ourmodel, newdata=newrace. sire. SR, type="response") odds<-1/prob Smartersig Conference September 2012

Model Assessment ourmodel<-glm(win~sire. SR. trunc, family=binomial) summary(ourmodel) Smartersig Conference September 2012

Extending Our Model • This model can be extended as: • R code to fit this model: ourmodel<-glm(win~sire. SR. trunc+days, binomial) Smartersig Conference September 2012

Extending Our Model Further • Our model can be extended further to: • But………. . need to treat position 1 as a factor • R code to fit this model: pos 1<-factor(position 1) ourmodel<glm(win~sire. SR. trunc+days+pos 1 , binomial) Smartersig Conference September 2012

Extending Our Model Further • Treating position 1 as a factor means model actually looks like: • • • pos 11 = 1 if position 1 = 1, otherwise pos 11 =0 pos 12 = 1 if position 1 = 2, otherwise pos 12 = 0 etc. Smartersig Conference September 2012

Our Final(? ) Model Smartersig Conference September 2012

Prediction Using the Final Model • What is model predicted win probability (and “fair” odds for a horse running which has a sire. SR = 10%, last ran 5 days ago, won its last race, was 3 rd two races ago and unplaced three races ago? • sire. SR=10, days=5 • pos 11=1, pos 12=0, pos 13=0, pos 14=0 • pos 21=1, pos 22=0, pos 23=1, pos 24=0 • pos 31=0, pos 32=0, pos 33=0, pos 34=0 Smartersig Conference September 2012

Prediction Using the Model Cont. 1 • Hence odds = exp(-1. 777) = 1 to 0. 169 • i. e. 5. 9 to 1 or 6. 9 if include stake • Hence prob = 1/6. 9 = 0. 145 Smartersig Conference September 2012

More on Model Assessment Smartersig Conference September 2012

More on Model Assessment Smartersig Conference September 2012

Problems with Binary Logistic Regression in Horse Race Modelling • Nothing to constrain the win probabilities so they sum to one across a race! • Model makes no use of information in structure of data re finishing positions. • i. e. it makes no use of fact that 2 nd placed horse beat 3 rd placed horse and so on…. • Multinomial Logistic Regression better but much more complicated! • So that is for another conference!? ? Smartersig Conference September 2012
- Slides: 24