Calibration from Probabilistic Classification Dr Oscar Olmedo Outline
Calibration from Probabilistic Classification Dr. Oscar Olmedo
Outline • Why calibrate ML probabilities • How to calibrate probabilities – Platt’s method – Isotonic Regression – Histogram binning
What is Calibration About • Many ML algorithms produce predicted probabilities that do not match empirical probabilities • Learning well-calibrated models has not been as extensively research as compared to research into models that discriminate well Naeini, 2016
Why Calibrate • Calibration is useful when probabilities of predictions are critical • Reduced bias for model comparison • People with asymmetric misclassification costs • Examples: – Finance – Marketing • Calibration may not always be necessary – If only interested in rank ordering of predictions – If only interested in an optimal split to get classes Naeini, 2016
ML algorithms and Calibration • Known to produced will-calibrated probabilities – Discriminant analysis – Logistic regression • Not so well-calibrated probabilities – Naïve bayes – SVM – Tree methods – Boosting – Neural networks
How to calibrate • Calibration is a post processing task • Should not affect the rank of predictions, only numerical probability • In a nutshell 1. Split data into train and test 2. Train ML model 3. Calibrate on test set (3 methods discussed later) ² Final Model to get probabilities composed of ML model and calibration model
Platt’s method • This method fits a sigmod to predicted values
Isotonic Regression • Pricewise liner function assuming monotonically increasing function
Histogram binning Naeini, 2016
Effects of boosting Niculescu-Mizil & Caruana 2005
Comparison of methods Niculescu-Mizil & Caruana 2005
Platt’s method Niculescu-Mizil & Caruana 2005
Isotonic Regression Niculescu-Mizil & Caruana 2005
Visualizing Probabilities • Letter. Recognition dataset – With R found in mlbench library • Predict the letter “Z” – 16 attributes based on pixels Reliability Plot
Applying Isotonic Regression After calibration
Future Work • Research into multi-class calibration methods • Research into non equal-size (or dynamic) histogram binning methods • Research into ML methods that produce wellcalibrated predictions
References • Mahdi Pakdaman Naeini. OBTAINING ACCURATE PROBABILITIES USING CLASSIFIER CALIBRATION. Diss. University of Pittsburgh, 2017. • Alexandru Niculescu-Mizil and Rich Caruana. "Predicting good probabilities with supervised learning. " Proceedings of the 22 nd international conference on Machine learning. ACM, 2005. • Alexandru Niculescu-Mizil and Rich Caruana. "Obtaining Calibrated Probabilities from Boosting. " UAI. 2005.
Part Two: Careers in Data Science
Marketing yourself 1. Networking – Meetups. There a number ongoing in the DC area. Data Science DC, Spark, … • Make business cards to hand out to people you meet • Setup Linkedin account for an online presences – This is where recruiters will look • Post resume to online sites such as: – indeed. com – monster. com • Follow up with recruiters
Tools and Expectations • Knowledge of – Statistics – Machine learning • Tools – – – *SQL Python R Java Scala Spark, an open source library written in Scala for distributed computing • Online courses are a good resource • While a student take electives to build your bag of tools
- Slides: 23