Third Generation Machine Intelligence Christopher M Bishop Microsoft

Third Generation Machine Intelligence Christopher M. Bishop Microsoft Research, Cambridge Microsoft Research Summer School 2009

First Generation “Artificial Intelligence” (GOFAI) Within a generation. . . the problem of creating ‘artificial intelligence’ will largely be solved Marvin Minsky (1967) Expert Systems – rules devised by humans Combinatorial explosion General theme: hand-crafted rules

Second Generation Neural networks, support vector machines Difficult to incorporate complex domain knowledge General theme: black-box statistical models

Third Generation General theme: deep integration of domain knowledge and statistical learning Probabilistic graphical models – Bayesian framework – fast inference using local message-passing Origins: Bayesian networks, decision theory, HMMs, Kalman filters, MRFs, mean field theory, . . .

Bayesian Learning Consistent use of probability to quantify uncertainty Predictions involve marginalisation, e. g.

Why is prior knowledge important? ? y x

Probabilistic Graphical Models Probability theory + graphs 1. New insights into existing models 2. Framework for designing new models 3. Graph-based algorithms for calculation and computation (c. f. Feynman diagrams in physics) 4. Efficient software implementation Directed graphs to specify the model Factor graphs for inference and learning

Directed Graphs

Example: Time Series Modelling

Manchester Asthma and Allergies Study Chris Bishop Iain Buchan Markus Svensén Vincent Tan John Winn

Factor Graphs

From Directed Graph to Factor Graph

Local message-passing Efficient inference by exploiting factorization:

Factor Trees: Separation y v w f 1(v, w) x f 3(x, y) f 2(w, x) z f 4(x, z)

Messages: From Factors To Variables y w x f 3(x, y) f 2(w, x) z f 4(x, z)

Messages: From Variables To Factors y x f 3(x, y) f 2(w, x) z f 4(x, z)

What if marginalisations are not tractable? True distribution Monte Carlo Variational Bayes Loopy belief propagation Expectation propagation

Illustration: Bayesian Ranking Ralf Herbrich Tom Minka Thore Graepel

Two Player Match Outcome Model s 1 s 2 1 2 y 12

Two Team Match Outcome Model s 1 s 2 s 3 t 1 s 4 t 2 y 12

Multiple Team Match Outcome Model s 1 s 2 t 1 s 3 s 4 t 2 y 12 t 3 y 23

Efficient Approximate Inference Gaussian Prior Factors s 1 s 2 t 1 s 3 s 4 t 2 t 3 Ranking Likelihood Factors y 12 y 23

Convergence 40 35 30 Level 25 20 15 char (True. Skill™) 10 SQLWildman (True. Skill™) char (Elo) 5 SQLWildman (Elo) 0 0 100 200 Number of Games 300 400

True. Skill. TM

John Winn Chris Bishop

research. microsoft. com/infernet Tom Minka John Winn John Guiver Anitha Kannan

Summary New paradigm for machine intelligence built on: – a Bayesian formulation – probabilistic graphical models – fast inference using local message-passing Deep integration of domain knowledge and statistical learning Large-scale application: True. Skill. TM Toolkit: Infer. NET

http: //research. microsoft. com/~cmbishop