STATISTICAL RELATIONAL LEARNING Joint Work with Sriraam Natarajan

BAYESIAN NETWORKS b e 0. 01 Burglary Earthquake Alarm John. Calls e b a

BAYESIAN NETWORK FOR A CITY Burglary Earthquake H 1 Earthquake H 3 Alarm Calls(H

SHARED VARIABLES Earthquake(BL) Burglary(H 1) Burglary(H 4) Burglary(H 2) Alarm(H 3) Alarm(H 2) Alarm(H

FIRST ORDER LOGIC Earthquake(city) Burglary(house) House. In. City(house, city) Alarm(house) Neighbor(house, nhouse) Calls(nhouse) Alarm(house)

LOGIC + PROBABILITY = STATISTICAL RELATIONAL LEARNING MODELS Logic Add Probabilities Statistical Relational Learning

ALPHABETIC SOUP � Knowledge-based model construction [Wellman et al. , 1992] � � �

RELATIONAL DATABASE Prof Level Student Prof Course Student Rating Course Diff IQ Course Satisfaction

FIRST ORDER LOGIC Prof Level Student Prof(P) Level(P, L) IQ Satisfaction Student(S) IQ(S, I)

GRAPHICAL MODEL grades(S, C, G) Diff(S, C, D) avg. Diff(S, D) avg. Grade(S, G)

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer

RELATIONAL PROBABILITY TREES speed(X, S), S>120 Use probabilities on the leaves yes no job(X,

STRUCTURE LEARNING PROBLEM Learn the structure of the conditional distributions avg. Grade(S, G) avg.

RELATIONAL TREE LEARNING student(X) adviser(X) paper(X, Y) X X Δ X Y x 1

FUNCTIONAL GRADIENT BOOSTING Sequentially learn models where each subsequent model corrects the previous model

BOOSTING ALGORITHM For each gradient step m=1 to M For each query predicate, P

UW-CSE • • Predict advised. By relation • Given student, professor, course. TA, course.

CARDIA Family history, medical history, physical activity, nutrient intake, obesity questions, pysochosocial, pulmonary function

IMITATION LEARNING Expert agent performs actions (trajectories) Goal: Learn a policy from these trajectories

ALZHEIMER'S RESEARCH AD – Progressive neurodegenerative condition resulting in loss of cognitive abilities and

CONCLUSION Statistical Relational Learning combines first-order logic with probabilistic models Relational trees used to

Slides: 31

Download presentation

STATISTICAL RELATIONAL LEARNING Joint Work with Sriraam Natarajan, Kristian Kersting, Jude Shavlik

BAYESIAN NETWORKS b e 0. 01 Burglary Earthquake Alarm John. Calls e b a 0 0 0. 1 0. 8 1 0 0. 6 1 1 0. 9 Mary. Calls

BAYESIAN NETWORK FOR A CITY Burglary Earthquake H 1 Earthquake H 3 Alarm Calls(H 2) Burglary H 2 Calls(H 4) Earthquake Alarm Calls(H 1) Burglary H 4 Earthquake Calls(H 3) Burglary Alarm Calls(H 3) H 5 Earthquake Alarm Calls(H 5) Calls(H 4) Calls(H 6)

SHARED VARIABLES Earthquake(BL) Burglary(H 1) Burglary(H 4) Burglary(H 2) Alarm(H 3) Alarm(H 2) Alarm(H 1) Calls(H 1) Burglary(H 3) Calls(H 2) Calls(H 3) Calls(H 4) Alarm(H 4) Calls(H 5)

FIRST ORDER LOGIC Earthquake(city) Burglary(house) House. In. City(house, city) Alarm(house) Neighbor(house, nhouse) Calls(nhouse) Alarm(house) : - House. In. City(house, city), Earthquake(city), Burglary(house) e b a 0 0 0. 1 0. 8 1 0 0. 6 1 1 0. 9

LOGIC + PROBABILITY = STATISTICAL RELATIONAL LEARNING MODELS Logic Add Probabilities Statistical Relational Learning (SRL) Probabilities Add Relations Diff CRating PRating

ALPHABETIC SOUP � Knowledge-based model construction [Wellman et al. , 1992] � � � � PRISM [Sato & Kameya 1997] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al. , 1999] Bayesian logic programs [Kersting & De Raedt, 2001] Bayesian logic [Milch et al. , 2005] Markov logic [Richardson & Domingos, 2006] Relational dependency networks [Neville & Jensen 2007] Prob. Log [De Raedt et al. , 2007] And many others!

RELATIONAL DATABASE Prof Level Student Prof Course Student Rating Course Diff IQ Course Satisfaction Grade

FIRST ORDER LOGIC Prof Level Student Prof(P) Level(P, L) IQ Satisfaction Student(S) IQ(S, I) satis(S, B) Prof Course Student Rating taught. By(P, C) Course takes(S, C) grde(S, C, G) ratings(P, C, R) Course Diff Course(C) Diff(C) Grade

GRAPHICAL MODEL grades(S, C, G) Diff(S, C, D) avg. Diff(S, D) avg. Grade(S, G) satisfaction(S, B) P(satisfaction(S, B) | avg. Grade(S, G), avg. Diff(D))

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(X, S), S>120 yes no job(X, politician) yes N no N knows(X, Y) no yes Y job(Y, politician) yes N no Y

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(Alice, 150), 150>120 yes no job(X, politician) yes N no N knows(X, Y) no yes Y job(Y, politician) yes N no Y

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(Alice, 150), 150>120 yes no job(Alice, politician) yes N no N knows(X, Y) no yes Y job(Y, politician) yes N no Y

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(Alice, 150), 150>120 yes no job(Alice, politician) yes N no N knows(Alice, John) no yes Y job(Y, politician) yes N no Y

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(Alice, 150), 150>120 yes no job(Alice, politician) yes N N no knows(Alice, John) no yes job(John, politician) yes N no Y Y

RELATIONAL PROBABILITY TREES speed(X, S), S>120 Use probabilities on the leaves yes no job(X, politician) Can be used to represent the conditional distributions Can use regression values on leaves to represent regression functions yes 0. 1 no knows(X, Y) 0. 2 no yes job(Y, politician) yes 0. 4 0. 8 no 0. 8

STRUCTURE LEARNING PROBLEM Learn the structure of the conditional distributions avg. Grade(S, G) avg. Diff(S, D) IQ(S, I) level(P, L) satisfaction(S, B) Find the parents and the distribution for the target concept

RELATIONAL TREE LEARNING student(X) adviser(X) paper(X, Y) X X Δ X Y x 1 0. 7 x 1 y 1 x 2 -0. 2 x 1 y 2 x 3 -0. 9 x 3 y 1 student(X) X Δ x 1 0. 7 x 2 -0. 2 student(X) = T 0. 25 paper(X, Y) = T X Δ x 1 0. 7 student(X) = F 0. 7 -0. 9 X Δ x 3 -0. 9 paper(X, Y) = F -0. 2 X Δ x 2 -0. 2 20

FUNCTIONAL GRADIENT BOOSTING Sequentially learn models where each subsequent model corrects the previous model ψm Data Initial Model - = Residues Induce Predictions + Iterate + Final Model = + + Natarajan et al MLJ’ 12 + + …

BOOSTING ALGORITHM For each gradient step m=1 to M For each query predicate, P For each example, x Generate trainset using previous model, Fm-1 Compute gradient for x Learn a regression function, Tm, p Add <x, gradient(x)> to trainset Add Tm, p to the model, Fm Set Fm as current model

UW-CSE • • Predict advised. By relation • Given student, professor, course. TA, course. Prof, etc relations 5 -fold cross validation UW-CSE AUC-ROC AUC-PR Likelihood Training Time Boosting 0. 96 0. 93 0. 81 9 s RDN 0. 88 0. 78 0. 80 1 s Alchemy 0. 53 0. 62 0. 73 93 hrs http: //pages. cs. wisc. edu/~tushar/rdnboost/index. html

CARDIA Family history, medical history, physical activity, nutrient intake, obesity questions, pysochosocial, pulmonary function etc Goal is to identify risk factors in early adulthood that causes serious cardio-vascular issues in older adults Extremely rich dataset with 25 years of information S. Natarajan , J. Carr

RESULTS

IMITATION LEARNING Expert agent performs actions (trajectories) Goal: Learn a policy from these trajectories to suggest actions based on current state Natarajan et al. IJCAI’ 11

Gridworld domain Robocup domain

ALZHEIMER'S RESEARCH AD – Progressive neurodegenerative condition resulting in loss of cognitive abilities and memory MRI – neuroimaging method � Visualization of brain anatomy Humans are not very good at identifying people with AD, especially before cognitive decline MRI data – major source for distinguishing AD vs CN (Cognitively normal) or MCI vs CN Natarajan et al. Under review

PROPOSITIONAL MODELS (WITH AAL)

CONCLUSION Statistical Relational Learning combines first-order logic with probabilistic models Relational trees used to represent conditional distributions Boosting trees can be used to efficiently learn structure of SRL models