STATISTICAL RELATIONAL LEARNING Joint Work with Sriraam Natarajan

  • Slides: 31
Download presentation
STATISTICAL RELATIONAL LEARNING Joint Work with Sriraam Natarajan, Kristian Kersting, Jude Shavlik

STATISTICAL RELATIONAL LEARNING Joint Work with Sriraam Natarajan, Kristian Kersting, Jude Shavlik

BAYESIAN NETWORKS b e 0. 01 Burglary Earthquake Alarm John. Calls e b a

BAYESIAN NETWORKS b e 0. 01 Burglary Earthquake Alarm John. Calls e b a 0 0 0. 1 0. 8 1 0 0. 6 1 1 0. 9 Mary. Calls

BAYESIAN NETWORK FOR A CITY Burglary Earthquake H 1 Earthquake H 3 Alarm Calls(H

BAYESIAN NETWORK FOR A CITY Burglary Earthquake H 1 Earthquake H 3 Alarm Calls(H 2) Burglary H 2 Calls(H 4) Earthquake Alarm Calls(H 1) Burglary H 4 Earthquake Calls(H 3) Burglary Alarm Calls(H 3) H 5 Earthquake Alarm Calls(H 5) Calls(H 4) Calls(H 6)

SHARED VARIABLES Earthquake(BL) Burglary(H 1) Burglary(H 4) Burglary(H 2) Alarm(H 3) Alarm(H 2) Alarm(H

SHARED VARIABLES Earthquake(BL) Burglary(H 1) Burglary(H 4) Burglary(H 2) Alarm(H 3) Alarm(H 2) Alarm(H 1) Calls(H 1) Burglary(H 3) Calls(H 2) Calls(H 3) Calls(H 4) Alarm(H 4) Calls(H 5)

FIRST ORDER LOGIC Earthquake(city) Burglary(house) House. In. City(house, city) Alarm(house) Neighbor(house, nhouse) Calls(nhouse) Alarm(house)

FIRST ORDER LOGIC Earthquake(city) Burglary(house) House. In. City(house, city) Alarm(house) Neighbor(house, nhouse) Calls(nhouse) Alarm(house) : - House. In. City(house, city), Earthquake(city), Burglary(house) e b a 0 0 0. 1 0. 8 1 0 0. 6 1 1 0. 9

LOGIC + PROBABILITY = STATISTICAL RELATIONAL LEARNING MODELS Logic Add Probabilities Statistical Relational Learning

LOGIC + PROBABILITY = STATISTICAL RELATIONAL LEARNING MODELS Logic Add Probabilities Statistical Relational Learning (SRL) Probabilities Add Relations Diff CRating PRating

ALPHABETIC SOUP � Knowledge-based model construction [Wellman et al. , 1992] � � �

ALPHABETIC SOUP � Knowledge-based model construction [Wellman et al. , 1992] � � � � PRISM [Sato & Kameya 1997] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al. , 1999] Bayesian logic programs [Kersting & De Raedt, 2001] Bayesian logic [Milch et al. , 2005] Markov logic [Richardson & Domingos, 2006] Relational dependency networks [Neville & Jensen 2007] Prob. Log [De Raedt et al. , 2007] And many others!

RELATIONAL DATABASE Prof Level Student Prof Course Student Rating Course Diff IQ Course Satisfaction

RELATIONAL DATABASE Prof Level Student Prof Course Student Rating Course Diff IQ Course Satisfaction Grade

FIRST ORDER LOGIC Prof Level Student Prof(P) Level(P, L) IQ Satisfaction Student(S) IQ(S, I)

FIRST ORDER LOGIC Prof Level Student Prof(P) Level(P, L) IQ Satisfaction Student(S) IQ(S, I) satis(S, B) Prof Course Student Rating taught. By(P, C) Course takes(S, C) grde(S, C, G) ratings(P, C, R) Course Diff Course(C) Diff(C) Grade

GRAPHICAL MODEL grades(S, C, G) Diff(S, C, D) avg. Diff(S, D) avg. Grade(S, G)

GRAPHICAL MODEL grades(S, C, G) Diff(S, C, D) avg. Diff(S, D) avg. Grade(S, G) satisfaction(S, B) P(satisfaction(S, B) | avg. Grade(S, G), avg. Diff(D))

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(X, S), S>120 yes no job(X, politician) yes N no N knows(X, Y) no yes Y job(Y, politician) yes N no Y

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(Alice, 150), 150>120 yes no job(X, politician) yes N no N knows(X, Y) no yes Y job(Y, politician) yes N no Y

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(Alice, 150), 150>120 yes no job(Alice, politician) yes N no N knows(X, Y) no yes Y job(Y, politician) yes N no Y

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(Alice, 150), 150>120 yes no job(Alice, politician) yes N no N knows(Alice, John) no yes Y job(Y, politician) yes N no Y

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(Alice, 150), 150>120 yes no job(Alice, politician) yes N N no knows(Alice, John) no yes job(John, politician) yes N no Y Y

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(Alice, 150), 150>120 yes no job(Alice, politician) yes N N no knows(Alice, John) no yes job(John, politician) yes N no Y Y

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer

RELATIONAL DECISION TREE Name Speed Job Fine Bob 120 Teacher N Alice 150 Writer N John 180 Politician N Mary 160 Student Y Mike 140 Engineer Y Person 1 Person 2 Alice John Mary Mike Mary Alice Bob Mike Bob Mary speed(Alice, 150), 150>120 yes no job(Alice, politician) yes N N no knows(Alice, John) no yes job(John, politician) yes N no Y Y

RELATIONAL PROBABILITY TREES speed(X, S), S>120 Use probabilities on the leaves yes no job(X,

RELATIONAL PROBABILITY TREES speed(X, S), S>120 Use probabilities on the leaves yes no job(X, politician) Can be used to represent the conditional distributions Can use regression values on leaves to represent regression functions yes 0. 1 no knows(X, Y) 0. 2 no yes job(Y, politician) yes 0. 4 0. 8 no 0. 8

STRUCTURE LEARNING PROBLEM Learn the structure of the conditional distributions avg. Grade(S, G) avg.

STRUCTURE LEARNING PROBLEM Learn the structure of the conditional distributions avg. Grade(S, G) avg. Diff(S, D) IQ(S, I) level(P, L) satisfaction(S, B) Find the parents and the distribution for the target concept

RELATIONAL TREE LEARNING student(X) adviser(X) paper(X, Y) X X Δ X Y x 1

RELATIONAL TREE LEARNING student(X) adviser(X) paper(X, Y) X X Δ X Y x 1 0. 7 x 1 y 1 x 2 -0. 2 x 1 y 2 x 3 -0. 9 x 3 y 1 student(X) X Δ x 1 0. 7 x 2 -0. 2 student(X) = T 0. 25 paper(X, Y) = T X Δ x 1 0. 7 student(X) = F 0. 7 -0. 9 X Δ x 3 -0. 9 paper(X, Y) = F -0. 2 X Δ x 2 -0. 2 20

FUNCTIONAL GRADIENT BOOSTING Sequentially learn models where each subsequent model corrects the previous model

FUNCTIONAL GRADIENT BOOSTING Sequentially learn models where each subsequent model corrects the previous model ψm Data Initial Model - = Residues Induce Predictions + Iterate + Final Model = + + Natarajan et al MLJ’ 12 + + …

BOOSTING ALGORITHM For each gradient step m=1 to M For each query predicate, P

BOOSTING ALGORITHM For each gradient step m=1 to M For each query predicate, P For each example, x Generate trainset using previous model, Fm-1 Compute gradient for x Learn a regression function, Tm, p Add <x, gradient(x)> to trainset Add Tm, p to the model, Fm Set Fm as current model

UW-CSE • • Predict advised. By relation • Given student, professor, course. TA, course.

UW-CSE • • Predict advised. By relation • Given student, professor, course. TA, course. Prof, etc relations 5 -fold cross validation UW-CSE AUC-ROC AUC-PR Likelihood Training Time Boosting 0. 96 0. 93 0. 81 9 s RDN 0. 88 0. 78 0. 80 1 s Alchemy 0. 53 0. 62 0. 73 93 hrs http: //pages. cs. wisc. edu/~tushar/rdnboost/index. html

CARDIA Family history, medical history, physical activity, nutrient intake, obesity questions, pysochosocial, pulmonary function

CARDIA Family history, medical history, physical activity, nutrient intake, obesity questions, pysochosocial, pulmonary function etc Goal is to identify risk factors in early adulthood that causes serious cardio-vascular issues in older adults Extremely rich dataset with 25 years of information S. Natarajan , J. Carr

RESULTS

RESULTS

IMITATION LEARNING Expert agent performs actions (trajectories) Goal: Learn a policy from these trajectories

IMITATION LEARNING Expert agent performs actions (trajectories) Goal: Learn a policy from these trajectories to suggest actions based on current state Natarajan et al. IJCAI’ 11

Gridworld domain Robocup domain

Gridworld domain Robocup domain

ALZHEIMER'S RESEARCH AD – Progressive neurodegenerative condition resulting in loss of cognitive abilities and

ALZHEIMER'S RESEARCH AD – Progressive neurodegenerative condition resulting in loss of cognitive abilities and memory MRI – neuroimaging method � Visualization of brain anatomy Humans are not very good at identifying people with AD, especially before cognitive decline MRI data – major source for distinguishing AD vs CN (Cognitively normal) or MCI vs CN Natarajan et al. Under review

PROPOSITIONAL MODELS (WITH AAL)

PROPOSITIONAL MODELS (WITH AAL)

CONCLUSION Statistical Relational Learning combines first-order logic with probabilistic models Relational trees used to

CONCLUSION Statistical Relational Learning combines first-order logic with probabilistic models Relational trees used to represent conditional distributions Boosting trees can be used to efficiently learn structure of SRL models