Machine Learning Version Spaces Learning Introduction Very many

  • Slides: 61
Download presentation
Machine Learning Version Spaces Learning

Machine Learning Version Spaces Learning

Introduction: ¥ Very many approaches to machine learning: è Neural Net approaches è Symbolic

Introduction: ¥ Very many approaches to machine learning: è Neural Net approaches è Symbolic approaches: u version spaces u decision trees u knowledge discovery u data mining u speed up learning u inductive learning u… 2

Version Spaces A concept learning technique based on refining models of the world.

Version Spaces A concept learning technique based on refining models of the world.

Concept Learning ¥ Example: è a student has the following observations about having an

Concept Learning ¥ Example: è a student has the following observations about having an allergic reaction after meals: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes Alma 3 breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes No No + + - è Concept to learn: under which circumstances do I get an allergic reaction after meals ? ? 4

In general ¥ There is a set of all possible events: è Example: Restaurant

In general ¥ There is a set of all possible events: è Example: Restaurant Meal 3 3 X Day X 7 Cost X 2 = 126 ¥ There is a boolean function (implicitly) defined on this set. è Example: Reaction: Restaurant X Meal X Day X Cost --> Bool ¥ We have the value of this function for SOME examples only. Find an inductive ‘guess’ of the concept, that covers all the examples! 5

Pictured: Given some examples: + + + + - Set of all possible events

Pictured: Given some examples: + + + + - Set of all possible events - - Find a concept that covers all the positive examples and none of the negative ones! 6

Non-determinism ¥ Many different ways to solve this! Set of all possible events +

Non-determinism ¥ Many different ways to solve this! Set of all possible events + + + + - ¥ How to choose ? 7

An obvious, but bad choice: Restaurant Meal Day Cost Alma 3 De Moete Alma

An obvious, but bad choice: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes Alma 3 breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes No No + + - ¥ The concept IS: è Alma 3 and breakfast and Friday and cheap OR è Alma 3 and lunch and Saturday and cheap Does NOT generalize the examples any !! 8

Pictured: ¥ Only the positive examples are positive ! + + + - -

Pictured: ¥ Only the positive examples are positive ! + + + - - Set of all possible events 9

Equally bad is: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes

Equally bad is: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes Alma 3 breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes No No + + - ¥ The concept is anything EXCEPT: è De Moete and lunch and Friday and expensive AND è Sedes and breakfast and Sunday and cheap AND è Alma 3 and breakfast and Sunday and expensive 10

Pictured: ¥ Everything except the negative examples are positive: Set of all possible events

Pictured: ¥ Everything except the negative examples are positive: Set of all possible events + + + + - 11

Solution: fix a language of hypotheses: ¥ We introduce a fix language of concept

Solution: fix a language of hypotheses: ¥ We introduce a fix language of concept descriptions. = hypothesis space ¥ The concept can only be identified as being one of the hypotheses in this language è avoids the problem of having ‘useless’ conclusions è forces some generalization/induction to cover more than just the given examples. 12

Reaction - Example: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes

Reaction - Example: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes Alma 3 breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes No No + + - ¥ Every hypothesis is a 4 -tuple: è most general hypothesis: [ ? , ? , ? ] è maximally specific: ex. : [ Sedes, lunch, Monday, cheap] è combinations of ? and values are allowed: ex. : [ De Moete, ? , expensive] è or [ ? , lunch, ? ] ¥ One more hypothesis: (bottom = denotes no example) 13

Hypotheses relate to sets of possible events Events x 2 Hypotheses x 1 x

Hypotheses relate to sets of possible events Events x 2 Hypotheses x 1 x 1 = < Alma 3, lunch, Monday, expensive> x 2 = < Sedes, lunch, Sunday, cheap> h 1 h 3 General h 2 Specific h 1 = [? , lunch, Monday, ? ] h 2 = [? , lunch, ? , cheap] h 3 = [? , lunch, ? ] 14

Expressive power of this hypothesis language: ¥ Conjunctions of explicit, individual properties ¥ Examples:

Expressive power of this hypothesis language: ¥ Conjunctions of explicit, individual properties ¥ Examples: è [? , lunch, Monday, ? ] : Meal = lunch Day = Monday è [? , lunch, ? , cheap] : Meal = lunch Cost = cheap è [? , lunch, ? ] : Meal = lunch ¥ In addition to the 2 special hypotheses: è [ ? , ? , ? ] : anything è : nothing 15

Other languages of hypotheses are allowed ¥ Example: identify the color of a given

Other languages of hypotheses are allowed ¥ Example: identify the color of a given sequence of colored objects. the examples: red : purple : blue : + + ¥ A useful language of hypotheses: any_color polyo_color mono_color red blue green orange purple 16

Important about hypothesis languages: ¥ They should have a specific <-> general ordering. Corresponding

Important about hypothesis languages: ¥ They should have a specific <-> general ordering. Corresponding to the set-inclusion of the events they cover. 17

Defining Concept Learning: ¥ Given: A set X of possible events : u Ex.

Defining Concept Learning: ¥ Given: A set X of possible events : u Ex. : Eat-events: <Restaurant, Meal, Day, Cost> An (unknown) target function c: X -> {-, +} u Ex. : Reaction: Eat-events -> { -, +} A language of hypotheses H u Ex. : conjunctions: [ ? , lunch, Monday, ? ] A set of training examples D, with their value under c u Ex. : (<Alma 3, breakfast, Friday, cheap>, +) , … ¥ Find: A hypothesis h in H such that for all x in D: x is covered by h c(x) = + 18

The inductive learning hypothesis: If a hypothesis approximates the target function well over a

The inductive learning hypothesis: If a hypothesis approximates the target function well over a sufficiently large number of examples, then the hypothesis will also approximate the target function well on other unobserved examples. 19

Find-S: a naïve algorithm Initialize: h : = For each positive training example x

Find-S: a naïve algorithm Initialize: h : = For each positive training example x in D Do: If h does not cover x: Replace h by a minimal generalization of h that covers x Return h 20

Restaurant Alma 3 De Moete Alma 3 Sedes Alma 3 Reaction example: Meal Day

Restaurant Alma 3 De Moete Alma 3 Sedes Alma 3 Reaction example: Meal Day Cost breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes No No + + - no more positive examples: return h Example 2: Example 1: Initially: h = [Alma 3, ? , cheap] Generalization = replace something by ? h = [Alma 3, breakfast, Friday, cheap] h= minimal generalizations = the individual events 21

Properties of Find-S ¥ Non-deterministic: è Depending on H, there maybe several minimal generalizations:

Properties of Find-S ¥ Non-deterministic: è Depending on H, there maybe several minimal generalizations: H Hacker Beth Scientist Jo Footballplayer Alex Beth can be minimally generalized in 2 ways to include a new example Jo. 22

Properties of Find-S (2) ¥ May pick incorrect hypothesis (w. r. t. the negative

Properties of Find-S (2) ¥ May pick incorrect hypothesis (w. r. t. the negative examples): D : Beth + Alex Jo + H Hacker Beth Scientist Jo Footballplayer Alex Is a wrong solution 23

Properties of Find-S (3) ¥ Cannot detect inconsistency of the training data: D :

Properties of Find-S (3) ¥ Cannot detect inconsistency of the training data: D : Beth + Jo + Beth - ¥ Nor inability of the language H to learn the concept: H D : Beth + Alex Jo + Scientist Beth Jo Alex 24

Nice about Find-S: ¥ It doesn’t have to remember previous examples ! èIf the

Nice about Find-S: ¥ It doesn’t have to remember previous examples ! èIf the previous h already covered all previous examples, then a minimal generalization h’ will too ! X H h’ h If h already covered the 20 first examples, then h’ will as well 25

Dual Find-S: Initialize: h : = [ ? , . . , ? ]

Dual Find-S: Initialize: h : = [ ? , . . , ? ] For each negative training example x in D Do: If h does cover x: Replace h by a minimal specialization of h that does not cover x Return h 26

Restaurant Alma 3 De Moete Alma 3 Sedes Alma 3 Initially: Reaction example: Meal

Restaurant Alma 3 De Moete Alma 3 Sedes Alma 3 Initially: Reaction example: Meal Day Cost breakfast lunch breakfast Friday Saturday Sunday cheap expensive Example 3: Yes + No No - h = [ ? , ? , ? ] Example 1: Example 2: Reaction h= [ ? , breakfast, ? , ? ] h= [ Alma 3, breakfast, ? , cheap] 27

Version Spaces: the idea: ¥ Perform both Find-S and Dual Find-S: è Find-S deals

Version Spaces: the idea: ¥ Perform both Find-S and Dual Find-S: è Find-S deals with positive examples è Dual Find-S deals with negative examples ¥ BUT: do NOT select 1 minimal generalization or specialization at each step, but keep track of ALL minimal generalizations or specializations 28

Version spaces: initialization G = { [ ? , …, ? ] } S

Version spaces: initialization G = { [ ? , …, ? ] } S = { } ¥ The version spaces G and S are initialized to be the smallest and the largest hypotheses only. 29

Negative examples: G = { [ ? , …, ? ] } G =

Negative examples: G = { [ ? , …, ? ] } G = {h 1, h 2, …, hn} S = { } ¥ Replace the top hypothesis by ALL minimal specializations that DO NOT cover the negative example. Invariant: only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example 30

Positive examples: G = {h 1, h 2, …, hn} S = {h 1’,

Positive examples: G = {h 1, h 2, …, hn} S = {h 1’, h 2’, …, hm’ } S = { } ¥ Replace the bottom hypothesis by ALL minimal generalizations that DO cover the positive example. Invariant: only the hypotheses more general than the ones of S are still possible: they do cover the positive example 31

Later: negative examples G = {h 1, h 2, …, hn} G = {h

Later: negative examples G = {h 1, h 2, …, hn} G = {h 1, h 22, …, hn} S = {h 1’, h 2’, …, hm’ } ¥ Replace the all hypotheses in G that cover a next negative example by ALL minimal specializations that DO NOT cover the negative example. Invariant: only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example 32

Later: positive examples G = {h 1, h 22, …, hn} S = {h

Later: positive examples G = {h 1, h 22, …, hn} S = {h 11’, h 12’, h 13’, h 2’, …, hm’ } S = {h 1’, h 2’, …, hm’ } ¥ Replace the all hypotheses in S that do not cover a next positive example by ALL minimal generalizations that DO cover the example. Invariant: only the hypotheses more general than the ones of S are still possible: they do cover the positive example 33

Optimization: negative: G = {h 1, h 21, …, hn} … are more general

Optimization: negative: G = {h 1, h 21, …, hn} … are more general than. . . S = {h 1’, h 2’, …, hm’ } ¥ Only consider specializations of elements in G that are still more general than some specific hypothesis (in S) Invariant: on S used ! 34

Optimization: positive: G = {h 1, h 22, …, hn} … are more specific

Optimization: positive: G = {h 1, h 22, …, hn} … are more specific than. . . S = {h 13’, h 2’, …, hm’ } ¥ Only consider generalizations of elements in S that are still more specific than some general hypothesis (in G) Invariant: on G used ! 35

Pruning: negative examples G = {h 1, h 21, …, hn} Cover the last

Pruning: negative examples G = {h 1, h 21, …, hn} Cover the last negative example! S = {h 1’, h 3’, …, hm’ } ¥ The new negative example can also be used to prune all the S - hypotheses that cover the negative example. Invariant only works for the previous examples, not the last one 36

Pruning: positive examples G = {h 1, h 21, …, hn} Don’t cover the

Pruning: positive examples G = {h 1, h 21, …, hn} Don’t cover the last positive example S = {h 1’, h 3’, …, hm’ } ¥ The new positive example can also be used to prune all the G - hypotheses that do not cover the positive example. 37

Eliminate redundant hypotheses Obviously also for S ! G = {h 1, h 21,

Eliminate redundant hypotheses Obviously also for S ! G = {h 1, h 21, …, hn} More specific than another general hypothesis S = {h 1’, h 3’, …, hm’ } ¥ If a hypothesis from G is more specific than another hypothesis from G: eliminate it ! Reason: Invariant acts as a wave front: anything above G is not allowed. The most general elements of G define the real boundary 38

Convergence: G = {h 1, h 22, …, hn} S = {h 13’, h

Convergence: G = {h 1, h 22, …, hn} S = {h 13’, h 2’, …, hm’ } ¥ Eventually, if G and S MAY get a common element: Version Spaces has converged to a solution. ¥ Remaining examples need to be verified for the solution. 39

Reaction example ¥ Initialization: [ ? , ? , ? ] Most general Most

Reaction example ¥ Initialization: [ ? , ? , ? ] Most general Most specific 40

Alma 3, breakfast, Friday, cheap: + ¥ Positive example: minimal generalization of [ ?

Alma 3, breakfast, Friday, cheap: + ¥ Positive example: minimal generalization of [ ? , ? , ? ] [Alma 3, breakfast, Friday, cheap] 41

De. Moete, lunch, Friday, expensive: - ¥ Negative example: minimal specialization of [ ?

De. Moete, lunch, Friday, expensive: - ¥ Negative example: minimal specialization of [ ? , ? , ? ] ¥ 15 possible specializations !! [Alma 3, ? , ? ] Matches the negative example [De. Moete, ? , ? ] X [Sedes, ? , ? ] X Does not generalize the specific model [? , breakfast, ? ] X [? , lunch, ? ] [? , dinner, ? ] X [? , Monday, ? ] X X [? , Tuesday, ? ] Remain ! [? , Wednesday, ? ] X [? , Thursday, ? ] X [? , Friday, ? ] X Specific model: [? , Saturday, ? ] X [Alma 3, breakfast, Friday, cheap] X [? , Sunday, ? ] [? , ? , cheap] [? , ? , expensive] X 42

Result after example 2: [ ? , ? , ? ] [Alma 3, ?

Result after example 2: [ ? , ? , ? ] [Alma 3, ? , ? ] [? , breakfast, ? ] [? , ? , cheap] [Alma 3, breakfast, Friday, cheap] 43

Alma 3, lunch, Saturday, cheap: + ¥ Positive example: minimal generalization of [Alma 3,

Alma 3, lunch, Saturday, cheap: + ¥ Positive example: minimal generalization of [Alma 3, breakfast, Friday, cheap] [Alma 3, ? , ? ] [? , breakfast, ? ] [? , ? , cheap] does not match new example [Alma 3, ? , cheap] [Alma 3, breakfast, Friday, cheap] 44

Sedes, breakfast, Sunday, cheap: - ¥ Negative example: minimal specialization of the general models:

Sedes, breakfast, Sunday, cheap: - ¥ Negative example: minimal specialization of the general models: [Alma 3, ? , ? ] [? , ? , cheap] [Alma 3, ? , cheap] The only specialization that is introduced is pruned, because it is more specific than another general hypothesis [Alma 3, ? , cheap] 45

Alma 3, breakfast, Sunday, expensive: - ¥ Negative example: minimal specialization of [Alma 3,

Alma 3, breakfast, Sunday, expensive: - ¥ Negative example: minimal specialization of [Alma 3, ? , ? , ? ] [Alma 3, ? , cheap] Same hypothesis !!! Cheap food at Alma 3 produces the allergy ! [Alma 3, ? , cheap] 46

Version Space Algorithm: Initially: G : = { the hypothesis that covers everything} S

Version Space Algorithm: Initially: G : = { the hypothesis that covers everything} S : = { } For each new positive example: Generalize all hypotheses in S that do not cover the example yet, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new specific hypothesis is a specialization of some general hypothesis. - No new specific hypothesis is a generalization of some other specific hypothesis. Prune away all hypotheses in G that do not cover the example. 47

Version Space Algorithm (2): . . . For each new negative example: Specialize all

Version Space Algorithm (2): . . . For each new negative example: Specialize all hypotheses in G that cover the example, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new general hypothesis is a generalization of some specific hypothesis. - No new general hypothesis is a specialization of some other general hypothesis. Prune away all hypotheses in S that cover the example. Until there are no more examples: report S and G OR S or G become empty: report failure 48

Properties of VS: ¥ Symmetry: è positive and negative examples are dealt with in

Properties of VS: ¥ Symmetry: è positive and negative examples are dealt with in a completely dual way. ¥ Does not need to remember previous examples. ¥ Noise: è VS cannot deal with noise ! u If a positive example is given to be negative then VS eliminates the desired hypothesis from the Version Space G ! 49

Termination: ¥ If it terminates because of “no more examples”: èExample (spaces on termination):

Termination: ¥ If it terminates because of “no more examples”: èExample (spaces on termination): [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] Then all these hypotheses, and all intermediate hypotheses, are still correct descriptions covering the test data. VS makes NO unnecessary choices ! 50

Termination (2): ¥ If it terminates because of S or G being empty: èThen

Termination (2): ¥ If it terminates because of S or G being empty: èThen either: u The data is inconsistent (noise? ) u The target concept cannot be represented in the hypothesis-language H. ¥ Example: è target concept is: [Alma 3, breakfast, ? , cheap] [Alma 3, lunch, ? , cheap] è given examples like: <Alma 3, dinner, Sunday, cheap> <Alma 3, breakfast, Sunday, cheap> + <Alma 3, lunch, Sunday, cheap> + this cannot be learned in our language H. 51

Which example next? ¥ VS can decide itself which example would be most useful

Which example next? ¥ VS can decide itself which example would be most useful next. è It can ‘query’ a user for the most relevant additional classification ! ¥ Example: [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] <Alma 3, lunch, Monday, expensive> classified negative by 3 hypotheses classified positive by 3 hypotheses is the most informative new example 52

Use of partially learned concepts: [Alma 3, ? , ? ] [Alma 3, ?

Use of partially learned concepts: [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] ¥ Example: <Alma 3, lunch, Monday, cheap> can be classified as positive is covered by all remaining hypotheses ! it is enough to check that it is covered by the hypotheses in S ! (all others generalize these) 53

Use of partially learned concepts (2): [Alma 3, ? , ? ] [Alma 3,

Use of partially learned concepts (2): [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] ¥ Example: <Sedes, lunch, Sunday, cheap> can be classified as negative is not covered by any remaining hypothesis ! it is enough to check that it is not covered by any hypothesis in G ! (all others specialize these) 54

Use of partially learned concepts (3): [Alma 3, ? , ? ] [Alma 3,

Use of partially learned concepts (3): [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] ¥ Example: <Alma 3, lunch, Monday, expensive> can not be classified is covered by 3, not covered 3 hypotheses no conclusion 55

Use of partially learned concepts (4): [Alma 3, ? , ? ] [Alma 3,

Use of partially learned concepts (4): [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] ¥ Example: <Sedes, lunch, Monday, expensive> can only be classified with a certain degree of precision is covered by 1, not covered 5 hypotheses probably does not belong to the concept : ratio : 1/6 56

The relevance of inductive BIAS: choosing H ¥ Our hypothesis language L fails to

The relevance of inductive BIAS: choosing H ¥ Our hypothesis language L fails to learn some concepts. èSee example : [Alma 3, breakfast, ? , cheap] [Alma 3, lunch, ? , cheap] ¥ What about choosing a more expressive language H? ¥ Assume H’: è allows conjunctions (as before) è allows disjunction and negation too ! u. Example: – Restaurant = Alma 3 ~(Day = Monday) 57

Inductive BIAS (2): ¥ This language H’ allows to represent ANY subset of the

Inductive BIAS (2): ¥ This language H’ allows to represent ANY subset of the complete set of all events X ¥ But X has 126 elements è we can express 2126 different hypotheses now ! Restaurant Meal 3 3 X Day X 7 Cost X 2 = 126 58

Inductive BIAS (3) ¥ Version Spaces using H’: Restaurant Meal Day Cost Alma 3

Inductive BIAS (3) ¥ Version Spaces using H’: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes Alma 3 breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes + No No - ? {~[De. Moete. Lunch, Friday, expensive]} {~[De. Moete. Lunch, Friday, expensive] ~[Sedes, breakfast, Sunday, cheap] ~[Alma 3, breakfast, Sunday, expensive]} {[Alma 3, breakfast, Friday, cheap] [Alma 3, lunch, Saturday, cheap]} {[Alma 3, breakfast, Friday, cheap]} 59

Inductive BIAS (3) ¥ Resulting Version Spaces: G = {~[De. Moete. Lunch, Friday, expensive]

Inductive BIAS (3) ¥ Resulting Version Spaces: G = {~[De. Moete. Lunch, Friday, expensive] ~[Sedes, breakfast, Sunday, cheap] ~[Alma 3, breakfast, Sunday, expensive]} S = {[Alma 3, breakfast, Friday, cheap] [Alma 3, lunch, Saturday, cheap]} ¥ We haven’t learned anything è merely restated our positive and negative examples ! ¥ In general: in order to be able to learn, we need an inductive BIAS (= assumption): è Example: “ The desired concept CAN be described as a conjunction of features “ 60

Shift of Bias ¥ Practical approach to the Bias problem: Start VS with a

Shift of Bias ¥ Practical approach to the Bias problem: Start VS with a very weak hypothesis language. If the concept is learned: o. k. Else Refine your language and restart VS ¥ Avoids the choice. ¥ Gives the most general concept that can be learned. 61