Machine Learning Version Spaces Learning Introduction Very many
- Slides: 61
Machine Learning Version Spaces Learning
Introduction: ¥ Very many approaches to machine learning: è Neural Net approaches è Symbolic approaches: u version spaces u decision trees u knowledge discovery u data mining u speed up learning u inductive learning u… 2
Version Spaces A concept learning technique based on refining models of the world.
Concept Learning ¥ Example: è a student has the following observations about having an allergic reaction after meals: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes Alma 3 breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes No No + + - è Concept to learn: under which circumstances do I get an allergic reaction after meals ? ? 4
In general ¥ There is a set of all possible events: è Example: Restaurant Meal 3 3 X Day X 7 Cost X 2 = 126 ¥ There is a boolean function (implicitly) defined on this set. è Example: Reaction: Restaurant X Meal X Day X Cost --> Bool ¥ We have the value of this function for SOME examples only. Find an inductive ‘guess’ of the concept, that covers all the examples! 5
Pictured: Given some examples: + + + + - Set of all possible events - - Find a concept that covers all the positive examples and none of the negative ones! 6
Non-determinism ¥ Many different ways to solve this! Set of all possible events + + + + - ¥ How to choose ? 7
An obvious, but bad choice: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes Alma 3 breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes No No + + - ¥ The concept IS: è Alma 3 and breakfast and Friday and cheap OR è Alma 3 and lunch and Saturday and cheap Does NOT generalize the examples any !! 8
Pictured: ¥ Only the positive examples are positive ! + + + - - Set of all possible events 9
Equally bad is: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes Alma 3 breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes No No + + - ¥ The concept is anything EXCEPT: è De Moete and lunch and Friday and expensive AND è Sedes and breakfast and Sunday and cheap AND è Alma 3 and breakfast and Sunday and expensive 10
Pictured: ¥ Everything except the negative examples are positive: Set of all possible events + + + + - 11
Solution: fix a language of hypotheses: ¥ We introduce a fix language of concept descriptions. = hypothesis space ¥ The concept can only be identified as being one of the hypotheses in this language è avoids the problem of having ‘useless’ conclusions è forces some generalization/induction to cover more than just the given examples. 12
Reaction - Example: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes Alma 3 breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes No No + + - ¥ Every hypothesis is a 4 -tuple: è most general hypothesis: [ ? , ? , ? ] è maximally specific: ex. : [ Sedes, lunch, Monday, cheap] è combinations of ? and values are allowed: ex. : [ De Moete, ? , expensive] è or [ ? , lunch, ? ] ¥ One more hypothesis: (bottom = denotes no example) 13
Hypotheses relate to sets of possible events Events x 2 Hypotheses x 1 x 1 = < Alma 3, lunch, Monday, expensive> x 2 = < Sedes, lunch, Sunday, cheap> h 1 h 3 General h 2 Specific h 1 = [? , lunch, Monday, ? ] h 2 = [? , lunch, ? , cheap] h 3 = [? , lunch, ? ] 14
Expressive power of this hypothesis language: ¥ Conjunctions of explicit, individual properties ¥ Examples: è [? , lunch, Monday, ? ] : Meal = lunch Day = Monday è [? , lunch, ? , cheap] : Meal = lunch Cost = cheap è [? , lunch, ? ] : Meal = lunch ¥ In addition to the 2 special hypotheses: è [ ? , ? , ? ] : anything è : nothing 15
Other languages of hypotheses are allowed ¥ Example: identify the color of a given sequence of colored objects. the examples: red : purple : blue : + + ¥ A useful language of hypotheses: any_color polyo_color mono_color red blue green orange purple 16
Important about hypothesis languages: ¥ They should have a specific <-> general ordering. Corresponding to the set-inclusion of the events they cover. 17
Defining Concept Learning: ¥ Given: A set X of possible events : u Ex. : Eat-events: <Restaurant, Meal, Day, Cost> An (unknown) target function c: X -> {-, +} u Ex. : Reaction: Eat-events -> { -, +} A language of hypotheses H u Ex. : conjunctions: [ ? , lunch, Monday, ? ] A set of training examples D, with their value under c u Ex. : (<Alma 3, breakfast, Friday, cheap>, +) , … ¥ Find: A hypothesis h in H such that for all x in D: x is covered by h c(x) = + 18
The inductive learning hypothesis: If a hypothesis approximates the target function well over a sufficiently large number of examples, then the hypothesis will also approximate the target function well on other unobserved examples. 19
Find-S: a naïve algorithm Initialize: h : = For each positive training example x in D Do: If h does not cover x: Replace h by a minimal generalization of h that covers x Return h 20
Restaurant Alma 3 De Moete Alma 3 Sedes Alma 3 Reaction example: Meal Day Cost breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes No No + + - no more positive examples: return h Example 2: Example 1: Initially: h = [Alma 3, ? , cheap] Generalization = replace something by ? h = [Alma 3, breakfast, Friday, cheap] h= minimal generalizations = the individual events 21
Properties of Find-S ¥ Non-deterministic: è Depending on H, there maybe several minimal generalizations: H Hacker Beth Scientist Jo Footballplayer Alex Beth can be minimally generalized in 2 ways to include a new example Jo. 22
Properties of Find-S (2) ¥ May pick incorrect hypothesis (w. r. t. the negative examples): D : Beth + Alex Jo + H Hacker Beth Scientist Jo Footballplayer Alex Is a wrong solution 23
Properties of Find-S (3) ¥ Cannot detect inconsistency of the training data: D : Beth + Jo + Beth - ¥ Nor inability of the language H to learn the concept: H D : Beth + Alex Jo + Scientist Beth Jo Alex 24
Nice about Find-S: ¥ It doesn’t have to remember previous examples ! èIf the previous h already covered all previous examples, then a minimal generalization h’ will too ! X H h’ h If h already covered the 20 first examples, then h’ will as well 25
Dual Find-S: Initialize: h : = [ ? , . . , ? ] For each negative training example x in D Do: If h does cover x: Replace h by a minimal specialization of h that does not cover x Return h 26
Restaurant Alma 3 De Moete Alma 3 Sedes Alma 3 Initially: Reaction example: Meal Day Cost breakfast lunch breakfast Friday Saturday Sunday cheap expensive Example 3: Yes + No No - h = [ ? , ? , ? ] Example 1: Example 2: Reaction h= [ ? , breakfast, ? , ? ] h= [ Alma 3, breakfast, ? , cheap] 27
Version Spaces: the idea: ¥ Perform both Find-S and Dual Find-S: è Find-S deals with positive examples è Dual Find-S deals with negative examples ¥ BUT: do NOT select 1 minimal generalization or specialization at each step, but keep track of ALL minimal generalizations or specializations 28
Version spaces: initialization G = { [ ? , …, ? ] } S = { } ¥ The version spaces G and S are initialized to be the smallest and the largest hypotheses only. 29
Negative examples: G = { [ ? , …, ? ] } G = {h 1, h 2, …, hn} S = { } ¥ Replace the top hypothesis by ALL minimal specializations that DO NOT cover the negative example. Invariant: only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example 30
Positive examples: G = {h 1, h 2, …, hn} S = {h 1’, h 2’, …, hm’ } S = { } ¥ Replace the bottom hypothesis by ALL minimal generalizations that DO cover the positive example. Invariant: only the hypotheses more general than the ones of S are still possible: they do cover the positive example 31
Later: negative examples G = {h 1, h 2, …, hn} G = {h 1, h 22, …, hn} S = {h 1’, h 2’, …, hm’ } ¥ Replace the all hypotheses in G that cover a next negative example by ALL minimal specializations that DO NOT cover the negative example. Invariant: only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example 32
Later: positive examples G = {h 1, h 22, …, hn} S = {h 11’, h 12’, h 13’, h 2’, …, hm’ } S = {h 1’, h 2’, …, hm’ } ¥ Replace the all hypotheses in S that do not cover a next positive example by ALL minimal generalizations that DO cover the example. Invariant: only the hypotheses more general than the ones of S are still possible: they do cover the positive example 33
Optimization: negative: G = {h 1, h 21, …, hn} … are more general than. . . S = {h 1’, h 2’, …, hm’ } ¥ Only consider specializations of elements in G that are still more general than some specific hypothesis (in S) Invariant: on S used ! 34
Optimization: positive: G = {h 1, h 22, …, hn} … are more specific than. . . S = {h 13’, h 2’, …, hm’ } ¥ Only consider generalizations of elements in S that are still more specific than some general hypothesis (in G) Invariant: on G used ! 35
Pruning: negative examples G = {h 1, h 21, …, hn} Cover the last negative example! S = {h 1’, h 3’, …, hm’ } ¥ The new negative example can also be used to prune all the S - hypotheses that cover the negative example. Invariant only works for the previous examples, not the last one 36
Pruning: positive examples G = {h 1, h 21, …, hn} Don’t cover the last positive example S = {h 1’, h 3’, …, hm’ } ¥ The new positive example can also be used to prune all the G - hypotheses that do not cover the positive example. 37
Eliminate redundant hypotheses Obviously also for S ! G = {h 1, h 21, …, hn} More specific than another general hypothesis S = {h 1’, h 3’, …, hm’ } ¥ If a hypothesis from G is more specific than another hypothesis from G: eliminate it ! Reason: Invariant acts as a wave front: anything above G is not allowed. The most general elements of G define the real boundary 38
Convergence: G = {h 1, h 22, …, hn} S = {h 13’, h 2’, …, hm’ } ¥ Eventually, if G and S MAY get a common element: Version Spaces has converged to a solution. ¥ Remaining examples need to be verified for the solution. 39
Reaction example ¥ Initialization: [ ? , ? , ? ] Most general Most specific 40
Alma 3, breakfast, Friday, cheap: + ¥ Positive example: minimal generalization of [ ? , ? , ? ] [Alma 3, breakfast, Friday, cheap] 41
De. Moete, lunch, Friday, expensive: - ¥ Negative example: minimal specialization of [ ? , ? , ? ] ¥ 15 possible specializations !! [Alma 3, ? , ? ] Matches the negative example [De. Moete, ? , ? ] X [Sedes, ? , ? ] X Does not generalize the specific model [? , breakfast, ? ] X [? , lunch, ? ] [? , dinner, ? ] X [? , Monday, ? ] X X [? , Tuesday, ? ] Remain ! [? , Wednesday, ? ] X [? , Thursday, ? ] X [? , Friday, ? ] X Specific model: [? , Saturday, ? ] X [Alma 3, breakfast, Friday, cheap] X [? , Sunday, ? ] [? , ? , cheap] [? , ? , expensive] X 42
Result after example 2: [ ? , ? , ? ] [Alma 3, ? , ? ] [? , breakfast, ? ] [? , ? , cheap] [Alma 3, breakfast, Friday, cheap] 43
Alma 3, lunch, Saturday, cheap: + ¥ Positive example: minimal generalization of [Alma 3, breakfast, Friday, cheap] [Alma 3, ? , ? ] [? , breakfast, ? ] [? , ? , cheap] does not match new example [Alma 3, ? , cheap] [Alma 3, breakfast, Friday, cheap] 44
Sedes, breakfast, Sunday, cheap: - ¥ Negative example: minimal specialization of the general models: [Alma 3, ? , ? ] [? , ? , cheap] [Alma 3, ? , cheap] The only specialization that is introduced is pruned, because it is more specific than another general hypothesis [Alma 3, ? , cheap] 45
Alma 3, breakfast, Sunday, expensive: - ¥ Negative example: minimal specialization of [Alma 3, ? , ? , ? ] [Alma 3, ? , cheap] Same hypothesis !!! Cheap food at Alma 3 produces the allergy ! [Alma 3, ? , cheap] 46
Version Space Algorithm: Initially: G : = { the hypothesis that covers everything} S : = { } For each new positive example: Generalize all hypotheses in S that do not cover the example yet, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new specific hypothesis is a specialization of some general hypothesis. - No new specific hypothesis is a generalization of some other specific hypothesis. Prune away all hypotheses in G that do not cover the example. 47
Version Space Algorithm (2): . . . For each new negative example: Specialize all hypotheses in G that cover the example, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new general hypothesis is a generalization of some specific hypothesis. - No new general hypothesis is a specialization of some other general hypothesis. Prune away all hypotheses in S that cover the example. Until there are no more examples: report S and G OR S or G become empty: report failure 48
Properties of VS: ¥ Symmetry: è positive and negative examples are dealt with in a completely dual way. ¥ Does not need to remember previous examples. ¥ Noise: è VS cannot deal with noise ! u If a positive example is given to be negative then VS eliminates the desired hypothesis from the Version Space G ! 49
Termination: ¥ If it terminates because of “no more examples”: èExample (spaces on termination): [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] Then all these hypotheses, and all intermediate hypotheses, are still correct descriptions covering the test data. VS makes NO unnecessary choices ! 50
Termination (2): ¥ If it terminates because of S or G being empty: èThen either: u The data is inconsistent (noise? ) u The target concept cannot be represented in the hypothesis-language H. ¥ Example: è target concept is: [Alma 3, breakfast, ? , cheap] [Alma 3, lunch, ? , cheap] è given examples like: <Alma 3, dinner, Sunday, cheap> <Alma 3, breakfast, Sunday, cheap> + <Alma 3, lunch, Sunday, cheap> + this cannot be learned in our language H. 51
Which example next? ¥ VS can decide itself which example would be most useful next. è It can ‘query’ a user for the most relevant additional classification ! ¥ Example: [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] <Alma 3, lunch, Monday, expensive> classified negative by 3 hypotheses classified positive by 3 hypotheses is the most informative new example 52
Use of partially learned concepts: [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] ¥ Example: <Alma 3, lunch, Monday, cheap> can be classified as positive is covered by all remaining hypotheses ! it is enough to check that it is covered by the hypotheses in S ! (all others generalize these) 53
Use of partially learned concepts (2): [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] ¥ Example: <Sedes, lunch, Sunday, cheap> can be classified as negative is not covered by any remaining hypothesis ! it is enough to check that it is not covered by any hypothesis in G ! (all others specialize these) 54
Use of partially learned concepts (3): [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] ¥ Example: <Alma 3, lunch, Monday, expensive> can not be classified is covered by 3, not covered 3 hypotheses no conclusion 55
Use of partially learned concepts (4): [Alma 3, ? , ? ] [Alma 3, ? , cheap] [? , Monday, ? ] [Alma 3, ? , Monday, ? ] [? , Monday, cheap] [Alma 3, ? , Monday, cheap] ¥ Example: <Sedes, lunch, Monday, expensive> can only be classified with a certain degree of precision is covered by 1, not covered 5 hypotheses probably does not belong to the concept : ratio : 1/6 56
The relevance of inductive BIAS: choosing H ¥ Our hypothesis language L fails to learn some concepts. èSee example : [Alma 3, breakfast, ? , cheap] [Alma 3, lunch, ? , cheap] ¥ What about choosing a more expressive language H? ¥ Assume H’: è allows conjunctions (as before) è allows disjunction and negation too ! u. Example: – Restaurant = Alma 3 ~(Day = Monday) 57
Inductive BIAS (2): ¥ This language H’ allows to represent ANY subset of the complete set of all events X ¥ But X has 126 elements è we can express 2126 different hypotheses now ! Restaurant Meal 3 3 X Day X 7 Cost X 2 = 126 58
Inductive BIAS (3) ¥ Version Spaces using H’: Restaurant Meal Day Cost Alma 3 De Moete Alma 3 Sedes Alma 3 breakfast lunch breakfast Friday Saturday Sunday cheap expensive Reaction Yes + No No - ? {~[De. Moete. Lunch, Friday, expensive]} {~[De. Moete. Lunch, Friday, expensive] ~[Sedes, breakfast, Sunday, cheap] ~[Alma 3, breakfast, Sunday, expensive]} {[Alma 3, breakfast, Friday, cheap] [Alma 3, lunch, Saturday, cheap]} {[Alma 3, breakfast, Friday, cheap]} 59
Inductive BIAS (3) ¥ Resulting Version Spaces: G = {~[De. Moete. Lunch, Friday, expensive] ~[Sedes, breakfast, Sunday, cheap] ~[Alma 3, breakfast, Sunday, expensive]} S = {[Alma 3, breakfast, Friday, cheap] [Alma 3, lunch, Saturday, cheap]} ¥ We haven’t learned anything è merely restated our positive and negative examples ! ¥ In general: in order to be able to learn, we need an inductive BIAS (= assumption): è Example: “ The desired concept CAN be described as a conjunction of features “ 60
Shift of Bias ¥ Practical approach to the Bias problem: Start VS with a very weak hypothesis language. If the concept is learned: o. k. Else Refine your language and restart VS ¥ Avoids the choice. ¥ Gives the most general concept that can be learned. 61
- Figure 10
- Scientific notation rules
- There is very few soup in the bowl
- Receiving table/area
- A little food or a few food
- Designing spaces for effective learning
- Introduction to machine learning ethem alpaydin
- Andrew ng intro machine learning
- Introduction to machine learning andrew ng
- High bias low variance introduction to machine learning
- Machine learning ethem
- Machine learning introduction slides
- Introduction to machine learning slides
- Introduction to azure ml
- A friendly introduction to machine learning
- Machine learning definition andrew ng
- Ethem alpaydin
- Introduction to machine learning ethem alpaydin
- Machine learning ethem
- Machine learning lecture notes
- Hypothesis space in machine learning
- Concept learning task in machine learning
- Analytical learning in machine learning
- Pac learning model in machine learning
- Pac learning model in machine learning
- Inductive vs analytical learning
- Inductive learning
- Instance based learning in machine learning
- Inductive learning machine learning
- First order rule learning in machine learning
- Eager learner vs lazy learner
- Cmu machine learning
- Learning english is very necessary
- Finite state machine vending machine example
- Moore machine
- Moore machine
- Chapter 10 energy, work and simple machines answer key
- Many sellers and many buyers
- Relationship chart
- Bookstore database design
- Erm vs erd
- Unary many to many
- Contoh erd one to one
- Unary many to many
- Many-to-many communication
- Sqlbi many to many
- Ternary relationship database
- Many sellers and many buyers
- Www.etwinning space.net
- Hardened magma squeezed into vertical spaces between rocks
- Midbrain pons medulla
- Katherine arden md
- Class 1 occlusion
- Infrahyoid
- Spaces of parona
- Boundaries of thenar space
- Spaces in primary dentition
- Sample complexity for finite hypothesis spaces
- Primary epithelial band
- Danger space of neck
- Hilbert spaces
- Spaces between particles of liquid