Machine Learning meets the Real World Successes and

Data, data everywhere. . . • Scientific: data collection routinely produces gigabytes of data

A wealth of information • Scientific data – Detection of oil spills from satellite

A wealth of information • Web mining – Characterize killer pages • Retail –

Machine learning success (Machine learning is ubiquitous) • Scientific discovery – Detection of oil

Why research in machine learning is so good today Research in machine learning benefits

Plan for this talk Original • Discuss success stories and failures • Failures help

Induction of decision trees • Not the only (or even the most “hot”) algorithms

Inductive learning Given a collection of observations of the form (<x>, f<x>) Find g<x>

Learning objectives • Learn a tree that is correct • Learn a tree that

TDIDT • If all examples are from the same class – The tree is

The Gain Criterion • Measure the information of the collection • Measure the information

Information (Entropy) • Let T be a set of examples • Let C 1,

Information (Entropy) • Let T be a set of examples • Info(T) = -

Entropy after a split • Let X be an attribute with n possible values.

Information Gain • Compute info. X(T) for every attribute • Select attribute that maximizes

Scrubber (the success story) • Diagnoses problems in the local loop • Problem may

MAX, 1990 • Acts as Maintenance Administrator (MA) • Sequence of action: – Customer

Scrubber 2 • Performed a task at a later point in the pipeline •

Scrubber 3 • Scrubber 2 for new application platform • Centralized knowledge server •

Implementation difficulties • Original expert system shell no longer supported • Knowledge base evolved

Requirements • Level of performance at least as good as prior system – Overall

Additional requirements (ours) • Improved performance • Improved extensibility

Phase I: Modeling Scrubber 2 • Applied a decision tree learning algorithm • Input

Data 26, 000 trouble reports • 40 attributes (1/2 continuous; 1/2 symbolic) • Two

Background knowledge • C 4. 5 selected • 17 of 40 attributes used

Phase I results • Decision trees with predictive accuracy of. 99, with as few

Phase II: Acceptance • Comprehensibility Readability – Need to observe rationality in learned knwoledge

Trading off simplicity and correctness • Pruning nodes sacrifices correctness • Appropriate when comprehensibility

Phase II results • Used only two most prominent attributes • New decision trees

Phase III: Working toward extensibility • Hoped to gain flexibility for – Local modifiability

Phase IIIb: More data • Focus on two attributes gave us access to an

Phase III results • Simple diagnostic model • Greater empirical confidence -- impt due

Summarizing the success story • C 4. 5 applied to induce Scrubber 2 model

Lessons can be learned from success Lesson 1: the importance of comprehensibility – Rationality

Lessons can be learned from success Lesson 2: the need for algorithms to handle

Lessons can be learned from success Lesson 3: the need to think about systematic

Lessons can be learned from success Lesson 4: the need to think about the

Lessons can be learned from success Lesson 5: creative uses of the technology –

Further reading and acknowledgements • Carla Brodley et al, American Scientist, Jan. /Feb. ‘

Slides: 44

Download presentation

Machine Learning meets the Real World: Successes and new research directions Andrea Pohoreckyj Danyluk Department of Computer Science Williams College, Williamstown, MA October 11, 2002

Data, data everywhere. . . • Scientific: data collection routinely produces gigabytes of data per day • Telecommunications: AT&T produces 275 million call records • Web: Google handles 70 million searches • Retail: Wal. Mart records 20 million sales transactions

A wealth of information • Scientific data – Detection of oil spills from satellite images – Prediction of molecular bioactivity for drug design • Telecommunications – Fraud detection to distinguish between “bad” and normal usage of cell phones

A wealth of information • Web mining – Characterize killer pages • Retail – Determine better product placement • Direct mail – Predict who is most likely to donate to a charity

Machine learning success (Machine learning is ubiquitous) • Scientific discovery – Detection of oil spills from satellite images • Telecommunications – Diagnosis of problems in the local loop • Printing – Determine causes of banding (printing cylinder problems) • Control – Self-steering vehicles

Why research in machine learning is so good today Research in machine learning benefits from • Abundant data • Interest in fielding new applications – Even more data – Push on limits of our understanding, technology, etc.

Plan for this talk Original • Discuss success stories and failures • Failures help identify new areas of research New plan • One success story in detail • Lesson learned: can identify new areas of research even when we succeed

Induction of decision trees • Not the only (or even the most “hot”) algorithms • Have been used in many contexts • Important for understanding our success story: local-loop network diagnosis

Inductive learning Given a collection of observations of the form (<x>, f<x>) Find g<x> that approximates f<x>

Sample data

Predictive model I. e. , g<x>

Learning objectives • Learn a tree that is correct • Learn a tree that is compact • At every level in the tree, select a test that best differentiates examples of one class from another

TDIDT • If all examples are from the same class – The tree is a leaf with that class name • Else – Pick a test to make – Construct one edge for each possible test outcome – Partition the examples by test outcome – Build subtrees recursively

Which is better?

The Gain Criterion • Measure the information of the collection • Measure the information of each possible split • Choose the split with greatest information gain

Information (Entropy) • Let T be a set of examples • Let C 1, C 2, …, Cn be class labels • freq(Ci, T) = number of examples in T that belong to class Ci. • |T| = number of examples in T • Select example and announce its class: info = - log 2 freq(Ci, T)/|T|

Information (Entropy) • Let T be a set of examples • Info(T) = - (freq(Ci, T)/|T|) (log 2 (freq(Ci/|T|))

Entropy after a split • Let X be an attribute with n possible values. • Let Tj be the examples that have the value j for attribute X. Average entropy that results from making split on X: info. X(T) = ( |Ti| / |T| ) * info(Ti), sum over n possible values of X.

Information Gain • Compute info. X(T) for every attribute • Select attribute that maximizes info(T) – info. X(T)

Which is better?

Scrubber (the success story) • Diagnoses problems in the local loop • Problem may be due to trouble in: – Customer premise equipment – Facilities connecting customer to cable – Central office • Millions of “troubles” reported annually

MAX, 1990 • Acts as Maintenance Administrator (MA) • Sequence of action: – Customer calls – Rep takes information; initiates tests – Trouble report sent to MA – MA puts trouble in dispatch queue for specific type of technician

Scrubber 2 • Performed a task at a later point in the pipeline • Survey dispatch queues to determine whether dispatch appropriate – Dispatch not immediate – Many problems resolved exogenously

Scrubber 3 • Scrubber 2 for new application platform • Centralized knowledge server • Cover twice as large a network

Implementation difficulties • Original expert system shell no longer supported • Knowledge base evolved into opacity – Many tweaks over a decade – Many knowledge engineers – Most not available to work on Scrubber 3

Requirements • Level of performance at least as good as prior system – Overall accuracy – False positives and false negatives in range • Comprehensible – For understanding and acceptance by experts

Additional requirements (ours) • Improved performance • Improved extensibility

Phase I: Modeling Scrubber 2 • Applied a decision tree learning algorithm • Input data: – Trouble reports – Scrubber 2 diagnoses

Data 26, 000 trouble reports • 40 attributes (1/2 continuous; 1/2 symbolic) • Two classes – Dispatch – Don’t -- I. e. , call customer to verify ok

Background knowledge • C 4. 5 selected • 17 of 40 attributes used

Phase I results • Decision trees with predictive accuracy of. 99, with as few as 10, 000 examples • Less than two days of work (easy!)

Phase II: Acceptance • Comprehensibility Readability – Need to observe rationality in learned knwoledge – Original trees on order of 1000 nodes • The simpler the model, the better it can be understood Comprehensibility = Readability + Simplicity + Fidelity

Trading off simplicity and correctness • Pruning nodes sacrifices correctness • Appropriate when comprehensibility an issue • Langley and Schwabacher, 2001 • Note: not pruning to avoid overfitting

Phase II results • Used only two most prominent attributes • New decision trees created • Still fell into acceptable zone

Phase III: Working toward extensibility • Hoped to gain flexibility for – Local modifiability – Additional attribute values • Moved toward probabilistic decision tree – Leaves labeled with probability estimates, not decisions – Stubby trees easy to represent in tabular form

Phase IIIb: More data • Focus on two attributes gave us access to an extensive data set – Many more trouble reports – Abridged (two-attribute) form had not been considered useful earlier

Phase III results • Simple diagnostic model • Greater empirical confidence -- impt due to small disjunct problem – “Big” general rules cover approximately 50% of the data – Remaining 50% covered by small disjuncts

Summarizing the success story • C 4. 5 applied to induce Scrubber 2 model • Pruned model for comprehensibility/simplicity • Converted new model into probabilistic one • Used newly gained data for additional tuning and confidence • Small(? ), simple model in very short time

Lessons can be learned from success Lesson 1: the importance of comprehensibility – Rationality – Readability – Simplicity

Lessons can be learned from success Lesson 2: the need for algorithms to handle small data sets – Creative ways to engineer interesting features from few – Openness to alternative sources of data – Algorithms specifically tuned to handle small data sets Langley has noted this to be an issue of scientific data -- but true for industrial data as well

Lessons can be learned from success Lesson 3: the need to think about systematic error – Locally systematic error only look like noise with enough data – Clearly related to the problem of small data sets – How do our algorithms hold up?

Lessons can be learned from success Lesson 4: the need to think about the future – Learning results put into practice will be modifed and extended – Must new models be learned? – Can improvement be incremental?

Lessons can be learned from success Lesson 5: creative uses of the technology – Learning for the purposes of re-engineering isn’t “standard” – New applications will serve to fuel new research

Further reading and acknowledgements • Carla Brodley et al, American Scientist, Jan. /Feb. ‘ 99 • Pat Langley, various publications • Thanks to Foster Provost and many others at Nynex / Bell Atlantic