Diagnosis and Interpretation We concentrate on diagnosis and

Data Driven Processes • While both diagnosis and interpretation have goals of “seeking to

Forms of Interpretation • The idea behind interpretation is that we are trying to

The Diagnostic Task • Data triggers causes (hypotheses of malfunctions, or potential diagnoses), typically

First Interpretation System • The system Dendral, from 1966, was given mass spectrogram data

Dendral Continued • The planning step would constrain the generate step – At this

Mycin • Mycin was the next important step in the evolution of AI expert

The Importance of Explanation • The Dendral system presented an answer but did not

Mycin Sample Rules RULE 116 IF: 1) the identity of ORGANISM-1 is not known

A Mycin-rule is described at several levels Abstract: Rule-scheme: Meningitis. coverfor. clinical Rule-model: Coverfor-is-model

Systems Generated From Emycin • SACON – Structural Analysis CONsultant IF: 1) The material

Analyzing Mycin’s Process • A thorough analysis of Mycin was performed and it was

Classification as a Task • One can organize the space of diagnostic conclusions (malfunctions)

Supporting Classification • The establish knowledge can take on any number of different forms

Feature-based Pattern Matching • A simple way to encode associational knowledge to support a

Data Inferencing • In Mycin, many rules were provided to perform data abstraction –

Lack of Differentiation • Notice that through the use of simple classification (what is

Abduction • This leads us to abduction, a form of inference first termed by

Inference to the Best Explanation • Another way to view abduction is as follows:

Continued • Assume H is a set of hypotheses (any of which can be

Ways to View “Best” • We will call a set of hypotheses that can

Internist – Rule based Abduction • One of the earliest expert systems to apply

Neural Network Approach • Paul Thagard developed ECHO, a system to learn explanatory coherence

Probabilistic Approach(es) • Pearl’s Belief networks and the generic idea behind the HMM are

The Peirce Algorithm • The previous strategies assume that knowledge is available in either

The Assembly Algorithm • Examine all data and see if there any data that

Propagation • The idea behind the Peirce algorithm is to build on islands of

MDX 2 Medical diagnostic System combines classification, feature-based pattern matching, functional reasoning (covered later),

Comments About Peirce 2 • Good complexity (O(n )) – We add at least

Layered Abduction • For some problems, a single data to hypothesis mapping is insufficient

Example: Handwritten Character Recognition (CHREC)

Overall Architecture • The system has a search space of hypotheses – the characters

Explaining a Character • The features (data) found to be explained for this character

Top-down Guidance • One benefit of this approach is that, by using domain dependent

Full Example in a Natural Language Domain

Uses of Abduction/Layered Abduction • In essence, any form of causal reasoning can be

Model-based Diagnosis: Functional • In all of our previous examples of diagnosis and interpretation,

The Clapper Buzzer • This mechanical device works as follows: – When you press

Generating a Diagnosis • Given a functional representation, we can reason over whether a

Model-based Diagnosis: Probabilistic • While a functional representation can be useful for diagnosis, it

Example • The device consists of 3 multipliers and 2 adders • F computes

Neural Network Approach • Recall that neural networks, while trainable to perform recognition tasks,

Case-Based Diagnosis • Case based reasoning is most applicable when – There a sufficiently

AI in Medicine • The term (abbreviated as AIM) was first coined in 1959

AIM Today • So while AI diagnosis still plays a role in AIM, it

AI Systems in Use • Puff – interpretation of pulmonary function tests has been

Continued • PERFEX – an expert rule-based system to assist with medical image analysis

Which Approach to Use? • Is the domain well understood – e. g. ,

Slides: 57

Download presentation

Diagnosis and Interpretation • We concentrate on diagnosis and interpretation because historically they are significant problems that AI has addressed – And there are numerous and varied solutions, providing us with an interesting cross-section of AI techniques to examine • Diagnosis is the process of determining whether the behavior of a system is correct – If incorrect, which part(s) of the system is(are) failing – We often refer to the result of a diagnosis is one or more malfunctions • The system being diagnosed can be an artificial system (man-made) or natural system (e. g. , the human body, the ecology) – Artificial systems are easier to diagnose because we understand the systems thoroughly • Interpretation is a related problem, it is the process of explaining the meaning of some object of attention

Data Driven Processes • While both diagnosis and interpretation have goals of “seeking to explain”, the processes are triggered by data – We use the data (symptoms, manifestations, observations) to trigger possible reasons for why those data have arisen • Thus, these problems are distinct from goal-driven problems (planning, design, control) – Control encompasses planning, interpretation, diagnosis and possibly prediction • One way to view diagnosis/interpretation is that given data, explain why the data has arisen – an explanation which attempts to describe why we have the resulting behavior (malfunctions or observations)

Forms of Interpretation • The idea behind interpretation is that we are trying to understand why something has happened – Diagnosis is a form of interpretation in that we are trying to understand a system’s deviation from the norm • what caused the system to deviate? what components have broken down? why? • Diagnosis is a form of interpretation, but there are other forms – Data analysis – what phenomenon caused the data to arise, e. g. , studying astronomical phenomena by looking at radio signals, or looking at blood clots and decided on blood types – Object identification – viewing a description (in some form, whether visual or data) of an object, what is the object – Speech recognition – interpret the acoustic signal in terms of words/meanings – Communication – what is the meaning behind a given message? This can be carried over to analysis of artwork – Evidence analysis – trying to decipher the data from a crime scene to determine what happened, who committed the crime and why – Social behavior –explaining why someone acted in a particular way

The Diagnostic Task • Data triggers causes (hypotheses of malfunctions, or potential diagnoses), typically an associational form of knowledge • Hypotheses must be confirmed through additional testing and inspection of the situation • Hypotheses should be as specific as possible, so they need to be refined (e. g. , given a general class of disease, find the most specific subclass)

First Interpretation System • The system Dendral, from 1966, was given mass spectrogram data and inferred the chemical composition from that data – The input would be the mass of the substance along with other experimental lab data – Dendral would apply knowledge of atomic masses, valence rules and connectivity among atoms to determine combinations and connections of the atoms in the unknown compound • The number of combinations grows exponentially with the size (mass) of the unknown compound) – Dendral used a plan-generate-test process • First, constraints would be generated based on heuristic knowledge of what molecules might appear given the initial input and any knowledge presented about the unknown compound

Dendral Continued • The planning step would constrain the generate step – At this step, graphical representations of possible molecules would be generated – The constraints are necessary to reduce the number of possible graphs generated • The final step, testing, attempts to eliminate all but the correct representations – Each remaining graph is scored by examining the candidate molecular structure and comparing it against mass spectrometry rules and reaction chemistry rules – Structures are discarded if they are inconsistent with the spectrum or known reactions – Any remaining structures are presented the operator • At this point, the operator can input additional heuristic rules that can be applied to this case to prune away incorrect structures – These rules are added to the heuristics, so Dendral “learns” – A thorough examination is presented in http: //profiles. nlm. nih. gov/BB/A/B/O/M/_/bbabom. pdf

Mycin • Mycin was the next important step in the evolution of AI expert systems and AI in medicine – The first well known and well received expert system, it also presented a generic solution to reasoning through rules – It provided uncertainty handling in the form of certainty factors – After creating Mycin, some of the researchers developed the rule-based language E-Mycin (Essential or Empty Mycin) so that others could develop their own rule-based expert systems • Mycin had the ability to explain its conclusions by showing matching rules that it used in its chain of logic • Mycin outperformed the infectious disease experts when tested, coming to an “acceptable” therapy in 69% of its cases – A spinoff of Mycin was a teaching tool called GUIDON which is based on the Mycin knowledge base

The Importance of Explanation • The Dendral system presented an answer but did not explain how it came about its conclusions • Mycin could easily generate an explanation by outputting the rules that matched in the final chain of logic – E. g. , rule 12 & rule 15 rule 119 rule 351 – A user can ask questions like “why was rule 351 selected? ” to which Mycin responds by showing the rule’s conditions (lhs) and why those conditions were true – The reason why a rule is true is usually based on previous rules being true leading to conclusions that made the given rule true • By being able to see the explanation, one can feel more confident with the system’s answers – But it is also a great tool to help debug and develop the knowledge base

Mycin Sample Rules RULE 116 IF: 1) the identity of ORGANISM-1 is not known ** 2) the gram stain of ORGANISM-1 is not known** 3) the morphology of ORGANISM-1 is not known 4) the site of CULTURE-1 is csf 5) the infection is meningitis 6) the age (in years) of the patient is less than equal to. 17 THEN: There is weakly suggestive evidence (. 3) that the category of ORGANISM-1 is enterobacteriaceae RULE 050 IF: 1) the morphology of ORGANISM-1 is rod 2) the gram stain of ORGANISM-1 is gramneg 3) the aerobicity of ORGANISM-1 is facultative ** 4) the infection with ORGANISM-1 was acquired while the patient was hospitalized** THEN: There is evidence that the category of ORGANISM-1 is enterobacteriaceae

A Mycin-rule is described at several levels Abstract: Rule-scheme: Meningitis. coverfor. clinical Rule-model: Coverfor-is-model Key-factor: Burned Dual: D-Rule 577 MYCIN Knowledge Base Performance Level D-Rule 577 If 1) the infection which requires therapy is meningitis, and 2) Organisms were not seen on the stain of the culture, and 3) The type of infection is bacterial, and 4) The patient has been seriously burned THEN: There is suggestive evidence (. 5) that pseudomonas-aeruginosa is one of the organisms (other than those seen on cultures or smears) which might be causing the infection Support Level: Mechanism-frame: body-infraction. wounds Justification: For a very brief period of time after a severe burn … Literature: Mac. Milan BG: Ecology of Bacteria Colorizing… Author: Dr. Victor Yu Last-Change: Sept 8, 1976

GUIDON (MYCIN-based Medical Tutorial)

Systems Generated From Emycin • SACON – Structural Analysis CONsultant IF: 1) The material composing the sub-structure is one of the metals, and 2) The analysis error that is tolerable is between 5% and 30%, and 3) Then non-dimensional stress of the sub-structure >. 9 , and 4) The number of cycles the loading is to be applied is between 1000 and 10000 THEN: It is definite (1. 0) that fatigue is one of the stress behavior phenomena in the sub-structure • Puff – pulmonary disorders – originally implemented in Emycin before being reimplemented as an OO system I f : 1) The mmf/mmf-predicted ratio is [35. . 45] & the fvc/fvc-predicted ratio > 88 2) The mmf/mmf-predicted ratio is [25. . 35] & the fvc/fvc-predicted ratio < 88 Then : There is suggestive evidence (. 5) that the degree of obstructive airways disease as indicated by the MMF is moderate, and it is definite (1. 8) that the following is one of the findings about the diagnosis of obstructive airways disease: Reduced mid-expiratory flow indicates moderate airway obstruction.

Analyzing Mycin’s Process • A thorough analysis of Mycin was performed and it was discovered that the rule-based approach of Mycin was actually following three specific tasks – Data are first translated using data abstraction from specific values to values that may be of more use (e. g. , changing a real value into a qualitative value) – The disease(s) is then classified – The hypothesis is refined into more detail • By considering the diagnostic process as three related but different tasks, it allows one to more clearly understand the process – With that knowledge, it becomes easier to see how to solve a diagnostic task – use classification

Classification as a Task • One can organize the space of diagnostic conclusions (malfunctions) into a taxonomy – The diagnostic task is then one of searching the taxonomy • Coined hierarchical classification – The task can be solved by establish-refine • Attempt to establish a node in the hierarchy • If found relevant, refine it by recursively trying to establish any of the node’s children • If found non-relevant, prune that portion of the hierarchy away and thus reduce the complexity of the search • How does one establish a node as relevant? – Here, we can employ any number of possible approaches including rules • Think of the node as a “specialist” in identifying that particular hypothesis • Encode any relevant knowledge to recognize (establish) that hypothesis in the node itself

Example 1: Automotive Diagnosis

Example 2: Syntactic Debugging

Ex 3: Linux User Classification

Supporting Classification • The establish knowledge can take on any number of different forms – – – Rules (possibly using fuzzy logic or certainty factors, or other) Feature-based pattern matching Bayesian probabilities or HMM Neural network activation strength Genetic algorithm fitness function • In nearly every case, what we are seeking are a set of predetermined features – Which features are present? Which are absent? – How strongly do we believe in a given feature? • If the feature is not found in the database, how do we acquire it? – By asking the user? By asking for a test result? By performing additional inference? – Notice that in the neural network case, features are inputs whereas in most of the rest of the cases, they are conditions usually found on the LHS of rules

Feature-based Pattern Matching • A simple way to encode associational knowledge to support a hypothesis is to enumerate the features (observations, symptoms) we expect to find if the hypothesis is true – We can then enumerate patterns that provide a confidence value that we might have if we saw the given collection of features • Consider for hypothesis H, we expect features F 1 and F 2 and possibly F 3 and F 4, but not F 5 where F 1 is essential but F 2 is somewhat less essential – – – – F 1 F 2 F 3 yes yes yes ? ? ? means “don’t care” F 4 yes ? ? F 5 no no yes Result confirmed likely somewhat likely neutral/unsure ruled out • We return the result from the first pattern to match, so this is in essence a nested if-else statement

Data Inferencing • In Mycin, many rules were provided to perform data abstraction – In a pattern matching approach, we might have a feature of interest that may not be directly evident from the data but the data might be abstracted to provide us with the answer • example: Was the patient anesthetized in the last 6 months? • no data indicates this, but we see that the patient had surgery 2 months ago and so we can infer that the patient was anesthetized • Data inferencing will be domain specific – We have to codify each inference as shown above • Some inferencing may be domain independent – Such as temporal reasoning or spatial reasoning • Other forms of inference are generalization (abstraction) such as abstracting away from a specific value to a qualitative value (temperature 102 becomes “high fever”)

Lack of Differentiation • Notice that through the use of simple classification (what is called hierarchical classification), one does not differentiate among possible hypotheses – If two hypotheses are found to be relevant, we do not have additional knowledge to select one • what if X and Y are both established with X being more certain than Y, which should we select? • what if X and Y have some form of association with each other such as mutually incompatible, or jointly likely? • what if there are multiple faults that are related to each other (causally) • We would like to employ a process that contains such knowledge as to let us select only the most likely hypothesis(es) given the data – In a neural network, we would only select the most likely node, and similarly for an HMM, the most likely path

Abduction • This leads us to abduction, a form of inference first termed by philosopher Charles Peirce – Peirce saw abduction as the following: • Deduction says that – If we have the rule A B – And given that A is true – Then we can conclude B • But abduction says that – If we have the rule A B – And given that B is true – Then we can conclude A – Notice that deduction is truth preserving but abduction is not – We can expand the idea of abduction to be as follows: • If A 1 v A 2 v A 3 v … v An B • And given that B is true • And if Ai is more likely than any other Aj (1<=j<=n), then we can infer that Ai is true – for this to work, we need a way to determine which is most likely

Inference to the Best Explanation • Another way to view abduction is as follows: – – D is a collection of data (facts, observations, symptoms) to explain H explains D (if H is true, then H can explain why D has appeared) No other hypothesis explains D as well as H does Therefore H is probably correct • Although the problem can be viewed similar to classification – we need to locate an H that accounts for D – We now need additional knowledge, explanatory knowledge • What data can H explain? • How well can H explain the data? • Is there some way to evaluate H given D? – Additionally, we will want to know if • H is consistent • Did we consider all H’s in our domain? • What complicates generating a best explanation is that H and D are probably not singletons but sets

Continued • Assume H is a set of hypotheses (any of which can be components of an explanation) – H = {H 1, H 2, H 3, …, Hn} • D is a collection of data to be explained – D = {d 1, d 2, d 3, …, dn} • A given hypothesis, Hi can account for (explain) some subset of data – If we have ranked all elements of H with some scoring algorithm (Bayesian probability, neural network strength of activation, featurebased pattern matching, etc) we can assemble a best explanation – What does best mean? Lines indicate explanatory power (coverage) Dotted line indicates incompatible hypotheses

Ways to View “Best” • We will call a set of hypotheses that can explain the data as a composite hypothesis • The best composite hypothesis should have these features – – Complete – explains all data (or as much as is possible) Consistent – there are no incompatibilities among the hypotheses Parsimonious – the composite has no superfluous parts Simplest – all things considered, the composite should have as fewer individual hypotheses as possible – Most likely – this might be the most likely composite or the composite with the most likely hypotheses (how do we compute this? ) • In addition, we might want to include additional factors – Cheapest costing (if applicable) – the composite that would be the least expensive to believe – Generated with a reasonable amount of effort – generating the composite in a non-intractable way (abduction is generally an NPcomplete problem)

Internist – Rule based Abduction • One of the earliest expert systems to apply abduction was Internist, to diagnose internal diseases – Internist was largely a rule-based system – The abduction process worked as follows • Data trigger rules of possible diseases • For each disease triggered, determine what other symptoms are expected by that disease, which are present and which are absent – Generate a score for that disease hypothesis • Now compare disease hypotheses to differentiate them – If one hypothesis is more likely, try to confirm it – If many possible hypotheses, try to rule some out – If a few hypotheses available, try to differentiate between them by seeking data (e. g. , test results) that one expects that the others do not – The diagnostic conclusion are those hypotheses that still remain at the end that each explain some of the data

Neural Network Approach • Paul Thagard developed ECHO, a system to learn explanatory coherence – ECHO was developed as a neural network where nodes represent hypotheses and data – links represent potential explanations between hypotheses and data – and hypothesis relationships (mutual incompatibilities, mutual support, analogy) • Unlike a normal neural network, nodes here represent specific concepts – weights are learned by the strength of relationships are found in test data • In fact, the approach is far more like a Bayesian network with edge weights representing conditional probabilities (counts of how often a hypothesis supports a datum) – When data are introduced, perform a propagation algorithm of the present data until the hypothesis nodes and data nodes have reached a stable state (similar to a Hopfield net) and then the best explanation are those hypothesis nodes whose probabilities are above a preset threshold amount

Ex: Evolution (DH) vs Creationism (CH)

Probabilistic Approach(es) • Pearl’s Belief networks and the generic idea behind the HMM are thought to be abductive problem solving techniques – Notice that there is no explicit coverage of hypotheses to data, for instance, we do not select a datum and ask “what will explain this? ” – In the HMM, we could say that each state explains the corresponding datum that was used to compute the emission probability • The typical Bayesian approach contains probabilities of a hypothesis (state) being true, of a hypothesis transitioning to another hypothesis, and of an output being seen from a given hypothesis – But there is no apparent mechanism to encode hypothesis incompatibilities or analogies

The Peirce Algorithm • The previous strategies assume that knowledge is available in either a rule-based or probabilistic-based format • The Peirce algorithm instead uses generic tasks – The algorithm has evolved over the course of construction several knowledge-based systems • The basic idea is – Generate hypotheses • this might be through hierarchical classification, neural network activity, or other – Instantiate generated hypotheses • for each hypothesis, determine its explanatory power (what it can explain from the data), hypothesis interactions (for the other generated hypotheses, are they compatible, incompatible, etc) and some form of ranking – Assemble the best explanation • see the next slide

The Assembly Algorithm • Examine all data and see if there any data that can only be explained by a single hypothesis (such a hypothesis is called essential) • Include all essential hypotheses in the composite • Propagate the effects of including these hypotheses (see next slide) • Remove all data that can be explained by these hypotheses • Start from the top (this may have created new essentials) • Examine remaining data and see if there any data that can only be explained by a superior hypothesis (one that is clearly better than all competitors, say because it has a much higher ranking) • Include all superior hypotheses in the composite, propagate and remove • Start from the top (this may have created new essentials) • Examine remaining data and see if there any data that can only be explained by a better hypothesis • Include all better hypotheses in the composite, propagate and remove • Start from the top (this may have created new essentials) • If there are still data to explain, either guess or quit with unexplained data

Propagation • The idea behind the Peirce algorithm is to build on islands of certainty – If a hypothesis is essential, it is the only way to explain something, it MUST be part of the best explanation • If a hypothesis is included in the composite, we can leverage knowledge of how that hypothesis relates to others – If the hypothesis, say H 1, is incompatible with H 2, since we believe H 1 is true, H 2 must be false, discard it – If hypothesis H 1 is very unlikely to appear with H 2, we can downgrade H 2’s ranking – If hypothesis H 1 is likely to appear with H 2, we can either reconsider H 2 or just bump up its ranking – If hypothesis H 1 can be inferred to be H 2 by analogy, we can include H 2 • Since H 1 was included because it was the only (or best) way to explain some data, we build upon that island of certainty by perhaps creating new essentials because H 1 is incompatible with other hypotheses

Red Blood Cell Identification System

MDX 2 Medical diagnostic System combines classification, feature-based pattern matching, functional reasoning (covered later), data inference and abductive assembly

Comments About Peirce 2 • Good complexity (O(n )) – We add at least 1 hypothesis to the composite in each iteration so we will have no more than n iterations – In each iteration, we have to look at up to n hypotheses and compare them to the newly included hypothesis for propagation • note: abduction in general is an O(2 n) problem (set covering) • Easy to generate explanation for our conclusion (how we came about our solution) • Can determine how good an explanation is based on how much superior one hypothesis was over another – We can control this as well by not allowing a hypothesis to be in the composite if it is not clearly superior, something that BNs and HMMs cannot do • We can generate a hypothesis that a specific datum is noise – We do not have to accept hypotheses to explain spurious data

Layered Abduction • For some problems, a single data to hypothesis mapping is insufficient – Either because we have more knowledge to bring to bear on the problem or because we want an explanation at a higher level of reasoning • in speech recognition, we wouldn’t want to just generate an explanation of the acoustic signal as a sequence of phonetic units • We map the output of one level into another – The explanation of one layer becomes the input of the next layer to be explained by hypotheses at a different level of abstraction or reasoning • we explain the phonetic unit output as a sequence of syllables, and we explain the syllables as a sequence of words, and then explain the sequence of words as a meaningful statement – We can use partially formed hypotheses at a higher level to generate expectations for a lower layer thus giving us some top-down guidance

Example: Handwritten Character Recognition (CHREC)

Overall Architecture • The system has a search space of hypotheses – the characters that can be recognized • this may be organized hierarchically, but here, its just a flat space – a list of the characters – each character has at least one recognizer • some have multiple recognizers if there are multiple ways to write the character, like 0 which may or may not have a diagonal line from right to left After characters are generated for each character in the input, the abductive assembler selects the best ones to account for the input

Explaining a Character • The features (data) found to be explained for this character are three horizontal lines and two curves • While both the E and F characters were highly rated, “E” can explain all of the features while “F” cannot, so “E” is the better explanation

Top-down Guidance • One benefit of this approach is that, by using domain dependent knowledge – the abductive assembler can increase or decrease individual character hypothesis beliefs based on partially formed explanations – for instance, in the postal mail domain, if the assembler detects that it is working on the zip code (because it already found the city and state on one line), then it can rule out any letters that it thinks it found • since we know we are looking at Saint James, NY, the following five characters must be numbers, so “I” (for one of the 1’s, “B” for the 8, and “O” for the 0 can all be ruled out (or at least scored less highly)

Full Example in a Natural Language Domain

Uses of Abduction/Layered Abduction • In essence, any form of causal reasoning can be considered abduction – Diagnosis – Sensor interpretation – Handwritten character recognition – Speech recognition – Story understanding – Theory evaluation – Legal reasoning – Automated planning – Belief revision • Approaches have included those already mentioned, Bayesian networks, HMMs, first order predicate calculus, and rule-based approaches

Model-based Diagnosis: Functional • In all of our previous examples of diagnosis and interpretation, our knowledge was associational – We associate these symptoms/data with these diseases/malfunctions • This is fine when we do not have a complete understanding the system – Medical diagnosis – Speech recognition – Vision understanding – What if we do understand the system? • E. g. , a human-made artifact – If this is the case, we should be able to provide knowledge in the form of the function that a given component will provide in the system and how that function is achieved through its behavior (process) • Debugging can be performed by simulating performance with various components not working

The Clapper Buzzer • This mechanical device works as follows: – When you press the button (not shown) it completes the circuit causing current to flow to the coil – When the magnetic coil charges, it pulls the clapper hand toward it – When the clapper hand moves, it disconnects the circuit causing the coil to stop pulling the hand then hand falls back, hitting a bell (not shown) causing the ringing sound – This also reconnects the circuit, and so this process repeats until the button is no longer pressed

Generating a Diagnosis • Given a functional representation, we can reason over whether a function can be achieved or not – Hypothetical or “what would happen if” reasoning • What would happen if the coil was not working? • What would happen if the battery was not charged? • What would happen if the clapper arm were blocked? – We can also use the behavior and test results to find out what function(s) was not being achieved • With the switch pressed, we measure current at the coil, so the coil is being charged • We measure a magnetic attraction to show that the coil is working • We do not hear a clapping sound, so the magnetic attraction is either not working, or the acoustic law is not being fulfilled – Why not? Perhaps the arm is not magnetic? Perhaps there is something on the arm so that when it hits the bell, no sound is being emitted

Model-based Diagnosis: Probabilistic • While a functional representation can be useful for diagnosis, it is somewhat problem independent – FRs can be used for prediction (WWHI reasoning), diagnosis, planning and redesign, etc • Diagnosis typically is more focused, so we can create a model of system components and their performance and enhance the system with probabilities – Failure rates can be used for prior probabilities – Evidential probabilities can be used to denote the likelihood of seeing a particular output from a component given that it has failed • Bayesian probabilities can then be easily computed

Example • The device consists of 3 multipliers and 2 adders • F computes A*C+B*D • G computes B*D+C*E – Given the inputs, F should output 12 but computes 10 – Given the inputs, G should output 12 and does • We can employ probabilities of • We use the model to compute the diagnosis – Possible malfunctions are with M 1, M 2, A 1 but not M 3 or A 2 • If we can probe the inside of the machine component failure rate and likelihood of seeing particular values given the input to compute the most likely cause – note: it could be multiple component failure – we can obtain values for X, Y and • If we have a model of the Z to remove some of the multiplier and adder, we can contending malfunction also use that knowledge to assist hypotheses in diagnosis

Neural Network Approach • Recall that neural networks, while trainable to perform recognition tasks, are knowledge-poor – Therefore, they seem unsuitable for diagnosis • However, there are many diagnostic tasks or subtasks that revolve around – Data interpretation – Visual understanding and classification • And neural networks might contribute to diagnosis by solving these lower level tasks • NNs have been applied to assist in – Congestive heart failure prediction based on patient background and habits – Medical imaging interpretation for lung cancer and breast cancer (MRI, chest X-ray, catscan, radioactive isotope, etc) – Interpreting forms of acidosis based on blood work analysis

Case-Based Diagnosis • Case based reasoning is most applicable when – There a sufficiently large number of cases – There is knowledge of how to manipulate a previous case to fit the current situation • This is most common done with planning/design, not diagnosis – So for diagnosis, we need a different approach • Retrieve all cases that are deemed relevant for the current input • Recommend those cases that match closely by combining common diagnoses, a weighted voting scheme • Supply a confidence based on the strength of the votes • If deemed useful, retain the case to provide the system with a mechanism for “learning” based on new situations – This approach has been employed by GE for diagnosing gas engine turbine problems

AI in Medicine • The term (abbreviated as AIM) was first coined in 1959 although actual usage didn’t occur until the 1970 s with Mycin – Surprisingly using AI for medical diagnosis has largely not occurred in spite of all of the research systems developed, in part because • the expert systems impose changes to the way that a clinician would perform their task (for instance, the need to have certain tests ordered at times when needed by the system, not when the clinician would normally order such a test) • the problem(s) solved by the expert system is not a particular issue needing solving (either because the clinician can solve the problem adequate, or the problem is too narrow in scope) • the cost of developing and testing the system is prohibitive

AIM Today • So while AI diagnosis still plays a role in AIM, it is a small role, much smaller than those in the 1980 s would have predicted • Today, AIM performs a variety of other tasks – Aiding with laboratory experiments – Enhancing medical education – Running with other medical software (e. g. , databases) to determine if inconsistent data or knowledge has been entered • for instance, a doctor prescribing medication that the patient is known to be allergic too – Generating alerts and reminders of specific patients to nurses, doctors or the patients themselves – Diagnostic assistance – rather than performing the diagnosis, they help the medical expert when the particular problem is of a rare case – Therapy critiquing and planning, for instance by finding omissions or inconsistencies in a treatment – Image interpretation of X-Rays, catscans, MRI, etc

AI Systems in Use • Puff – interpretation of pulmonary function tests has been sold to hundreds of sites world-wide starting as early as 1977 • Germ. Watcher – used in hospitals to detect in-patient acquired infections by monitoring lab data on culture data • PEIRS – pathology expert interpretive reporting system is similar, it generates 80 -100 reports daily with an accuracy of about 95%, providing reports on such things as thyroid function tests, arterial blood gases, urine and plasma catecholamines, glucose test results and more • KARDIO – a decision tree learning system that interprets ECG test results • Athena – decision support system implements guidelines for hypertension patients to instruct them on how to be more healthy, in use since 2002 in clinics in NC and northern CA

Continued • PERFEX – an expert rule-based system to assist with medical image analysis for heart disease patients • Orthoplanner – plans orthodonture treatments using rule -based forward and backward chaining and fuzzy logic, in use in the UK since 1994 • Pharm. Ade and Dose. Checker – expert systems to evaluate drug therapy prescriptions given the patient’s background for inaccuracies, negative interactions, and adjustments, in use in many hospitals starting in 1996/1994 • IPROB – intelligent clinical management system to keep track of obstetrics/gynecology patient records and cases, risk reduction, decision support through distributed databases and rules based on hospital guidelines, practices, etc, in use since 1995

Which Approach to Use? • Is the domain well understood – e. g. , a mechanical system versus a medical domain • How much knowledge is needed to solve the problem – Medical diagnosis contains too much knowledge to easily build into a knowledge-based system, a Bayesian approach might be better • Is there useful training data to construct a system – For instance to train a neural network or SVM? • Is computational complexity a concern? – HMMs and Bayesian networks have intractable performances which means problems with large numbers of hypotheses make these approaches impractical • Do we want to obtain an explanation from the problem solving system? • Are multiple faults possible?