Understanding of data using Computational Intelligence methods Wodzisaw

  • Slides: 58
Download presentation
Understanding of data using Computational Intelligence methods Włodzisław Duch Dept. of Informatics, Nicholas Copernicus

Understanding of data using Computational Intelligence methods Włodzisław Duch Dept. of Informatics, Nicholas Copernicus University, Toruń, Poland http: //www. phys. uni. torun. pl/~duch IEA/AIE Cairns, 17 -20. 06. 2002

What am I going to say • • • Data and CI What we

What am I going to say • • • Data and CI What we hope for. Forms of understanding. Visualization. Prototypes. Logical rules. Some knowledge discovered. Expert system for psychometry. Conclusions, or why am I saying this?

Types of Data • Data was precious! Now it is overwhelming. . . •

Types of Data • Data was precious! Now it is overwhelming. . . • Statistical data – clean, numerical, controlled experiments, vector space model. • Relational data – marketing, finances. • Textual data – Web, NLP, search. • Complex structures – chemistry, economics. • Sequence data – bioinformatics. • Multimedia data – images, video. • Signals – dynamic data, biosignals. • AI data – logical problems, games, behavior …

Computational Intelligence Neural networks Evolutionary algorithms Fuzzy logic Soft computing Pattern Recognition Expert systems

Computational Intelligence Neural networks Evolutionary algorithms Fuzzy logic Soft computing Pattern Recognition Expert systems Computational Intelligence Data => Knowledge Artificial Intelligence Machine learning Probabilistic methods Visualization Multivariate statistics

CI & AI definition • Computational Intelligence is concerned with solving effectively non-algorithmic problems.

CI & AI definition • Computational Intelligence is concerned with solving effectively non-algorithmic problems. This corresponds to all cognitive processes, including low-level ones (perception). • Artificial Intelligence is a part of CI concerned with solving effectively non-algorithmic problems requiring systematic reasoning and symbolic knowledge representation. Roughly this corresponds to high-level cognitive processes.

Turning data into knowledge What should CI methods do? • Provide descriptive and predictive

Turning data into knowledge What should CI methods do? • Provide descriptive and predictive nonparametric models of data. • Allow to classify, approximate, associate, correlate, complete patterns. • Allow to discover new categories and interesting patterns. • Help to visualize multi-dimensional relationships among data samples. • Allow to understand the data in some way. • Facilitate creation of ES and reasoning.

Forms of useful knowledge AI/Machine Learning camp: Neural nets are black boxes. Unacceptable! Symbolic

Forms of useful knowledge AI/Machine Learning camp: Neural nets are black boxes. Unacceptable! Symbolic rules forever. But. . . knowledge accessible to humans is in: • symbols, • similarity to prototypes, • images, visual representations. What type of explanation is satisfactory? Interesting question for cognitive scientists. Different answers in different fields.

Data understanding • Humans remember examples of each category and refer to such examples

Data understanding • Humans remember examples of each category and refer to such examples – as similarity-based or nearestneighbors methods do. • Humans create prototypes out of many examples – as Gaussian classifiers, RBF networks, neurofuzzy systems do. • Logical rules are the highest form of summarization of knowledge. Types of explanation: • visualization-based: maps, diagrams, relations. . . • exemplar-based: prototypes and similarity; • logic-based: symbols and rules.

Visualization: dendrograms All projections (cuboids) on 2 D subspaces are identical, dendrograms do not

Visualization: dendrograms All projections (cuboids) on 2 D subspaces are identical, dendrograms do not show the structure. Normal and malignant lymphocytes.

Visualization: 2 D projections All projections (cuboids) on 2 D subspaces are identical, dendrograms

Visualization: 2 D projections All projections (cuboids) on 2 D subspaces are identical, dendrograms do not show the structure. 3 -bit parity + all 5 -bit combinations.

Visualization: MDS mapping Results of pure MDS mapping + centers of hierarchical clusters connected.

Visualization: MDS mapping Results of pure MDS mapping + centers of hierarchical clusters connected. 3 -bit parity + all 5 -bit combinations.

Visualization: 3 D projections Only age is continuous, other values are binary Fine Needle

Visualization: 3 D projections Only age is continuous, other values are binary Fine Needle Aspirate of Breast Lesions, red=malignant, green=benign A. J. Walker, S. S. Cross, R. F. Harrison, Lancet 1999, 394, 1518 -1521

Visualization: MDS mappings Try to preserve all distances in 2 D nonlinear mapping MDS

Visualization: MDS mappings Try to preserve all distances in 2 D nonlinear mapping MDS large sets using LVQ + relative mapping.

Prototype-based rules C-rules (Crisp), are a special case of F-rules (fuzzy rules) are a

Prototype-based rules C-rules (Crisp), are a special case of F-rules (fuzzy rules) are a special case of P-rules (Prototype). P-rules have the form: IF P = arg min. R D(X, R) THAN Class(X)=Class(P) D(X, R) is a dissimilarity (distance) function, determining decision borders around prototype P. P-rules are easy to interpret! IF X=You are most similar to the P=Superman THAN You are in the Super-league. IF X=You are most similar to the P=Weakling THAN You are in the Failed-league. “Similar” may involve different features or D(X, P).

P-rules Euclidean distance leads to a Gaussian fuzzy membership functions + product as T-norm.

P-rules Euclidean distance leads to a Gaussian fuzzy membership functions + product as T-norm. Manhattan function => m(X; P)=exp{-|X-P|} Various distance functions lead to different MF. Ex. data-dependent distance functions, for symbolic data:

Crisp P-rules New distance functions from info theory => interesting MF. Membership Functions =>

Crisp P-rules New distance functions from info theory => interesting MF. Membership Functions => new distance function, with local D(X, R) for each cluster. Crisp logic rules: use L norm: D (X, P) = ||X-P|| = maxi Wi |Xi-Pi| D (X, P) = const => rectangular contours. L (Chebyshev) distance with thresholds P IF D (X, P) P THEN C(X)=C(P) is equivalent to a conjunctive crisp rule X 1 [P 1 - P/W 1, P 1+ P/W 1] … … XN [PN - P/WN, PN+ P/WN] THEN C(X)=C(P) IF

Decision borders D(P, X)=const and decision borders D(P, X)=D(Q, X). Euclidean distance from 3

Decision borders D(P, X)=const and decision borders D(P, X)=D(Q, X). Euclidean distance from 3 prototypes, one per class. Minkovski a=20 distance from 3 prototypes.

P-rules for Wine L distance (crisp rules): 15 prototypes kept, 5 errors, f 2,

P-rules for Wine L distance (crisp rules): 15 prototypes kept, 5 errors, f 2, f 8, f 10 removed Euclidean distance: 11 prototypes kept, 7 errors Manhattan distance: 6 prototypes kept, 4 errors, f 2 removed Many other solutions. Prototypes: SV & clusters.

Complex objects Vector space concept is not sufficient for complex object. A common set

Complex objects Vector space concept is not sufficient for complex object. A common set of features is meaningless. AI: complex objects, states, subproblems. General approach: sufficient to evaluate similarity Compare D(Oi, Oj). Oi, Oj: define transformation Elementary operators Wk, eg. substring’s substitutions. T connecting a pair of objects Oi and Oj objects exist. Cost of transformation = sum of Wk costs. Many Similarity: lowest transformation costs. Bioinformatics: sophisticated similarity functions for sequences. Dynamic programming finds similarities in reasonable time. Use adaptive costs and general framework for SBM methods.

Promoters DNA strings, 57 aminoacids, 53 + and 53 - samples tactagcaatacgcttgcgttcggtggttaagtataatgcgcgggcttgtcgt Euclidean distance,

Promoters DNA strings, 57 aminoacids, 53 + and 53 - samples tactagcaatacgcttgcgttcggtggttaagtataatgcgcgggcttgtcgt Euclidean distance, symbolic s =a, c, t, g replaced by x=1, 2, 3, 4 PDF distance, symbolic s=a, c, t, g replaced by p(s|+)

Connection of CI with AI AI/CI division is harmful for science! GOFAI: operators, state

Connection of CI with AI AI/CI division is harmful for science! GOFAI: operators, state transformations and search techniques are basic tools in AI solving problems requiring systematic reasoning. CI methods may provide useful heuristics for AI and define metric relations between states, problems or complex objects. Example: combinatorial productivity in AI systems and FSM. Later: decision tree for complex structures.

Electric circuit example Answering questions in complex domains requires reasoning. Qualitative behavior of electric

Electric circuit example Answering questions in complex domains requires reasoning. Qualitative behavior of electric circuit: 7 variables, but Ohm’s law V=I*R, or Kirhoff’s law Vt=V 1+V 2 Train a Neuro. Fuzzy system on Ohm’s and Kirhoff’s laws. Without solving equations; answer questions of the type: If R 2 grows, R 1 & Vt are constant, what will happen with the current I and voltages V 1, V 2 ? (taken from the PDP book, Mc. Clleland, Rumelhart, Hinton)

Electric circuit search AI: create search tree, CI: provide guiding intuition. Any law of

Electric circuit search AI: create search tree, CI: provide guiding intuition. Any law of the form A=B*C or A=B+C, ex: V=I*R, has 13 true facts, 14 false facts and may be learned by NF system. Geometrical representation: + increasing, - decreasing, 0 constant Find combination of Vt, Rt, I, V 1, V 2, R 1, R 2 for which all 5 constraints are fulfilled. For 111 cases put of 37=2187 Search and check if X can be +, 0, -, laws are not satisfied if F(Vt=0, Rt, I, V 1, V 2, R 1=0, R 2=+) =0

Heuristic search If R 2 grows, R 1 & Vt are constant, what will

Heuristic search If R 2 grows, R 1 & Vt are constant, what will happen with the current I and voltages V 1, V 2 ? We know that: R 2 =+, R 1 =0, Vt =0, V 1=? , V 2=? , Rt=? , I =? Take V 1=+ and check if: F(Vt=0, Rt=? , I=? , V 1=+, V 2=? , R 1=0, R 2=+) >0 Since for all V 1=+, 0 and – the function is F()>0 take variable that leads to unique answer, Rt Single search path solves the problems. Useful also in approximate reasoning where only some conditions are fulfilled.

Logical rules Crisp logic rules: for continuous x use linguistic variables (predicate functions). sk(x)

Logical rules Crisp logic rules: for continuous x use linguistic variables (predicate functions). sk(x) ş True [XkŁ x Ł X'k], for example: small(x) = True{x|x < 1} medium(x) = True{x|x [1, 2]} large(x) = True{x|x > 2} Linguistic variables are used in crisp (prepositional, Boolean) logic rules: IF small-height(X) AND has-hat(X) AND hasbeard(X) THEN (X is a Brownie) ELSE IF. . . ELSE. . .

Crisp logic decisions Crisp logic is based on rectangular membership functions: True/False values jump

Crisp logic decisions Crisp logic is based on rectangular membership functions: True/False values jump from 0 to 1. Step functions are used for partitioning of the feature space. Very simple hyper-rectangular decision borders. Severe limitation on the expressive power of crisp logical rules!

DT decisions borders Decision trees lead to specific decision borders. SSV tree on Wine

DT decisions borders Decision trees lead to specific decision borders. SSV tree on Wine data, proline + flavanoids content Decision tree forests: many decision trees of similar accuracy, but different selectivity and specificity.

Logical rules - advantages Logical rules, if simple enough, are preferable. • Rules may

Logical rules - advantages Logical rules, if simple enough, are preferable. • Rules may expose limitations of black box solutions. • Only relevant features are used in rules. • Rules may sometimes be more accurate than NN and other CI methods. • Overfitting is easy to control, rules usually have small number of parameters. • Rules forever !? A logical rule about logical rules is: IF AND THEN the number of rules is relatively small the accuracy is sufficiently high. rules may be an optimal choice.

Logical rules - limitations Logical rules are preferred but. . . • Only one

Logical rules - limitations Logical rules are preferred but. . . • Only one class is predicted p(Ci|X, M) = 0 or 1 black-and-white picture may be inappropriate in many applications. • Discontinuous cost function allow only nongradient optimization. • Sets of rules are unstable: small change in the dataset leads to a large change in structure of complex sets of rules. • Reliable crisp rules may reject some cases as unclassified. • Interpretation of crisp rules may be misleading. • Fuzzy rules are not so comprehensible.

Rules - choices Simplicity vs. accuracy. Confidence vs. rejection rate. p++ is a hit;

Rules - choices Simplicity vs. accuracy. Confidence vs. rejection rate. p++ is a hit; p-+ false alarm; p+- is a miss. Accuracy (overall) Error rate A(M) = p+++ p-L (M) = p - + + p + - Rejection rate R(M)=p+r+p-r= 1 -L(M)-A(M) Sensitivity S+(M)= p+|+ = p++ /p+ Specificity S-(M)= p-|- = p-- /p-

Neural networks and rules ~ p(MI|X) 0. 7 Myocardial Infarction Output weights Inputs: -1

Neural networks and rules ~ p(MI|X) 0. 7 Myocardial Infarction Output weights Inputs: -1 65 Sex Age 1 5 3 1 Smoking Pain Elevation Pain Intensity Duration ECG: ST

Knowledge from networks Simplify networks: force most weights to 0, quantize remaining parameters, be

Knowledge from networks Simplify networks: force most weights to 0, quantize remaining parameters, be constructive! • Regularization: mathematical technique improving predictive abilities of the network. • Result: MLP 2 LN neural networks that are equivalent to logical rules.

MLP 2 LN Converts MLP neural networks into a network performing logical operations (LN).

MLP 2 LN Converts MLP neural networks into a network performing logical operations (LN). Input layer Output: one node per class. Aggregation: Linguistic units: better features windows, filters Rule units: threshold logic

Learning dynamics Decision regions shown every 200 training epochs in x 3, x 4

Learning dynamics Decision regions shown every 200 training epochs in x 3, x 4 coordinates; borders are optimally placed with wide margins.

Neurofuzzy systems Fuzzy: m(x)=0, 1 (no/yes) replaced by a degree m(x) [0, 1]. Triangular,

Neurofuzzy systems Fuzzy: m(x)=0, 1 (no/yes) replaced by a degree m(x) [0, 1]. Triangular, trapezoidal, Gaussian. . . MF. M. f-s in many dimensions: Feature Space Mapping (FSM) neurofuzzy system. Neural adaptation, estimation of probability density distribution (PDF) usingle hidden layer network (RBF-like) with nodes realizing separable functions:

Heterogeneous systems Homogenous systems: one type of “building blocks”, same type of decision borders.

Heterogeneous systems Homogenous systems: one type of “building blocks”, same type of decision borders. Ex: neural networks, SVMs, decision trees, k. NNs …. Committees combine many models together, but lead to complex models that are difficult to understand. Discovering simplest class structures, its inductive bias: requires heterogeneous adaptive systems (HAS). Ockham razor: simpler systems are better. HAS examples: NN with many types of neuron transfer functions. k-NN with different distance functions. DT with different types of test criteria.

Ghost. Miner Philosophy Ghost. Miner, data mining tools from our lab. http: //www. fqspl.

Ghost. Miner Philosophy Ghost. Miner, data mining tools from our lab. http: //www. fqspl. com. pl/ghostminer/ • Separate the process of model building and knowledge discovery from model use => Ghost. Miner Developer & Ghost. Miner Analyzer. • There is no free lunch – provide different type of tools for knowledge discovery. Decision tree, neural, neurofuzzy, similarity-based, committees. • Provide tools for visualization of data. • Support the process of knowledge discovery/model building and evaluating, organizing it into projects.

Recurrence of breast cancer Data from: Institute of Oncology, University Medical Center, Ljubljana, Yugoslavia.

Recurrence of breast cancer Data from: Institute of Oncology, University Medical Center, Ljubljana, Yugoslavia. 286 cases, 201 no recurrence (70. 3%), 85 recurrence cases (29. 7%) no-recurrence-events, 40 -49, premeno, 25 -29, 0 -2, ? , 2, left, right_low, yes 9 nominal features: age (9 bins), menopause, tumor-size (12 bins), nodes involved (13 bins), node-caps, degreemalignant (1, 2, 3), breast quad, radiation.

Recurrence of breast cancer Data from: Institute of Oncology, University Medical Center, Ljubljana, Yugoslavia.

Recurrence of breast cancer Data from: Institute of Oncology, University Medical Center, Ljubljana, Yugoslavia. Many systems used, 65 -78% accuracy reported. Single rule: IF (nodes-involved [0, 2] degree-malignant = 3 THEN recurrence, ELSE no-recurrence 76. 2% accuracy, only trivial knowledge in the data: “Highly malignant breast cancer involving many nodes is likely to strike back. ”

Recurrence - comparison. Method 10 x. CV accuracy MLP 2 LN 1 rule SSV

Recurrence - comparison. Method 10 x. CV accuracy MLP 2 LN 1 rule SSV DT stable rules 76. 2 75. 7 1. 0 k-NN, k=10, Canberra 74. 1 1. 2 MLP+backprop. CART DT FSM, Gaussian nodes Naive Bayes 73. 5 9. 4 (Zarndt) 71. 4 5. 0 (Zarndt) 71. 7 6. 8 69. 3 10. 0 (Zarndt) Other decision trees < 70. 0

Breast cancer diagnosis. Data from University of Wisconsin Hospital, Madison, collected by dr. W.

Breast cancer diagnosis. Data from University of Wisconsin Hospital, Madison, collected by dr. W. H. Wolberg. 699 cases, 9 cell features quantized from 1 to 10: clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, mitoses. Tasks: distinguish benign from malignant cases.

Breast cancer rules. Data from University of Wisconsin Hospital, Madison, collected by dr. W.

Breast cancer rules. Data from University of Wisconsin Hospital, Madison, collected by dr. W. H. Wolberg. Simplest rule from MLP 2 LN, large regularization: If uniformity of cell size < 3 Then benign Else malignant Sensitivity=0. 97, Specificity=0. 85 More complex solutions (3 rules) give in 10 CV: Sensitivity =0. 95, Specificity=0. 96, Accuracy=0. 96

Breast cancer comparison. Method 10 x. CV accuracy k-NN, k=3, Manh FSM, neurofuzzy 97.

Breast cancer comparison. Method 10 x. CV accuracy k-NN, k=3, Manh FSM, neurofuzzy 97. 0 2. 1 (GM) 96. 9 1. 4 (GM) Fisher LDA MLP+backprop. LVQ Inc. Net (neural) Naive Bayes SSV DT, 3 crisp rules LDA (linear discriminant) Various decision trees 96. 8 96. 7 (Ster, Dobnikar) 96. 6 (Ster, Dobnikar) 96. 4 2. 1 (GM) 96. 4 96. 0 2. 9 (GM) 96. 0 93. 5 -95. 6

SSV HAS Wisconsin Heterogeneous decision tree that searches not only for logical rules but

SSV HAS Wisconsin Heterogeneous decision tree that searches not only for logical rules but also for prototype-based rules. Single P-rule gives simplest known description of this data: IF ||X-R 303|| < 20. 27 then malignant else benign 18 errors, 97. 4% accuracy. Good prototype for malignant! Simple thresholds, that’s what MDs like the most! Best L 1 O error best 10 CV around C 4. 5 gives SSV without distances: 98. 3% (FSM), 97. 5% (Naïve Bayes + kernel, SVM) 94. 7± 2. 0% 96. 4± 2. 1% Several simple rules of similar accuracy in CV tests exist.

Melanoma skin cancer l l l Collected in the Outpatient Center of Dermatology in

Melanoma skin cancer l l l Collected in the Outpatient Center of Dermatology in Rzeszów, Poland. Four types of Melanoma: benign, blue, suspicious, or malignant. 250 cases, with almost equal class distribution. Each record in the database has 13 attributes: asymmetry, border, color (6), diversity (5). TDS (Total Dermatoscopy Score) - single index Goal: hardware scanner for preliminary diagnosis.

Melanoma results Method Rules Training % Test % MLP 2 LN, crisp rules 4

Melanoma results Method Rules Training % Test % MLP 2 LN, crisp rules 4 98. 0 all 100 SSV Tree, crisp rules 4 97. 5± 0. 3 100 FSM, rectangular f. 7 95. 5± 1. 0 100 knn+ prototype selection 13 97. 5± 0. 0 100 FSM, Gaussian f. 15 93. 7± 1. 0 95± 3. 6 knn k=1, Manh, 2 features -- 97. 4± 0. 3 100 -- 96. 2 LERS, rough rules 21

Antibiotic activity of pyrimidine compounds. Pyrimidines: which compound has stronger antibiotic activity? Common template,

Antibiotic activity of pyrimidine compounds. Pyrimidines: which compound has stronger antibiotic activity? Common template, substitutions added at 3 positions, R 3, R 4 and R 5. 27 features taken into account: polarity, size, hydrogen-bond donor or acceptor, pi-donor or acceptor, polarizability, sigma effect. Pairs of chemicals, 54 features, are compared, which one has higher activity? 2788 cases, 5 -fold crossvalidation tests.

Antibiotic activity - results. Pyrimidines: which compound has stronger antibiotic activity? Mean Spearman's rank

Antibiotic activity - results. Pyrimidines: which compound has stronger antibiotic activity? Mean Spearman's rank correlation coefficient used: -1< rs < +1 Method Rank correlation FSM, 41 Gaussian rules Golem (ILP) Linear regression CART (decision tree) 0. 77± 0. 03 0. 68 0. 65 0. 50

Thyroid screening. Garavan Institute, Sydney, Australia 15 binary, 6 continuous Training: 93+191+3488 Validate: 73+177+3178

Thyroid screening. Garavan Institute, Sydney, Australia 15 binary, 6 continuous Training: 93+191+3488 Validate: 73+177+3178 l l Determine important clinical factors Calculate prob. of each diagnosis. Clinical findings Age sex … … TSH T 4 U T 3 TT 4 TBG Final Hidden diagnoses units Normal Hypothyroid Hyperthyroid

Thyroid – some results. Accuracy of diagnoses obtained with different systems. Method Rules/Features Training

Thyroid – some results. Accuracy of diagnoses obtained with different systems. Method Rules/Features Training % Test % MLP 2 LN optimized 4/6 99. 9 99. 36 CART/SSV Decision Trees 3/5 99. 8 99. 33 Best Backprop MLP -/21 100 98. 5 Naïve Bayes -/- 97. 0 96. 1 k-nearest neighbors -/- - 93. 8

Psychometry MMPI (Minnesota Multiphasic Personality Inventory) psychometric test. Printed forms are scanned or computerized

Psychometry MMPI (Minnesota Multiphasic Personality Inventory) psychometric test. Printed forms are scanned or computerized version of the test is used. • Raw data: 550 questions, ex: I am getting tired quickly: Yes - Don’t know - No • Results are combined into 10 clinical scales and 4 validity scales using fixed coefficients. • Each scale measures tendencies towards hypochondria, schizophrenia, psychopathic deviations, depression, hysteria, paranoia etc.

Psychometry • There is no simple correlation between single values and final diagnosis. •

Psychometry • There is no simple correlation between single values and final diagnosis. • Results are displayed in form of a histogram, called ‘a psychogram’. Interpretation depends on the experience and skill of an expert, takes into account correlations between peaks. Goal: an expert system providing evaluation and interpretation of MMPI tests at an expert level. Problem: agreement between experts only 70% of the time; alternative diagnosis and personality changes over time are important.

Psychometric data 1600 cases for woman, same number for men. 27 classes: norm, psychopathic,

Psychometric data 1600 cases for woman, same number for men. 27 classes: norm, psychopathic, schizophrenia, paranoia, neurosis, mania, simulation, alcoholism, drug addiction, criminal tendencies, abnormal behavior due to. . . Extraction of logical rules: 14 scales = features. Define linguistic variables and use FSM, MLP 2 LN, SSV - giving about 2 -3 rules/class.

Psychometric data Method Data N. rules Accuracy + G x% C 4. 5 ♀

Psychometric data Method Data N. rules Accuracy + G x% C 4. 5 ♀ 55 93. 0 93. 7 ♂ 61 92. 5 93. 1 ♀ 69 95. 4 97. 6 ♂ 98 95. 9 96. 9 FSM 10 -CV for FSM is 82 -85%, for C 4. 5 is 79 -84%. Input uncertainty +Gx around 1. 5% (best ROC) improves FSM results to 90 -92%.

Psychometric Expert Probabilities for different classes. For greater uncertainties more classes are predicted. Fitting

Psychometric Expert Probabilities for different classes. For greater uncertainties more classes are predicted. Fitting the rules to the conditions: typically 3 -5 conditions per rule, Gaussian distributions around measured values that fall into the rule interval are shown in green. Verbal interpretation of each case, rule and scale dependent.

Visualization Probability of classes versus input uncertainty. Detailed input probabilities around the measured values

Visualization Probability of classes versus input uncertainty. Detailed input probabilities around the measured values vs. change in the single scale; changes over time define ‘patients trajectory’. Interactive multidimensional scaling: zooming on the new case to inspect its similarity to other cases.

Conclusions Data understanding is challenging problem. • Classification rules are frequently only the first

Conclusions Data understanding is challenging problem. • Classification rules are frequently only the first step and may not be the best solution. • Visualization is always helpful. • P-rules may be competitive if complex decision borders are required, providing different types of rules. • Understanding of complex objects is possible, although difficult, using adaptive costs and distance as least expensive transformations (action principles in physics). • Great applications are coming!

Challenges Fully automatic universal data analysis systems: press the button and wait for the

Challenges Fully automatic universal data analysis systems: press the button and wait for the truth … • • Discovery of theories rather than data models Integration with image/signal analysis Integration with reasoning in complex domains Combining expert systems with neural networks …. We are slowly getting there. More & more computational intelligence tools (including our own) are available.