Possibility Theory and its applications a retrospective and

Possibility Theory and its applications: a retrospective and prospective view D. Dubois, H. Prade IRIT -CNRS, Université Paul Sabatier 31062 TOULOUSE FRANCE

Outline • • Basic definitions Pioneers Qualitative possibility theory Quantitative possibility theory

Possibility theory is an uncertainty theory devoted to the handling of incomplete information. • similar to probability theory because it is based on setfunctions. • differs by the use of a pair of dual set functions (possibility and necessity measures) instead of only one. • it is not additive and makes sense on ordinal structures. The name "Theory of Possibility" was coined by Zadeh in 1978

The concept of possibility • Feasibility: It is possible to do something (physical) • Plausibility: It is possible that something occurs (epistemic) • Consistency : Compatible with what is known (logical) • Permission: It is allowed to do something (deontic)

POSSIBILITY DISTRIBUTIONS (uncertainty) • S: frame of discernment (set of "states of the world") • x : ill-known description of the current state of affairs taking its value on S • L: Plausibility scale: totally ordered set of plausibility levels ([0, 1], finite chain, integers, . . . ) • A possibility distribution πx attached to x is a mapping from S to L : s, πx(s) L, such that s, πx(s) = 1 (normalization) • Conventions: πx(s) = 0 iff x = s is impossible, totally excluded πx(s) = 1 iff x = s is normal, fully plausible, unsurprizing

EXAMPLE : x = AGE OF PRESIDENT • If I do not know the age of the president, I may have statistics on presidents ages… but generally not, or they may be irrelevant. • partial ignorance : – 70 ≤ x ≤ 80 (sets, intervals) a uniform possibility distribution π(x) = 1 x [70, 80] = 0 otherwise • partial ignorance with preferences : May have reasons to believe that 72 > 71 73 > 70 74 > 75 > 76 > 77

EXAMPLE : x = AGE OF PRESIDENT • Linguistic information described by fuzzy sets: “ he is old ” : π = µOLD • If I bet on president's age: I may come up with a subjective probability ! But this result is enforced by the setting of exchangeable bets (Dutch book argument). Actual information is often poorer.

A possibility distribution is the representation of a state of knowledge: a description of how we think the state of affairs is. • π' more specific than π in the wide sense if and only if π' ≤ π In other words: any value possible for π' should be at least as possible for π that is, π' is more informative than π • COMPLETE KNOWLEDGE : The most specific ones • π(s 0) = 1 ; π(s) = 0 otherwise • IGNORANCE : π(s) = 1, s S

POSSIBILITY AND NECESSITY OF AN EVENT • A possibility distribution on S (the normal values of x) • an event A How confident are we that x A S ? (A) = maxu A π(s); The degree of possibility that x A N(A) = 1 – (Ac)=min u A 1 – π(s) The degree of certainty (necessity) that x A

Comparing the value of a quantity x to a threshold when the value of x is only known to belong to an interval [a, b]. • In this example, the available knowledge is modeled by (x) = 1 if x [a, b], 0 otherwise. • Proposition p = "x > " to be checked • i) a > : then x > is certainly true : N(x > ) = 1. • ii) b < : then x > is certainly false ; N(x > ) = 0. • iii) a ≤ ≤ b: then x > is possibly true or false; N(x > ) = 0; (x > ) = 1.

Basic properties (A) = to what extent at least one element in A is consistent with π (= possible) N(A) = 1 – (Ac) = to what extent no element outside A is possible = to what extent π implies A (A B) = max( (A), (B)); N(A B) = min(N(A), N(B)). Mind that most of the time : (A B) < min( (A), (B)); N(A B) > max(N(A), N(B) Corollary N(A) > 0 (A) = 1

Pioneers of possibility theory • In the 1950’s, G. L. S. Shackle called "degree of potential surprize" of an event its degree of impossibility. • Potential surprize is valued on a disbelief scale, namely a positive interval of the form [0, y*], where y* denotes the absolute rejection of the event to which it is assigned. • The degree of surprize of an event is the degree of surprize of its least surprizing realization. • He introduces a notion of conditional possibility

Pioneers of possibility theory • In his 1973 book, the philosopher David Lewis considers a relation between possible worlds he calls "comparative possibility". • He relates this concept of possibility to a notion of similarity between possible worlds for defining the truth conditions of counterfactual statements. • for events A, B, C, A B C A C B. • The ones and only ordinal counterparts to possibility measures

Pioneers of possibility theory • The philosopher L. J. Cohen considered the problem of • • legal reasoning (1977). "Baconian probabilities" understood as degrees of provability. It is hard to prove someone guilty at the court of law by means of pure statistical arguments. A hypothesis and its negation cannot both have positive "provability" Such degrees of provability coincide with necessity measures.

Pioneers of possibility theory • Zadeh (1978) proposed an interpretation of membership functions of fuzzy sets as possibility distributions encoding flexible constraints induced by natural language statements. • relationship between possibility and probability: what is probable must preliminarily be possible. • refers to the idea of graded feasibility ("degrees of ease") rather than to the epistemic notion of plausibility. • the key axiom of "maxitivity" for possibility measures is highlighted (also for fuzzy events).

Qualitative vs. quantitative possibility theories • Qualitative: – comparative: A complete pre-ordering ≥π on U A well-ordered partition of U: E 1 > E 2 > … > En – absolute: πx(s) L = finite chain, complete lattice. . . • Quantitative: πx(s) [0, 1], integers. . . One must indicate where the numbers come from. All theories agree on the fundamental maxitivity axiom (A B) = max( (A), (B)) Theories diverge on the conditioning operation

Ordinal possibilistic conditioning • A Bayesian-like equation: A) = min( A), A) is the maximal solution to this equation. (B | A) = 1 if A, B ≠ Ø, (A) = (A B) > 0 = (A B) if (A) > (A B) N(B | A) = 1 – (Bc| A) • Independence (B | A) = (B) implies A) = min( ), Not the converse!!!!

QUALITATIVE POSSIBILISTIC REASONING • The set of states of affairs is partitioned via π into a totally ordered set of clusters of equally plausible states E 1 (normal worlds) > E 2 >. . . En+1 (impossible worlds) • ASSUMPTION: the current situation is normal. By default the state of affairs is in E 1 • N(A) > 0 iff (A) > (Ac) iff A is true in all the normal situations Then, A is accepted as an expected truth • Accepted events are closed under deduction

A CALCULUS OF PLAUSIBLE INFERENCE (B) ≥ (C) means « Comparing propositions on the basis of their most normal models » • ASSUMPTION for computing (B): the current situation is the most normal where B is true. • PLAUSIBLE REASONING = “ reasoning as if the current situation were normal” and jumping to accepted conclusions obtained from the normality assumption. • DIFFERENT FROM PROBABILISTIC REASONING BASED ON AVERAGING

ACCEPTANCE IS DEFEASIBLE • If B is learned to be true, then the normal situations become the most plausible ones in B, and the accepted beliefs are revised accordingly • Accepting A in the context where B is true: (A B) > (Ac B) iff N(A | B) > 0 (conditioning) • One may have N(A) > 0 , N(Ac | B) > 0 : non-monotony

PLAUSIBLE INFERENCE WITH A POSSIBILITY DISTRIBUTION Given a non-dogmatic possibility distribution π on S (π(s) > 0, s) Propositions A, and B • A =π B iff (A B) > (A Bc) It means that B is true in the most plausible worlds where A is true • This is a form of inference first proposed by Shoham in nonmonotonic reasoning

PLAUSIBLE INFERENCE WITH A POSSIBILITY DISTRIBUTION (in A)

Example (continued) • Pieces of knowledge like ∆ = {b f, p b, p ¬f} can be expressed by constraints (b f) > ( b ¬f) (p b) > (p ¬b) (p ¬f) > (p f) • the minimally specific π* ranks normal situations first: ¬p b f, ¬p ¬b • then abnormal situations: ¬f b • Last, totally absurd situations f p , ¬b p

Example (back to possibilistic logic) = material implication • Ranking of rules: b f has less priority that others according to *: N*(b f ) = N*(p b) > N*(b f) • Possibilistic base : K = {(b f ), (p b ), (p ¬f )}, with <

Applications of qualitative possibility theory • Exception-tolerant Reasoning in rule bases • Belief revision and inconsistency handling in deductive knowledge bases • Handling priority in constraint-based reasoning • Decision-making under uncertainty with qualitative criteria (scheduling) • Abductive reasoning for diagnosis under poor causal knowledge (satellite faults, car engine testbenches)

ABSOLUTE APPROACH TO QUALITATIVE DECISION • • A set of states S; A set of consequences X. A decision = a mapping f from S to X f(s) is the consequence of decision f when the state is known to be s. • Problem : rank-order the set of decisions in XS when the state is ill-known and there is a utility function on X. • This is SAVAGE framework.

ABSOLUTE APPROACH TO QUALITATIVE DECISION • Uncertainty on states is possibilistic a function π: S L L is a totally ordered plausibility scale • Preference on consequences: a qualitative utility function µ: X U – µ(x) = 0 – µ(y) > µ(x) – µ(x) = 1 totally rejected consequence y preferred to x preferred consequence

Possibilistic decision criteria • Qualitative pessimistic utility (Whalen): UPES(f) = mins S max(n(π(s)), µ(f(s))) where n is the order-reversing map of V – Low utility : plausible state with bad consequences • Qualitative optimistic utility (Yager): UOPT(f) = maxs S min(π(s), µ(f(s))) – High utility: plausible states with good consequences

The pessimistic and optimistic utilities are well-known fuzzy pattern-matching indices • in fuzzy expert systems: – µ = membership function of rule condition – π = imprecision of input fact • in fuzzy databases – µ = membership function of query – π = distribution of stored imprecise data • in pattern recognition – µ = membership function of attribute template – π = distribution of an ill-known object attribute

Assumption: plausibility and preference scales L and U are commensurate • There exists a common scale V that contains both L and U, so that confidence and uncertainty levels can be compared. – (certainty equivalent of a lottery) • If only a subset E of plausible states is known – π = E – UPES(f) = mins E µ(f(s)) (utility of the worst consequence in E) criterion of Wald under ignorance – UOPT(f)= maxs E µ(f(s))

On a linear state space

Pessimistic qualitative utility of binary acts x. Ay, with µ(x) > µ(y): • x. Ay (s) = x if A occurs = y if its complement Ac occurs UPES(x. Ay) = median {µ(x), N(A), µ(y)} • Interpretation: If the agent is sure enough of A, it is as if the consequence is x: UPES(f) = µF(x) If he is not sure about A it is as if the consequence is y: UPES(f) = µF(y) Otherwise, utility reflects certainty: UPES(f) = N(A) • WITH UOPT(f) : replace N(A) by (A)

Representation theorem for pessimistic possibilistic criteria • Suppose the preference relation a on acts obeys the following properties: • • • (XS, a) is a complete preorder. there are two acts such that f a g. A, f, x, y constant, x a y x. Af y. Af if f >a h and g >a h imply f g >a h if x is constant, h >a x and h >a g imply h >a x g then there exists a finite chain L, an L-valued necessity measure on S and an L-valued utility function u, such that a is representable by the pessimistic possibilistic criterion UPES(f).

Merits and limitations of qualitative decision theory • Provides a foundation for possibility theory • Possibility theory is justified by observing how a decision-maker ranks acts • Applies to one-shot decisions (no compensations/ accumulation effects in repeated decision steps) • Presupposes that consecutive qualitative value levels are distant from each other (negligibility effects)

Quantitative possibility theory • Membership functions of fuzzy sets – Natural language descriptions pertaining to numerical universes (fuzzy numbers) – Results of fuzzy clustering Semantics: metrics, proximity to prototypes • Upper probability bound – Random experiments with imprecise outcomes – Consonant approximations of convex probability sets Semantics: frequentist, subjectivist (gambles). . .

Quantitative possibility theory • Orders of magnitude of very small probabilities degrees of impossibility k(A) ranging on integers k(A) = n iff P(A) = en • Likelihood functions (P(A| x), where x varies) behave like possibility distributions P(A| B) ≤ maxx B P(A| x)

POSSIBILITY AS UPPER PROBABILITY • Given a numerical possibility distribution , define A} P( ) = {Probabilities P | P(A) ≤ (A) for all • Then, generally it holds that (A) = sup {P(A) | P P( )} N(A) = inf {P(A) | P P( )} • So is a faithful representation of a family of probability measures.

From confidence sets to possibility distributions Consider a nested family of sets E 1 E 2 … En a set of positive numbers a 1 …an in [0, 1] and the family of probability functions P = {P | P(Ei) ≥ ai for all i}. P is always representable by means of a possibility measure. Its possibility distribution is precisely πx = mini max(µEi, 1 – ai)

Random set view • Let mi = i – i+1 then m 1 +… + mn = 1 A basic probability assignment (SHAFER) • π(s) = ∑i: s Ai mi (one point-coverage function) • Only in the consonant case can m be recalculated from π

CONDITIONAL POSSIBILITY MEASURES • A Coxian axiom (A C) = (A |C)* (C), with * = product Then: (A |C) = (A C)/ (C) N(A| C) = 1 – (Ac | C) Dempster rule of conditioning (preserves s-maxitivity) For the revision of possibility distributions: minimal change of when N(C) = 1. It improves the state of information (reduction of focal elements)

Bayesian possibilistic conditioning (A |b C) = sup{P(A|C), P ≤ , P(C) > 0} N(A |b C) = inf{P(A|C), P ≤ , P(C) > 0} It is still a possibility measure π(s |b C) = π(s) max(1, 1 /( π(s) + N(C))) It can be shown that: (A |b C) = (A C)/ ( (A C) + N(Ac C)) N(A|b C) = N(A C) / (N(A C) + (Ac C)) = 1 – (Ac |b C) For inference from generic knowledge based on observations

Possibility-Probability transformations • Why ? – fusion of heterogeneous data – decision-making : betting according to a possibility distribution leads to probability. – Extraction of a representative value – Simplified non-parametric imprecise probabilistic models

Elementary forms of probability-possibility transformations exist for a long time • POSS PROB: Laplace indifference principle “ All that is equipossible is equiprobable ” = changing a uniform possibility distribution into a uniform probability distribution • PROB POSS: Confidence intervals Replacing a probability distribution by an interval A with a confidence level c. – It defines a possibility distribution – π(x) = 1 if x A, = 1 – c if x A

Possibility-Probability transformations : BASIC PRINCIPLES • • • Possibility probability consistency: P ≤ Preserving the ordering of events : P(A) ≥ P(B) (A) ≥ (B) or elementary events only (x) > (x') if and only if p(x) > p(x') (order preservation) Informational criteria: from to P: Preservation of symmetries (Shapley value rather than maximal entropy) from P to : optimize information content (Maximization or minimisation of specificity

From OBJECTIVE probability to possibility : • Rationale : given a probability p, try and preserve as much information as possible • Select a most specific element of the set PI(P) = { : ≥ P} of possibility measures PI dominating P such that (x) > (x') iff p(x) > p(x') • may be weakened into : p(x) > p(x') implies (x) > (x') • The result is i = j=i, …n pi (case of no ties)

From probability to possibility : Continuous case • The possibility distribution obtained by transforming p encodes then family of confidence intervals around the mode of p. • The -cut of is the (1 - )-confidence interval of p • The optimal symmetric transform of the uniform probability distribution is the triangular fuzzy number • The symmetric triangular fuzzy number (STFN) is a covering approximation of any probability with unimodal symmetric density p with the same mode. • In other words the -cut of a STFN contains the (1 - )confidence interval of any such p.

From probability to possibility : Continuous case • IL = {x, p(x) ≥ } = [a. L, a. L+ L] is the interval of length L with maximal probability • The most specific possibility distribution dominating p is π such that L > 0, π(a. L) = π(a. L+ L) = 1 – P(IL).

Possibilistic view of probabilistic inequalities • Chebyshev inequality defines a possibility distribution that dominates any density with given mean and variance. • The symmetric triangular fuzzy number (STFN) defines a possibility distribution that optimally dominates any symmetric density with given mode and bounded support.

From possibility to probability • Idea (Kaufmann, Yager, Chanas): –Pick a number in [0, 1] at random –Pick an element at random in the -cut of π. a generalized Laplacean indifference principle : change alpha-cuts into uniform probability distributions. • Rationale : minimise arbitrariness by preserving the symmetry properties of the representation. • The resulting probability distribution is: • The centre of gravity of the polyhedron P( • The pignistic transformation of belief functions (Smets) • The Shapley value of the unanimity game N in game theory.

SUBJECTIVE POSSIBILITY DISTRIBUTIONS • Starting point : exploit the betting approach to subjective probability • A critique: The agent is forced to be additive by the rules of exchangeable bets. – For instance, the agent provides a uniform probability distribution on a finite set whether (s)he knows nothing about the concerned phenomenon, or if (s)he knows the concerned phenomenon is purely random. • Idea : It is assumed that a subjective probability supplied by an agent is only a trace of the agent's belief.

SUBJECTIVE POSSIBILITY DISTRIBUTIONS • Assumption 1: Beliefs can be modelled by belief functions – (masses m(A) summing to 1 assigned to subsets A). • Assumption 2: The agent uses a probability function induced by his or her beliefs, using the pignistic transformation (Smets, 1990) or Shapley value. • Method : reconstruct the underlying belief function from the probability provided by the agent by choosing among the isopignistic ones.

SUBJECTIVE POSSIBILITY DISTRIBUTIONS – There are clearly several belief functions with a prescribed Shapley value. • Consider the least informative of those, in the sense of a non-specificity index (expected cardinality of the random set) I(m) = ∑ m(A) card(A). • RESULT : The least informative belief function whose Shapley value is p is unique and consonant.

SUBJECTIVE POSSIBILITY DISTRIBUTIONS • The least specific belief function in the sense of maximizing I(m) is characterized by i = j=1, n min(pj, pi). • It is a probability-possibility transformation, previously suggested in 1983: This is the unique possibility distribution whose Shapley value is p. • It gives results that are less specific than the confidence interval approach to objective probability.

Applications of quantitative possibility • Representing incomplete probabilistic data for uncertainty propagation in computations • (but fuzzy interval analysis based on the extension principle differs from conservative probabilistic risk analysis) • Systematizing some statistical methods (confidence intervals, likelihood functions, probabilistic inequalities) • Defuzzification based on Choquet integral (linear with fuzzy number addition)

Applications of quantitative possibility • Uncertain reasoning : Possibilistic nets are a counterpart to Bayesian nets that copes with incomplete data. Similar algorithmic properties under Dempster conditioning (Kruse team) • Data fusion : well suited for merging heterogeneous information on numerical data (linguistic, statistics, confidence intervals) (Bloch) • Risk analysis : uncertainty propagation using fuzzy arithmetics, and random interval arithmetics when statistical data is incomplete (Lodwick, Ferson) • Non-parametric conservative modelling of imprecision in measurements (Mauris)

Perspectives Quantitative possibility is not as well understood as probability theory. • Objective vs. subjective possibility (a la De Finetti) • How to use possibilistic conditioning in inference tasks ? • Bridge the gap with statistics and the confidence interval literature (Fisher, likelihood reasoning) • Higher-order modes of fuzzy intervals (variance, …) and links with fuzzy random variables • Quantitative possibilistic expectations : decision-theoretic characterisation ?

Conclusion • Possibility theory is a simple and versatile tool for modeling uncertainty • A unifying framework for modeling and merging linguistic knowledge and statistical data • Useful to account for missing information in reasoning tasks and risk analysis • A bridge between logic-based AI and probabilistic reasoning

Properties of inference |= • A |=π A if A ≠ Ø (restricted reflexivity) • if A ≠ Ø, then A |=π Ø never holds (consistency preservation) • The set {B: A |= π B} is deductively closed -If A B and C |=π A then C |=π B (right weakening rule RW) -If A |=π B and A |=π Cthen A |=π B C (Right AND)

REPRESENTATION THEOREM FOR POSSIBILISTIC ENTAILMENT • Let |= be a consequence relation on 2 S x 2 S • Define an induced partial relation on subsets as A > B iff A B |= Bc for A ≠ • Theorem: If |= satisfies restricted reflexivity, right weakening, rational monotony, Right AND and Left OR, then A > B is the strict part of a possibility relation on events. So a consequence relation satisfying the above properties is representable by possibilistic inference, and induces a complete plausibility preordering on the states.

A POSSIBILISTIC APPROACH TO MODELING RULES • A generic rule « if A then B » is modelled by (A B) > (Ac B). • This is a constraint that delimits a set of possibility distributions on the set of interpretations of the language • Applying the minimal specificity principle: B). (A B) = (A Bc ) = (Ac Bc ) > (Ac

MODELLING A SET OF DEFAULT RULES as a POSSIBILITY DISTRIBUTION • ∆ = {Ai Bi, i = 1, n} • ∆ defines a set of constraints on possibility distributions (Ai Bi) > (Ai ¬Bi), i = 1, …n • (∆) = set of feasible π's with respect to ∆ • One may compute * : the least specific possibility distribution in (∆)

Plausible inference from a set of default rules What « ∆ implies A B » means • Cautious inference ∆ = A B iff For all (∆), (A B) > (Ac B). • Possibilistic inference ∆ =* A B iff *(A B) > *(Ac B) for the least specific possibility measure in (∆). Leads to a stratification of ∆ according to N*(Ac B)

Possibilistic logic • A possibilistic knowledge base is an ordered set of propositional or 1 st order formulas pi • K = {(pi i), i = 1, n} where i > 0 is the level of priority or validity of pi i = 1 means certainty. i = 0 means ignorance • Captures the idea of uncertain knowledge in an ordinal setting

Possibilistic logic • Axiomatization: All axioms of classical logic with weight 1 Weighted modus ponens {(p ), (¬p q )} |- (q min( , )) OLD! Goes back to Aristotle school Idea: the validity of a chain of uncertain deductions is the validity of its weakest link Syntactic inference K |-(p ) is well-defined

Possibilistic logic • Inconsistency becomes a graded notion inc(K) = sup{ , K |- ( , )} • Refutation and resolution methods extend K |- (p ) iff K {( p 1)} |- ( , ) • Inference with a partially inconsistent knowledge base becomes non-trivial and nonmonotonic K |-nt p iff K |- (p ) and > inc(K)

Semantics of possibilistic logic • A weighted formula has a fuzzy set of models. • If A = [p] is the set of models of p (subset of S), • |-(p ) means N(A) ≥ The least specific possibility distribution induced by |-(p ) is: π(p )(s) = max(µA(s), 1 – ) = 1 if p is true in state s = 1 – if p is false in state s

Semantics of possibilistic logic • The fuzzy set of models of K is the intersection of the fuzzy sets of models of {(pi i), i = 1, n} • πK(s)= mini=1, n {1 – i | s [pi]} determined by the highest priority formula violated by s • The p. d. πK is the least informed state of partial knowledge compatible with K

Soundness and completeness • Monotonic semantic entailment follows Zadeh’s entailment principle K |= (p, ) stands for πK ≤ π(p a) Theorem: K |- (p, ) iff K |= (p ) • For the non-trivial inference under inconsistency: {(p 1)} K |-nt q iff (q p) > (¬q p)

Possibilistic vs. fuzzy logics • Possibilistic logic – Formulas are Boolean – Truth is 2 -valued – Weighted formulas have fuzzy sets of models – Validity is many-valued – degrees of validity are not compositional except for conjunctions – Represents uncertainty • Fuzzy logic (Pavelka) – Formulas are non-Boolean – Truth is many-valued – Weighted formulas have crisp sets of models (cuts) – Validity is Boolean – degrees of truth are compositional – represents real functions by means of logical formulas

Example: IF BIRD THEN FLY; IF PENGUIN THEN BIRD; IF PENGUIN THEN NOT-FLY • K = {b f, p b, p ¬f} = material implication • K {b} |- f; K {p} |- contradiction • using possibilistic logic: < min( , ) K = {(b f ), (p b ), (p ¬f )} then K {(b, 1)} |- (f ) and K {(b, 1)} |-nt f • Inc(K {(p, 1), (b, 1)} = • K {(p, 1), (b, 1)} |- (¬f, min( , )) • Hence K {(p, 1), (b, 1)} |-nt ¬f