Statistical Decision Theory Abraham Wald 1902 1950 Walds

Statistical Decision Theory Abraham Wald (1902 - 1950) • Wald’s test • Rigorous proof of the consistency of MLE “Note on the consistency of the maximum likelihood estimate”, Ann. Math. Statist. , 20, 595 -601.

Statistical Decision Theory Unlike classical statistics which is only directed towards the use of sampling information in making inferences about unknown numerical quantities, an attempt in decision theory is made to combine the sampling information with a knowledge of the consequences of our decisions. A major use of statistical inference is its application to decision making under uncertainty, say Parameter and Hypothesis Testing. estimation

Three elements in SDT • State of Nature: θ, some unknown quantities, say parameters. • Decision Space D : A space of all possible values of decisions/actions/rules/estimators • Loss function L(θ, d(X)): – a non-negative function on Θ x D. – a measure of how much we lose by choosing action d when θ is used. – In estimation, a measure of the accuracy of estimators d of θ.

For example, θ = 0 means “nuclear warhead is NOT headed to UBC” θ = 1 means “nuclear warhead is headed to UBC” D ={0, 1}={Stay in Vancouver, Leave L(θ, d): } L(0, 0) = 0 L(0, 1) = cost of moving L(1, 1) = cost of moving + cost of belongings we cannot move L(1, 0) = loss of belongings +

Common loss functions • Univariate – L 1 = |θ – d(x)| – L 2 = (θ – d(x))2 (absolute error loss) (squared error loss) • Multivariate – (Generalized) Euclidean norm: [θ – d(x)]T Q [θ – d(x)], where Q is positive definite • More generally, – Non-decreasing functions of L 1 or Euclidean norm

Loss function, L(d(X), θ), is random Frequentist Bayesian Risk E or. LPosterior risk (in Frequentist) (in Bayesian)

Estimator Comparison The Risk principle: the estimator d 1(X) is better than another estimator d 2(X) in the sense of risk if R(θ, d 1) ≤ R(θ, d 2) for all θ, with strict inequality for some θ. Best estimator (uniformly minimum risk estimator) d*(X) = arg min R(θ, d(X)) for all θ d However, in general, it does not exist Class of all estimators

Too Large Shrink the class of estimators. Then find the best estimator in this class. Smaller class of estimators Class of estimators For instance, only consider mean-unbiased estimators. In particular, UMVUE is the best unbiased estimator when L 2 is used. I am the best!!

Notice that the risk depends on θ. So, the risks of two estimators often cross each other. This is another possibility that the best estimator does not exist. Weaken the optimality criterion by considering the maximum value of the risk over all θ. Then choose the estimator with the smallest maximum risk. The best estimator according to this minimax principle is called minimax estimator. R(θ, d) d 1 Winner d 2 Loser θ

Alternatively, we can find the best estimator by minimizing the average risk with respect to a prior π of θ in the Bayesian framework. Given a prior π of θ, the average risk of the estimator d(X) defined by is called Bayes risk, rπ(d) The estimator having the smallest Bayes risk with a specific prior π is called the Bayes estimator (with π).

Under Bayes risk principle Winner Loser R(θ, d 1) R(θ, d 2) d 1 d 2 θ π(θ)

In general, it is not easy to find the Bayes estimator by minimizing the Bayes risk. However, if the Bayes risk of the Bayes estimator is finite, then the estimator minimizing the posterior risk and the Bayes estimator are the same.

Some examples for finding the Bayes estimator (2) Absolute error loss: min E[|Θ-d| |x] d The minimizer is med[Θ|x], i. e. the posterior median. (3) Linear error loss: L(θ, d) = K 0(θ-d) if θ-d>=0 and = K 1(d-θ) if θ-d<0 The K 0/(K 0+K 1) th quantile of the posterior is the Bayes estimator of θ.

Relationship between minimax and Bayes estimators Denote by dπ the Bayes estimator with respect to π. If the Bayes risk of dπ is equal to the maximum risk of dπ, i. e. Then the Bayes estimator dπ is minimax. In particular, if the Bayes estimator has a constant risk, then it is minimax.

Problems for the risk measure: The risk measure is too sensitive to the choice of loss function. All estimators are assumed to have finite risks. So, in general, the risk measure fails to use in problems with heavy tails or outliers.

Other measures: (1) Pitman measure of closeness, PMC: θ d 1 d 2 P θ( )≥ 1/2 d 1 is Pitman-closer to θ than d 2 if the above condition holds for all θ.

Other measures: (2) Universal domination, u. d. : d 1(X) is said to universally dominate d 2(X) if, for all nondecreasing functions h and all θ, Eθ[h(|| d 1(X) - θ ||Q)] ≤ Eθ[h(|| d 2(X) - θ ||Q)]. (3) Stochastic domination, s. d. : d 1(X) is said to stochastically dominate d 2(X) if, for every c > 0 and all θ, Pθ[|| d 1(X) - θ ||Q≤ c] ≥ Pθ[|| d 2(X) - θ ||Q≤ c].

Problems: (1) Pitman measure of closeness, PMC: d 3 θ d 1 d 2

Problems: (2) Universal domination, u. d. : d 1(X) is said to universally dominate d 2(X) if, for all nondecreasing functions h and all θ, Eθ[h(|| d 1(X) - θ ||Q)] ≤ Eθ[h(|| d 2(X) - θ ||Q)]. Expectation is a linear operator h(t) = at + b, a>0

For all nondecreasing functions h, E Tθ[h(|| d 1(X) - θ ||Q)] where T has a property that T[h(y)] =h[T(y)]

Thank You!!