Generalized pairwise comparisons of prioritized outcomes Marc Buyse

Generalized pairwise comparisons of prioritized outcomes Marc Buyse, Sc. D marc. buyse@iddi. com

Outline • The Wilcoxon test, and generalizations • Generalized pairwise comparisons • Universal measure of treatment effect • An example • Conclusions

General Setup Eligible subjects R Treatment (T ) Control (C ) Let Xi be the continuous outcome of Let Yj be the continuous outcome of the i th subject in T (i = 1, … , n ) the j th subject in C (j = 1, … , m )

The Mann-Whitney form of the Wilcoxon test The Wilcoxon test statistic can be derived from all possible pairs of subjects, one from T and one from C. Let Wilcoxon-Mann-Whitney test statistic. W

Gehan generalized the Wilcoxon test The Wilcoxon test can be generalized to the case of censored outcomes. Letting and denote censored observations, the pairwise comparison indicator is now

First, generalize the test further for a single outcome measure Now let Xi and Yj be observed outcomes for ANY outcome measure (continuous, time to event, binary, categorical, …) Xi pairwise comparison Yj favors C favors T (favorable) neutral (unfavorable) uninformative

Binary outcome measure Pairwise comparison Pair is Xi = 1, Yj = 0 Xi = 1, Yj = 1 or Xi = 0, Yj = 0 Xi = 0, Yj = 1 Xi or. Yj missing favorable neutral unfavorable uninformative

Continuous outcome measure * Pairwise comparison Pair is Xi Yj > * Xi Yj ≤ * Xi Yj < * Xi or. Yj missing favorable neutral unfavorable uninformative chosen to reflect clinical relevance; = 0 is Wilcoxon test

Time to event outcome measure * Pairwise comparison Pair is Xi Yj > * or Yj > * Xi Yj ≤ * Xi Yj < * or Xi < * favorable unfavorable otherwise uninformative neutral chosen to reflect clinical relevance; = 0 is Gehan test

Generalized pairwise comparisons Let Xi and Yj be VECTORS of observed outcomes for any number of occasions of a single outcome measure, or any number of outcome measures. We assume that the occasions and/or the outcome measures can be prioritized.

Next, generalize the test to prioritized repeated observations of a single outcome measure… Occasion with higher priority Occasion with lower priority Pair is favorable ignored favorable unfavorable ignored unfavorable neutral ignored neutral uninformative favorable uninformative unfavorable uninformative neutral uninformative

Last, generalize the test to several prioritized outcome measures… Outcome with higher priority Outcome with lower priority Pair is favorable ignored favorable unfavorable ignored unfavorable neutral ignored neutral uninformative favorable uninformative unfavorable uninformative neutral uninformative

A general measure of treatment effect Extend the previous definition of Uij U is the difference between the proportion of favorable pairs and the proportion of unfavorable pairs. We call this general measure of treatment effect the « proportion in favor of treatment » ( ).

The proportion in favor of treatment ( ) is a linear transformation of the probabilistic index, P (X > Y ) : Situation P (X > Y ) T uniformly worse than C T no different from C T uniformly better than C 0 1 0. 5 0 1 +1

The proportion in favor of treatment ( ) For a binary variable, is equal to the difference in proportions For a continuous variable , is related to the effect size d For a time-to-event variable, is related to the hazard ratio and the proportion of informative pairs f

A re-randomization test for The test statistic U (or ) no longer has known expectation and variance. An empirical distribution of can be obtained through rerandomization. Tests of significance and confidence intervals follow suit.

Cumulative proportions for prioritized outcomes The proportion in favor of treatment for the l th prioritized outcome (l = 1, . . . , L ) is given by and the cumulative proportion is

Early breast cancer 3, 222 patients after curative resection of HER 2+ breast cancer R 1, 075 Taxotere Carboplatin Herceptin (TCH) 1, 074 Adriamycin Cyclophosphamide Taxotere Herceptin (ACTH) 1, 073 Adriamycin Cyclophosphamide Taxotere (ACT) two combination chemotherapies plus herceptin standard chemotherapy main efficacy endpoints disease recurrence or death main safety endpoint congestive heart failure

Disease-free survival 93% 92% 88% 87% 86% 84% 81% 87% 81% 78% 75%

Prioritized outcomes Priority Outcomes 1 Time to death from any cause 2 Time to second malignancy 3 Time to distant metastases 4 Time to locoregional relapse 5 Time to congestive heart failure

Prioritized outcomes GENERALIZED PAIRWISE COMPARISONS ACTH vs. ACT * Difference in ACTH better ACT better Cumulative P-value * Time to death 4. 97% 2. 87% 2. 09% 0. 006 Time to second tumor 1. 20% 1. 21% 2. 08% 0. 022 Time to distant mets 7. 03% 3. 46% 5. 66% < 0. 001 Time to relapse 1. 82% 1. 01% 6. 47% < 0. 001 Time to CHF 0. 62% 1. 83% 5. 25% < 0. 001 Unadjusted for multiplicity

Prioritized outcomes GENERALIZED PAIRWISE COMPARISONS TCH vs. ACT * Difference in TCH better ACT better Cumulative P-value * Time to death 5. 05% 3. 49% 1. 56% 0. 059 Time to second tumor 1. 22% 0. 72% 2. 05% 0. 029 Time to distant mets 7. 18% 3. 96% 5. 26% < 0. 001 Time to relapse 1. 75% 1. 47% 5. 55% < 0. 001 Time to CHF 0. 63% 0. 71% 5. 47% < 0. 001 Unadjusted for multiplicity

Prioritized outcomes GENERALIZED PAIRWISE COMPARISONS TCH vs. ACTH * Difference in TCH better ACTH better Cumulative P-value * Time to death 3. 04% 3. 57% -0. 53% 0. 46 Time to second tumor 1. 29% 0. 74% 0. 02% 0. 98 Time to distant mets 3. 84% 4. 36% -0. 50% 0. 68 Time to relapse 1. 04% 1. 63% -1. 09% 0. 40 Time to CHF 1. 97% 0. 74% 0. 14% 0. 93 Unadjusted for multiplicity

Generalized Pairwise Comparisons 1. are equivalent to well-known non-parametric tests in simple cases 2. allow testing for differences thought to be clinically relevant 3. allow any number of prioritized outcomes of any type to be analyzed simultaneously 4. naturally lead to a universal measure of treatment effect, , which is directly related to classical measures of treatment effect (difference in proportions, effect size or hazard ratio)

References Buyse M. Generalized pairwise comparisons for prioritized outcomes in the two-sample problem. Statistics in Medicine 29: 3245 -57, 2010. Buyse M. Reformulating the hazard ratio to enhance communication with clinical investigators. Clinical Trials 5: 641 -2, 2008.