Detecting sparse signals and sparse connectivity in scalespace

  • Slides: 43
Download presentation
Detecting sparse signals and sparse connectivity in scale-space, with applications to the 'bubbles' task

Detecting sparse signals and sparse connectivity in scale-space, with applications to the 'bubbles' task in an f. MRI experiment Keith Worsley, Nicholas Chamandy, Mc. Gill Jonathan Taylor, Stanford and Université de Montréal Robert Adler, Technion Philippe Schyns, Fraser Smith, Glasgow Frédéric Gosselin, Université de Montréal Arnaud Charil, Montreal Neurological Institute

Astrophysics

Astrophysics

Sloan Digital Sky Survey, data release 6, Aug. ‘ 07

Sloan Digital Sky Survey, data release 6, Aug. ‘ 07

What is ‘bubbles’?

What is ‘bubbles’?

Nature (2005)

Nature (2005)

Subject is shown one of 40 faces chosen at random … Happy Sad Fearful

Subject is shown one of 40 faces chosen at random … Happy Sad Fearful Neutral

… but face is only revealed through random ‘bubbles’ l First trial: “Sad” expression

… but face is only revealed through random ‘bubbles’ l First trial: “Sad” expression Sad l l 75 random Smoothed by a bubble centres Gaussian ‘bubble’ Subject is asked the expression: Response: What the subject sees “Neutral” Incorrect

Your turn … l Trial 2 Subject response: “Fearful” CORRECT

Your turn … l Trial 2 Subject response: “Fearful” CORRECT

Your turn … l Trial 3 Subject response: “Happy” INCORRECT (Fearful)

Your turn … l Trial 3 Subject response: “Happy” INCORRECT (Fearful)

Your turn … l Trial 4 Subject response: “Happy” CORRECT

Your turn … l Trial 4 Subject response: “Happy” CORRECT

Your turn … l Trial 5 Subject response: “Fearful” CORRECT

Your turn … l Trial 5 Subject response: “Fearful” CORRECT

Your turn … l Trial 6 Subject response: “Sad” CORRECT

Your turn … l Trial 6 Subject response: “Sad” CORRECT

Your turn … l Trial 7 Subject response: “Happy” CORRECT

Your turn … l Trial 7 Subject response: “Happy” CORRECT

Your turn … l Trial 8 Subject response: “Neutral” CORRECT

Your turn … l Trial 8 Subject response: “Neutral” CORRECT

Your turn … l Trial 9 Subject response: “Happy” CORRECT

Your turn … l Trial 9 Subject response: “Happy” CORRECT

Your turn … l Trial 3000 Subject response: “Happy” INCORRECT (Fearful)

Your turn … l Trial 3000 Subject response: “Happy” INCORRECT (Fearful)

Bubbles analysis l 1 E. g. Fearful (3000/4=750 trials): + 2 + 3 +

Bubbles analysis l 1 E. g. Fearful (3000/4=750 trials): + 2 + 3 + Trial 4 + 5 + 6 + 7 + … + 750 = Sum Correct trials Proportion of correct bubbles =(sum correct bubbles) /(sum all bubbles) Thresholded at proportion of correct trials=0. 68, scaled to [0, 1] Use this as a bubble mask

Results l Mask average face Happy l Sad Fearful But are these features real

Results l Mask average face Happy l Sad Fearful But are these features real or just noise? l Need statistics … Neutral

Statistical analysis Correlate bubbles with response (correct = 1, incorrect = 0), separately for

Statistical analysis Correlate bubbles with response (correct = 1, incorrect = 0), separately for each expression Equivalent to 2 -sample Z-statistic for correct vs. incorrect bubbles, e. g. Fearful: l l Trial 1 2 0 l 1 3 4 5 1 Response 0 1 6 1 7 … 1 … 750 1 Very similar to the proportion of correct bubbles: Z~N(0, 1) statistic

Results l Thresholded at Z=1. 64 (P=0. 05) Happy l Average face Sad Fearful

Results l Thresholded at Z=1. 64 (P=0. 05) Happy l Average face Sad Fearful Multiple comparisons correction? l Need random field theory … Neutral Z~N(0, 1) statistic

Euler Characteristic Heuristic Euler characteristic (EC) = #blobs - #holes (in 2 D) Excursion

Euler Characteristic Heuristic Euler characteristic (EC) = #blobs - #holes (in 2 D) Excursion set Xt = {s: Z(s) ≥ t}, e. g. for neutral face: EC = 0 30 20 0 -7 -11 13 14 9 0 Heuristic: At high thresholds t, the holes disappear, EC ~ 1 or 0, E(EC) ~ P(max Z ≥ t). Observed Expected 10 EC(Xt) 1 0 -10 -20 -4 -3 -2 -1 0 1 Threshold, t 2 • Exact expression for E(EC) for all thresholds, • E(EC) ~ P(max Z ≥ t) is 3 4 extremely accurate.

The» result ¡ ¢ If Z (s) N(0; 1) is an isotropic Gaussian random

The» result ¡ ¢ If Z (s) N(0; 1) is an isotropic Gaussian random ¯eld, s µ ¶ with ¸ 2 I £ = V @Z , 2 2 @s P max 2 Z (s) s S Lipschitz-Killing curvatures of S (=Resels(S)×c) L ¸ (S) 0 L (S) 1 L (S) 2 t ¼ f Z E(E C (S £ = E C (S) 2< 2, ¸ g s : Z (s) t )) 1 ¡ 1 z 2 =2 dz e (2¼)1=2 t £ 1 ¡ + ¸ 1 Perimeter(S) t 2 =2 e 2¼ 2 £ ¡ 1 + ¸ 2 Area(S) t 2 =2 te (2¼)3=2 If Z (s) is white noise convolved with an isotropic Gaussian Z(s) ¯lter of Full Width at Half p Maximum FWHM then 4 log 2 : ¸ = FWHM ½ 0 (Z ½ 1 (Z ½ 2 (Z ¸ ¸ ¸ t) t) t) EC densities of Z above t white noise = filter * FWHM

Results, corrected for search l Random field theory threshold: Z=3. 92 (P=0. 05) Happy

Results, corrected for search l Random field theory threshold: Z=3. 92 (P=0. 05) Happy l l Average face Sad Fearful Neutral 3. 82 3. 80 3. 81 3. 80 Saddle-point approx (2007): Z=↑ (P=0. 05) Bonferroni: Z=4. 87 (P=0. 05) – nothing Z~N(0, 1) statistic

Theorem 2(1981, 1995) ½< Let T (s), fs S ¸D gbe a smooth isotropic

Theorem 2(1981, 1995) ½< Let T (s), fs S ¸D gbe a smooth isotropic random ¯eld. Let X = f s : T (s) ¸ tg be the excursion set inside S. t Let R = z : f (z) t be the Xrejection region of T. t Then D L E(E C (S X )) = (S)½ (R ): d d t t d= 0

E (E C (S Lipschitz-Killing curvature  L d X t )) = L

E (E C (S Lipschitz-Killing curvature L d X t )) = L d (S )½ (R ) d t d= 0 (S) Steiner-Weyl Tube Formula (1930) µ EC density ½ (R ) d t ¶ • Put a tube of radius r about the search @Z region λS ¸ = Sd @s r Tube(λS, r) λS • Find volume, expand as a power series X in r, pull off coefficients: j j L D d ¼ ¡ ( ) d Tube(¸ S; r ) = ¡(d=2 + 1) D d S r d= 0 D Morse Theory X method (1981, 1995) • EC has a point-set representation: E C (S £ X t )µ= sign ¡ ¸ g 1 f g T ( s) t @T ( s) =@s = 0 s 1 f ¶ @2 T 0 µ @s@s µ ¶ + boundary ¡ 2 1 ¯ f ¸ µ g det ¶@T 0 E ¶ 1 ½D (R t ) = ¯ T t ¯ ¸D @s@s ¯ @T = 0 P @T = 0 @s @s µ ¶ ¡ • For¸a Gaussian p random field: ¸ ½d (Z t) = 1 @ 2¼@t d P(Z t)

Lipschitz-Killing L curvature (S) d of a triangle r Tube(λS, r) λS X ¸

Lipschitz-Killing L curvature (S) d of a triangle r Tube(λS, r) λS X ¸ = Sd µ @Z @s ¶ Steiner-Weyl Volume of Tubes Formula (1930) Area(Tube(¸ S; r )) = D ¼d=2 L ¡ ( ) d Sr D d ¡(d=2 + 1) L L L d= 0 = (S) + 2 (S)r + ¼ (S)r 2 2 1 0 = Area(¸ S) + Perimeter(¸ S)r + EC(¸ S)¼r 2 L ( ) = EC(¸ S) L 0 S = L 1 (S) 12 Perimeter(¸ S) (S) = Area(¸ S) 2 Lipschitz-Killing curvatures are just “intrinisic volumes” or “Minkowski functionals” in the (Riemannian) metric of the variance of the derivative of the process

Lipschitz-Killing curvature S S d (S) of any set S ¸ = Sd Edge

Lipschitz-Killing curvature S S d (S) of any set S ¸ = Sd Edge length × λ µ @Z @s ¶ L ² L ¡ L Lipschitz-Killing curvature of triangles L 0 (¡) = 1, 0 ( ) = 1, L 0 (N) = 1 L 1 ( ) = edge length, 1 (N) = 1 perimeter 2 (N) = area 2 P P L ² ¡ PL ¡ L L Lipschitz-Killing curvature of union of triangles L 0 (S) = P² L 0 ( ¡) ¡ ¡ L 0 ( ) + N 0 (N) L 1 (S) = ¡ L 1 ( ) N 1 (S) = N (N) 2 2

µ Non-isotropic data? s 2 ¸ = Sd Z~N(0, 1) @Z @s ¶ s

µ Non-isotropic data? s 2 ¸ = Sd Z~N(0, 1) @Z @s ¶ s 1 • Can we warp the data to isotropy? i. e. multiply edge lengths by λ? • Globally no, but locally yes, but we may need extra dimensions. • Nash Embedding Theorem: #dimensions ≤ D + D(D+1)/2; D=2: #dimensions ≤ 5. • Better idea: replace Euclidean distance by the variogram: d(s 1, s 2)2 = Var(Z(s 1) - Z(s 2)).

Non-isotropic data s 2 ¸ (s) = Sd Z~N(0, 1) Edge length × λ(s)

Non-isotropic data s 2 ¸ (s) = Sd Z~N(0, 1) Edge length × λ(s) s 1 µ @Z @s ¶ L ² L ¡ L Lipschitz-Killing curvature of triangles L 0 (¡) = 1, 0 ( ) = 1, L 0 (N) = 1 L 1 ( ) = edge length, 1 (N) = 1 perimeter 2 (N) = area 2 P P L ² ¡ PL ¡ L L Lipschitz-Killing curvature of union of triangles L 0 (S) = P² L 0 ( ¡) ¡ ¡ L 0 ( ) + N 0 (N) L 1 (S) = ¡ L 1 ( ) N 1 (S) = N (N) 2 2

Estimating Lipschitz-Killing curvature d (S ) We need independent & identically distributed random fields

Estimating Lipschitz-Killing curvature d (S ) We need independent & identically distributed random fields e. g. residuals from a linear model Z 1 Z 2 Z 3 Z 4 Replace coordinates 2 < of the triangles 2 by normalised residuals 2< Z jj jj n; Z Z = (Z ; : : : ; Z ): n 1 Z 5 Z 6 Z 7 Z 8 Z 9 … Zn L ² L ¡ L Lipschitz-Killing curvature of triangles L 0 (¡) = 1, 0 ( ) = 1, L 0 (N) = 1 L 1 ( ) = edge length, 1 (N) = 1 perimeter 2 area (N) = P P P 2 P L ² ¡ PL ¡ L L Lipschitz-Killing curvature of union of triangles L 0 (S) = P² L 0 ( ¡) ¡ ¡ L 0 ( ) + N 0 (N) L 1 (S) = ¡ L 1 ( ) N 1 (S) = N (N) 2 2

Scale space: smooth Z(s) with range of filter widths w = continuous wavelet transform

Scale space: smooth Z(s) with range of filter widths w = continuous wavelet transform adds an extra dimension to the random field: Z(s, w) Scale space, no signal w = FWHM (mm, on log scale) 34 8 6 4 2 0 -2 22. 7 15. 2 10. 2 6. 8 -60 -40 34 -20 0 20 One 15 mm signal 40 60 22. 7 15. 2 10. 2 6. 8 -60 -40 -20 0 s (mm) 20 40 60 15 mm signal is best detected with a 15 mm smoothing filter Z(s, w) 8 6 4 2 0 -2

Matched Filter Theorem (= Gauss-Markov Theorem): “to best detect signal + white noise, filter

Matched Filter Theorem (= Gauss-Markov Theorem): “to best detect signal + white noise, filter should match signal” 10 mm and 23 mm signals w = FWHM (mm, on log scale) 34 8 6 4 2 0 -2 22. 7 15. 2 10. 2 6. 8 -60 -40 34 -20 0 20 40 Two 10 mm signals 20 mm apart 60 8 6 4 2 0 -2 22. 7 15. 2 10. 2 6. 8 -60 -40 -20 0 20 40 60 s (mm) But if the signals are too close together they are detected as a single signal half way between them Z(s, w)

Scale space can even separate two signals at the same location! 8 mm and

Scale space can even separate two signals at the same location! 8 mm and 150 mm signals at the same location 10 5 w = FWHM (mm, on log scale) 0 -60 170 -40 -20 0 20 40 60 20 76 15 34 10 15. 2 6. 8 5 -60 -40 -20 0 s (mm) 20 40 60 Z(s, w)

R Scale space Lipschitz-Killing curvatures Suppose f is a kernel with f 2 =

R Scale space Lipschitz-Killing curvatures Suppose f is a kernel with f 2 = 1 and B is a Brownian sheet. ¶ Z µ Then the scale space random ¯eld is ¡ » ¡ s h = ( ) D =2 Z s; w f w d. B (h) N(0; 1): w Lipschitz-Killing curvatures: L d (S £ where · = ¡ ¡ d +w [w ; w ]) = w 1 2 2 R³ b L d (D X ¡ c ¡ ¡ d+ 1) =2 w 1 d 2 j + 1 ¡ (S) + d + 2 j j = 0¡ ¡ ¡ £ ( 1 2 j ) =2 ( 1)j ( + 2 L d ¡j 1)! · ¡ ¡ ( ) S; ´ d+ 2 j 1 (1 2 j )(4¼)j j !(d 1)! 0 µf ( s) + s@ @s D 2 1 2 For a Gaussian kernel, · = D =2. Then f (s) ¶ds. X P max 2 Z (s) s S d ¡ ¡ w ¡ 2 d 2 j + 1 ¸ t ¼ D+1 L d d= 0 (S £ [w ; w ])½ (R ): d t 1 2

Rotation space: Try all rotated elliptical filters Unsmoothed data Threshold Z=5. 25 (P=0. 05)

Rotation space: Try all rotated elliptical filters Unsmoothed data Threshold Z=5. 25 (P=0. 05) Maximum filter

E (E C (S Beautiful symmetry: Lipschitz-Killing curvature L d  X t ))

E (E C (S Beautiful symmetry: Lipschitz-Killing curvature L d X t )) = D L d (S )½ (R ) d t d= 0 (S) Steiner-Weyl Tube Formula (1930) µ ¶ EC density ½ (R ) d t Taylor Gaussian Tube Formula (2003) • Put a tube of radius r about the search region λS and rejection region Rt: @Z ¸ = Sd @s Z 2~N(0, 1) r Rt Tube(λS, r) Tube(Rt, r) r λS t-r t X X d= 0 Z 1~N(0, 1) • Find volume or probability, expand as a power series in r, pull off 1 coefficients: j j L (2¼)d=2 D d ¼ P ¡ = (Tube( )) (R )r d = (S )r d Tube(¸ S; r ) R ; r ½ d t t ¡(d=2 + 1) D d d!

EC density ½ ( ¹ d of the ¹ statistic t) Z 2~N(0, 1)

EC density ½ ( ¹ d of the ¹ statistic t) Z 2~N(0, 1) Tube(Rt, r) ¹(s) = ·max · Z 1 (s) cos µ + 0 µ ¼=2 r Z 2 (s) sin µ t-r Rejection region Rt Z 1~N(0, 1) t X 1 Taylor’s Gaussian Tube Formula (2003) ¸ (2 ) d=2 ¼ P (Z ; Z ¹ t )r d Tube(R ; r )) = ½d ( 1 2 t ! d Z ¢¢¢ ¸ ¸ ¸ = d 0 = ½ ( ¹ t ) + (2¼)1=2 ½ ( ¹ t )r + (2¼)½ ( ¹ t )r 2 =2 + 01 1 2 2 ¡ = t ¹ ½ 0 ( ¡ ¸ r (2¼Z) t) = ¸ t ¹ t) = ½ 1 ( ¸ ¹ t) = ½ ( 2 . . . ¡ 1=2 e z 2 =2 dz 1 + ¡ ¡ e ( t r ) 2 =2 =4 ¡ ¡ (2¼) 1=2 e z 2 =2 dz ¡ ¡ ¡ +e t 2 =2 =4 ¡ ¡ (2¼) 1 e t 2 =2 + (2¼) 1=2 e t 2 =2 t=4 ¡ ¡ (2¼) 3=2 e t 2 =2 t + (2¼) 1 e t 2 =2 (t 2 ¡ 1)=8

EC densities for some standard test statistics Using Morse theory method (1981, 1995): l

EC densities for some standard test statistics Using Morse theory method (1981, 1995): l l l T, χ2, F (1994) Scale space (1995, 2001) Hotelling’s T 2 (1999) Correlation (1999) Roy’s maximum root, maximum canonical correlation (2007) Wilks’ Lambda (2007) (approximation only) Using Gaussian Kinematic Formula: l l l T, χ2, F are now one line … Likelihood ratio tests for cone alternatives (e. g chi-bar, beta-bar) and nonnegative least-squares (2007) …

Accuracy of the P-value approximation ¡ ¢ » If Z (s) N(0; 1) is

Accuracy of the P-value approximation ¡ ¢ » If Z (s) N(0; 1) is an isotropic Gaussian random ¯eld, s µ £ = V¶ with @Z , ¸ 2 I 2 2 2< 2, @s ¸ ¼ ¸ g f Z P max E(E C (S 2 Z (s) t s : Z (s) t )) s S 1 £ ¡ 1 = E C (S) z 2 =2 dz e (2¼)1=2 t £ 1 ¡ + ¸ 1 Diameter(S) t 2 =2 e 2¼ 2 £ ¡ 1 Z + ¸ 2 Area(S) t 2 =2 te (2¼)3=2 1 ¢¢¢ ¡ ¡ ¡ 1 =¶c + c t D 1 )e t 2 =2 z 2 =2 dz + (c + ¯ µ ¯ c 2 t + e D 0 1 ¯ ¯ (2¼)1=2 ¯ ¯ t ¯ ¯ ¸ ¸ g ¡ f ¡ P max E = (E C (S 2 Z (s) O(e ®t 2 =2 ); ® > 1: t s : Z (s) t )) s S The expected EC gives all the polynomial terms in the expansion for the P-value.

Bubbles task in f. MRI scanner l Correlate bubbles with BOLD at every voxel:

Bubbles task in f. MRI scanner l Correlate bubbles with BOLD at every voxel: Trial 1 2 3 4 5 6 7 … 3000 f. MRI l Calculate Z for each pair (bubble pixel, f. MRI voxel) l a 5 D “image” of Z statistics …

Thresholding? Cross correlation random field µ ¶ Correlation between 2 fields at 2 different

Thresholding? Cross correlation random field µ ¶ Correlation between 2 fields at 2 different locations, searched over all pairs of locations, one in S, one in T: l ¸ ¼ ¸ g 2 f 2 P 2 max E(E C s S; t T : C (s; t ) c ) 2 C (s; t ) c X X s S; t T = dim ( S) dim( T ) i= 0 ½i j (C ¸ c) = ¡ ¡ 2 n 2 h (i ¡ ¼h =2+ 1 1)!j ! ¸ L (S) (T )½ (C c) j i ij L X j=0 b ¡ c ( h 1) =2 ¡ ¡ ¡ ( 1)k ch 1 2 k (1 c 2 )( n 1 h ) =2+ k ¡ ¡ k= 0 ¡( n i + l )¡( n j¡+ 2 2 ¡ ¡ l !m !(k l m )!(n 1 h + l + m + k )!(i XX k k l= 0 m ¡) ¡ ¡ ¡ 1 k l + m )!(j k m + l )! Cao & Worsley, Annals of Applied Probability (1999) l Bubbles data: P=0. 05, n=3000, c=0. 113, T=6. 22

Discussion: modeling l l l l The random response is Y=1 (correct) or 0

Discussion: modeling l l l l The random response is Y=1 (correct) or 0 (incorrect), or Y=f. MRI The regressors are Xj=bubble mask at pixel j, j=1 … 240 x 380=91200 (!) Logistic regression or ordinary regression: l logit(E(Y)) or E(Y) = b 0+X 1 b 1+…+X 91200 b 91200 But there are only n=3000 observations (trials) … Instead, since regressors are independent, fit them one at a time: l logit(E(Y)) or E(Y) = b 0+Xjbj However the regressors (bubbles) are random with a simple known distribution, so turn the problem around and condition on Y: l E(Xj) = c 0+Ycj l Equivalent to conditional logistic regression (Cox, 1962) which gives exact inference for b 1 conditional on sufficient statistics for b 0 l Cox also suggested using saddle-point approximations to improve accuracy of inference … Interactions? logit(E(Y)) or E(Y)=b 0+X 1 b 1+…+X 91200 b 91200+X 1 X 2 b 1, 2+ …