PostHoc Scoring of Topic 102 Topic 102 Documents

  • Slides: 7
Download presentation
Post-Hoc Scoring of Topic 102

Post-Hoc Scoring of Topic 102

Topic 102 • Documents referring to marketing or advertising restrictions proposed for inclusion in,

Topic 102 • Documents referring to marketing or advertising restrictions proposed for inclusion in, or actually included in, the Master Settlement Agreement (“MSA"), including, but not limited to, restrictions on advertising on billboards, stadiums, arenas, shopping malls, buses, taxis, or any other outdoor advertising.

Topic 102 Sampling Stratum Clearwell 1 R 2 R 3 R 4 R 5

Topic 102 Sampling Stratum Clearwell 1 R 2 R 3 R 4 R 5 N 6 N 7 N 8 N Pitt R R N N Ad Hoc Total Docs Sampled Assessable Relevant R 2, 015 214 203 N 1 1 R 10, 608 1, 041 1, 038 666 N 1, 071 108 102 17 R 2, 435 249 246 199 N 54 11 11 4 R 531, 068 1, 125 1, 110 352 N 6, 362, 940 1, 750 1, 713 106 TOTAL 6, 910, 192 4, 500 4, 435 1, 548

Judged (Adjudicated) Samples Exist for 8 of the 107 Submitted Documents Arlene Danielle Susan

Judged (Adjudicated) Samples Exist for 8 of the 107 Submitted Documents Arlene Danielle Susan Chad Danielle Onaona TID apb 14 c 00 dfr 72 c 00 kzj 82 a 00 qkg 70 d 00 ewf 62 d 00 bol 70 d 00 jzg 45 a 00 efe 50 d 00 Rel Y N Y Y Y Stratum 1 1 3 3 3 5 P 0. 1067 0. 0981 0. 1023 Precision ≈ 7/8 = 0. 875 1/P 9. 4 10. 2 9. 8 Rank 1 8 4 4 7 9 8 8

Estimating Recall • Estimated total relevant: 562, 402 – Plus or minus 100, 000!

Estimating Recall • Estimated total relevant: 562, 402 – Plus or minus 100, 000! (at 95% confidence) – 385, 000 of these are from Stratum 8 (NNN)! • Estimate 68. 5 documents found – 7 assessed documents, each scaled by 1/P – This is practical because none were in stratum 8 • Recall = 68. 5/562402 = 0. 000122 ≈ 0 – Ignoring stratum 8, Recall = 68. 5/176990 = 0. 00039

Takeaway Messages • You can get high precision manually – But achieving high recall

Takeaway Messages • You can get high precision manually – But achieving high recall requires automation • You can measure precision easily – But measuring recall requires sampling • Post-hoc sampling computes relative recall – Sampling gets too sparse in very large collections • Assessor errors are greatly magnified • Systems will rarely hit on an assessed document

Structured Evaluation Exercise • Two systems – Boolean Queries – Interactive result set •

Structured Evaluation Exercise • Two systems – Boolean Queries – Interactive result set • Draw 200 samples from the four strata – Undersample the NN stratum • Compute Precision and Recall as above