Netflix Prize and Heritage Health Prize Philip Chan
- Slides: 43
Netflix Prize and Heritage Health Prize Philip Chan
Cash Prizes to Stimulate Research n Ansari X Prize for Private Spaceflight (2004) [$10 M] n 100 km above earth twice within 2 weeks n DAPRA Grand Challenge (2005) [$2 M] n autonomous vehicle: 131 miles in 10 hours n Archon X Prize for Genomics (2006) [$10 M] n map 100 human genomes in 10 days
Cash Prizes to Stimulate Research n Netflix Prize (2006) [$1 M] n Recommend movies with 10% improvement n Heritage Health Prize (2011) [$3 M] n Days in hospital next year with 0. 4 error
Netflix Prize netflixprize. com
Netflix Prize n Task n Given customer ratings on some movies n Predict customer ratings on other movies n If John rates n “Mission Impossible” a 5 n “Over the Hedge” a 3, and n “Back to the Future” a 4, n how would he rate “Harry Porter”, … ? n Performance n Error rate (accuracy)
Cash Award n Grand Prize n $1 M n 10% improvement n by 2011 (in 5 years) n Progress Prize n $50 K per year n 1% improvement
Intellectual Property n Netflix has a non-exclusive license to the algorithm n Authors tell the world what the algorithm is
Participation n 51 K contestants n 41 K teams n 186 countries
Leader Board n Started on Oct 2, 2006 n Improvement by the top algorithm n after a week: ~0. 9% n after two weeks: ~4. 5% n after a month: ~5% n after a year: ~8. 4% n after two years: ~9. 4% n July 26, 2009 (less than 3 years): 10%
Winner n Bell. Kor’s Pragmatic Chaos n 7 members n Merger of 3 teams n Bell. Kor § AT&T Labs, USA & Yahoo! Research, Israel n Pragmatic. Theory § telecommunications, Canada n Big. Chaos § started a company, Austria n A combination of different algorithms
Runner-up n The Ensemble n ~30 members n “last-minute” merger n teams had 30 days to beat the first team that crossed the 10% threshold same accuracy n behind by 20 minutes! n
Heritage Health Prize heritagehealthprize. com
Health Care n 71 M individuals admitted to US hospitals each year n Unnecessary admissions cost $30 B
Heritage Provider Network n Has a network of doctors in California n Can we identify earlier those most at risk and ensure they get the treatment they need? n Can we reduce unnecessary hospitalizations?
Heritage Health Prize n Launch n http: //www. youtube. com/watch? v=Gu. Z 8 nkpyg. As n Given patient data n Predict how many days a patient will spend in a hospital in the next year n The prediction helps develop strategies to reduce emergencies and hence hospitalizations
Grand Prize n $3 M n At most 0. 4 in error (~0. 5 day) n By Apr 4, 2013 [2 years] n $500 K Consolation Prize n not below 0. 4 error
Milestone Prizes n top 2 performers at each milestone n Aug 31, 2011 n $30 K, $20 K n Feb 13, 2012 n $50 K, $30 K n http: //www. youtube. com/watch? v=pkmk. Nn. Gyih. Y n Sep 4, 2012 n $60 K, $40 K
Performance of Algorithms n Prediction Error Rate (RMSLE) n where n real = log ( actual # of days + 1 ) n prediction = log ( predicted # of days + 1 ) n Prediction error threshold = 0. 4 (~0. 5 day)
Intellectual Property n Exclusive license to Sponsor n and participant’s own use n Algorithms not previously published n Use of data sets is for the competition only n written consent for other purposes
Data Sets n Training and validation data sets n For participants to design algorithms n Feedback data set n For calculating standings on Leaderboard n Scoring data set n For determining winners for prizes n http: //www. heritagehealthprize. com/c/hhp/Data
Data (in CSV format) n Members Data (113 K members) n Claims Data (2. 7 M claims) n Drug Count Data (818 K prescriptions) n Lab Count Data (361 K labs) n Outcome Data (76 K in Y 2, 71 K in Y 3) n Target (71 K in Y 4 for prediction) n Total ~264 MB (including other files)
Members Data n Member. ID n Age. At. First. Claim n Sex
Claims Data n Member. ID n Provider. ID n Vendor ID n PCP (Primary care physician) ID n Year n Specialty (of physician/vendor? ) n Place. Svc (place of service) n office, outpatient hospital, inpatient hospital, … n Pay. Delay (between service and payment)
Claims Data [continued] n Length. Of. Stay (in hospital) n DSFS (days since first claim) n Primary. Condition. Group (diagnostic categories) n Charlson. Index (affect of diseases on illness) n Procedure. Group (intervention categories) n Sup. LOS (supplement to Length. Of. Stay) n 1 if Lenght. Of. Stay is NULL because of deidentificaiton
Drug Count Data n Member. ID n Year n DSFS (Days since first service) n Drug. Count (unique prescription drugs)
Lab Count Data n Member Id n Year n DSFS (Days since first service) n Lab. Count (unique lab or pathology tests)
Outcome Data n Member. ID n Days. In. Hospital_Y 2 (claims in Y 1) n ie, Predict Y 2 based on Y 1 n Days. In. Hospital_Y 3 (claims in Y 2) n Claimed. Truncated n 1 if members with “truncated” claims
Using Other Data? n Yes n Freely available to anyone (public source) n URL needs to be published to the forum n Except for n demographic, socioeconomic or clinical information about the members
Naive Algorithms n For predicting the number of Days in Hospital in the next year n Posted as “benchmarks” on the Leaderboard
Always Predict 15 (max) n Everyone goes to the hospital for at least 15 days
Always Predict 15 (max) n Everyone goes to the hospital for at least 15 days n RMSLE = 2. 628062 n 550+% over threshold
Always Predict Zero n no one goes to the hospital
Always Predict Zero n no one goes to the hospital n RMSLE = 0. 522226 n 31% over threshold
Predict Random Values n between 0 and 15
Predict Random Values n between 0 and 15 n RMSLE = 0. 752297 n 88% over threshold
Always Predict Average n Average ~= 0. 209179
Always Predict Average n Average ~= 0. 209179 n RMSLE = 0. 486459 n 22% over threshold
Leader Board n Competition started on Apr 4, 2011 with partial data n All data were released on June 4, 2011 n Sep 9, 2011
Leader Board n Competition started on Apr 4, 2011 with partial data n All data were released on June 4, 2011 n Sep 9, 2011 n RMSLE: 0. 456384 n ~14. 1% over threshold n Aug 29, 2012 n RMSLE: 0. 450426 n ~12. 6% over threshold
Teams n Sep 9, 2011 n 914 teams n 6021 entries n Aug 29, 2012 n 1292 teams
Considerations n Accurate Prediction n algorithms n Efficiency n time n space
Teams n Form your own teams n www. heritagehealthprize. com n Join my team n CSE 4403 Independent Study n CSE 5801 Independent Research
THANK YOU www. heritagehealthprize. com
- Heritage health prize
- Sapratibandha daya and apratibandha daya
- Classification of property under hindu law
- Bé thì chăn nghé chăn trâu
- Netflix prize
- World heritage is our heritage slogan
- Intangible cultural heritage and sustainable development
- The creation of the monster performer heritage
- Communities culture and heritage
- Colleyville heritage track and field
- Relieve de la india
- The tangible and intangible heritage in my local area
- Tracee martin
- Netflix culture seeking excellence
- Netflix and sony partnership
- Health and social component 3
- Toni morrison pulitzer prize
- Nobel peace prize 1962
- Millennium prize problems
- Pcr nobel prize
- Wacker process
- Poincare conjecture solution
- Drauzio varella nobel
- Ansari prize
- Eyes on the prize study guide
- Katsushi arisaka nobel prize
- Edusave character award
- How to play bunco with a ghost
- Prize collecting steiner tree
- Up student plagiarizes prize-winning photos
- Air pump boyle
- Modelo deming prize
- Nobel prize 1963
- Emani kumar
- Nobel prize 1963
- Ciro
- Chittaranjan nepali
- Regrettably these days lots of people don’t have jobs.
- Moe character award
- Who won the nobel prize in chemistry
- Perrin nobel prize
- Nobel prize
- Into indirect speech
- Who won the nobel prize in chemistry