Optimizing Update Frequencies for Decaying Information Simon Razniewski
- Slides: 17
Optimizing Update Frequencies for Decaying Information Simon Razniewski Free University of Bozen-Bolzano, Italy
2 Motivation Addr. Inc. Sells them to pharmaceutical/medical technology companies Collects addresses Dr. John 5 Main St. 38274 Hampton Dr. Miller 17 Hill St. 45192 Fordham Dr. Higgs 9 West St. 82077 Chatham
3 Main activities of Addr. Inc. 1. Discover new addresses 2. Check correctness of existing addresses ▫ Doctors relocate occasionally
4 How to check the correctness of existing addresses? • Check done by web search or phone calls ▫ Online directories for doctors ▫ Hospital webpages ▫ Homepages of private doctors
5 How many resources to provide to update addresses?
6 Outline 1. The problem 2. Information decay 3. Formula for optimal update frequency 4. Application to caching and crawling 5. Validation
7 1. The problem • The more employees, the more frequent updates ▫ But how often should we update each entity? What is the optimal update frequency for each entity? Optimize income (benefit minus cost) • Cost ▫ Work time per update ▫ E. g. 15 minutes at $20/hr -> $5 per update • Benefit ▫ $20 per year per up-to-date address ▫ What is the benefit of updating?
8 2. Information decay • Information value of data gets lost over time • Similar to radioactive decay
9 Shape of decay curves • Linear, exponential, geometric, … • Exponential decay for all processes that follow a Poisson distribution [Cho and Molina, TOIT 2003] ▫ Empirically found to apply to website updates • Below: Soccer player relocation behaviour Manchester United Bayern München
10 Benefit of updating • Benefit per entity depends on average correctness ▫ $20 per year per up-to-date entity, 70% average correctness $14 benefit 1 st year 30% average correctness $6 benefit 2 nd year …. • Benefit of updating is a certain average correctness Updates
11 3. The (simple) core formula •
12 Examples for Addr. Inc. Yearly income in $ Update frequency (years) C=$5, B=$20/year, exponential decay, relocation frequencies taken from Californian tax payers
13 Extensions in the paper • Bulk updates • Cost of outdated entities • Different costs for checking and updating an address
14 4. Other applications • Caching • Web crawling • Difference to classical work there: ▫ Focus on optimal update frequency ▫ Classical work focuses on best distribution of a fixed update budget (“ 1000 pages, 500 crawls/minute, . . ”) • Our approach more relevant now given scalable cloud resources
15 Caching and crawling • Cost of an update ▫ Bandwidth or compute time �Crawling: ~ 0. 003 ct/crawl (2015) • Benefit of an update ▫ Caching: Avoiding repeated computation or lower delay ▫ Crawling: Better search quality ▫ Challenge: How to express in money?
16 5. Validation • 1. Is it easy to get decay rates; 2. Do they differ? Yearly relocation probability Ph. D students Academic Industrial researchers • 3. Which decay function applies? ▫ Exponential decay for soccer player affiliations • 4. What can we gain? ▫ Up to 6. 5% in a use case of academic advertisement ▫ Up to 31. 8% in a web crawling use case [Cho and Molina, TOIT 2003] Professors
17 Summary • Framework for finding optimal update frequency for decaying information ▫ Independent of actual decay function • Focus on address data, but also relevant for crawling and caching ▫ Challenge: Modelling benefit • Also interesting ▫ Data mining: How to identify relevant attributes (age/profession, …) that allow to predict decay rates
- Marco montali
- Simon razniewski
- Decaying stage of typhoon
- Causes of decaying cities
- Backup and recovery techniques
- How is economizing different from optimizing?
- Mark harris nvidia
- Reduction cuda
- The fortran optimizing compiler
- Optimizing patient flow
- Conditional relative frequency
- Jt6m frequencies
- What is a conditional relative frequency
- Relative frequency vs frequency
- What are joint relative frequencies
- Shares hf
- Frequency of recombination
- Genotype vs allele frequency