Data Mining Disasters A Report Mary Mc Glohon
- Slides: 17
Data Mining Disasters A Report Mary Mc. Glohon SIGBOVIK Commission for Workplace Safety
Data Mining Safety • Data mining disasters are a hazard to the progress of scientific research. • We will review some common mining disasters and make recommendations for prevention
Numeric Overflow “ In 2007, numeric floods were responsible for over $600 million in property damages. ’’ -Department of Made-Up Statistics
Numeric Overflow ERROR: : NUMERICOVERFLOW Nobody expected the breach of the levees
Numeric Overflow • Also caused loss of several hundred nerd-hours. • 1 nerd-hour = 1 grad-student-hour = 0. 25 faculty-hours = 6 undergrad-hours
Numeric Overflow • Recommendation: A drowning researcher’s best bet is to grab onto a floating log.
Power Law Failures • Occurs when confusing heavy-tailed distributions such as: • • • Power Law (incl. Pareto, Zipf) Lognormal Weibull Burr Log-gamma Log-Log-Mushroom-Mushroom
Power Law Failures • Many natural phenomena have heavy tails. • • • Magnitude of earthquakes Size of human settlements Degree distribution of “real” graphs Time-to-response in CS professors email Your mom • However, confusing heavy-tailed
Power Law Failures • Related danger: Statisticians, computer scientists, and physicists wasting valuable nerd-hours in religious arguments over which heavy-tailed distribution is being followed.
Power Law Failures • Statisticians get mean when they get religious. (SIGBOVIK 07) • Recommendation: Calm the hell down.
Decision Tree Forest Fires • Pruning is used to prevent overfitting. • When overpruning occurs, trees are burned to stumps. • This spreads, torching entire forests. L (Aww. . . )
Decision Tree Forest Fires • Recommendation: Researchers should obtain burning permit before pruning with fire. • Smoking while researching is not recommended-- if you choose to do so, make sure your “butts are out”.
Voting Fraud by One. Armed Bandits • Cascading failures from other fields may cause disasters in data mining. • Fatal mistake: combining related subfields voting mechanisms and onearmed bandit problems.
Voting Fraud by One. Armed Bandits • One-armed bandits commit voting fraud by: • Impersonating real voting machines. • Cramming cake into voting machines. • (The cake is a lie. )
Other safety measures • Cool mining helmets
Conclusion • The Commission for Workplace Safety hopes this has raised awareness of potential data mining disasters. • When faced with data-mining disasters, • • Remain Calm. J Blame it on one-off errors, lack of rigor in proofs of correctness, or whatever government agency is funding the project.
- Mining complex types of data in data mining
- Multimedia data mining
- Strip mining vs open pit mining
- Strip mining before and after
- Difference between strip mining and open pit mining
- Web text mining
- Data reduction in data mining
- Data mining in data warehouse
- What is missing data in data mining
- Data reduction in data mining
- Data reduction in data mining
- Data reduction in data mining
- Data cube technology in data mining
- Data reduction in data mining
- Arsitektur data mining
- Perbedaan data warehouse dan data mining
- Data mining dan data warehouse
- Descriptive mining of complex data objects