DATA MINING MACHINE LEARNING DATA SCIENCE BIG DATA
DATA MINING, MACHINE LEARNING, DATA SCIENCE, BIG DATA, . . . OH MY! (AN OVERVIEW FOR BUSINESS PEOPLE) 11/09/2018 Prof. Eduardo Bezerra ebezerra@cefet-rj. br CEFET/RJ
Overview 2 Introduction Types of learning, Aplications The Data Scientist Final remarks
3 Introduction
Hype is in the air 4 Data Mining, Machine Learning, Data Science, Big Data, Data Analytics, Business Intelligence, Predictive Analytics, Artificial Intelligence, . . .
Hype is in the air 5
Hype is in the air – be careful! 6
7 Machine Learning (to play Checkers) Machine Learning (“Field of study that gives computers the ability to learn without being explicitly programmed. ”) “it will learn to play a better game of checkers than can be played by the person who wrote the program. ” search tree alpha-beta pruning scoring functions Minimax search TD-learning 1959 Arthur Samuel
Machine Learning 8 “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. ” 1998 Tom Mitchell
Machine Learning 9 Suponha que um banco registre o histórico de empréstimos que realizou, os dados do cliente, e se cada empréstimo foi honrado ou não, e com base nesse histórico, aprenda a identificar bons e maus pagadores. Components: � T learn how to classify customers � E historic loan data � P percentage of customers classified correctly
Terminologia 10 Training Dataset Features Model Learning algorithm
11 Data availability is key to Data Mining Kian. Sing Ng & Huan Liu. Customer Retention via Data Mining, 2000
Features (text, audio, visual) 12
Machine Learning Systems 13 http: //www. nosimpler. me/machine-learning/
14 Types of Learning, Applications
Types of machine learning 15 supervised learning unsupervised learning https: //www. quora. com/What-is-the-difference-between-supervised-and-unsupervised-learning-algorithms
16 Supervised Learning
Supervised Learning 17 Learner (machine) receives the correct answer to each example in the training dataset. Goal: build a predictive model. Two subtypes (tasks): � Classification � Regression
Classification 18 Predict a discrete value https: //www. quora. com/What-is-the-main-difference-between-classification-problems-and-regression-problems-in-machine-learning
Regression 19 Predict a continuous value https: //www. quora. com/What-is-the-main-difference-between-classification-problems-and-regression-problems-in-machine-learning
Applications - Classification Sentiment Analysis 20
Applications - Classification Customer Retention (aka, Churn 21 Prediction) https: //othot. com/predictive-intelligence-customer-acquisition/
Applications - Classification Fraud Detection 22 Entrada : atividades nas contas dos clientes Classes: {fraude, legítimo}
Applications - Classification: other applications 23 Other applications: � � OCR (input: images, classes: characters) Medical diagnosis (input: symptoms, classes: diseases) Routing news … many more Classification is an important technology commercially!
Applications - Regression Sales Forecasting 24 https: //stats. stackexchange. com/questions/43264/regression-model-for-predicting-sales
25 Unspupervised Learning
Applications Market Segmentation (CRM) 26 Clustering: “organizing objects into groups whose members are somewhat similar” https: //www. quora. com/What-is-clustering
Applications Analysis of Networks 27
Applications Outlier detection 28 https: //stats. stackexchange. com/questions/160260/anomaly-detection-based-on-clustering
Recommendation System 29 http: //thedatascientist. com/right-way-recommender-system-startup/
30 The Data Scientist
Data Science 31
Data Scientist 32 “[P]rofessional responsible for collecting, analyzing and interpreting large amounts of data to identify ways to help a business improve operations and gain a competitive edge over rivals. ” https: //searchenterpriseai. techtarget. com/definition/data-scientist
Data Scientist - education 33 Bachelor degree in one of � statistics � computer science � mathematics
Data Scientist - skills 34 Database systems Programming languages (R, Python, . . . ) Machine learning Statistics Knowledge of Big Data platforms (Spark, Map. Reduce, Hadoop, . . . )
Data Scientist – related jobs 35 Related jobs: � Data analyst � Software engineer � Business expert
36 Final remarks
Machine Learning is useful 37 predict how much/how many which category does something belong to which groups exist in a dataset is something weird? which action should you take?
ML & Big Data 38 Currently, relatively easy to get datasets with ~106 examples: � e-commerce portals � Kaggle (https: //www. kaggle. com) � Io. T � Google Datasets �. . .
39
Unsupervised Learning 40 Current models are hungry for labeled data. Today’s main approach is supervised learning. “The Revolution Will Not be Supervised. ” –Yann Lecun
Social Issues 41
Privacy/Ethical Issues 42 [D]ynamic flows of data and storage of databases in clouds with diminished attention given to national borders has led to improvements in human health; it has led to increased risk of privacy-related harms. Ethics and Big Data in health (https: //www. sciencedirect. com/science/article/pii/S 2452310017300264)
PPCIC – CEFET/RJ Programa de Pós-Graduação em Ciência da Computação (pesquisas com foco em Ciência de Dados) http: //eic. cefet-rj. br/ppcic
OBRIGADO! DATA MINING, MACHINE LEARNING, DATA SCIENCE, BIG DATA, . . . OH MY! Prof. Eduardo Bezerra ebezerra@cefet-rj. br CEFET/RJ
- Slides: 44