Algorithms for Data Analytics B Ramamurthy Data Analytics
Algorithms for Data Analytics B. Ramamurthy
Data Analytics (Data Science) Intuition/ understandi ng * EDA Data * Big-data analytics Stats. Algs Discoveries / intelligence Statistical Inference Decisions/ Answers/ Results
Three Types of Data Science Algorithms � Pipelines to prepare data � Three types: 1. Data preparation algorithms such as sorting, workflows 2. Optimization algorithms stochastic gradient descent, least squares… 3. Machine learning algorithms…
Machine Learning Algorithms � Comes from Artificial Intelligence � No underlying generative process � Build to predict or classify something � Three basic algorithms: � linear regression, k-nn, k-means � We already looked at linear regression as a case study for R/Rstudio � We will start with k-means…
K-means � K-means is unsupervised: no prior knowledge of the “right answer” � Goal of the algorithm is to determine the definition of the right answer by finding clusters of data � Kind of satisfaction survey data, incident report data, � Assume data {age, gender, income, state, household, size}, your goal is to segment the users. � K-means is the simplest of the clustering algorithms. � Lets understand kmeans using an example.
Lets examine an example � {Age, income range, education, skills, social, paid work} � Lets take just the age { 23, 25, 24, 23, 21, 32, 30, 31, 30, 37, 35, 38, 37, 39, 42, 43, 45} � Classify this data using K-means � Lets assume K = 3 or 3 groups � Give me a guess of the centroids? Lets assume initial value of centroids to {21, 30, 40} � First lets hand calculate and then use R-Studio
K-NN � Supervised ML � You know the “right answers” or at least data that is “labeled”: training set � Set of objects have been classified or labeled (training set) � Another set of objects are yet to be labeled or classified (test set) � Your goal is to automate the processes of labeling the test set. � Intuition behind k-NN is to consider most similar items --- similarity defined by their attributes, look at the existing label and assign the object a label.
Lets look at an example Age Loan (X 1000) Default 25 40 N 35 60 N 45 80 N 20 20 N 35 120 N 52 18 Y 23 95 Y 40 62 Y 60 100 Y 48 220 Y 33 150 Y
Determine using K-NN �K = 3, whether you can lend money to a person age 48 requesting a loan amount of 142 K � K=5, repeat the same. � We need lot more data for the application of K -NN.
- Slides: 9