Advanced Data Analytics CSCI 528 Advanced Data Analytics

  • Slides: 28
Download presentation
Advanced Data Analytics CSCI: 528 Advanced Data Analytics Qingchen Zhang qzhang@stfx. ca http: //cs.

Advanced Data Analytics CSCI: 528 Advanced Data Analytics Qingchen Zhang qzhang@stfx. ca http: //cs. stfx. ca/~qzhang/

Advanced Data Analytics Ø What is data analytics? Ø Some applications of data analytics?

Advanced Data Analytics Ø What is data analytics? Ø Some applications of data analytics? Ø Data analysis process? Ø Data analytic techniques? Ø What will we learn in this course?

What is data analytics? l l Two terminologies: Data and Information Data – Everything

What is data analytics? l l Two terminologies: Data and Information Data – Everything stored in computers can be called data l Information – The data that are processed to be useful and meaningful Only having data is not useful, we need information. How can we obtain information from data?

What is data analytics? l Many Definitions – The term data analytics refers to

What is data analytics? l Many Definitions – The term data analytics refers to the process of examining datasets to draw conclusions about the information they contain. Data analytic techniques enable you to take raw data and uncover patterns to extract valuable insights from it. – Data analytics is the science of analyzing raw data in order to make conclusions about that information. Data analytics techniques can reveal trends and metrics that would otherwise be lost in the mass of information.

What is data analytics? l Types of data analytics – Descriptive analytics: the objective

What is data analytics? l Types of data analytics – Descriptive analytics: the objective is to derive patterns (correlations, trends, clusters, trajectories, and anomalies) that summarize the underlying relationships in data. – Predictive analytics: moves to what is likely going to happen in the near term. What happened to sales the last time we had a hot summer? How many weather models predict a hot summer this year? –It can suggest a course of action. If the likelihood of a hot summer is measured as an average of these five weather models is above 58%, we should add an evening shift to the brewery and rent an additional tank to increase output.

Applications of data analytics – Fraud and Risk Detection – Healthcare – Internet Search

Applications of data analytics – Fraud and Risk Detection – Healthcare – Internet Search – Targeted Advertising – Airline Route Planning

Applications of data analytics – Fraud and Risk Detection – This has been known

Applications of data analytics – Fraud and Risk Detection – This has been known as one of the initial applications of data science which was extracted from the discipline of Finance. So many organizations had very bad experiences with debt and were so fed up with it. Since they already had data that was collected during the time their customers applied for loans, they applied data analytics which eventually rescued them from the losses they had incurred. This led to banks learning to divide and conquer data from their customers’ profiles, recent expenditure and other significant information that were made available to them. This made it easy for them to analyze and infer if there was any probability of customers defaulting.

Applications of data analytics – Healthcare – The healthcare sector, especially, receives great benefits

Applications of data analytics – Healthcare – The healthcare sector, especially, receives great benefits from data analytics. l Medical Image Analysis – Data analytic techniques can help to detect tumors, Alzheimer's disease, and so on.

Applications of data analytics – Healthcare l Drug Development l The drug discovery process

Applications of data analytics – Healthcare l Drug Development l The drug discovery process is highly complicated and involves many disciplines. The greatest ideas are often bounded by billions of testing, huge financial and time expenditure. On average, it takes twelve years to make an official submission. l Data analytics and machine learning algorithms simplify and shorten this process, adding a perspective to each step from the initial screening of drug compounds to the prediction of the success rate based on the biological factors. Such algorithms can forecast how the compound will act in the body using advanced mathematical modeling and simulations instead of the “lab experiments”. The idea behind the computational drug discovery is to create computer model simulations as a biologically relevant network simplifying the prediction of future outcomes with high accuracy.

Applications of data analytics – Healthcare l Virtual assistance for patients and customer support

Applications of data analytics – Healthcare l Virtual assistance for patients and customer support l Optimization of the clinical process builds upon the concept that for many cases it is not actually necessary for patients to visit doctors in person. A mobile application can give a more effective solution by bringing the doctor to the patient instead. l The (AI and data analytics)-powered mobile apps can provide basic healthcare support, usually as chatbots. You simply describe your symptoms, or ask questions, and then receive key information about your medical condition derived from a wide network linking symptoms to causes. Apps can remind you to take your medicine on time, and if necessary, assign an appointment with a doctor. l This approach promotes a healthy lifestyle by encouraging patients to make healthy decisions, saves their time waiting in line for an appointment, and allows doctors to focus on more critical cases.

Applications of data analytics – l l l Internet Search Now, this is probably

Applications of data analytics – l l l Internet Search Now, this is probably the first thing that strikes your mind when you think Data Analytic Applications. When we speak of search, we think ‘Google’. Right? But there are many other search engines like Yahoo, Bing, Ask, AOL, and so on. All these search engines (including Google) make use of data science algorithms to deliver the best result for our searched query in a fraction of seconds. Considering the fact that, Google processes more than 20 petabytes of data every day. Had there been no data science, Google wouldn’t have been the ‘Google’ we know today.

Applications of data analytics – l l Targeted Advertising If you thought Search would

Applications of data analytics – l l Targeted Advertising If you thought Search would have been the biggest of all data analytics applications, here is a challenger – the entire digital marketing spectrum. Starting from the display banners on various websites to the digital billboards at the airports – almost all of them are decided by using data science algorithms. This is the reason why digital ads have been able to get a lot higher CTR (Call-Through Rate) than traditional advertisements. They can be targeted based on a user’s past behavior.

Applications of data analytics – Airline Route Planning – Airline Industry across the world

Applications of data analytics – Airline Route Planning – Airline Industry across the world is known to bear heavy losses. Except for a few airline service providers, companies are struggling to maintain their occupancy ratio and operating profits. With high rise in air-fuel prices and need to offer heavy discounts to customers has further made the situation worse. It wasn’t for long when airlines companies started using data science to identify the strategic areas of improvements. Now using data science, the airline companies can: Predict flight delay Decide which class of airplanes to buy Whether to directly land at the destination or take a halt in between (For example, A flight can have a direct route from New Delhi to New York. Alternatively, it can also choose to halt in any country. ) Effectively drive customer loyalty programs Southwest Airlines, Alaska Airlines are among the top companies who’ve embraced data analytics to bring changes in their way of working. – – –

Applications of data analytics – Other applications Policing/Security l Manage Risk l Delivery Logistics

Applications of data analytics – Other applications Policing/Security l Manage Risk l Delivery Logistics l Web Provision l Customer Interactions l Energy Management l Gaming l Speech Recognition l ……. l

Data analysis process – Data Analysis Process is nothing but gathering information by using

Data analysis process – Data Analysis Process is nothing but gathering information by using proper application or tool which allows you to explore the data and find a pattern in it. Based on that, you can take decisions, or you can get ultimate conclusions. – Data Analysis consists of the following phases: – Data Requirement Gathering – Data Collection – Data Cleaning – Data Analysis – Data Interpretation – Data Visualization

Data analysis process – Data Requirement Gathering – First of all, you have to

Data analysis process – Data Requirement Gathering – First of all, you have to think about why do you want to do this data analysis? All you need to find out the purpose or aim of doing the Analysis. You have to decide which type of data analysis you wanted to do! In this phase, you have to decide what to analyze and how to measure it, you have to understand why you are investigating and what measures you have to use to do this Analysis.

Data analysis process – Data Collection – After requirement gathering, you will get a

Data analysis process – Data Collection – After requirement gathering, you will get a clear idea about what things you have to measure and what should be your findings. Now it's time to collect your data based on requirements. Once you collect your data, remember that the collected data must be processed or organized for Analysis. As you collected data from various sources, you must have to keep a log with a collection date and source of the data.

Data analysis process – Data Cleaning – Now whatever data is collected may not

Data analysis process – Data Cleaning – Now whatever data is collected may not be useful or irrelevant to your aim of Analysis, hence it should be cleaned. The data which is collected may contain duplicate records, white spaces or errors. The data should be cleaned and error free. This phase must be done before Analysis because based on data cleaning, your output of Analysis will be closer to your expected outcome.

Data analysis process – Data Analysis – Once the data is collected, cleaned, and

Data analysis process – Data Analysis – Once the data is collected, cleaned, and processed, it is ready for Analysis. As you manipulate data, you may find you have the exact information you need, or you might need to collect more data. During this phase, you can use data analysis tools and software which will help you to understand, interpret, and derive conclusions based on the requirements.

Data analysis process – Data Interpretation – After analyzing your data, it's finally time

Data analysis process – Data Interpretation – After analyzing your data, it's finally time to interpret your results. You can choose the way to express or communicate your data analysis either you can use simply in words or maybe a table or chart. Then use the results of your data analysis process to decide your best course of action.

Data analysis process – Data Visualization – Data visualization is very common in your

Data analysis process – Data Visualization – Data visualization is very common in your day to day life; they often appear in the form of charts and graphs. In other words, data shown graphically so that it will be easier for the human brain to understand process it. Data visualization often used to discover unknown facts and trends. By observing relationships and comparing datasets, you can find a way to find out meaningful information.

Data analytic techniques – Artificial Intelligence: artificial neural network and deep learning, reinforcement learning,

Data analytic techniques – Artificial Intelligence: artificial neural network and deep learning, reinforcement learning, support vector machine, genetic algorithms, …… – Statistical Techniques: resampling, standard deviations, multiple regression analysis, …… – Data mining: clustering, classification, association analysis, and anomaly detection.

What will we learn in this course? l l l Data Classification Association analysis

What will we learn in this course? l l l Data Classification Association analysis Clustering Anomaly detection

What will we learn in this course? l Data l l l Types of

What will we learn in this course? l Data l l l Types of data Data quality Measures of similarity and dissimilarity

What will we study in this course? l Classification (assignment and project) l l

What will we study in this course? l Classification (assignment and project) l l l Decision tree Nearest-Neighbor classifiers Bayesian classifiers Deep learning Ensemble Methods

What will we study in this course? l Clustering (assignment and/or project) l l

What will we study in this course? l Clustering (assignment and/or project) l l l K-means DBSCAN Fuzzy c-means Subspace clustering Spectral clustering

What will we study in this course? l Association analysis (assignment) l Apriori

What will we study in this course? l Association analysis (assignment) l Apriori

What will we study in this course? l Anomaly detection l l Proximity-based approaches

What will we study in this course? l Anomaly detection l l Proximity-based approaches Clustering-based approaches