Unit V Big Data Visualization Introduction to Data
Unit V Big Data Visualization ü Introduction to Data visualization ü Challenges to Big data visualization ü Conventional data visualization tools ü Techniques for visual data representations ü Types of data visualization Visualizing Big Data ü Tools used in data visualization, Propriety Data Visualization tools ü Open source data visualization tools ü Analytical techniques used in Big data visualization ü Data visualization with Tableau ü Introduction to: ü Pentaho ü Flare ü Jasper Reports ü Dygraphs ü Datameer Analytics Solution and Cloudier ü Platfora ü Node. Box ü Gephi ü Google Chart API ü Flot ü D 3 ü Visually.
Introduction to Data visualization “Data Visualization is the technique use to communicate data by representing information using visual graphic object like point, lines or bars. ” v Objective of data visualization o o o To enlighten the data or see data in context. To solve or give solutions to problem. For understanding data clearly explore data, help to take proper decision. To illustrate or hide data. To find patterns or relationship among data To make comparison between statistical data
Visualizing Big Data Amount of data generated by organization increased year and year through internet activity called as Big Data. main Problem is collected data should be use full only. Big data visualization refers “Front End” of big data. Data Visualization used to represent data in different sensitive objects like tables, diagram, images etc.
Challenges to Big data visualization Problems occurs in Big Data Visualization : Problems in Big Data Visualization 1. Visual noise (too relative data, user unable separate) 2. Information loss (reduction of data set, but may be info loss) 3. Large image perception 4. High rate of image change 5. High performance requirement (limitations- aspect ratio, screen resolution) (only see data, can’t change it) (lower visualization speed)
Challenges to Big data visualization Solution occurs in Big Data Visualization : Solution in Big Data Visualization 1. Speed upping process (By using fast h/w, increasing m/m) 2. Understanding the data (Take help of expertise to understand) 3. Addressing data quality (Assure quality by information management process) 4. Displaying meaningful results 5. Dealing with outliers (effective visualization by clustering) (removing outliers)
Types of Data Visualization Problems in Big Data Visualization 1. Tables 2. Histogram 3. Scatter plot 4. Various charts 5. Timeline 6. Various diagrams
1. Tables Collection of rows and columns, represent data into structured. Small unit is ‘cell’, represented as [4(row), 2(columns)]
2. Histogram Vertical bar chart is used Represent distribution od set of data over continues interval
3. Scatter plot Also known as X-Y Plots, Scattered Graph, Point Graphs or Scatter grams use to represent relationship among 2 different variables where one may or may not correlate to another.
Correlation 1. Positive 2. Negative 3. Null 4. Linear 5. Exponential 6. U-shape
4. Charts Types of Chart 1. Line Chart 2. Bar Chart 3. Pie Chart 4. Area Chart 5. Flow Chart 6. Bubble Chart
1. Line Chart 4. Area Chart 2. Bar Chart 5. Flow Chart 3. Pie Chart 6. Bubble Chart
5. Timeline Pictorial representation of events in chronological sequence along with drawing straight line. Timeline 1. Linear timeline 2. Comparative timeline
5. Various Diagram Various diagram 1. Venn Diagram 2. Data Flow Diagram 3. Entity Relationship Diagram
1. Venn Diagram 2. Data Flow Diagram 3. Entity Relationship Diagram
Conventional Data Visualization Tool The methods and ideas used by organization for visualizing data. 1. Selection point on which interactive visualization takes place 1. Size and Volume of data (To make perfect choice, size and volume should be visualize) 2. Cardinality (Cordiality should be visualize) 3. Portion of Data to be Convey (Visualizing the point/portion of data which user want to convey) 4. Audience (To whom user want to convey) 5. Type of visual (Which type of visualization user should use)
2. Interactive Visualization approaches 1. Zoom in and Zoom out or Zooming (allows user to change scale of interface are according to choice) 2. Overview + Detail (Multiple view simultaneously used) 3. Focus + Context or Flash Eye (Focus area represent detail about part of information)
3. Steps used to perform interactive visualization 1. Interactive selection of data object 2. Linking data object with each other 3. Filtering Information 4. Rearranging or Remapping (According to user data entities /subset/part of whole selected for visualization) (use for connecting multiple views) (only valuable data focused and unrelated remove) (rearrange data)
Techniques For Visual Data Representation According to diff author data visualization techniques are different. Visualization Techniques/Methods 1. Data Visualization (Help to represent quantitative data with or without axes in diagrammatic form eg. Table, Line chart, pie chart) 2. Information Visualization (Provide interactivity in data to increase cognition. eg. Tree map, Clustering, Venn Diagram ) 3. Concept Visualization (Used to explain ideas, plans, concept in detail and analyse easily eg. Decision Tree) 4. Strategic Visualization (used to represent organizations strategies of development, Formulation, implementation. Eg. Organizational chart, failure Tree, Strategy map) 5. Metaphor Visualization (organize and structure information graphically. Express insight of information. Metro Map, Tree) 6. Compound Visualization (allow merging diff graphic format in single shema. eg Cartoon )
Data Visualization Tools For visualizing data sets in the form of 2 D and 3 D various tool are used Part of visualization tool divided into 2 part: Part of visualization tools 1. Multidimensional visualization 2. Multidimensional Visualization Tool
1. Multidimensional Visualization There are two categories of multidimensional visualization First type examine the category properties or category count. q o o Example Pie chart Bar chart Histogram Tree map Second type examines the relationship among the variables q o o Example Scatter Plot Line chart Area chart Tabular comparison
2. Multidimensional Visualization Tool v Google Charts This tool display live data on our website Google Chart contains Introduction , Quick Start and Chart Gallery for ideas.
v Many Eyes Many eyes is an research done by IBM research and IBM Cognos s/w grp. Developed by using JAVA and Flash, Open Source It is public website, allows user to upload data and for such data it will generate interactive visualization.
v Tableau Public Most popular tool, developed by US company Tableau Software. According to their website it “Brings Data to Life”.
v Weave Web Based Analysis and Visualization Environment Can handle diff datatypes bcz it has large array of option for working with various data.
v Wordle takes text as input from user and generate ‘Word Clouds’ Clouds provide greater importance to words which frequently occurs in source text
Open – Source Data Visualization Tools 1. Datawrapper 2. Chart JS 3. Raw 4. Charted 5. Timeline 6. Leaflet
1. Datawrapper Open –source, produce in Europe by the journalism organization. designed to create data visualization for news institutes. Graph can be created in 4 steps; : - To create graph click on “New Chart” link on top menu bar - paste your data in the text area. - Then, tool analyses, and show preview - if everything is fine then publish data
2. Chart JS Open source, having clean charting library Chart JS allow self control to user over look and feel of their chart Before creating chart , include library in frontend code(code must) Then add chart and assign value to them
3. Raw Open source, web based tool, built on D 3. js library Simple, ready to use tool for non-programmable user 4. Charted Open source, invented by the product science team at Medium To visualize data just paste link of Google spreadsheet or. csv as input it check whether data is up-to-date or not after some interval(30 min).
5. Timeline To display set of events in sequential manner Need proper formatting of data in Google spreadsheet 6. Leaflet Lightweight, mobile friendly Java. Script library, use to create interactive maps take advantage of HTML 5 and CSS 3 Well documented, easy to use, beautiful API and readable source code
Analytical Techniques used in Big Data Visualization Analytical Methods 1. Classification 2. Regression 3. Clustering 4. Association Rule
1. Classification
Supervised learning SL is where you have input variables (X) and an output variable (Y) We use an algorithm to learn mapping function from input to output Y=f(X). Goal is when you have new input data(X) then you can predict output variables(Y) For instance, suppose you are given an basket filled fruits. Now the first step is to train the machine with different fruits : If shape of object is rounded and depression at top having color Red then it will be labelled as –Apple. If shape of object is long curving cylinder having color Green-Yellow then it will be labelled as –Banana.
i) Classification problem is when the output variable is category, such as “red” or “blue. ” Classification model attempts to draw some conclusion from observed values. Given one or more input to classification model will try to predict value of one or more outcomes. for eg. When filtering emails “spam” or “not spam”
ii) Regression problem is when output variable is real or continuous value such as “Salary” or “Weight” Diff between classification and regression, Classification predict something will happen whereas Regression predict how much of it will happen The ans of following types of question Regression analysis use: 1) How much person expected income is? (ans- Linear regression) 2) What is the probability that an applicant will fail to clear loan? (ans- Logistic regression)
Unsupervised learning Hidden structure is discovered from unlabeled data Unsupervised learning is the training of machine using information that is neither classified nor labeled Unlike supervised learning, no teacher is provided that means no training will be given to the machine Task of machine is to group unsorted information according to similarities, patterns and differences without training data.
For instance, suppose it is given an image having both dogs and cats which have not seen ever Thus machine has no any idea about the features of dogs and cat so we can’t categorize it in dogs and cats. But it can categorize them according to their similarities, patterns and differences.
i) Clustering Unsupervised technique used for grouping similar object. No prediction, find out similarities between object and grp in to cluster ii) Association Rule Unsupervised technique No prediction made, instead it find out remarkable relationship among item that are hidden in large dataset. This discovered relation denoted as Rules
Data Visualization with Tableau is Business Intelligence tool s/w data. has its own in-memory data engine, Help to speed up the visualization Hadoop embedded with Tableau, uses Hive Ø Features 1. Quick and easy data acquisition 2. Publication of interactive graphics 3. Data are public 4. Has 3 main product : i) Tableau Desktop ii) Tableau Server iii) Tableau Public
Introduction : Ø Pentaho : It provide Data analysis designing, monitoring, Data Mining and integration features Ø Flare : It is Action. Script library, runs on Adobe Flash Player Ø Jasper Reports open source java reporting tool, define in XML format Ø Dygraphs fast, flexible, open source Java. Script charting library Ø Datameer Analytics Solution and Cloudier allows to store entire data in hadoop Ø Platfora bult on Hadoop and Spark
Ø Node. Box : node-based s/w, used for creating 2 D graphs Ø Gephi: Written in java and Open. GL open source java reporting tool, define in XML format Ø Google Chart API provide simple visualization using online tool Ø Flot Jquery library for line and Bar chart Ø D 3. js Data-Drivan Document(HTML + CSS) Ø Visual. ly provide template, popular for infographics
- Slides: 43