Understanding Big Data Mr Sriram Email hadoopsriramagmail com
Understanding Big Data Mr. Sriram Email: hadoopsrirama@gmail. com
Objectives v What is Big Data ? v Applications of Big Data v Challenges in handling Big Data v Big Data Analytics v Application of BDA in real world v More use cases for Big Data v Analyze Limitations And Solutions Of Existing Data Analytics Architecture v What is Distributed Data File System?
What is Big Data • • • What is Big Data ? Applications of Big Data Challenges in handling Big Data What comes under Big Data Characteristics of Big Data / Types of Data Unstructured data is exploding heavily Big Data Analytics Application of BDA in real world More use cases for Big Data Customers
What is Big Data? v Lots of data / Huge data (More than 1 Petabytes) v Big data is the term for a collection of data sets, so large and complex that it becomes difficult to process using traditional data processing applications v Big data is a general term used to describe the voluminous amount of unstructured and semi-structured data a company creates. v The size of big data is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. v The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization
What is Big Data?
How big is Big Data? v Lots of data / Huge data (More than 1 Petabytes) § 1 Peta byte=>1000 ZB § 1 zeta byte=> 1 billion tera byte § 1 tera byte=> 1000 GB Facebook v Facebook Insights provides developers and website owners with access to real-time analytics related to Facebook activity across websites with social plugins, Facebook Pages, and Facebook Ads. Using anonymized data, Facebook surfaces activity such as impressions, click through rates and website visits. i. e. , 30+ PB per day
Applications of Big Data v A primary goal for looking at big data is to discover repeatable business patterns. Everyday System/Enterprises generates huge amount of data from terabytes to petabytes of information in the world. Big data examples : v Google processes about 24 petabytes of data per day v The experiments in the Large Hadron Collider produce about 15 petabytes of data per year. v The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3 D CGI effects v Amazon handles 20 million Customer clicks stream user Data per day to recommend the products v Stock Market generates about one terabyte of new trade data per day to perform the stock trading analytics to determine trends for optimal trades v 300 billions of emails sent every day. Services analyze this data to find spams
Challenges in handling Big Data v Difficulties – Capture, storage, search, sharing, analytics, visualizing data v Data Storage – Physical storage, Acquisition, Space & Power costs v Data Management – Skills, People, Time v Data Processing (Information and Content management)
What comes under Big Data? Big data involves the data produced by different devices and applications. Given below are some of the fields that come under the umbrella of Big Data. v Black Box Data : It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft v Social Media Data : Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe v Stock Exchange Data : The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers v Power Grid Data : The power grid data holds information consumed by a particular node with respect to a base station. v Transport Data : Transport data includes model, capacity, distance and availability of a vehicle v Search Engine Data : Search engines retrieve lots of data from different databases
Characteristics of Big Data Categories / Types of Data 1. Volume – Size of data Structured Data from Enterprise systems such as ERP, CRM E. g. , Tables, Relational Data 2. Velocity – Speed of data 3. Variety – Different types of data 4. Veracity – Trustworthiness of data Semi Structured Data XML Files, Email body E. g. , XML, documents with table Unstructured Data Audio, Video, Images, Archived documents E. g. , Raw text, images, audio and video 5. Value – Talking about business into Money Streaming data E. g. , You. Tube, Tweets 6. Variability – Data is not constant Temporal data E. g. , OLAP/OTP Data including trends and activities in time 7. Visualization – Talks about report generation Geospatial data E. g. , Regions, Tracks, Shape
Unstructured Data is Exploding heavily v 90 % of the world’s data was generated in the last few years v Due to the advent of new technologies, devices, and communication means like social networking sites, the amount of data produced by mankind is growing rapidly every year. v By 2020, IDC (International Data Corporation) , predicts the number will have reached 40, 000 EB or Zettabytes (ZB) v The world’s information is doubling every two years. By 2020, there will be 5, 200 GB of data for every person on the earth.
Big Data Analytics
Big Data Analytics. . In Big Data Analytics (BDA), the user is typically trying to discover new facts that no one in the enterprise knew before. Helps in enterprise information management and decision making. The characteristics common to the technologies identified with BDA: v The perception that traditional data warehousing processes are too slow and limited in scalability v The ability to converge data from multiple data sources, both structured and unstructured v The realization that time to information is critical to extract value from data sources that include mobile devices, RFID, sensors etc. ,
Applications of Big Data Analytics in the real world
More Use Cases for Big Data Research & Development v. Use customer insights to eliminate unnecessarily costly features and add features which has a higher value for the customer. v. Improve gross margins After-Sales Support v Obtain real-time input on emerging defects and adjust the production process immediately. v R&D operations could use these data for redesign, new product development Police departments v Target crime hotspots and prevent crime waves Public utilities v. Usage of data from sensors on water & sewer usage v. Detect leaks and reduce water consumption Electric power utilities v Smart meters to better manage resources and avoid blackouts
Big Data Customers
Big Data Customers. .
Big Data Customers. .
Hidden Treasure
Analyze Limitations And Solutions Of Existing Data Analytics Architecture Limitations Solutions
Limitations of Existing Data Analytics Architecture
Solutions of Existing Data Analytics Architecture
Distributed Data File System
What is Distributed Data File System? Reading 1 TB from 1 Machine = 1024 * 1024 = 10, 48576 / 4 = 262, 144/100 = 2621. 44/60 = 43. 69 Minutes to Reading 1 TB from 10 Machine = 1024 * 1024 = 10, 48576 / 4 = 262, 144/100 = 2621. 44/60 = 43. 69 /10 = 4. 369 Minutes to Read To speed up the data, put 1/10 th of data in different machines with commodity hardware with 4 I/O channel with each 100 MB/s. Result: - Totally 1/10 th of the time to process the data.
File System Types of File system in Hadoop v LFS Local file system (local) v DFS distributed file system (server) v HDFS FS (cluster)
Logos Lab 26
Thank You !!!!!!
- Slides: 27