Big Data Overview Dr Anil Maheshwari 2 Video
Big Data Overview Dr. Anil Maheshwari
2 Video, Audio, Logs Petabytes, Exabytes Aphanumeric Gigabytes Relational Commercial Pre-relational, Opensource Hadoop Map. Reduce HDFS HBase Small Data BIG Data Volume Velocity Variety Veracity
Small Data vs Big Data 3 Feature Rep Animal Small Data Cat Big Data Tiger Purpose Manage business activities Communicate, Monitor Source Business transactions Quantity Structure Content Speed Storage medium Data organization Data Manipulation Processing speed req Database cost Users and machine generated, social media Gigabytes Exabytes Well-Structured Semi- or Unstructured Aphanumeric Audio, Video, Graphs Batch level Real-time torrential large SAN managed using one Hadoop clusters w/ commo-dity large machines disks Relational Pre-relational Conventional programming languages, SQL Overnight Map. Reduce using parallel processing algorithms, SQL Near realtime Commercial Open-source
Big Data defined Collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. Challenges include capture, curation, storage, search, sharing, analysis, and visualization Additional information is derivable from analysis of a single large set of related data, as compared to separate smaller sets to spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions
The ‘Big Data’ Picture - Business “Big data will disrupt your business. Your actions will determine whether these disruptions are positive or negative. ” (Source: Big Data Disruptions Tamed With Enterprise Architecture” Gartner, 2012) Business Issues How to use generated data as a strategic asset in real-time, to identify opportunities, thwart threats and achieve operational efficiencies How to organize the business to not get buried in high volume, velocity and variety of data How to design a ‘Digital Business Strategy’ around digital assets and capabilities
The ‘Big Data’ Picture - Technology "Big data" forces organizations to address the variety of information assets and how fast these new asset types are changing information management demands. (Source: 'Big Data' and Content Will Challenge IT Across the Board” Gartner, 2012) IT Issues How can IT professionals integrating "big data" structured assets with content must increase their business requirement identification skills. How can IT support teams support end-user-deployed big data solutions How to re-design enterprise data warehouses to address big data issues How to help design a ‘Digital Business Strategy’ around digital assets and capabilities
Films on Big Data Digital nation – PBS (90 min) http: //video. pbs. org/video/1402987791/ig BBC Documentary on Big Data (60 min) https: //www. youtube. com/watch? v=t. Ixx. Eur 07 Q
Big Data: the exponential growth of business data This growth is made possible in large part by the advancement of technology. This graph shows growth of disk drive average capacities: from 1 MB in 1980 to 1 TB in 2010.
Big Data characteristics Variety Many types, sources, and quality Velocity Mobile + Social Media = Velocity Volume Autonomous data streams of video, audio, text, data, etc. Sources of Data: People: Google searches, Facebook posts, Tweets, Youtube videos, other Social Media, blogs, emails Machines: RFID, telematics, connected devices, Mobile location, surveillance, etc Metadata: web crawlers, bots
Source: http: //hortonworks. com/wp-content/uploads/2012/05/bigdata_diagram. png
Kendall & Kendall Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 13 -11
Innovation with Big Data http: //blogs. gartner. com/doug-laney/
Use of Big Data Discover new insights What experts think is ordinary can be very interesting for ordinary folks, and vice versa Create new models of reality Refuting the taken for granted is interesting Reality seems to be X, but in actuality reality is X’
Key findings highlighted in IBM Analytics study Across all industries, the business case for big data is strongly focused on addressing customer-centric objectives A scalable and extensible information management foundation is a prerequisite for big data advancement Organizations are beginning their pilots and implementations by using existing and newly accessible internal sources of data Advanced analytic capabilities are required, yet lacking, for organizations to get most value from big data As awareness and involvement in big data grows, four key stages of big data adoption emerge along a continuum. http: //www. ibmbigdatahub. com/blog/new-study-details-how-real-world-enterprises-are-using-big-data
Insights into Big Data The faster you analyze the data, the more its predictive value Maintain one copy of your data, not multipl Use more diverse data, not just more data Data has value beyond what you initially anticipate Plan for exponential growth Solve a real pain-point Put humans and data together to get most insight Big data is transforming business, just like IT did http: //www. forbes. com/sites/davefeinleib/2012/07/24/big-data-trends/
Recommendations for Big Data Educate business and IT leaders about the business benefits of sophisticated analytics Analyze, identify and remove cultural roadblocks to data sharing Identify talent pool and skill gaps for interdisciplinary roles such as data scientists and chief data officers. Create an enterprise architecture and acquire the toolsets to manage and process data thru the life cycle. Start experiments with Map-Reduce clusters, and business personalization themes Organize and consolidate data with minimum information loss Process in near real-time or the advantage is lost Dynamically keep looking for new patterns
Big Data imperatives Organize and consolidate data with minimum information loss Process in near real-time or the advantage is lost Dynamically keep looking for new patterns
Kendall & Kendall Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 13 -18
Big Data Growth drivers Many new sources of data People: Google searches, Facebook posts, Tweets, other Social Media, blogs, emails Machines: RFID, telematics, connected devices, Mobile location, surveillance, etc Metadata: web crawlers, bots, High density: high def videos,
Big Data economics Data growing faster than Moore’s law Doubling every 12 -18 months More data generated in 1 second today than was on the whole web in 1990 Mc. Kinsey 2011 report on Big Data oppty Large business opportunity 160, 000 new jobs Davenport books/articles 2006 -12 Competing on analytics ‘Data scientist’ – sexiest job of the future
Kendall & Kendall Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 13 -21
Big Data technology Non-relational data structures Hadoop and open source stack Google Big. File, Semantic web, etc. Massively parallel computing Map-Reduce algorithms Unstructured Information Management Architecture (UIMA) The ‘sauce’ behind IBM’ Watson system Natural language processing
Lots of Data Scientist Jobs: Word. Cloud http: //blogs. gartner. com/doug-laney/
Big Data perspectives Economist magazine http: //www. economist. com/blogs/dailychart/2011/1 1/big-data-0 Churchill club seminar http: //www. youtube. com/watch? v=KD_g 6 byn 83 s BBC Horizon on Big Data https: //www. youtube. com/watch? v=t. Ixx. Eur 07 Q Write down 2 -4 main points
- Slides: 24