Begin with the Data Sources and Sizing Using

Begin with the Data Sources and Sizing

Using this Presentation Template This presentation is designed to provide an adaptable format for presenting information related to Big Data projects to your organization. On its own, it is an instruction about the basic concepts of Big Data and how they are related. We encourage you to use the format to expand each section to support the individual needs of your organization.

Understand the Data Ask the following questions about your organization? l What data does the organization use? l Where does the data come from? l How much data is there? l How much data is actually used? l How much data is discarded without analysis? l How much data is duplicated? l How much data are you liable for?

The Human Analogy What if data processing by an organization was similar as data processing by the human being?

Defining the Data is often defined in terms of: l Structured/Unstructured l Permanent/Temporary l Text/Rich-Text or Non-Text (Images, Video, Audio)

Defining the Data (Part II) From the Federal Enterprise Architecture Data Reference Model, define all usable data in the following format: l Data Description l Data Context l Data Sharing

Sources of Data Where does the data come from? Internal sources: Documentation Databases Operational logs Transactions External sources: Research Studies Stock Markets Regulatory Agencies Customer Feedback METADATA

Data Usage DATA Big Data Usage Traditional Data Usage

Data Duplication Why is data duplicated? l l Support Backup and Recovery Regulatory Requirements Improve Performance Employee Preferences

Liability Inspired by IDC 2010 Study Consumer Generated Data Enterprise Liability

Moving Forward Understanding the characteristics of data within your organization provides a clear definition of the scope and context for any data-related solution. The next area of concern is storing the data, particularly planning for exponential increases in data over time. This topic will be explored more fully in the presentation, Data Storage, and will introduce the emerging technologies related to virtualization and cloud computing.
- Slides: 11