MIS 2502 Data and Analytics The Information Architecture

  • Slides: 14
Download presentation
MIS 2502: Data and Analytics The Information Architecture of an Organization Jae. Hwuen Jung

MIS 2502: Data and Analytics The Information Architecture of an Organization Jae. Hwuen Jung jaejung@temple. edu http: //community. mis. temple. edu/jaejung

Two types of data Transactional • Captures data to support operation • Data describing

Two types of data Transactional • Captures data to support operation • Data describing an event • Real-time Analytical • Captures data to support analysis and reporting • An aggregated view of the business • Historical

Components of an information infrastructure Transactional Database Analytical Data Store Captures data describing an

Components of an information infrastructure Transactional Database Analytical Data Store Captures data describing an event Extracted from transactional data Supports management of an organization’s data Supports managerial decision-making For everyday transactions Used in analysis and reporting This is what is commonly thought of as “database management” This is the foundation for “advanced data analytics”

The Information Architecture of an Organization Data entry Data extraction Data analysis Transactional Database

The Information Architecture of an Organization Data entry Data extraction Data analysis Transactional Database Analytical Data Stores real-time transactional data in a relational or No. SQL database Stores historical transactional and summary data

The Transactional Database • Definition of Transaction In business, a transaction is the exchange

The Transactional Database • Definition of Transaction In business, a transaction is the exchange of information, goods, or services. For databases, a transaction is an action performed in a database management system. Transactional databases deal with both: they store information about business transactions using database transactions • Examples of transactions – Purchase a product – Enroll in a course – Hire an employee • Data is in real-time – Reflects current state – How things are “now”

Relational Database (RDBMS) • The Relational Paradigm: – How transactional data is collected and

Relational Database (RDBMS) • The Relational Paradigm: – How transactional data is collected and stored • Primary Goal: Minimize redundancy (normalization) – Reduce errors – Less space required Which of these do you think is more important today ? • Most database management systems are based on the relational paradigm – My. SQL, Oracle, Microsoft Access, SQL Server

Relational Database Student-Class enrollment Example This is good because: • Data is entered and

Relational Database Student-Class enrollment Example This is good because: • Data is entered and stored once • Minimizes redundancy

The Relational Database Student-Class enrollment Example • A series of tables with logical associations

The Relational Database Student-Class enrollment Example • A series of tables with logical associations between them • The associations (relationships) allow the data to be combined Student-Class Student. ID Student. Class. ID Name Student. ID Name Major Class. ID GPA

No. SQL Database • Stands for “Not only SQL” – Supports semi-structured (unstructured) data

No. SQL Database • Stands for “Not only SQL” – Supports semi-structured (unstructured) data • Primary Goal: flexibility and scalability – schema-less and nested data – requires less management ? • Better fit for companies dealing big data & real time web applciations – Facebook, Airbnb, Netflix, Linked. In, …

No. SQL Database Schema-less and embedded documents Last. Name: “WELLS” GPA: 3. 0 Last.

No. SQL Database Schema-less and embedded documents Last. Name: “WELLS” GPA: 3. 0 Last. Name: “NORBERT” Last. Name: “KENDALL” Major: “MIS” Major: “FIN” GPA: 3. 5 Class: {Class. ID: 1234, Class. Name: “MIS 2101”} Class: [{Class. ID: 1235, Class. Name: “MIS 2502”, …} …] Class: [{Class. ID: 1234, Class. Name: “MIS 2101”} …] This is good because: • More flexible – easily insert/delete data • More faster – requires less merging (join)

From structured to unstructured data

From structured to unstructured data

The Analytical Data Store • Stores historical and summarized data – “Historical” means we

The Analytical Data Store • Stores historical and summarized data – “Historical” means we keep everything • Data is extracted from the transactional database and reformatted for the analytical data store Extract Transactional Database Transform Query Data conversion Load Query Analytical Data Store We’ll discuss this in much more detail later in the course!!

Three Advanced Data Analytic Techniques • Decision Trees Used to classify data according to

Three Advanced Data Analytic Techniques • Decision Trees Used to classify data according to a pre-defined outcome • Clustering Used to determine distinct groups of data • Association Rule Find out which events predict the occurrence of other events

The agenda for the course

The agenda for the course