Where to with No Sql Patrick Thompson Patrick

  • Slides: 27
Download presentation
Where to with No. Sql Patrick Thompson Patrick. thompson@channelintelligence. com – svp of new

Where to with No. Sql Patrick Thompson Patrick. thompson@channelintelligence. com – svp of new stuff Actually mostly hadoop

Outline ● High-level stuff ● Lower level stuff ● Specifics ● Other Stuff

Outline ● High-level stuff ● Lower level stuff ● Specifics ● Other Stuff

90 What Goes Where 80 Relative Sizes 70 60 50 40 Events 30 20

90 What Goes Where 80 Relative Sizes 70 60 50 40 Events 30 20 10 0 Edge Data Warehouse High volume in, low volume out unpredictable content, Scan, scrub & grep Requests UI Edge Data Mart Simple key lookup No joins No scans Cubes & spreadsheets Analytic Joins, inequalities, complex fixed structure DW Analytic DM True for CI – your experience may vary

Origins Functional programming map & fold E. g. Python’s map, filter & reduce Google

Origins Functional programming map & fold E. g. Python’s map, filter & reduce Google paper 2004 Nutch (crawler & indexer) 2002 -2004 DFS & Map. Reduce added to Nutch Apache OSS 2006 -2008 Qizmt Sept 2009 (c# hadoop out of my. Space )

Map-Reduce input 0 output 0 Sort input 1 input 2 input 3 … input

Map-Reduce input 0 output 0 Sort input 1 input 2 input 3 … input n Input output 1 Map Phase Intermediate files on local disks Sort Reduce Phase Output

Map-Reduce Map Phase Input Reduce Phase Sort Output Provides 0 or more records Each

Map-Reduce Map Phase Input Reduce Phase Sort Output Provides 0 or more records Each with a key Provides 1 or more records Intermediate files on local disks Records sorted By key Key + all records With that key records

Terminology ● ● ● DFS/HDFS ● Distributed file system Cluster ● A group of

Terminology ● ● ● DFS/HDFS ● Distributed file system Cluster ● A group of machines with a common DFS Function ● Map filters and defines keys. Can be paired with a reduce ● Reduce computation over a set of records with a common key Job step ● Map-reduce Job ● Sequence of job steps DFS File ● A file in the DFS potentially distributed across the DFS cluster

What does it look like?

What does it look like?

SQL v’s Hadoop Performance

SQL v’s Hadoop Performance

Demo – sort of http: //www. bpib. com/illustrat/whrobin 9. gif

Demo – sort of http: //www. bpib. com/illustrat/whrobin 9. gif

SQL v’s Hadoop Design, Develop, Test and Deploy • Normal forms, dependency graphs •

SQL v’s Hadoop Design, Develop, Test and Deploy • Normal forms, dependency graphs • SSMS + SQL DDL, SQL, TSQL • Test. DB • Script deployment • Filter, sort, merge, filter, aggregate • Eclipse, hadoop, linux + java • Try it and see • Stop testing Some places are very sophisticated – unless you’re a hard-core unix shop it will be pretty rough to start with

SQL v’s Hadoop Backup and Recovery • Backup + Logs • Restore • Reprocess

SQL v’s Hadoop Backup and Recovery • Backup + Logs • Restore • Reprocess and/or another cluster – Reprocess from when? If you accept two inconsistent statements any conclusion follows

SQL v’s Hadoop Transactions • It’s all about consistency • It’s all about throughput

SQL v’s Hadoop Transactions • It’s all about consistency • It’s all about throughput and asynchronous and ftp (or whatever) access Don’t allow anyone to read a partial file and you’re in good shape

SQL v’s Hadoop Stream processing • Probably not • Definitely not I want to

SQL v’s Hadoop Stream processing • Probably not • Definitely not I want to know when Fred carts something and doesn’t buy within 5 minutes • • Esper (http: //esper. codehaus. org/) SQL Server’s Streaminsight Stonebraker’s latest toy, if you have to ask the price, you Twitter’s Storm probably can’t afford it Stream Base

Hive versus SQL • SQL & tables – maybe even a few views Tr.

Hive versus SQL • SQL & tables – maybe even a few views Tr. SQL perhaps • Truncated SQL & columns that might correspond to the contents of a file (or might not)

Hadoop v’s SQL Drilling Down • AKA joins… • Run-a-job or maybe Tr. SQL

Hadoop v’s SQL Drilling Down • AKA joins… • Run-a-job or maybe Tr. SQL over Hive Which it turns out is also run-a-job

Sharded SQL versus HDFS • Joins? Indexes? Connectivity? Transactions? Other ugly stuff • Replicate

Sharded SQL versus HDFS • Joins? Indexes? Connectivity? Transactions? Other ugly stuff • Replicate & repeat There’s a reason why there’s no “D” in “SQL”

SSIS Versus Hadoop • ETL++ or maybe ELT SQL Server connector for hadoop (2008

SSIS Versus Hadoop • ETL++ or maybe ELT SQL Server connector for hadoop (2008 only…) http: //www. microsoft. com/download/en/details. aspx? id=27194

SSAS Versus … • OLAP, MDX etc • … Something other than hadoop

SSAS Versus … • OLAP, MDX etc • … Something other than hadoop

SQL versus Hadoop – Lookup Data • OK • Really not OK – Cluster

SQL versus Hadoop – Lookup Data • OK • Really not OK – Cluster lookups Other things come to mind

SQL versus Hadoop – Unstructured Data • First Normal Form • However SQL does

SQL versus Hadoop – Unstructured Data • First Normal Form • However SQL does blobs, Xml & full-text indexing • No form to speak of Without structure – what’re you gonna do with it? Grep perhaps?

SQL versus Hadoop – Images & Blobs • Blobs in Sql don’t make much

SQL versus Hadoop – Images & Blobs • Blobs in Sql don’t make much sense • Some forms of esoteric image processing – commercially not very interesting You can use Sql server to access NTFS if you really want to

SQL v’s Hadoop Support • 1. 3 million Sql Server professionals • Very active

SQL v’s Hadoop Support • 1. 3 million Sql Server professionals • Very active community • Cloudera’s not bad Some of whom have been at it for over 30 years

Other Stuff Parallel C# See http: //www. parallelcsharp. com/

Other Stuff Parallel C# See http: //www. parallelcsharp. com/

Other Stuff Dryad From http: //research. microsoft. com/en-us/projects/dryad/

Other Stuff Dryad From http: //research. microsoft. com/en-us/projects/dryad/

Other Stuff Dryad LINQ From http: //research. microsoft. com/en-us/projects/dryadlinq/default. aspx

Other Stuff Dryad LINQ From http: //research. microsoft. com/en-us/projects/dryadlinq/default. aspx

Questions?

Questions?