Where to with No Sql Patrick Thompson Patrick
- Slides: 27
Where to with No. Sql Patrick Thompson Patrick. thompson@channelintelligence. com – svp of new stuff Actually mostly hadoop
Outline ● High-level stuff ● Lower level stuff ● Specifics ● Other Stuff
90 What Goes Where 80 Relative Sizes 70 60 50 40 Events 30 20 10 0 Edge Data Warehouse High volume in, low volume out unpredictable content, Scan, scrub & grep Requests UI Edge Data Mart Simple key lookup No joins No scans Cubes & spreadsheets Analytic Joins, inequalities, complex fixed structure DW Analytic DM True for CI – your experience may vary
Origins Functional programming map & fold E. g. Python’s map, filter & reduce Google paper 2004 Nutch (crawler & indexer) 2002 -2004 DFS & Map. Reduce added to Nutch Apache OSS 2006 -2008 Qizmt Sept 2009 (c# hadoop out of my. Space )
Map-Reduce input 0 output 0 Sort input 1 input 2 input 3 … input n Input output 1 Map Phase Intermediate files on local disks Sort Reduce Phase Output
Map-Reduce Map Phase Input Reduce Phase Sort Output Provides 0 or more records Each with a key Provides 1 or more records Intermediate files on local disks Records sorted By key Key + all records With that key records
Terminology ● ● ● DFS/HDFS ● Distributed file system Cluster ● A group of machines with a common DFS Function ● Map filters and defines keys. Can be paired with a reduce ● Reduce computation over a set of records with a common key Job step ● Map-reduce Job ● Sequence of job steps DFS File ● A file in the DFS potentially distributed across the DFS cluster
What does it look like?
SQL v’s Hadoop Performance
Demo – sort of http: //www. bpib. com/illustrat/whrobin 9. gif
SQL v’s Hadoop Design, Develop, Test and Deploy • Normal forms, dependency graphs • SSMS + SQL DDL, SQL, TSQL • Test. DB • Script deployment • Filter, sort, merge, filter, aggregate • Eclipse, hadoop, linux + java • Try it and see • Stop testing Some places are very sophisticated – unless you’re a hard-core unix shop it will be pretty rough to start with
SQL v’s Hadoop Backup and Recovery • Backup + Logs • Restore • Reprocess and/or another cluster – Reprocess from when? If you accept two inconsistent statements any conclusion follows
SQL v’s Hadoop Transactions • It’s all about consistency • It’s all about throughput and asynchronous and ftp (or whatever) access Don’t allow anyone to read a partial file and you’re in good shape
SQL v’s Hadoop Stream processing • Probably not • Definitely not I want to know when Fred carts something and doesn’t buy within 5 minutes • • Esper (http: //esper. codehaus. org/) SQL Server’s Streaminsight Stonebraker’s latest toy, if you have to ask the price, you Twitter’s Storm probably can’t afford it Stream Base
Hive versus SQL • SQL & tables – maybe even a few views Tr. SQL perhaps • Truncated SQL & columns that might correspond to the contents of a file (or might not)
Hadoop v’s SQL Drilling Down • AKA joins… • Run-a-job or maybe Tr. SQL over Hive Which it turns out is also run-a-job
Sharded SQL versus HDFS • Joins? Indexes? Connectivity? Transactions? Other ugly stuff • Replicate & repeat There’s a reason why there’s no “D” in “SQL”
SSIS Versus Hadoop • ETL++ or maybe ELT SQL Server connector for hadoop (2008 only…) http: //www. microsoft. com/download/en/details. aspx? id=27194
SSAS Versus … • OLAP, MDX etc • … Something other than hadoop
SQL versus Hadoop – Lookup Data • OK • Really not OK – Cluster lookups Other things come to mind
SQL versus Hadoop – Unstructured Data • First Normal Form • However SQL does blobs, Xml & full-text indexing • No form to speak of Without structure – what’re you gonna do with it? Grep perhaps?
SQL versus Hadoop – Images & Blobs • Blobs in Sql don’t make much sense • Some forms of esoteric image processing – commercially not very interesting You can use Sql server to access NTFS if you really want to
SQL v’s Hadoop Support • 1. 3 million Sql Server professionals • Very active community • Cloudera’s not bad Some of whom have been at it for over 30 years
Other Stuff Parallel C# See http: //www. parallelcsharp. com/
Other Stuff Dryad From http: //research. microsoft. com/en-us/projects/dryad/
Other Stuff Dryad LINQ From http: //research. microsoft. com/en-us/projects/dryadlinq/default. aspx
Questions?
- Thompson y thompson
- Difference between oracle and pl sql
- Oracle sql developer real time sql monitoring
- Intentional food additives examples
- Ela 200
- 8183 thompson st douglasville ga
- Thompson's construction examples
- Stfc ukri
- Wayne thompson sas
- Simon thompson kent
- Florence leona christie
- Carnegie mellon what is rpa robotic process automation
- French thompson bnsf
- Thompson tetrahedron fcc
- Dual nature of light
- Lara roscher
- Ria thompson
- Nfa to dfa converter
- Victor thompson nasa
- Aforador thompson
- Sarnat modificado
- Linterland
- Signe de thompson
- Kevin thompson nsf
- Dramaturgia administrativa de thompson
- Paul craig thompson
- Thompson
- Emery ve trist araştırması