Data Mining Scratching the surface And now you

  • Slides: 9
Download presentation
Data Mining Scratching the surface And now you have… Datameer Rapid. Miner Windows Azure

Data Mining Scratching the surface And now you have… Datameer Rapid. Miner Windows Azure Marketplace by Prateek Burman

Datameer • • Integrate, Analyze, Visualize • Targeted at Hadoop users • Around since

Datameer • • Integrate, Analyze, Visualize • Targeted at Hadoop users • Around since 2009 Scalable Secured access Excel like interface

Integration • • Oracle, DB 2, MS SQL, My. SQL Teradata, Greenplum XML, JSON,

Integration • • Oracle, DB 2, MS SQL, My. SQL Teradata, Greenplum XML, JSON, CSV Hbase, Cassandra Datameer cont’d. • • Twitter, Facebook, Linked. In Email Log files Saa. S – CRM, Git. Hub, JIRA Analytics • • • Time series analysis Clustering Decision trees Built-in Recommendation engine Column Dependencies Predictive analysis with R, PMML

Datameer cont’d. Visualization • Graphs • Shapes • Dashboard • Maps • Tables •

Datameer cont’d. Visualization • Graphs • Shapes • Dashboard • Maps • Tables • HTML 5 • Visualization apps from apps market

Rapid. Miner – Yet Another Learning Environment (YALE) • • • Around since 2001

Rapid. Miner – Yet Another Learning Environment (YALE) • • • Around since 2001 Open source - Older versions Client/Server model w/ Server as Saa. S Most popular for data analytics GUI based – no need to write code • • • Predictive analysis Text mining Sentiment analysis Direct Marketing Predictive Maintenance

Rapid. Miner cont’d… • • • Lab. View type layout No coding – min.

Rapid. Miner cont’d… • • • Lab. View type layout No coding – min. likelihood of error One operator's output is another operator’s input Only structured datasets 3 D graphics & Interactive dashboards

Windows Azure Marketplace • • • Launched in 2010 Hundreds of apps Thousands of

Windows Azure Marketplace • • • Launched in 2010 Hundreds of apps Thousands of subscriptions Trillions of data point Scalable – load balance No need to move data Data • • Git. Hub/svn of data Point of discoverability Clean - Ready to use data An economic model for broad access OData standard Excel, SQL server, Office, Deliver using RESTful web-service access • • • Marketplaces Infochimps Factual Datamarket Gnip Datasift Kasabi

R Rapid. Miner • • Cutting edge • Algorithms • • Learning curve •

R Rapid. Miner • • Cutting edge • Algorithms • • Learning curve • Need to import data • Slow • Datameer • • Point & click • • Excel like interface • Extensible to R, Python etc. • Need to import data • Supports many Hadoop Distributions • Optimized for Hadoop • Business Infograpics & Dashboards • HTML 5 – view anywhere Intuitive Can execute R scripts Can be extended using Java or Ruby scripts Pretty graphics Azure Marketplace Need to import data • Known tools like Cron scheduler Excel • Data readily available • Cleaner data • Other Windows services