Marketing Analytics Technology Statistical Analysis Software R R

  • Slides: 18
Download presentation
Marketing Analytics Technology: Statistical Analysis Software & R R Basics Stephan Sorger www. stephansorger.

Marketing Analytics Technology: Statistical Analysis Software & R R Basics Stephan Sorger www. stephansorger. com Disclaimer: • All images such as logos, photos, etc. used in this presentation are the property of their respective copyright owners and are used here for educational purposes only • Some material adapted from: Sorger, “Marketing Analytics: Strategic Models and Metrics” © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 1

Statistical Analysis Software: Introduction Topic Definition Software designed for in-depth analysis Unlike MS Excel

Statistical Analysis Software: Introduction Topic Definition Software designed for in-depth analysis Unlike MS Excel (general purpose spreadsheet) Origins SAS conceived in 1966 by Anthony J. Barr Placed statistical procedures in formatted file framework Uses Advanced statistical techniques Nonlinear functions; Multiple regression; Conjoint Advantages Powerful; Accurate; Specific tools Disadvantages Command line interface; steep learning curve Very expensive © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 2

Statistical Analysis Software: Major Suppliers Criteria SAS SPSS R Market Focus User Origins Learning

Statistical Analysis Software: Major Suppliers Criteria SAS SPSS R Market Focus User Origins Learning Cost UI Database Graphics Analogy Fortune 500 Power user Industry Difficult $86, 600/yr+ Command Line 32, 768 var. SAS/Graph Microsoft Universities Ease of use Student Education Moderate $16, 000/yr+ Point & Click 1 file at a time High quality Apple Universities Price-sensitive Open Source Moderate Free Command Line Different packages Linux UCLA, Statistical Software Packages Comparison, ats. ucla. edu: http: //www. ats. ucla. edu/stat/mult_pkg/compare_packages. htm Mine. Quest Business Analytics, “Cost of Licensing WPS 3. 0 vs. SAS 9. 3. ” February 2013. http: //www. minequest. com/downloads/Pricing_Comparisons_Between_WPS_and_SAS. pdf IBM SPSS Statistics website, “Buy IBM SPSS Statistics Now” http: //www-01. ibm. com/software/analytics/spss/products/statistics/buy-now. html © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 3

R: Introduction Topic Description Free statistical computing and graphics software package Widely used among

R: Introduction Topic Description Free statistical computing and graphics software package Widely used among statisticians and data miners Increased popularity in 2010 - on History Started in 1993 as implementation of S programming language (1976) R developed by Ross Ihaka and Robert Gentleman “R” from Ross & Robert, as well as play on “S” Functions R includes many functions, which can be expanded through packages Data Can handle multiple simultaneous data sets, unlike Excel Data types: scalars, vectors, matrices, data frames, and lists Vectors: numerical, character, logical Commercial Revolution Analytics offers enterprise version ($) References: 1. Venables, W. N. , Smith, D. M. , “An Introduction to R. ” Version 3. 0. 1. May 16, 2013. http: //www. cran. r-project. org/doc/manuals/R-intro. pdf © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 4

R: Basics Topic Description Commands Based on UNIX; case sensitive Commands separated by “;

R: Basics Topic Description Commands Based on UNIX; case sensitive Commands separated by “; ” or by newline <CR> Comments #Hashtags to indicate comments Prompt > #system is waiting for you to type something Traditional version not menu-driven, unlike consumer software Arithmetic >5+4 [1] 9 #system returns the sum of 5 + 4, which is 9 Assignment (=) > x <- 3 # assign the number “ 3” to the object “x”; similar to “=“ sign Help 2 ways to get help; Example: Get help with “read. csv” command ? (read. csv) help(read. csv) © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 5

R: Basics Topic Description Functions R features a rich set of functions c() :

R: Basics Topic Description Functions R features a rich set of functions c() : Function c Statistics functions: mean(x); median(x); range(x); etc. Arithmetic functions: 4^2; log (10); sqrt (16) Vector > x <- v(1, 2, 3) # assign v, a vector of numbers to the object x Matrix > y <- matrix(c(1, 2, 3, 4, 5, 6), 2, 3 Print Ask R to print out numbers inside an object, such as a vector by printing it > print (x) # ask R to print out x > x # Or, you can just type the variable and hit return Plot Ask R to plot out lines based on a dataset by plotting the data > plot(data) Small subset R is a large, complex language. We cover only a small % in this class # create 2 x 3 matrix © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 6

R: Getting Started Topic Description Download R Windows: http: //cran. r-project. org/bin/windows/base/ Mac: http:

R: Getting Started Topic Description Download R Windows: http: //cran. r-project. org/bin/windows/base/ Mac: http: //cran. r-project. org/bin/macosx/ Launch R Double-click to launch Will see prompt in “R Console” > R Console > © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 7

R Editor and R Console: Layout Topic Description R Console R Editor New Script

R Editor and R Console: Layout Topic Description R Console R Editor New Script Traditional R interface; Command Line Interface Attempt at easier to use User Interface To open Editor, click on Select File > New Script Arrange Editor window on left; Console on right Execute (run) line: Highlight line on R editor; Click on “Run Line” Run Line Open Script Save Script Run Line Return focus to Console Print RGui Icons Untitled—R Editor R Console vector<-c(2, 4, 6, 8) > vector<-c(2, 4, 6, 8) © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 8

R Editor: Typical Usage Topic Description Statistics Find statistics mean(vector) <RUN LINE> (mean) var(vector)

R Editor: Typical Usage Topic Description Statistics Find statistics mean(vector) <RUN LINE> (mean) var(vector) <RUN LINE> (variance) sd(vector) <RUN LINE> (standard deviation) Run Line Select Run Line icon to move to R Console and execute Untitled—R Editor R Console vector<-c(2, 4, 6, 8) mean(vector) var(vector) sd(vector) > vector<-c(2, 4, 6, 8) > mean(vector) [1] 5 > var(vector) [1] 6. 6667 > sd(vector) [1] 2. 5819 © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 9

Sample R Session: Regression Analysis Topic Description 1. Preparation Remove introductory content; First line

Sample R Session: Regression Analysis Topic Description 1. Preparation Remove introductory content; First line should be data headers Save Excel file as Comma Separated Values (CSV) 2. Directory Optional: Set up working directory for dataset; allows shorter filepaths Windows: See “Windows Explorer help” for more info Mac: See “Finder help” for more info 3. Filename Need complete filename Example: “C: My DocumentsFolder AFilename. csv” Alternative 1: Right click to see filename Alternative 2: Find filename in Windows Explorer (Windows); Finder (Mac) Alternative 3: Drag csv file and drop into R Console; Will show filename © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 10

Sample R Session: Regression Analysis Topic Description 4. Read CSV data Datafile <- read.

Sample R Session: Regression Analysis Topic Description 4. Read CSV data Datafile <- read. csv(“C: \My Documents\Desktop\Filename. csv”, header=T) 5. Check data Print out dataset to ensure it was loaded correctly print(Datafile): will print out entire datafile; OK for small datasets str(Datafile): Shows structure of Datafile; “data. frame: 4 obs. of 4 variables” summary(Datafile): Shows summary: Min; Max; Mean; Median 6. Run regression lm: Regression analysis in R; stands for Linear Model lm(Dependent~Independent+Independent, Dataset) 7. Interpret Results Compare results obtained with R with those from Microsoft Excel © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 11

Example: Causal Analysis Forecast for Real Estate Step Description 1. Preparation Remove introductory information;

Example: Causal Analysis Forecast for Real Estate Step Description 1. Preparation Remove introductory information; First row = header row “Save As” CSV © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 12

Example: Causal Analysis Forecast for Real Estate Step Description 2. Directory Optional: Can set

Example: Causal Analysis Forecast for Real Estate Step Description 2. Directory Optional: Can set up working directory” In R, select File Change dir… then select where you want to put R files © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 13

Example: Causal Analysis Forecast for Real Estate Step Description 3. Filename “C: \Users\user\Desktop\Real. Data.

Example: Causal Analysis Forecast for Real Estate Step Description 3. Filename “C: \Users\user\Desktop\Real. Data. csv” Windows: Right-click on file to get file properties; will show full filename under “Location” Mac: Check Finder to find full filename OR: Drag file into R © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 14

Example: Causal Analysis Forecast for Real Estate Step Description 4. Read Datafile <- read.

Example: Causal Analysis Forecast for Real Estate Step Description 4. Read Datafile <- read. csv(“C: \Users\user\Desktop\Real. Data. csv”, header=T) Alternative: Set up working directory © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 15

Example: Causal Analysis Forecast for Real Estate Step Description 5. Check Data print (Datafile)

Example: Causal Analysis Forecast for Real Estate Step Description 5. Check Data print (Datafile) ; check if dataset looks OK For large datasets, ask R to provide summary data instead of printing out entire dataset © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 16

Example: Causal Analysis Forecast for Real Estate Step Description 6. Run Regression lm(Dependent~Independent+Independent, Dataset)

Example: Causal Analysis Forecast for Real Estate Step Description 6. Run Regression lm(Dependent~Independent+Independent, Dataset) Dependent variable: Price; Independent variable: House; Lot Equation: Price = c 1 + c 2*(House Size) + c 3*(Lot Size) Real. Regression <- lm(Price ~ House + Lot, Datafile) Find tilde symbol “ ~ “ at upper left of keyboard, to left of number “ 1” © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 17

Example: Causal Analysis Forecast for Real Estate Topic Description 7. Interpret Results Compare results

Example: Causal Analysis Forecast for Real Estate Topic Description 7. Interpret Results Compare results from R with those from Excel Method Coefficient House Size Excel -0. 554 +0. 646 R -0. 55415 +0. 64680 Lot Size +0. 02763 R results agree well with those of Excel © Stephan Sorger 2015: www. stephansorger. com; Marketing Analytics; R Basics: 18