The profile of the management data scientist Potential

Organisational + personal context • Nesta: The UK’s innovation foundation. , with a mission

1. Definitions More online activity, digital processes, better hardware. More varieties of data Generated

New opportunities for researchers • Coverage: Large samples • Revelation: Make the invisible, reveal

3. MOR examples I looked at abstracts of 103 papers in last three issues

Access data Obtain proprietary data Manage anonymity and ethical issues (including experimental research cf.

Institutional solutions • People with technical skills and domain knowledge are rare -> Unicorns.

THANK YOU Juan. mateos-garcia@nesta. org. uk @JMateos. Garcia 10

Slides: 10

Download presentation

The profile of the management (data) scientist: Potential scenarios and skills for B/SMD-based Management research Juan Mateos-Garcia, Nesta P&R NEMODE PDW BAM Conference 9 -11 September, 2014

Organisational + personal context • Nesta: The UK’s innovation foundation. , with a mission to help people and organisations bring great ideas to life. • Doing research on data skills for BIS data capability strategy in partnership with RSS and Creative Skillset • Doing some ‘big’ data work myself • I used to do management research (CENTRIM). Draw on all this to reflect on the implications of big data for management research, focusing on skills. 2

1. Definitions More online activity, digital processes, better hardware. More varieties of data Generated Larger volumes of data New applications at faster velocities Data-driven (automated, personalised) products, processes and services. New formats for data communication 3

More complexity 4

New opportunities for researchers • Coverage: Large samples • Revelation: Make the invisible, reveal preferences, run experiments. • Granularity: High level of resolution (temporal + dimensional). • Cheap! £££ 5

3. MOR examples I looked at abstracts of 103 papers in last three issues of [1] AOMJ, [2] BJM, [3] Management Science. No ‘big data’ papers in [1] and [2]. 11 in MS (8 in a ‘Business Analytics’ special issue) Data source Topic Use RCTs to study social influence. Large samples and high levels of Aral + Facebook Walker (Proprietary) granularity allows them to consider how social influence interacts with tie embeddedness and tie strength. Bao + Datta SEC (Open) Use unsupervised learning to identify and quantify risk types in ~14, 000 annual reports, benchmark them against other methods for classification, and develop an interactive platform to explore the findings. Goshe + Han App Store + Google Play (open) Scrape App Store and Google Play data to create a sales panel they use to estimate consumer demand how it is affected by App features, including pricing model. Tambe Linked. In Quantify business big data capabilities and measure inter-company (Proprietary) recruitment networks to estimate inter-company skill investment spillover 6

Data Pipeline Technical skills required, or the profile of the management data scientist Access data Get data: Web scraping/API programming skills Run experiments: Experimental designs Manage and process the data: Database management Clean the data: ‘wrangling’ (and patience). Model data Initial visualisation: Exploratory data analysis Dimension reduction: Cluster analysis, PCA. Model selection, estimation, evaluation: Econometrics/statistics/machine learning Present findings Display findings visually + interactively: Data visualisation 7

Access data Obtain proprietary data Manage anonymity and ethical issues (including experimental research cf. Facebook infamous RCT). Model data Ask the right questions: “The best dimension reduction tool that there is. ” Be careful with biases: N = All? Rarely. It is important to understand the (administrative and organisational) processes that generated the data. Present findings Dealing with false positives bound to happen with large samples and multiple tests. Encouraging consilience through reproducibility and relating finding to wider bodies of knowledge Requires theory and domain knowledge Data Pipeline Challenges (not all technical) 8

Institutional solutions • People with technical skills and domain knowledge are rare -> Unicorns. • Supply push + Demand pull to increase MOR big data capabilities. • Internal dialogue within the discipline and with other disciplines (Computer Science, Information Systems) • Acknowledge big data limitations for looking at important issues (power, perceptions, structural change. ) 9

THANK YOU Juan. mateos-garcia@nesta. org. uk @JMateos. Garcia 10