Building Data Science capability across Government Data Analytics
Building Data Science capability across Government – Data Analytics Apprenticeship 12 th July 2017 Alexis Fernquest and Gareth Jones
Data Analytics Apprenticeship • • • The first of it’s kind in the U. K. Over 130 applicants for 8 positions. 8 in the Data Science Campus from December 2016. 6 across Government departments in Wales 6 to start in September 2017 in the Data Science Campus. “Apprenticeships are essential to creating the workforce of the future” John Manzoni
The structure of the apprenticeship 01 EXTERNAL COURSES ON THE JOB TRAINING 02 TRAINING 03 PROVIDER ASSESSMENTS
External Training 40 days of external training, including; 1. Building professionalism and development 2. Programming in R and Python 3. Information governance and data management 4. Statistical analysis and data analysis 5. Data science methods 6. Advanced data representations and manipulation for IT
On the job training • • • R and Python application Regression modelling Git. Hub Data Ethics Overview of and work across ONS business areas • Project participation across ONS and in collaboration with other government departments
Sustainable Development Goal Project • Completed from January to March 2017 • Group One – Goal 7 - sustainable energy sources – energy demand impacts. • Group Two – Goal 3 - non communicable diseases – lung cancer & mortality rates.
Group one – Goal 7 Clean Energy Ø Forecast in energy using the ARIMA model. Ø Trend in energy reduction. Source: "G. B. National Grid Status" - gridwatch. templar. co. uk
Group one – Goal 7 Clean Energy Ø Smart meter installations increasing. Ø Energy demand decreasing. Sources: the Department for Business, Energy & Industrial Strategy and Grid. Watch
Group Two – Goal 3 Good Health and Wellbeing By 2030, reduce by one third premature mortality from non-communicable diseases through prevention and treatment and promote mental health and well-being. NUMBER OF DEATHS PER AGE AND GENDER The average women are about 3. 9% more likely to survive 3500 Women have a higher rate of survival at every stage 3000 The biggest variance in survival rates comes in stage 4, where women out survive men by 4. 7%. Number of Deaths 2500 2000 SURVIVAL RATE FOR EACH STAGE OF DIAGNOSIS PER GENDER 1500 1000 500 0 15 - 20 - 25 - 30 - 35 - 40 - 45 - 50 - 55 - 60 - 65 - 70 - 75 - 80 - 85+ 19 24 29 34 39 44 49 54 59 64 69 74 79 84 Age Bands by Gender Male Female Source: Cancer Research Website Female death steadily increase throughout the age bands Increases slow down from ages 60 onwards. There are more men dying in every age bracket Especially between the age of 60 and 80 where men are much higher Source: Cancer Research Website
Group Two – Goal 3 Good Health and Wellbeing Completed an augmented Dickey-Fuller Test to see if the data is stationary ACF suggested ARIMA(1, 0, 1)(2, 0, 0)[12] should be used Decomposition completed to find trend and seasonality Seasonality taken out of the data to complete forecast ARIMA forecast used to calculate next 24 months
Group Two – Goal 3 Good Health and Wellbeing
Personas in collaboration with the Cabinet Office WHAT IS THE PROJECT? Project undertaken by the Data Science Campus for the Policy Lab within The Cabinet Office AIM To see if personas can be developed from survey data. SOLUTION? Cluster analysis on the Labour Force Survey; forming clusters that can be translated into personas.
Personas in collaboration with the Cabinet Office • 14 variables used from the Labour Force Survey • R Package PCAmix used for the combination of categorical and numerical values. • Variable influence plotted across 5 dimensions • Kmeans algorithm used to form 4 clusters • Dimension influence to each cluster
Personas in collaboration with the Cabinet Office
Question Bank WHAT IS THE PROJECT? Capture all business survey questions in a machine readable format including survey, form type, question number, wording, response type and related guidance. AIM To have all the question types digitally represented in one place. SOLUTION? Capture business surveys into a format which can be used to perform text analytics.
Question Bank THE BENEFITS HARMONISATION RESOURCE Enable departments to Offer a resource to harmonise questions the wider ONS. within their surveys. NEXT STEPS ØTo import data into python from JSON files ØStart data cleaning on imports ØStart data manipulation and match duplicate questions ADMIN DATA INPUT TO I. T SYSTEM Enable the use I. T development input. Offer of admin data to input to DMP (Data supplement Management Platform) / survey. KMS (Knowledge Management System) / Authoring Tool. ØVisualise the results
Skills We Have Learnt
Other initiatives we are involved in Career talks and Data Science awareness days with local schools Blogs posts Local STEM ambassadors
Other initiatives we are involved in The Data Science Campus Launch Intruder testing on surveys Charitable endeavours
Learning pathways and other opportunities • There are many other pathways in the Learning Academy, if you would like more information, please contact: • Learning. Academy@ons. gov. uk
Any Questions? Do get in touch! datasciencecampus@ons. gov. uk
- Slides: 21