Introduction to Weather Data Cleaning Weather data cleaning

  • Slides: 8
Download presentation
Introduction to Weather Data Cleaning Weather data cleaning is fundamental to the provision of

Introduction to Weather Data Cleaning Weather data cleaning is fundamental to the provision of high quality weather data. Speedwell Weather offer extensive packs of cleaned historical weather data for sites around the world as well as cleaned weather data feeds. We believe we offer a high quality product. This document shows how we approach the task.

Data Cleaning Problem: Weather data is not perfect Ø Ø Missing values Erroneous observations

Data Cleaning Problem: Weather data is not perfect Ø Ø Missing values Erroneous observations Consistency problems Multiple data sets claiming to be for the same weather station Never purchase data from anyone unless they can provide satisfactory answers to these questions: - What is the original source of the data? - What is the observation convention? - What are the attributes (lat, lon, elevation)? - What has been done to the data? Solution (1): Ignore the problem Ø Erroneous observations will lead to inaccurate pricing Solution (2): Use only “good” stations Ø Difficult to determine what is good without cleaning it Ø Greatly limits your ability to trade Solution (3): Clean / fill the data Ø Fill missing values Ø Detect and replace erroneous observations Ø Confirm the consistency of the data

Data Cleaning • The quality of Meteorological observations varies significantly • Missing / erroneous

Data Cleaning • The quality of Meteorological observations varies significantly • Missing / erroneous observation are common place • To safeguard against data problems use cleaned data where available Fundamentals of a proper data cleaning (1) Organization (2) Redundancy (3) Flexibility (4) Human interaction (5) Transparency Fundamental to satisfying the above is the implementation of software systems infrastructure. . . but data cleaning cannot and SHOULD not be FULLY automated (see 4) Part of the Speedwell Data cleaning process diagram

Data Cleaning. . Organization Fundamentals of a proper data cleaning (1) Organization - logical

Data Cleaning. . Organization Fundamentals of a proper data cleaning (1) Organization - logical flow - data management Data preparation Initial Review (2) Redundancy In-depth analysis / data filling (3) Flexibility (4) Human interaction (5) Transparency Manual Review Data delivery Speedwell data quality types

Data Cleaning. . Redundancy Data sources bring in as much as possible and keep

Data Cleaning. . Redundancy Data sources bring in as much as possible and keep what is useful. Typical processing includes: Fundamentals of a proper data cleaning (1) Organization Climate data (daily / hourly), Synoptic data, METAR, ECMWF forecast data, climatology If one source fails there are others (2) Redundancy - data sources - testing - estimates - delivery Estimates (filling) Testing no one test is applicable for all situations. Why have one when you can have many? - comparison against itself Useful for more indepth manual analysis - physical consistency - statistical probability -comparison against neighbors - Observations are compared against the median of (3) Flexibility (4) Human interaction a basket of proxies and the MAD (median absolute deviation). If the observation is statistically different from the surrounding stations it is sent to the filling process (5) Transparency Example of weather variables stored for a single site Data delivery A fundamental pre-requisite for effective data cleaning is access to a library of weather data providing access to near by sites allowing plausibility testing for the site being cleaned - Multiple FTP deliveries Speedwell Weather maintains a very large inventory of weather data for over 50 different weather elements. This is all warehoused by us in a manner that fully respects differing data types (Synoptic/Climate, Cleaned/Raw etc) with a full audit trail. This allows us to document data point changes which may occur when national met offices change data records to reflect their internal QC procedures. type - 24 -hour support - logging of all deliveries -Description of data quality and

Data Cleaning. . Flexibility Fundamentals of a proper data cleaning (1) Organization (2) Redundancy

Data Cleaning. . Flexibility Fundamentals of a proper data cleaning (1) Organization (2) Redundancy Estimate #1 surrounding station regression using deseasonalized data (3) Flexibility - consider the situation - appropriateness of tests (4) Human interaction Estimate #3 (5) Transparency Estimates of daily observations by manipulating other data types (Synoptic, METAR, ½ hourly) Estimate #2 Estimates of daily observations from hourly observations (curve fitting) Estimate #6, #7, #8, … Estimate #4 Estimate #5 Day +1 forecasts can actually be very good… Climatology – worst case scenario Flexibility allows you to add any appropriate estimates. The possibilities are unlimited. - satellite derived values - installed stations - reanalysis

Data Cleaning. . the Human element and Transparency Fundamentals of a proper data cleaning

Data Cleaning. . the Human element and Transparency Fundamentals of a proper data cleaning (1) Organization (2) Redundancy (3) Flexibility (4) Human interaction - meteorology is complicated - introduction of non-automated information (5) Transparency - explanation of the process - share what has been cleaned - no-one likes “black boxes”

Contact Us Regarding world-wide weather data and forecast matters please see www. speedwellweather. com

Contact Us Regarding world-wide weather data and forecast matters please see www. speedwellweather. com or contact: Phil Hayes phil. hayes@Speedwell. Weather. com David Whitehead (U. S) david. whitehead@Speedwell. Weather. com Telephone: UK office: +44 (0) 1582 465 551 US office: +1 (0) 703 535 8801 Address UK: Mardall House, Vaughan Rd, Harpenden, Herts, AL 5 4 HU Address USA: 101 N Columbus Street, Second Floor, Alexandria VA 22314 USA Regarding software and consultancy services please see www. speedwellweather. com or contact: Stephen Doherty stephen. doherty@Speedwell. Weather. com Dr Michael Moreno michael. moreno@Speedwell. Weather. com David Whitehead (U. S) david. whitehead@Speedwell. Weather. com Telephone: UK office: +44 (0) 1582 465 569 US office: +1 (0) 703 535 8800 Speedwell Weather Derivatives Limited is authorised and regulated by the Financial Services Authority. Registered Offices Mardall House, 9 -11 Vaughan Road, Harpenden, Herts AL 5 4 HU, UK. Company No 3790989.