Using Python to Retrieve Data from the CUAHSI



















- Slides: 19
Using Python to Retrieve Data from the CUAHSI HIS Web Services – Part 1 Jeffery S. Horsburgh Hydroinformatics Fall 2016 Slides adapted from original version by Jon Goodall University of Virginia, Hydroinformatics, Fall 2014 This work was funded by National Science Foundation Grants EPS 1135482 and EPS 1208732
Objectives • Discover and access data from major hydrologic data sources • Create reproducible data visualizations • Write and execute computer code to automate difficult and repetitive data related tasks • Manipulate data and transform it across file systems, flat files, databases, programming languages, etc. • Retrieve and use data from Web services
Class Plan • Introduction (setting this material within context) • Set up – Software requirements • In class demos – Example 1: Getting the site name for a USGS NWIS station using the CUAHSI HIS Web Services – Example 2: Getting the minimum streamflow over the past 5 days for a USGS NWIS station using the CUAHSI HIS Web Services – Example 3: Making a plot of the streamflow data for the past 5 days • Challenge problems • Wrap up
Big Picture Context 1. 2. 3. 4. 5. 6. 7. Data life cycle Data modeling Database design Database implementation and ODM SQL Querying of an ODM database Python programming against an ODM database Sharing data from an ODM using CUAHSI HIS Web services (Water. One. Flow, Water. ML, and Hydro. Server 8. Accessing CUAHSI HIS Web services using Hydro. Desktop and the CUAHSI Web Client (http: //data. cuahsi. org) 9. This week: Accessing CUAHSI HIS Web services using Python
Set up …
Required Packages • “suds” – a package for making requests to SOAP web services (https: //fedorahosted. org/suds/) • “pandas” – A data analysis library with high performance data structures (http: //pandas. pydata. org/) • “matplotlib” – A package for scientific plotting (http: //matplotlib. org/)
What is the “suds” package? • “Suds is a lightweight SOAP Python client for consuming Web Services. ” https: //fedorahosted. org/suds/ • SOAP and WSDL are standards for creating web services. You don’t need to know the details behind these standards, but if you are interested, Wikipedia has a good summary of both: – SOAP: http: //en. wikipedia. org/wiki/SOAP – WSDL: http: //en. wikipedia. org/wiki/Web_Services_De scription_Language
What is the “pandas” package? • pandas is an open source library providing high performance data structures • Some pandas data structures you may be interested in: – Series – a one-dimensional, labeled array capable of holding any data type – axis labels are collectively referred to as the “index” – Data. Frame – a 2 -dimensional data structure with columns of potentially different types (essentially a high performance table object)
In class examples …
Example 1: Get the site name for a USGS NWIS gage station using the CUAHSI HIS Web Services • Use the Get. Site. Info. Object method on the CUAHSI HIS Water. One. Flow USGS Unit Values web service: – http: //hydroportal. cuahsi. org/nwisuv/cuahsi_1_1. asmx • Use the suds Client object to call the web service method – We will use site. Code = “USGSUV: 10109000” – Suds will automatically parse the Water. ML response from the web service call. – We will need to find the site. Name property in the response and print it to the console. • The answer for “NWISUV: 10109000” is: LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT
Example 2: Getting the minimum streamflow over the past five days for a USGS NWIS station using the CUAHSI HIS Web Services • Use the Get. Values. Object method on the CUAHSI HIS Water. One. Flow web services – http: //hydroportal. cuahsi. org/nwisuv/cuahsi_1_1. asmx • • Like example 1, use the suds Client object to call the web service method We will use site. Code = “USGSUV: 10109000 and Parameter. Code = USGSUV: 00060. Date. Times should be in the format “YYYY-MM-DD”. – Example Get. Values. Object web service call in a browser that returns a Water. ML file: http: //hydroportal. cuahsi. org/nwisuv/cuahsi_1_1. asmx/Get. Values. Object? location=NWISUV: 101090 00&variable=NWISUV: 00060&start. Date=2015 -10 -20&end. Date=2015 -10 -31&auth. Token= • • We will extract the values and date. Times and create a pandas Series object to store the time series We will use the min() and idxmin() methods on the Series object to get the minimum streamflow and datetime when the minimum streamflow occurred. – http: //pandas. pydata. org/pandas-docs/stable/generated/pandas. Series. min. html • Note: I tested this with Pandas version 0. 19. 0. If you get an error, check your version of Pandas in the Py. Charm Package Manager and upgrade the version of Pandas if needed.
Example 3: Create a Time Series Plot of Streamflow Values for the Past 5 Days • Use the Get. Values. Object method on the CUAHSI HIS Water. One. Flow web services – http: //hydroportal. cuahsi. org/nwisuv/cuahsi_1_1. asmx • Like examples 1 and 2, use the suds client object to call the web service method • • • We will use Site. Code = USGSUV: 10109000 and Parameter. Code = USGSUV: 00060. Date. Times should be in the format YYYY-MM-DD. We will extract the values and date. Times and create a Pandas Series object to store the time series. We will create a “figure” object within which we can create our time series plot – http: //matplotlib. org/api/figure_api. html • We will use the plot() method on the pandas Series object to plot the time series. – http: //pandas. pydata. org/pandas-docs/stable/generated/pandas. Series. plot. html • Note: I tested this with Pandas version 0. 19. 0. If you get an error, check your version of Pandas in the Py. Charm Package Manager and upgrade the version of Pandas if needed.
Using these principles for other Water. One. Flow Web services • CUAHSI HIS Central lists other available web services that can be accessed in a similar way – http: //hiscentral. cuahsi. org/ – There are over 400 billion observations available through HIS Central! • The Hydro. Servers you created last week with Dr. Ames can also be accessed using this same approach
Challenge problems …
Coding Challenges • For the same station (NWISUV: 10109000), modify the example 2 script so that it prints the daily min, max, and average streamflow for the past 5 days. • (time permitting) Modify your script so that it prints out the min, max, and average streamflow for EACH DAY during the past 5 days this is part of what you will need to do for Assignment 8.
Wrap up …
Pros and Cons of using Web Services vs. a local ODM database • Pros: – Access to the entire database on which the service is based without the need to store the data locally – No need to keep local in sync with USGS version • Cons: – Requires Internet connection – Speed: Getting data via web services will almost certainly be slower than getting data from your own database – Data is outside your control: breaking changes, unavailable services, etc. – (Some of these could be improved with a data caching strategy)
Summary • You can use Python to automate the retrieval of hydrologic data via web services • The “suds” package enables you to retrieve data as Python objects • “pandas” has some nice data structures that make analysis and visualization easier • “matplotlib” allows you to make nice plots
Thursday’s Class • Introduce Assignment 8 • Work on the assignment in class