Using Python to Retrieve and Visualize Data Part

  • Slides: 27
Download presentation
Using Python to Retrieve and Visualize Data (Part 1 of 2) Jeff Horsburgh Hydroinformatics

Using Python to Retrieve and Visualize Data (Part 1 of 2) Jeff Horsburgh Hydroinformatics Fall 2017 This work was funded by National Science Foundation Grants EPS 1135482 and EPS 1208732

Announcements • Assignment 4 due on Thursday • Final Project progress report due Thursday

Announcements • Assignment 4 due on Thursday • Final Project progress report due Thursday

Quick Review • What we have learned so far – Describing data with effective

Quick Review • What we have learned so far – Describing data with effective metadata – Datalogger programming and data collection – Data model design – organizing your data/metadata – Database implementation – creating a database and loading data – SQL – slicing and dicing your data – Introduction to Python programming Today: Creating reproducible, reusable, and sharable visualizations from data in your database.

Learning Objectives • Create reproducible data visualizations • Write and execute computer code to

Learning Objectives • Create reproducible data visualizations • Write and execute computer code to automate difficult and repetitive data related tasks • Manipulate data and transform it across file systems, flat files, databases, programming languages, etc.

Reproducible Results “A script is a record of the entire modeling or data analysis

Reproducible Results “A script is a record of the entire modeling or data analysis process. By simply running the script you reprocess the entire model or analysis. How about that for reproducible results? ” Bakker, M. (2014), Python Scripting: The Return to Programming. Groundwater. http: //doi. org/10. 1111/gwat. 12269

Reproducible Results “In order to maximize reproducibility, everything needed to re-create the output should

Reproducible Results “In order to maximize reproducibility, everything needed to re-create the output should be recorded automatically in a format that other programs can read. ” Wilson G, Aruliah DA, Brown CT, Chue Hong NP, Davis M, et al. (2014) Best Practices for Scientific Computing. PLo. S Biol 12(1): e 1001745. http: //doi. org/10. 1371/journal. pbio. 1001745

Python Packages • A collection of Python modules packaged together that provide useful, reusable

Python Packages • A collection of Python modules packaged together that provide useful, reusable functionality by importing into a script • Packages vs. modules: http: //stackoverflow. com/questions/7948494 /whats-the-difference-between-a-pythonmodule-and-a-python-package

Py. Charm Package Manager • Lists all packages installed in current environment • Enables

Py. Charm Package Manager • Lists all packages installed in current environment • Enables you to install new packages and update existing

Thoughts on Coding Style • Having ‘readable code’ is just as important as having

Thoughts on Coding Style • Having ‘readable code’ is just as important as having working code. Remember the code needs to be read by humans as well as computers. • When formatting and commenting your code, do your best to follow a style guide (Py. Charm will help you with this): – The ‘official’ guide: Guido van Rossum’s Python style guide: http: //legacy. python. org/dev/peps/pep-0008/ – Google’s Python style guide: https: //google. github. io/styleguide/pyguide. html

Demo: How to Create a Times Series Plot using Python • We will use

Demo: How to Create a Times Series Plot using Python • We will use the following Python packages: – For plotting: matplotlib http: //matplotlib. org – For connecting to your My. SQL database: Py. My. SQL https: //github. com/Py. My. SQL • I encourage you to look through the matplotlib documentation, especially the gallery of example plots with code: http: //matplotlib. org/gallery. html

Demo Steps 1. Connect to database 2. Determine the SQL statement needed to extract

Demo Steps 1. Connect to database 2. Determine the SQL statement needed to extract a time series for the ODM database 3. Execute the needed SQL statement on the database 4. Restructure the data so that it can be plotted 5. Plot the data 6. Set properties of the plot (axis labels, title, etc. )

Connect to database • Use the Py. My. SQL package to establish a connection

Connect to database • Use the Py. My. SQL package to establish a connection to your database • This uses the connection object in Py. My. SQL import pymysql # Create a connection to database conn = pymysql. connect(host='localhost', port=3306, user='root', passwd='horsburgh', db='loganriverodm') • Note that I am using the larger version of the database, which I named loganriverodm. Your database name may differ. • Also note my super-secret root password!!

Determine the SQL statement needed to extract a time series for the ODM database

Determine the SQL statement needed to extract a time series for the ODM database • A time series is a collection of observations made over time at one site and for one variable. We also need to include in the WHERE clause a statement to only include Quality. Control. Level. ID = 1 (quality controlled data) AND filter out No. Data values. # Construct the query to get the data sql_statement = 'SELECT Local. Date. Time, Data. Value ' 'FROM Data. Values ' 'WHERE Site. ID = 1 AND Variable. ID = 57 ' 'AND Quality. Control. Level. ID = 1 ' 'AND Data. Value <> -9999 ' 'ORDER BY Local. Date. Time'

Execute the needed SQL statement on the database • Create a cursor object from

Execute the needed SQL statement on the database • Create a cursor object from the connection object you created earlier # Create the cursor object from the database connection cursor = conn. cursor() • Use the cursor object’s execute method to execute the SQL statement # Execute the SQL query cursor. execute(sql_statement) • Use the cursor object’s fetchall method to get all resulting rows at once # Get the resulting rows = cursor. fetchall()

Restructure the data for plotting • Use Python’s built-in zip function to transform the

Restructure the data for plotting • Use Python’s built-in zip function to transform the result from the query into two lists, one storing the values from the Local. Date. Time field and the second storing the values from the Data. Value field # Use the zip function to transform the # query result into two lists # one storing the data values, and one # storing their date/time values local. Date. Times, data. Values = zip(*rows) • An explanation of the * before rows is here.

Getting ready to plot • Set the font size and style before you plot

Getting ready to plot • Set the font size and style before you plot • NOTE: this has to come before the creation of the plot # Set the default font size for the plot font = {'family': 'normal', 'weight': 'normal', 'size': 12} rc('font', **font)

Create a plot • Import matplotlib’s pyplot (MATLAB-like) plotting framework import matplotlib. pyplot as

Create a plot • Import matplotlib’s pyplot (MATLAB-like) plotting framework import matplotlib. pyplot as plt • Create a plot of the local. Date. Times and data. Values lists. Have a solid grey line with no markers # Create a plot of the Local. Date. Time and # Data. Value lists. Make it a solid grey # line with no markers. plt. plot(local. Date. Times, data. Values, color='grey', linestyle='solid', markersize=0) • Show the plot plt. show()

If you execute what we have so far, you should see the following Good

If you execute what we have so far, you should see the following Good start, but we need to clean some things up (axis labels and axis tick labels, for example)

Set properties of the plot (axis labels, grid, etc. ) • Set the ylabel

Set properties of the plot (axis labels, grid, etc. ) • Set the ylabel and xlabel properties • Do this before plt. show() # Get the current axis of the plot and # set the x and y-axis labels ax = plt. gca() ax. set_ylabel('Temperature ($^circ$C)') ax. set_xlabel('Date/Time') ax. grid(True)

Add a title • Set the title # Set the title ax. set_title('Water temperature

Add a title • Set the title # Set the title ax. set_title('Water temperature in the Logan ' 'River n at the Utah Water Research ' 'Laboratory')

Print figure to image file • You can save the figure directly to an

Print figure to image file • You can save the figure directly to an image file for inclusion in a presentation, report, paper, etc. • Again – do this before plt. show() # Save the plot to a file plt. savefig('Example_Plot. png')

Final Product

Final Product

Challenge Question 1 Can you modify the code so that a user can set

Challenge Question 1 Can you modify the code so that a user can set site. ID and variable. ID as variables at the beginning of the script and the script produces a plot for the provided site. ID and variable. ID? Share your code: https: //codeshare. io/G 6 m 6 yo

Challenge Question 2 Can you modify the code so that the user can set

Challenge Question 2 Can you modify the code so that the user can set a Start. Date. Time and End. Date. Time at the beginning of the script, and the script produces a plot for just that time range? Share your code: https: //codeshare. io/G 6 m 6 yo

Summary • You learned how to create a time series plot that reads data

Summary • You learned how to create a time series plot that reads data directly from an ODM My. SQL database • If you completed the challenge questions, you were able to generalize the script to work for any Site. ID, Variable. ID, Start. Date. Time, and End. Date. Time • Next class we will talk about – Creating a figure with multiple subplots – Organizing your code into functions and/or classes – Using the Pandas library for data analysis in Python

Resources for Learning Python • Google’s Python Class – https: //developers. google. com/edu/python/ •

Resources for Learning Python • Google’s Python Class – https: //developers. google. com/edu/python/ • MIT’s Python Course – http: //ocw. mit. edu/courses/electrical-engineering -and-computer-science/6 -00 -introduction-tocomputer-science-and-programming-fall-2008/ • Many other books, sites, course, etc. online

Credits • Many of these slides and ideas are from Jon Goodall at the

Credits • Many of these slides and ideas are from Jon Goodall at the University of Virginia