Authoritative Geospatial Data Validation via Machine Learning Algorithms

  • Slides: 18
Download presentation
Authoritative Geospatial Data Validation via Machine Learning Algorithms CW 3 Augustus Wright GEOG 596

Authoritative Geospatial Data Validation via Machine Learning Algorithms CW 3 Augustus Wright GEOG 596 B Penn State University Advisor: Dr. Gregory Thomas Image Source: (Extreme. Tech, 2016) As of 7 May 2017

Community Feedback Please specify your main reason for attending this presentation: Which aspect were

Community Feedback Please specify your main reason for attending this presentation: Which aspect were you mostly interested in? What is the most beneficial aspect of the project? What is the least beneficial aspect of the project? What impact do you think the project will have on Army Geospatial, if any? List any questions or concerns you have that were not answered of addressed: Email: augustus. wright. mil@mail. mil MGIS As of 7 May 2017

Presentation Objectives • Restate the problem • Research questions & approach • Share Results

Presentation Objectives • Restate the problem • Research questions & approach • Share Results • Analysis Methods • Video Demo • Impact | Meaning Image Source: (WACOM Digitizing, 2016) As of 7 May 2017

Background Current Workflow 1 Check out map sheet 4 Geospatial Analysis Integrity Tool (GAIT)

Background Current Workflow 1 Check out map sheet 4 Geospatial Analysis Integrity Tool (GAIT) 2 Heads-up digitize 5 Enterprise holdings 3 Edge match & topology CURRENT WORKFLOW Silver dataset Sheet Index 0 1 2 3 4 5 6 7 8 9 … … GAIT 4 5 9 5 5 9 1 Digitizer(s) Authoritative Gold dataset 3 Data Steward 2 As of 7 May 2017

Limitations • • • 5 9 Experience disparities Subjectivity Man hours Image Source: (GISCommons.

Limitations • • • 5 9 Experience disparities Subjectivity Man hours Image Source: (GISCommons. org, 2016) • Duplicated effort • Subjectivity • Compounded Man hours Silver dataset 5 9 1 Digitizer(s) 3 Data Steward 2 As of 7 May 2017

Augment the Workflow • Standardization • Duplicated effort • Less Subjectivity • Man hours

Augment the Workflow • Standardization • Duplicated effort • Less Subjectivity • Man hours • Compounded Man hours Example Neural Network (Pintado, 2016) Silver dataset 5 9 1 Digitizer(s Machine Learning 3 Data Steward 2 As of 7 May 2017

Research Questions Example Neural Network (Nielsen, 2016) • How to best implement ML in

Research Questions Example Neural Network (Nielsen, 2016) • How to best implement ML in a GIS extraction environment? • What ML algorithm(s) best support the desired result? • Are the results repeatable on an adjacent sheet? Scope Management: Tree Canopies, Open Water ONLY As of 7 May 2017

Anticipated Results Training results: • 75 -80% accurately classified new numpy array per feature

Anticipated Results Training results: • 75 -80% accurately classified new numpy array per feature • Potential Overfitting (mitigation via trial and error) • Pass binary numpy array back into esri as a new feature class • Convert numpy using arcpy. Num. Py. Array. To. Feature. Class • Keep 1 delete No Data 0 Deliverable: • Python Script (Tool) • Potentially ESRI C# Addin As of 7 May 2017

Results (a sheet by a Machine / a sheet by a human) Machine Tree

Results (a sheet by a Machine / a sheet by a human) Machine Tree Predictions Machine Water Predictions CORRECT ERROR CORRECT Accuracy: 97% Accuracy: 98% Training Samples: Test Samples: 750750 250250 KEY River Perennial Scattered Trees MGCP Trees. A, Watrcrs (ESRI, 2017) As of 7 May 2017

Approach (AE) Automated Extraction I M Q A V uery nalyze isualize eo s

Approach (AE) Automated Extraction I M Q A V uery nalyze isualize eo s ngest odel G g in e Re ot n Se m Ingest Sc Da ie ta nc e -P r Model Visualize AE Query oc e ss i ng Model Analyze Visualize er t pu ce m en o C Sci Acronym Source: (C. Kelly, 2016) MGIS As of 7 May 2017

Approach (AE) Automated Extraction Implement ML in a GIS extraction environment: Available ML Libraries

Approach (AE) Automated Extraction Implement ML in a GIS extraction environment: Available ML Libraries MEGA • Installed Tensorflow (Keras Theano) Python 3. 5. 2 64 bit • Installed Anaconda • Running ESRI thru Python IDLE 2. 7. 10 Headless w/o GUI Several Others. . . As of 7 May 2017

1 2 (ANN) Artificial Neural Network Approach • Not Enough Training Data • Not

1 2 (ANN) Artificial Neural Network Approach • Not Enough Training Data • Not Enough Storage (CNN) Convolutional Neural Network As of 7 May 2017

NIR Mean Segmentation Data Prep Red Mean Segmentation Green Mean Segmentation NDVI NDWI Perceived

NIR Mean Segmentation Data Prep Red Mean Segmentation Green Mean Segmentation NDVI NDWI Perceived Luminance DEM n. DSM Binary Water Binary Trees As of 7 May 2017

Tools Numpy Array: arcpy. Raster. To. Num. Py. Array As of 7 May 2017

Tools Numpy Array: arcpy. Raster. To. Num. Py. Array As of 7 May 2017

3 Sklearn Ensemble Approach • Success Image Source: (Nando de Freitas, 2013) As of

3 Sklearn Ensemble Approach • Success Image Source: (Nando de Freitas, 2013) As of 7 May 2017

Questions? Contact me at: augustus. wright. mil@mail. mil or Linked. In: https: //www. linkedin.

Questions? Contact me at: augustus. wright. mil@mail. mil or Linked. In: https: //www. linkedin. com/pub/augustus-wright/92/583/a 6 b MGIS As of 7 May 2017

Summary & Way Ahead • Automated Extraction @ +- 97% acc. • Final Report

Summary & Way Ahead • Automated Extraction @ +- 97% acc. • Final Report • Condense Code Other Applicable Classification Problems Image Source: (ENGR Wikii, 2015) Image Source: (Military. com, 2010) As of 7 May 2017

References Works Cited ACM DL. (2011). Proceedings of the 2 nd International Conference on

References Works Cited ACM DL. (2011). Proceedings of the 2 nd International Conference on Computing for Geospatial Research & Applications. Washington, DC: ACM DL. D. G. Brown †*, B. C. (2000). Modeling the relationships between land use and land cover on private lands in the Upper Midwest, USA. Journal of Environmental Management. Midwest, USA: Academic Press. Data Mining, Analytics, Big Data, and Data Science. (2016, November). Data Mining, Analytics, Big Data, and Data Science. USA. H 2 oai. (2016, November 14). H 2 O, Sparkling Water, and Steam Documentation. USA. Harris Visualization ENVI. (2016). ENVI Analytics Symposium Proceedings on MEGA. Boulder, CO: Harris Corporation. M. Kanevski, A. P. (2008). Machine Learning Algorithms for Geo. Spatial Data. Applications and Software Tools. Institute of Geomatics and Analysis of Risk (IGAR), Faculty of Geosciences and Environment, University of Lausanne, Switzerland: University of Lausanne. MIROSLAV KUBAT, R. C. (1998). Machine Learning for the Detection of Oil Spills in Satellite Radar Images. School of Information Technology and Engineering, University of Ottawa. Boston: Kluwer Academic Publishers, Boston. Nielsen, M. (2016, 01 01). Neural Networks and Deep Learning. Determination Press. USA. Nitesh V. Chawla, K. W. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. Tampa, FL: Journal of Artificial Intelligence Research. Ostermann, F. O. (2015). Hybrid geo-information processing: Crowdsourced supervision of geo-spatial machine learning tasks. University of Twente. AE Enschede, The Netherlands : University of Twente. Pintado, J. H. (2016, October 01). Errors Are Imminent. Computer Science, Programming, Maths and Big Data. Somewhere, Over the Rainbow, USA. Poulson, B. (2015, 1 1). Introduction to Data Science. State College, PA, USA. Programmer, L. (2016). Deep Learning Fundamentals in Python. Lazy. Programmer. PROVOST, F. (2001). Robust Classification for Imprecise Environments. The Netherlands: Kluwer Academic Publishers. USGS. (2014, JAN). Using Anaconda modules from the ESRI python environment (All Users). Rolla, MO, USA. As of 7 May 2017