Secrets of Unidata Software Engineers Russ Rew UCAR

  • Slides: 39
Download presentation
Secrets of Unidata Software Engineers Russ Rew UCAR Software Engineering Assembly April 26, 2006

Secrets of Unidata Software Engineers Russ Rew UCAR Software Engineering Assembly April 26, 2006

Unidata in a Nutshell Mission: To provide data, tools and community leadership for enhanced

Unidata in a Nutshell Mission: To provide data, tools and community leadership for enhanced Earth-system education and research The Unidata Program Center: Facilitates (real-time) data access Provides and supports data access, analysis, and visualization tools and services Builds and advocates for a community of geoscience educators and researchers UPC size: 12 developers, 12 other staff

Unidata Developers • Tom • Baltzer John Caron (. 75) • Jeff Mc. Whirter

Unidata Developers • Tom • Baltzer John Caron (. 75) • Jeff Mc. Whirter Don Murray (. 75) • Steve Chiswell • • Ethan Davis • Jen Oxelson • Steve Emmerson • Russ Rew (. 25) • Ed Hartnett (. 25) • Anne Wilson • Yuan Ho • Tom Yoksas (. 75) • Robb Kambic

Overview: The Mystery Premise: Unidata has been very successful in its software development Premise:

Overview: The Mystery Premise: Unidata has been very successful in its software development Premise: Unidata’s software engineering process appears haphazard and chaotic Mystery: Why is Unidata’s software successful and popular when it makes little use of recognized development methodologies? Speculations, theories, and revelations

Some Software Successes Integrated Data Viewer (IDV) Local Data Manager (LDM) net. CDF, net.

Some Software Successes Integrated Data Viewer (IDV) Local Data Manager (LDM) net. CDF, net. CDF Java (nj 22) THREDDS and THREDDS Data Server (TDS) Units library (udunits)

Unidata’s newest scientific analysis and visualization tool Freely available 100% Java framework and reference

Unidata’s newest scientific analysis and visualization tool Freely available 100% Java framework and reference application Provides 2 - and 3 D displays of geoscience data Stand-alone or networked application Integrates data from disparate sources End-to-end test for Unidata IDV

IDV’s Success In use at over 80 Unidata sites and use growing rapidly Selected

IDV’s Success In use at over 80 Unidata sites and use growing rapidly Selected as the visualization tool for the Operations Center in TREX Bill Hibbard, developer of Vis 5 D and Vis. AD, calls the IDV “far better than any other environmental visualization system”

LDM Peer-to-peer system for reliable, event-driven data distribution using LDM-6 software Supports subscriptions to

LDM Peer-to-peer system for reliable, event-driven data distribution using LDM-6 software Supports subscriptions to near real-time data feeds LDM protocols use persistent TCP connections, suitable for pushing a large number of small products, as well as large products Highly configurable: can inject, distribute, capture, filter, and process arbitrary data products

LDM’s Success Unidata’s Internet Data Distribution system: Near real-time data for 175 universities and

LDM’s Success Unidata’s Internet Data Distribution system: Near real-time data for 175 universities and research organizations 30 data feeds (radar, satellite, text bulletins, lightning, model forecasts, surface obs, upper air obs, . . . ), Also used by USGS, NASA, ESRL, weather services in Spain and Korea, active projects on 6 continents Data volume: 2. 5 GB/hr, 120000 products/hr; ranks fifth in weekly Internet 2 traffic (Iperf, HTTP, NNTP, SSH, LDM, . . .

More LDM Successes NOAA/NWS adopted for Level II radar distribution From 134 radars to

More LDM Successes NOAA/NWS adopted for Level II radar distribution From 134 radars to 125 weather forecast offices, 22 universities, 10 federal organizations, 12 commercial organizations Will be used in THORPEX Interactive Grand Global Ensemble (TIGGE) Model output collection from 10 global modeling centers Collected at 3 archive centers (NCAR, ECMWF, Beijing) Test from ECMWF to NCAR sustained 17 GB/hr Candidate to replace WMO’s Global Telecommunications System (GTS)

Net. CDF’s Niche Simple data model for scientific datasets Portable, self-describing data Supports direct

Net. CDF’s Niche Simple data model for scientific datasets Portable, self-describing data Supports direct access (unlike XML) Many language interfaces: C, Fortran, C++, Java, Python, Perl, Ruby, . . . Lots of applications Efficient subsetting of multidimensional arrays Supports appending, sharing, archiving data

Net. CDF-Java (nj 22) 100% Java library, more advanced than Cbased interfaces Prototype implementation

Net. CDF-Java (nj 22) 100% Java library, more advanced than Cbased interfaces Prototype implementation of Common Data Model for access to net. CDF-4, OPe. NDAP, HDF 5 Provides net. CDF interfaces to other formats: Grids (GRIB 1, GRIB 2), Radar (NEXRAD, NIDS, DORADE), Satellite (DMSP, GINI), Point Observations (BUFR (soon)) Provides uniform coordinate systems layer Access to THREDDS catalogs Implements access through Nc. ML

Common Data Model Applications Scientific Datatypes Point Trajectory Radial Grid Station Swath Coordinate Systems

Common Data Model Applications Scientific Datatypes Point Trajectory Radial Grid Station Swath Coordinate Systems Common Data Access Model THREDDS OPe. NDAP HDF 5 GRIB net. CDF . . .

Success of Basis for CF Conventions for climate and forecast data Used at LLNL/PCMDI

Success of Basis for CF Conventions for climate and forecast data Used at LLNL/PCMDI for archiving model output for the upcoming IPCC Fourth Assessment Report: 23 models, 30 TBytes, 70000 files Used in various archives maintained by NOAA, NASA, USGS, Do. E, NCAR, BADC, CSIRO, . . . C and Fortran net. CDF Users Guides have been translated into Japanese at Kyoto University Other uses in chromatography, mass spectrometry, neuroimaging, biomolecule trajectory simulations, . . . Used in 15 commercial packages and over 50 open source packages for analysis, visualization, and data management

THREDDS Originally funded under NSF Digital Libraries initiative “Discovery and use of scientific data”

THREDDS Originally funded under NSF Digital Libraries initiative “Discovery and use of scientific data” Middleware between data providers and users Dataset Inventory Catalogs (XML) Now part of Unidata Data Collections effort Data Serving (pull) THREDDS Data Server (TDS) most recent development A THREDDS catalog provides a hierarchical structure for factoring inherited metadata

TDS (THREDDS Data Server) Integrates data access with THREDDS catalogs and services Tomcat/Servlet, 100%

TDS (THREDDS Data Server) Integrates data access with THREDDS catalogs and services Tomcat/Servlet, 100% Java, single war file Data input is net. CDF Java 2. 2 library Data output: OPe. NDAP (for accessing subsets) HTTP Server (for bulk file transfer) OGC Web Coverage Server (currently gridded only, subsetting supported) Supports dynamic generation of catalogs

Success of THREDDS used in NCAR Community Data Portal, many other data archives TDS

Success of THREDDS used in NCAR Community Data Portal, many other data archives TDS in use for serving IDD data from motherode. ucar. edu, other data providers From “Lessons Learned: Evaluation Studies Related to Geoscience Data in THREDDS and DLESE”, Susan Lynds et al: • “Data providers agreed that THREDDS has made data access much easier than it used to be and enables them to reach new user communities. ”

udunits Library for manipulating units of physical qualities. Conversion of unit specifications between formatted

udunits Library for manipulating units of physical qualities. Conversion of unit specifications between formatted and binary forms Arithmetic manipulation of unit specifications Conversion of values between compatible scales of measurement C, Fortran, and Java interfaces Required by CF conventions

udunits Success Almost as widely used as net. CDF

udunits Success Almost as widely used as net. CDF

The Unidata Development Process Unidata’s software engineering process appears haphazard and chaotic. No uniform

The Unidata Development Process Unidata’s software engineering process appears haphazard and chaotic. No uniform software engineering process No regular code reviews Specifications for software often missing or vague No enforcement of coding standards No measurement of programmer productivity No effort underway to improve software engineering methodology

What Accounts for Unidata’s Successes? . . . and can other organizations benefit from

What Accounts for Unidata’s Successes? . . . and can other organizations benefit from the answers? Magic fairy dust? Advanced processes? Signing bonuses? Working conditions? Luck?

I’ll Offer Some Theories The identified factors are subjective Based on almost twenty years

I’ll Offer Some Theories The identified factors are subjective Based on almost twenty years involvement in Unidata Discussion question: are any of these easily transferrable? Discussion question: would we have had even better software success with application of disciplined development methodologies?

Involve Developers in Software Support Superior support for users of legacy applications: GEMPAK Mc.

Involve Developers in Software Support Superior support for users of legacy applications: GEMPAK Mc. IDAS Support for software developed elsewhere: OPe. NDAP Vis. AD Every developer expected to answer user questions

GEMPAK Application for analysis and visualization In use at over 200 sites, use still

GEMPAK Application for analysis and visualization In use at over 200 sites, use still growing Developer specialized expert in package, not process: maintaining, upgrading, testing, distributing, supporting, teaching user workshop, supporting user community, supporting data types

Mc. IDAS In use at approximately 100 sites, a growing number outside the U.

Mc. IDAS In use at approximately 100 sites, a growing number outside the U. S. Developer specialized expert in package, not process: maintaining, upgrading, testing, distributing, supporting, teaching user workshop, supporting user community, supporting new data types

Unidata User Support Over 30 responses to user questions/day Searchable support archives help Support

Unidata User Support Over 30 responses to user questions/day Searchable support archives help Support for legacy apps still significant Balance between visualization apps, data middleware Keeps developers close to users

Leverage User Efforts • Net. CDF users have contributed language interfaces, applications, good ideas,

Leverage User Efforts • Net. CDF users have contributed language interfaces, applications, good ideas, and bug reports: www. unidata. ucar. edu/software/netcdf/credits. html Bob Albrecht, Ethan Alpert, Chris Anderson, Ayal Anis, Harald Anlauf, Phil Austin, Eric Bachalo, Jason Bacon, Sandy Ballard, Matthew Banta, Mike Berkley, Sherman Beus, Lorenzo Bigagli, Mark Borges, Nicola Botta, Dr. Kenneth P. Bowman, Bill Boyd, Mark Bradford, Bernward Bretthauer, Dr. Paul A. Bristow, Roy Britten, Glenn Carver, Tom Cavin, Morrell Chance, Susan C. Cherniss, Jason E. Christy, Gerardo Cisneros, Alain Coat, Carlie J. Coats, Jr. , Jon Corbet, Alexandru Corlan, Jim Cowie, Arlindo da Silva, Rick Danielson, Alan Dawes, Donald W. Denbo, Charles R. Denham, Arnaud Desitter, Steve Diggs, Michael Dixon, Alastair Doherty, Bob Drach, Patrice Dumas, Frank Dzaak, Brian Eaton, Harry Edmon, Lee Elson, Ata Etemadi, Constantinos Evangelinos, John Evans, Joe Fahle, Gabor Fichtinger, Glenn Flierl, Connor J. Flynn, Anne Fouilloux, Jean-Francois Foccroulle, Mike Folk, David Forrest, David W. Forslund, Ben Foster, Masaki Fukuda, Dave Fulker, James Gallagher, Bear Giles, Tom Glaess, Peter Gleckler, André Gosselin, Gary Granger, Jonathan Gregory, Patrick Guio, Mark Hadfield, Magnus Hagdorn, Paul Hamer, Steve Hankin, Bill Hart, Kate Hedstrom, Charles Hemphill, Olaf Heudecker, Donn Hines, Konrad Hinsen, Leigh Holcombe, Tim Holt, Toshinobu Hondo, Takeshi Horinouchi, Chris Houck, Matt Huddleston, Matt Hughes, Doug Hunt, Alan Imerito, Jouk Jansen, Harry Jenter, Susan Jesuroga, Patrick Jöckel, Tomas Johannesson, Peter Gylling Jørgensen, Narita Kazumi, John Kemp, Jeff Kuehn, V. Lakshmanan, Bruce Langdon, Stephen Leak, Tom Le. Febvre, Angel Li, Jianwei Li, Rick Light, Brian Lincoln, Keith Lindsay, Fei Liu, Jeffery W. Long, Dave Lucas, Valerio Luccio, Lifeng Luo, Steve Luzmoor, Lawrence Lyjak, Rich Lysakowski, Sergey Malyshev, Len Makin, Jim Mansbridge, Andreas Manschke, Chris Marquardt, Marinna Martini, William C. Mattison, Craig Mattocks, Mike Mc. Carrick, Bill Mc. Kie, Ron Melton, Roy Mendelssohn, Pavel Michna, Barb Mihalas, Henry Le. Roy Miller Jr. , Philip Miller, Rakesh Mithal, Masahiro Miiyaki, Christine C. Molling, Skip Montanaro, Thomas L. Moore, Stefano Nativi, Gottfried Necker, Peter Neelin, Michael Nolta, Bill Noon, Enda O'Brien, Dave Osburn, Dan Packman, Simon Paech, Gabor Papp, Morten Pedersen, Dr. Louise Perkins, Michael D Perryman, Hartmut Peters, Ron Pfaff, David Pierce, Alexander Pletzer, Philippe Poilbarbe, Dierk Polzin, Jacob Weismann Poulsen, Ken Prada, Dave Raymond, Michael Redetzky, Rene Redler, Mark Reyes, Doug Reynolds, Mike Rilee, Mark Rivers, Randolph Roesler, Mike Romberg, Mathis Rosenhauer, Suzanne T. Rupert, Toshihiro Sakakima, Eric Salathe, Matthew H. Savoie, Marie Schall, Larry A. Schoof, Dan Schmitt, Robert B. Schmunk, Rich Schramm, William J. Schroeder, Uwe Schulzweida, Keith Searight, Guntram Seiss, Remko Scharroo, John Sheldon, Masato Shiotani, Michael Shopsin, Richard P. Signell, Steve Simpson, Joe Sirott, Greg Sjaardema, Dirk Slawinski, Cathy Smith, Neil R. Smith, Peter Paul Smolka, Nancy Soreide, Hudson Souza, Gunter Spranz, Richard Stallman, Bob Swanson, John Tanski, Karl Taylor, Jason Thaxter, Kevin W. Thomas, Philippe Tulkens, Tom Umeda, Joe Van. Andel, Paul van Delst, Gerald van der Grijn, Richard van Hees, János Végh, Bernhard Wagner, Thomas Wainwright, Stephen Walker, Chris Webster, Paul Wessel, Carsten Wieczorrek, Gerry Wiener, Ralf Wildenhues, David Wilensky, Hartmut Wilhelms, Gareth Williams, David Wojtowicz, Jeff Wong, Randy Zagar, Charlie Zender, Remik Ziemlinski.

Strive for Discipline. Independence Demand is greater than supply for useful data-oriented infrastructure for

Strive for Discipline. Independence Demand is greater than supply for useful data-oriented infrastructure for science Examples: net. CDF LDM THREDDS udunits Common Data Model. . .

Emphasize Loose Coupling Data providers and data consumers should be uncoupled Data storage should

Emphasize Loose Coupling Data providers and data consumers should be uncoupled Data storage should be uncoupled from visualization and analysis applications Data distribution should be independent of type of data. . .

Find Right Level for Abstractions Data Scientific Data Georeferenced Data Meteorological Data Radar Data

Find Right Level for Abstractions Data Scientific Data Georeferenced Data Meteorological Data Radar Data

Improve Software Quality by Porting Platform-independence is important Achieving it seems to improve quality

Improve Software Quality by Porting Platform-independence is important Achieving it seems to improve quality of software in unexpected ways Aiming for reasonable tradeoffs between portability and performance requires expertise Solving portability problems for others (e. g. providing portable data, service-oriented architectures) is a growth industry Java developers may ignore this

Work on Small Projects Unidata projects and software packages typically require only one or

Work on Small Projects Unidata projects and software packages typically require only one or two developers Much of software engineering is about scaling to large projects with dozens of developers May be the #1 secret for success

Find and Exploit Tight Feedback Loops Develop for an active and interested user community

Find and Exploit Tight Feedback Loops Develop for an active and interested user community Find specific users with problems important to them that your software can solve Exploit short iterations for incremental development Governance: establish and pay attention to an external Users Committee that meets regularly

Use the Software You Develop “Eat your own dogfood” The Unidata Integrated Data Viewer

Use the Software You Develop “Eat your own dogfood” The Unidata Integrated Data Viewer uses net. CDF Java, THREDDS, Nc. ML, net. CDF decoders, Vis. AD, OPe. NDAP, ADDE servers Provides end-to-end testing Prioritizes useful enhancements Leads to early bug identification by developers instead of users If taken too far, leads to NIH syndrome

Drive Development with Tests Test-driven development (TDD) and Unit Testing gives developers confidence to

Drive Development with Tests Test-driven development (TDD) and Unit Testing gives developers confidence to refactor code try big changes port to new platforms Example: net. CDF “make check” runs over 150, 000 tests

Value People over Process Important tenet of the “Manifesto for Agile Software Development”, http:

Value People over Process Important tenet of the “Manifesto for Agile Software Development”, http: //agilemanifesto. org/, to value: Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan

Arrange Long Funding Cycles T. T. T. Put up in a place where it's

Arrange Long Funding Cycles T. T. T. Put up in a place where it's easy to see the cryptic admonishment T. T. T. When you feel how depressingly slowly you climb, it's well to remember that Things Take Time. --Piet Hein

Summary: The “Secrets” 1. Involve developers in support 2. Leverage users efforts 3. Strive

Summary: The “Secrets” 1. Involve developers in support 2. Leverage users efforts 3. Strive for discipline-independent infrastructure 4. Emphasize loose coupling 5. Choose the right level for abstractions 6. Improve quality by porting

More “Secrets” 7. Work on small projects 8. Find good feedback loops 9. Use

More “Secrets” 7. Work on small projects 8. Find good feedback loops 9. Use your own software 10. Drive development with tests 11. Value people over process 12. Arrange for long funding cycles