Industrial Data Modeling with Data Modeler Mark Kotanchek

  • Slides: 15
Download presentation
Industrial Data Modeling with Data. Modeler Mark Kotanchek Evolved Analytics mark@evolved-analytics. com Wolfram Tech

Industrial Data Modeling with Data. Modeler Mark Kotanchek Evolved Analytics mark@evolved-analytics. com Wolfram Tech Conf 2006 Evolved Analytics LLC

Nonlinear Data Modeling: The Bottom Line The caveat here is that we are (mostly)

Nonlinear Data Modeling: The Bottom Line The caveat here is that we are (mostly) looking at response surface analysis and modeling of numerical data Wolfram Tech Conf 2006 • The world is nonlinear • People time is expensive • Computing is cheap • Life doesn’t have to be hard • Success has been demonstrated in the real world. Evolved Analytics LLC 2

Symbolic Regression • • • Algorithmic advances in recent years have resulted greater than

Symbolic Regression • • • Algorithmic advances in recent years have resulted greater than three-order-ofmagnitude speed improvement in symbolic regression via genetic programming relative to conventional GP This has been coupled with continuing improvements in compute hardware Furthermore, symbolic regression is naturally parallelizable Symbolic regression features most of the unique nonlinear capabilities The net result is that symbolic regression has moved into the forefront of nonlinear modeling technologies for us Symbolic Regression searches for both the expression structure as well as the associated coefficients which capture the data behavior Wolfram Tech Conf 2006 Evolved Analytics LLC 3

What Makes Data. Modeler Special? • Automatic variable selection & variable transform identification •

What Makes Data. Modeler Special? • Automatic variable selection & variable transform identification • Ability to handle illconditioned data sets • System insight • Problem insight • Robust & accurate models • Trust metric • Modeling lifecycle tools Wolfram Tech Conf 2006 Goal: To Dazzle & Delight • Dazzle – Extract value out of data – Robustness of models – Provide insight & understanding in the process • Delight Evolved Analytics LLC – Ease & efficiency of model development – Model lifecycle management 4

Package Case Studies • Distillation Column Quality Predictor • Process Optimization Emulator – Large

Package Case Studies • Distillation Column Quality Predictor • Process Optimization Emulator – Large data set (skinny array: 6929 x 23 variables) – Multiple data sets (test, train, validate) – Ensemble of models – Potential pathologies – Working against designed data (320/275 x 10 + 5 response – Goal is to replace a 24 hour optimization • Blown Film Process Effects – Interpreting research data – 20 x 9 inputs, 21 responses – Applications into combinatorial chemistry • Emissions Inferential Sensor – Handling correlated data sets • Train/test: 251/107 x 8 • Balancing Service Price – Fat array (298 x 48) – Handle correlated data – Identify driving variables – Looking at extrapolation Wolfram Tech Conf 2006 Evolved Analytics LLC 5

Getting the Zen of the Data Context-free analysis leads to confidently wrong answers Wolfram

Getting the Zen of the Data Context-free analysis leads to confidently wrong answers Wolfram Tech Conf 2006 Evolved Analytics LLC 6

Evolving Models • Models may be automatically archived • For convenience, default option sets

Evolving Models • Models may be automatically archived • For convenience, default option sets are defined • Progress may be monitored at several levels Wolfram Tech Conf 2006 Evolved Analytics LLC 7

Pareto Front & Modeling Potential Hard but potential Exploratory Run Useful? ? ? Wolfram

Pareto Front & Modeling Potential Hard but potential Exploratory Run Useful? ? ? Wolfram Tech Conf 2006 Where is the knee? Evolved Analytics LLC 8

Driving Variables Notice the natural variable selection Wolfram Tech Conf 2006 Evolved Analytics LLC

Driving Variables Notice the natural variable selection Wolfram Tech Conf 2006 Evolved Analytics LLC 9

Selecting Models (ad hoc) Wolfram Tech Conf 2006 Evolved Analytics LLC 10

Selecting Models (ad hoc) Wolfram Tech Conf 2006 Evolved Analytics LLC 10

Potential Pathologies? Wolfram Tech Conf 2006 Evolved Analytics LLC 11

Potential Pathologies? Wolfram Tech Conf 2006 Evolved Analytics LLC 11

Model Performance • Visually, our goal is minimal error with an even distribution of

Model Performance • Visually, our goal is minimal error with an even distribution of errors and no structural error problems as a function of variable value. Wolfram Tech Conf 2006 Evolved Analytics LLC 12

Trust via Ensembles Wolfram Tech Conf 2006 • Models with independent error structures may

Trust via Ensembles Wolfram Tech Conf 2006 • Models with independent error structures may be “stacked” with their consensus forming a trust metric • Note that the models generally won’t be on the Pareto front • Also note that for large data sets, the error residuals will be highly correlated (so we need a relaxed definition of uncorrelated) Evolved Analytics LLC 13

On the (near!) Horizon • Implement a Convert. Model. For. Excel[ ] function •

On the (near!) Horizon • Implement a Convert. Model. For. Excel[ ] function • Complete documentation & release package sale Wolfram Tech Conf 2006 Evolved Analytics LLC 14

Symbolic Regression: Summary Benefits Diverse Model Ensembles Compact Nonlinear Models – Compact empirical models

Symbolic Regression: Summary Benefits Diverse Model Ensembles Compact Nonlinear Models – Compact empirical models can be suitable for online implementation – Model(s) can be used as an emulator for coarse system optimization Driving Variable Selection & Identification – Identified driving variables may be used as inputs into other modeling tools Models from Pathological Data Sets – Appropriate models may be developed from poorly structured data sets (too many variables & not enough measurements) Metasensor (Variable Transform) Identification – Identifying variable couplings can give insight into underlying physical mechanisms – Identified metavariables can enable linearizing transforms to meld symbolic regression and more traditional statistical analysis – Metavariables can also be used as inputs into other modeling tools – The independent evolutions will produce independent models. Independent (but comparable) models may be stacked into ensembles whose divergence in prediction may be an indicator of extrapolation & model trustworthiness. This is an issue in high dimensional parameter spaces. Human Insight – The transparency of the evolved models as well as the explicit identification of the model complexity-accuracy trade-off is very compelling – Examining an expression can be viewed as a visualization technique for high-dimensional data Rapid Modeling – Exploitation of the Pareto front has resulted in several orders-of-magnitude in the symbolic regression performance relative to more traditional GP. This greatly increases the range of possible applications. Rapid Data Content Assessment – Examining the shape of the Pareto front allow us to quickly assess whether viable models can be developed from the available data Wolfram Tech Conf 2006 There are many benefits to symbolic regression. These are enhanced when coupled with other analysis tools and techniques. Evolved Analytics LLC 15