Machine Learning to explore fish species interaction in

  • Slides: 48
Download presentation
Machine Learning to explore fish species interaction in the Northern gulf of St Lawrence

Machine Learning to explore fish species interaction in the Northern gulf of St Lawrence Dr Allan Tucker Centre for Intelligent Data Analysis Brunel University West London UK

Talk Outline Introduce myself and research group n Introduce Machine Learning n Describe Bayesian

Talk Outline Introduce myself and research group n Introduce Machine Learning n Describe Bayesian network models n Document some preliminary results on fish population data n Conclusions n

Who Am I? Research Lecturer at Brunel University, West London n Member of Centre

Who Am I? Research Lecturer at Brunel University, West London n Member of Centre for IDA (est 1994) n X

What is the ? Over 25 members (academics, postdocs, and Ph. Ds) with diverse

What is the ? Over 25 members (academics, postdocs, and Ph. Ds) with diverse backgrounds (e. g. maths, statistics, computing, biology, engineering) n Over 140 journal publications & a dozen research council grants since 2001 n Many collaborating partners in UK, Europe, China and USA n Bi Annual Symposia in Europe n

Some Previous Work in Machine Learning and Temporal Analysis n Oil Refinery Models n

Some Previous Work in Machine Learning and Temporal Analysis n Oil Refinery Models n n Medical Data: Retinal (Visual Field) n n n Forecasting Explanation Screening Forecasting Bioinformatics: n n Gene Clusters Gene Regulatory Networks

Some Previous Work in

Some Previous Work in

Part 1 What is Machine Learning?

Part 1 What is Machine Learning?

What is Machine Learning? (and why not statistics? ) n Data oriented n Extracting

What is Machine Learning? (and why not statistics? ) n Data oriented n Extracting useful info from data n As automated as possible n Useful when lots of data and little theory n Making predictions about the future

What Can we do with ML? Classification and Clustering n Feature Selection n Prediction

What Can we do with ML? Classification and Clustering n Feature Selection n Prediction and Forecasting n Identifying Structure in Data n

E. g. Classification Given some labelled data (supervised) n Build a “model” to allow

E. g. Classification Given some labelled data (supervised) n Build a “model” to allow us to classify other unlabelled data n e. g. A doctor diagnosing a patient based upon previous cases n

Classification e. g. medical Scatterplot of patients n 2 variables: n n Measurement of

Classification e. g. medical Scatterplot of patients n 2 variables: n n Measurement of expression of 2 genes

Classification n How do we classify them? n Nearest Neighbour / Linear / Complex

Classification n How do we classify them? n Nearest Neighbour / Linear / Complex Fn?

Classification n Trivial case with Cod and Shrimp Data

Classification n Trivial case with Cod and Shrimp Data

The Data n Northern Gulf (region a) n n n Two ships (Needler and

The Data n Northern Gulf (region a) n n n Two ships (Needler and Hammond) combined by normalising according to overlap year Multivariate Spatial Time Series (short) Missing Data

Background Northern Gulf considered to be one ecosystem / fish community n Quite heavily

Background Northern Gulf considered to be one ecosystem / fish community n Quite heavily fished until about 1990 n Most fish populations collapsed since n Some say that moved to an alternative stable state and unlikely to come back to cod dominated community without some chance event beyond human control. n Lots of speculation: n cold water n large increases in population of predators. n Examine nature and strength of interactions between species in the two periods. n Ask “what if ? ” questions: n For other parts of community to recover, we would need cod to have X strength of interaction with Y number of other species? n

ML for Northern Gulf Data n Network building n knowledge and data of interactions

ML for Northern Gulf Data n Network building n knowledge and data of interactions Feature Selection for Classification of relevant species to the cod collapse n State Space / Dynamic models for predicting populations n Hidden variable analysis n

Part 2 Bayesian Networks for Machine Learning

Part 2 Bayesian Networks for Machine Learning

Bayesian Networks Method to model a domain using probabilities n Easily interpreted by non-statisticians

Bayesian Networks Method to model a domain using probabilities n Easily interpreted by non-statisticians n Can be used to combine existing knowledge with data n Essentially use independence assumptions to model the joint distribution of a domain n

Bayesian Networks n Simple 2 variable Joint Distribution P(Collapse 1, Collapse 2) Species 2

Bayesian Networks n Simple 2 variable Joint Distribution P(Collapse 1, Collapse 2) Species 2 ¬ Species 2 Species 1 0. 89 0. 01 ¬ Species 1 0. 03 0. 07 can use it to ask many useful questions n but requires k. N probabilities n

Bayesian Network for Toy Domain P(A). 001 Species. A A T T F F

Bayesian Network for Toy Domain P(A). 001 Species. A A T T F F C P(D) T. 70 F. 01 B T F P(C). 95. 94. 29. 001 Species. D Species. B P(B). 002 Species. C Species. E C T F P(E). 90. 05

Bayesian Networks n Bayesian Network Demo n [Species_Net] Use algorithms to learn structure and

Bayesian Networks n Bayesian Network Demo n [Species_Net] Use algorithms to learn structure and parameters from data n Or build by hand (priors) n Also continuous nodes (density functions) n

Informative Priors To build BNs we can also use prior structures and probabilities n

Informative Priors To build BNs we can also use prior structures and probabilities n These are then updated with data n Usually uniform (equal probability) n Informative Priors used to incorporate existing knowledge into BNs n

Bayesian Networks for Classification & Feature Selection Node that represents the class label attached

Bayesian Networks for Classification & Feature Selection Node that represents the class label attached to the data n

Dynamic Bayesian Networks for Forecasting Nodes represent variables at distinct time slices n Links

Dynamic Bayesian Networks for Forecasting Nodes represent variables at distinct time slices n Links between nodes over time n Can be used to forecast into the future [Species_Dynamic_Net] n

Hidden Markov Models Like a DBN but with hidden nodes: HT-1 HT OT-1 OT

Hidden Markov Models Like a DBN but with hidden nodes: HT-1 HT OT-1 OT Often used to model sequences

Typical Algorithms for HMMs Given an observed sequence and a model, how do we

Typical Algorithms for HMMs Given an observed sequence and a model, how do we compute its probability given the model? n Given the observed sequence and the model, how do we choose an optimal hidden state sequence? n How do we adjust the model parameters to maximise the probability of the observed sequence given the model? n

Summary Different learning tasks can be used to solve real world problems n Machine

Summary Different learning tasks can be used to solve real world problems n Machine Learning techniques useful when lots of data and lots of gaps in knowledge n Bayesian Networks: probabilistic framework that can perform most key ML tasks n Also transparent & can incorporate expert knowledge n

Part 3 Some Preliminary Results on Northern Gulf Data

Part 3 Some Preliminary Results on Northern Gulf Data

Expert Knowledge Ask marine biologists to generate matrices of expected relationships n Can be

Expert Knowledge Ask marine biologists to generate matrices of expected relationships n Can be used to compare models learnt from data n Also to be used as priors to improve model quality n

Results: Expert networks

Results: Expert networks

Results: Data networks (BN from correlation) n 85% conf. imputed from 70% data Witch

Results: Data networks (BN from correlation) n 85% conf. imputed from 70% data Witch Flounder (Eel pout / Ocean Sun Fish) Cod Haddock (Silver Hake) n (Lumpfish) Shrimp (Atlantic soft pout / Bristlemouths) Warning: data quality, spurious relations

Example DBN Let’s look at an example DBN [NGulf. Dynamic - range] n Structure

Example DBN Let’s look at an example DBN [NGulf. Dynamic - range] n Structure Encoded by knowledge n Updated by data n Explore with queries n Supported by previous knowledge: n n “In the Northern gulf of st. Lawrence, cod (code 438) and redfish (792, 793, 794, 795, 796) collapsed to very low levels in the mid 1990 s. Subsequently the shrimp (8111) increased greatly in biomass so one will see this signal in the data. It is hypothesised that these are exclusive community states where you never get high abundance of both at the same time owing to predatory interactions. ”

Feature Selection Given that we know that from 1990 the cod population collapsed n

Feature Selection Given that we know that from 1990 the cod population collapsed n Can we apply Feature Selection to see what species characterise this collapse n [Learn BN and apply CV]

Results 7: Feature Selection with Bootstrap Filter method using Log Likelihood Wrapper method using

Results 7: Feature Selection with Bootstrap Filter method using Log Likelihood Wrapper method using BNs Redfish

Results : Feature Selection Change in Correlation of interactions between cod and high ranking

Results : Feature Selection Change in Correlation of interactions between cod and high ranking species before and after 1990: n

Dynamic Models Given that the data is a time-series n Can we build dynamic

Dynamic Models Given that the data is a time-series n Can we build dynamic models to forecast future states? n Can we use HMM to classify the timeseries? n

Multivariate Time Series N Gulf is process measured over time n Autoregressive Correlation Function

Multivariate Time Series N Gulf is process measured over time n Autoregressive Correlation Function (here cod) n Cross Correlation Function (here hake to cod) n ACF CCF

Results 3: Fitting Dynamic Models HMM Expert with CCF > 0. 3 (maxlag =

Results 3: Fitting Dynamic Models HMM Expert with CCF > 0. 3 (maxlag = 5) LSS = 8. 3237

Results 3: Fitting Dynamic Models Learning DBN from CCF data LSS = 5. 0106

Results 3: Fitting Dynamic Models Learning DBN from CCF data LSS = 5. 0106 Fluctuation: Early Indicator of Collapse?

Results 4: Examining DBN Net Data only Dynamic Links: Hakes Redfish Cod Haddock Witch

Results 4: Examining DBN Net Data only Dynamic Links: Hakes Redfish Cod Haddock Witch Flounder White Hake Shrimp Thorny Skate

Results 5: Fitting Dynamic Models Learning DBN from Expert biased CCF data CCF >

Results 5: Fitting Dynamic Models Learning DBN from Expert biased CCF data CCF > 0. 5 (maxlag=5) LSS = 6. 1326

Results 6: Examining DBN Net Data Biased Expert Dynamic Links: Cod Herring Witch Flounder

Results 6: Examining DBN Net Data Biased Expert Dynamic Links: Cod Herring Witch Flounder Mackerel / Capelin

Results 7: Linear Dynamic System Instead of hidden state, continuous var: 1987 (white fur

Results 7: Linear Dynamic System Instead of hidden state, continuous var: 1987 (white fur ban) 1991 1997 (white fur hunt) 1984 Could be interpreted as measure of fishing? Predator population (e. g. seals)? Water temperature?

Conclusions Hopefully conveyed the broad idea of machine learning n Shown how it can

Conclusions Hopefully conveyed the broad idea of machine learning n Shown how it can be used to help analyse data like fish population data n Potentially applicable to other data studied here at MLI n

Potential Projects 1. Spatio-Temporal Analysis Use Spatio-Temporal BNs to model fish stock data. Nodes

Potential Projects 1. Spatio-Temporal Analysis Use Spatio-Temporal BNs to model fish stock data. Nodes would represent species in specific “regions” 2. 3. 4. Combining Expert Knowledge and Data for improved Prediction Looking for Un/Stable States and the factors that influence them Functional Analysis of Data from Multiple Locations

E. G. Spatial Analysis n n Spatial Bayesian Network Analysis [NGulf. Cod. Spatial]

E. G. Spatial Analysis n n Spatial Bayesian Network Analysis [NGulf. Cod. Spatial]

E. G. Functional Models to assimilate data from different oceans. . . n

E. G. Functional Models to assimilate data from different oceans. . . n

Acknowledgements: Daniel Duplisea Panayiota Apostolaki Any Questions?

Acknowledgements: Daniel Duplisea Panayiota Apostolaki Any Questions?