Analysing the dynamics and relative influence of variables affecting ecosystem responses using functional PCA and boosted regression trees: A seagrass case study

Paul Pao Yen Wu, Kerrie Mengersen, M. Julian Caley, Kathryn McMahon, Michael A. Rasheed, Gary A. Kendrick

Research output: Contribution to journalArticle

Abstract

Understanding the relative influence of variables on ecosystem responses and the dynamics of their effect is necessary for effective ecosystem monitoring and management. Also known as causal pathways analysis, we develop an approach using functional Principal Components Analysis (fPCA) and machine learning within a scenario analysis framework. fPCA is used to identify most influential variables for correlated, non-homogenoeus and nonlinear time series data characteristic of complex ecosystems. Hierarchical clustering of fPCA scores reveals groups of more homogeneous scenarios and similarly influential variables. The resultant subset of variables helps to overcome model identifiability problems when analysing time-lagged effects using Boosted Regression Trees (BRT). We use simulated data generated by a Dynamic Bayesian Network (DBN) of ecological windows for seagrass ecosystems given dredging stressors; 3,024 scenarios with 75 state variables are analysed. The BRT demonstrated a high level of fit ((Formula presented.)), supporting the validity of influential variables identified by fPCA. Influential variables identified included genus, location type, light, growth and seed. Six consecutive months of positive growth and adequate light were important for predicting states of high or moderate population. Compared to traditional scenario analysis and sensitivity analysis approaches, our approach simultaneously enabled capture of n-way interactions while accounting for time correlations. Although some variables and their dynamics agreed with existing knowledge, new variables and/or time lags of their effects were identified, corresponding to opportunities for further investigation as well as informing monitoring and management. Although we demonstrate our method on state variables with DBN simulated data, it is equally applicable to general time series data.

Original languageEnglish
JournalMethods in Ecology and Evolution
DOIs
Publication statusE-pub ahead of print - 12 Aug 2019

Fingerprint

ecosystem response
seagrass
principal component analysis
case studies
ecosystems
ecosystem
time series analysis
time series
monitoring
artificial intelligence
dredging
sensitivity analysis
seed
seeds
effect
scenario analysis
methodology

Cite this

@article{1a9b0d24ce834c37b920a900f45edeee,
title = "Analysing the dynamics and relative influence of variables affecting ecosystem responses using functional PCA and boosted regression trees: A seagrass case study",
abstract = "Understanding the relative influence of variables on ecosystem responses and the dynamics of their effect is necessary for effective ecosystem monitoring and management. Also known as causal pathways analysis, we develop an approach using functional Principal Components Analysis (fPCA) and machine learning within a scenario analysis framework. fPCA is used to identify most influential variables for correlated, non-homogenoeus and nonlinear time series data characteristic of complex ecosystems. Hierarchical clustering of fPCA scores reveals groups of more homogeneous scenarios and similarly influential variables. The resultant subset of variables helps to overcome model identifiability problems when analysing time-lagged effects using Boosted Regression Trees (BRT). We use simulated data generated by a Dynamic Bayesian Network (DBN) of ecological windows for seagrass ecosystems given dredging stressors; 3,024 scenarios with 75 state variables are analysed. The BRT demonstrated a high level of fit ((Formula presented.)), supporting the validity of influential variables identified by fPCA. Influential variables identified included genus, location type, light, growth and seed. Six consecutive months of positive growth and adequate light were important for predicting states of high or moderate population. Compared to traditional scenario analysis and sensitivity analysis approaches, our approach simultaneously enabled capture of n-way interactions while accounting for time correlations. Although some variables and their dynamics agreed with existing knowledge, new variables and/or time lags of their effects were identified, corresponding to opportunities for further investigation as well as informing monitoring and management. Although we demonstrate our method on state variables with DBN simulated data, it is equally applicable to general time series data.",
keywords = "complex systems, conservation and management, dynamic Bayesian networks, functional PCA, scenario analysis, time series modelling",
author = "Wu, {Paul Pao Yen} and Kerrie Mengersen and Caley, {M. Julian} and Kathryn McMahon and Rasheed, {Michael A.} and Kendrick, {Gary A.}",
year = "2019",
month = "8",
day = "12",
doi = "10.1111/2041-210X.13269",
language = "English",
journal = "Methods in",
issn = "2041-210X",
publisher = "John Wiley & Sons",

}

Analysing the dynamics and relative influence of variables affecting ecosystem responses using functional PCA and boosted regression trees : A seagrass case study. / Wu, Paul Pao Yen; Mengersen, Kerrie; Caley, M. Julian; McMahon, Kathryn; Rasheed, Michael A.; Kendrick, Gary A.

In: Methods in Ecology and Evolution, 12.08.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Analysing the dynamics and relative influence of variables affecting ecosystem responses using functional PCA and boosted regression trees

T2 - A seagrass case study

AU - Wu, Paul Pao Yen

AU - Mengersen, Kerrie

AU - Caley, M. Julian

AU - McMahon, Kathryn

AU - Rasheed, Michael A.

AU - Kendrick, Gary A.

PY - 2019/8/12

Y1 - 2019/8/12

N2 - Understanding the relative influence of variables on ecosystem responses and the dynamics of their effect is necessary for effective ecosystem monitoring and management. Also known as causal pathways analysis, we develop an approach using functional Principal Components Analysis (fPCA) and machine learning within a scenario analysis framework. fPCA is used to identify most influential variables for correlated, non-homogenoeus and nonlinear time series data characteristic of complex ecosystems. Hierarchical clustering of fPCA scores reveals groups of more homogeneous scenarios and similarly influential variables. The resultant subset of variables helps to overcome model identifiability problems when analysing time-lagged effects using Boosted Regression Trees (BRT). We use simulated data generated by a Dynamic Bayesian Network (DBN) of ecological windows for seagrass ecosystems given dredging stressors; 3,024 scenarios with 75 state variables are analysed. The BRT demonstrated a high level of fit ((Formula presented.)), supporting the validity of influential variables identified by fPCA. Influential variables identified included genus, location type, light, growth and seed. Six consecutive months of positive growth and adequate light were important for predicting states of high or moderate population. Compared to traditional scenario analysis and sensitivity analysis approaches, our approach simultaneously enabled capture of n-way interactions while accounting for time correlations. Although some variables and their dynamics agreed with existing knowledge, new variables and/or time lags of their effects were identified, corresponding to opportunities for further investigation as well as informing monitoring and management. Although we demonstrate our method on state variables with DBN simulated data, it is equally applicable to general time series data.

AB - Understanding the relative influence of variables on ecosystem responses and the dynamics of their effect is necessary for effective ecosystem monitoring and management. Also known as causal pathways analysis, we develop an approach using functional Principal Components Analysis (fPCA) and machine learning within a scenario analysis framework. fPCA is used to identify most influential variables for correlated, non-homogenoeus and nonlinear time series data characteristic of complex ecosystems. Hierarchical clustering of fPCA scores reveals groups of more homogeneous scenarios and similarly influential variables. The resultant subset of variables helps to overcome model identifiability problems when analysing time-lagged effects using Boosted Regression Trees (BRT). We use simulated data generated by a Dynamic Bayesian Network (DBN) of ecological windows for seagrass ecosystems given dredging stressors; 3,024 scenarios with 75 state variables are analysed. The BRT demonstrated a high level of fit ((Formula presented.)), supporting the validity of influential variables identified by fPCA. Influential variables identified included genus, location type, light, growth and seed. Six consecutive months of positive growth and adequate light were important for predicting states of high or moderate population. Compared to traditional scenario analysis and sensitivity analysis approaches, our approach simultaneously enabled capture of n-way interactions while accounting for time correlations. Although some variables and their dynamics agreed with existing knowledge, new variables and/or time lags of their effects were identified, corresponding to opportunities for further investigation as well as informing monitoring and management. Although we demonstrate our method on state variables with DBN simulated data, it is equally applicable to general time series data.

KW - complex systems

KW - conservation and management

KW - dynamic Bayesian networks

KW - functional PCA

KW - scenario analysis

KW - time series modelling

UR - http://www.scopus.com/inward/record.url?scp=85070715203&partnerID=8YFLogxK

U2 - 10.1111/2041-210X.13269

DO - 10.1111/2041-210X.13269

M3 - Article

JO - Methods in

JF - Methods in

SN - 2041-210X

ER -