piecewiseSEM: Piecewise Structural Equation Modeling in R

#piecewiseSEM: Piecewise Structural Equation Modeling in R| 来源: 网络整理| 查看: 265

piecewiseSEM: Piecewise Structural Equation Modeling in R Jonathan S. Lefcheck 2020-12-09 1. An Introduction to Structural Equation Modeling 2. An Example using piecewiseSEM 2.1 Worked example 2.2 Standardized coefficients 2.3 GLMs in pSEM 2.4 Correlated errors 2.5 Nested models and AIC 3. Comparing Package Versions 3.1 Introduction to Shipley (2009) 3.2 Comparing versions in evaluating the Shipley’s SEM 3.3 Additional functions 4. References

Structural equation modeling (SEM) is among the fastest growing statistical techniques in ecology and evolution, and provides a new way to explore and quantify ecological systems. SEM unites multiple variables in a single causal network, thereby allowing simultaneous tests of multiple hypotheses. The idea of causality is central to SEM as the technique implicitly assumes that the relationships among variables represent causal links. Because variables can be both predictors and responses, SEM is also a useful tool for quantifying both direct and indirect (cascading) effects.

Piecewise SEM (or confirmatory path analysis) expands upon traditional SEM by introducing a flexible mathematical framework that can incorporate a wide variety of model structures, distributions, and assumptions. These include: interactions and non-normal responses, random effects and hierarchical models, and alternate correlation structures (including phylogenetic, spatial, and temporal).

This release is version 2.0 of the package and contains substantial updates to both the syntax and the underlying calculations. All functions have been replaced and rewritten from the ground up.

The first part of this vignette will briefly introduce the concepts behind piecewise SEM. The second part will introduce the new syntax using a worked example. The final part will briefly compare the old and new versions of the package.

1. An Introduction to Structural Equation Modeling

Broadly, structural equation modeling (SEM) unites a suite of variables in a single network. They are generally presented using box-and-arrow diagrams denoting directed (causal) relationships among variables:

1.1 Example SEM

Those variables that exist only as predictors in the network are referred to as exogenous, and those that are predicted (at any point) as endogenous. Exogenous variables therefore only ever have arrows coming out of them, while endogenous arrows have arrows coming into them (which does not preclude them from having arrows come out of them as well). This vocabulary is important when considering some special cases later.

In traditional SEM, the relationships among variables (i.e., their linear coefficients) are estimated simultaneously in a single variance-covariance matrix. This approach is well developed but can be computationally intensive (depending on the sizes of the v-cov matrix) and additionally assumes independence and normality of errors, two assumptions that are generally violated in ecological research.

Piecewise structural equation modeling (SEM), also called confirmatory path analysis, was proposed in the early 2000s by Bill Shipley as an alternate approach to traditional variance-covariance based SEM. In piecewise SEM, each set of relationships is estimated independently (or locally). This process decomposes the network into the corresponding simple or multiple linear regressions for each response, each of which are evaluated separately, and then combined later to generate inferences about the entire SEM. This approach has two consequences: 1. Increasingly large networks can be estimated with ease compared to a single vcov matrix (because the approach is modularized), and 2. Specific assumptions about the distribution and covariance of the responses can be addressed using typical extensions of linear regression, such as fixed covariance structures, random effects, and other sophisticated modeling techniques.

Unlike traditional SEM, which uses a \(\chi^2\) test to compare the observed and predicted covariance matrices, the goodness-of-fit of a piecewise structural equation model is obtained using ‘tests of directed separation.’ These tests evaluate the assumption that the specific causal structure reflects the data. This is accomplished by deriving the ‘basis set,’ which is the smallest set of independence claims obtained from the SEM. These claims are relationships that are unspecified in the model, in other words paths that could have been included but were omitted because they were deemed to be biologically or mechanistically insignificant. The tests ask whether these relationships can truly be considered independent (i.e., their association is not statistically significant within some threshold of acceptable error, typically \(\alpha\)=0.05) or whether some causal relationship may exist as indicated by the data.

For instance, the preceding example SEM contains 4 specified paths (solid, black) and 2 unspecified paths (dashed, red), the latter of which constitute the basis set:

1.2 Missing Paths

In this case, there are two relationships that need to be evaluated: y3 and x1, and y3 and y2. However, there are additional influences on y3, specifically the directed path from y2. Thus, the claims need to be evaluated for ‘conditional independence,’ i.e. that the two variables are independent conditional on the already specified influences on both of them. This also pertains to the predictors of y2, including the potential contributions of x1. So the full claim would be: y2 | y3 (y1, x1), with the claim of interest separated by the | bar and the conditioning variable(s) following in parentheses.

As the network grows more complex, however, the independence claims only consider variables that are immediately ancestral to the primary claim (i.e., the parent nodes). For example, if there was another variable predicting x1, it would not be considered in the independence claim between y3 and y2 since it is >1 node away in the network.

The independence claims are evaluated by fitting a regression between the two variables of interest with any conditioning variables included as covariates. Thus, the claim above y2 | y3 (y1, x1) would be modeled as y3 ~ y2 + y1 + x1 . These regressions are constructed using the same assumptions about y3 as specified in the actual structural equation model. So, for instance, if y3 is a hierarchically sampled variable predicted by y1, then same hierarchical structure would carry over to the test of directed separation of y3 predicted by y2.

The P-values of the conditional independence tests are then combined in a single Fisher’s C statistic using the following equation:

\[C = -2\sum_{i=1}^{k}ln(p_{i})\]

This statistic is \(\chi^2\)-distributed with 2k degrees of freedom, with k being the number of independence claims in the basis set.

Shipley (2013) also showed that the the C statistic can be used to compute an AIC score for the SEM, so that nested comparisons can be made in a model selection framework:

\[AIC = C + 2K\]

where K is the likelihood degrees of freedom. A further variant, \(AIC_c\), can be obtained by adding an additional penalty based on sample size:

\[AIC_c = C + 2K\frac{n}{(n - K - 1)}\]

The piecewiseSEM package automates the derivation of the basis set and the tests of directed separation, as well as extraction of path coefficients based on the user-specified input.

2. An Example using piecewiseSEM 2.1 Worked example

Let’s make up some fake data corresponding to the path diagram above:

dat

【本文地址】

公司简介

联系我们