I. Basic Models spdep

NOTE: the definitions of \(\lambda\) and \(\rho\) in this sheet and R are opposite

Concept

\[ y = \lambda W y + X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \\ u = \rho W u + \epsilon \qquad\qquad\qquad\qquad\qquad |\rho|<1 \]

Model Condition Estimator Function
1 Pure spatial autoregressive \(\beta=0; \quad \lambda, \rho=0\) ML spautolm()
2 Lagged independent variable model \(\lambda=\rho=0\)
3 Spatial Error Model (SEM) \(\lambda = 0; \quad \rho \neq 0\) ML errorsarlm()
FGLS GMerrorsar()
4 Spatial Lag Model (SLM) \(\lambda \neq 0; \quad \rho = 0\) ML lagsarlm()
2SLS stsls()
5 Complete model (SARAR) \(\lambda \neq 0; \quad \rho \neq 0\) ML sacsarlm()
GS2SLS gstsls()
BFGS2SLS

1) Pure spatial autoregressive

  • Condition: \(\quad \beta=0 \quad\) & \(\quad \lambda, \rho=0\)
    1. ML: Most Likelihood
      The procedure is deloped by Whittle (1954).
      PSA_ML <- spautolm(Y~X, data = df, listw = W)
      \[ (i)\qquad\qquad y = \lambda W y \qquad\qquad |\lambda|<1 \\ (ii)\qquad\quad y = \rho W u + \epsilon \qquad\quad |\rho|<1 \]

2) Lagged independent variable model

  • Condition: \(\quad \lambda=\rho=0\) \[ y = X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \]

3) Spatial Error Model (SEM)

  • Condition: \(\quad \lambda = 0 \quad\) & \(\quad \rho \neq 0\) \[ y = X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \\ u = \rho W u + \epsilon \quad\qquad\qquad\qquad |\rho|<1 \]
  • SEM is referred by Anselin (1988) and Arbis (2006).
    1. ML: Most Likelihood
      Lee (2004) proves constant and asymptotically normal in a model.
      SEM_ML <- errorsarlm(Y~X, data = df, listw = W)
    2. FGLS: Feasible GLS
      The procedure is deloped by Kelejian and Prucha (1998).
      SEM_FGLS <- GMerrorsar(Y~X, data = df, listw = W)

4) Spatial Lag Model (SLM)

  • Condition: \(\quad \lambda \neq 0 \quad\) & \(\quad \rho = 0\) \[ y = \lambda W y + X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \]
  • SLM is referred by Anselin (1988) and Arbis (2006).
    1. ML: Most Likelihood
      Assumption: normality of the residuals (BP test)
      SLM_ML <- lagsarlm(Y~X, data = df, listw = W)
    2. 2SLS: Two-Stage Least Squares
      No assumption of normality of the residuals (BP test)
      SLM_2SLS <- stsls(Y~X, data = df, listw = W)

5) Complete model (SARAR)

  • Condition: \(\quad \lambda \neq 0 \quad\) & \(\quad \rho \neq 0\) \[ y = \lambda W y + X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \\ u = \rho W u + \epsilon \qquad\qquad\qquad\qquad\qquad |\rho|<1 \]
  • SARAR is referred as SARAR(1,1) by Kelejian and Prucha (1998) and as General Spatial Model by Anselin (1988).
    1. ML: Most Likelihood
      Limitation is there is currently no formal proof that the estimators possess the usual optimal large sample properties.
      Assumption: normality of the residuals (BP test)
      SARAR_ML <- sacsarlm(Y~X, data = df, listw = W)
    2. GS2SLS: Generalized Spatial Two-Stage Least Squares
      An extension of 2SLS proposed by Kelejian and Prucha (1998).
      No assumption of normality of the residuals (BP test)
      Limitation is not asymptotically fully efficient.
      SARAR_GS2SLS <- gstsls(Y~X, data = df, listw = W)
    3. BFGS2SLS: Best Feasible GS2SLS
      An extension of GS2SLS proposed by Lee (2003), also known as LIV (Lee’s Instrumental Variable)
      No assumption of normality of the residuals (BP test)
      Limitation is numerically challenging in very large samples.
    • Kelejian et al. (2004) shows BFGS2SLS and its simplified version do not differ substantially in terms of efficiency from GS2SLS in small sample.

Model test

  • Moran I test: SEM vs SLM
    • \(H_0\): No spatial autocorrelation in regression residual
    • but there is no alternative hypothesis considering to contrast the null of uncorrelation
    • Moran I test is proposed by Moran (1950).
    1. spdep::lm.morantest(OLS, listw = W)
    2. spdep::moran.test(REG$residuals, listw = W)
      Both have same Moran’s I statistic value, but different p-value.
  • LM and RLM test
    • \(H_0\): \(\lambda=0\) for \(LM_{SLM}\), \(\rho=0\) for \(LM_{SEM}\)
    • LM stands for Lagrange Multiplier and its robust version is proposed by Anselin et al. (1996).
      • spdep::lm.LMtests(OLS, listw = W, test = "all")
  • Wald test
    • \(H_0\): \(\lambda=\rho=0\)

Interpretation of parameters in spatial econometrics

  • LeSage and Pace (2009) introduce the impact measures
    1. ADI: average direct impact
    2. AII: average indirect impact
    3. ATI: average total impact
    4. ATIT: average total impact to an observation
    5. ATIF: average total impact from an observation
    • spdep::impacts(SLM/SARAR, listw = W)


- Further material
- Arbia (2011) provides revivews for spatial panel analysis.


II. Panel Data splm

Concept

  • Basic for panel data \[ y_{it} = \alpha + X_{it} \beta + u_{it} = \alpha + X_{it} \beta + (\mu_i + \epsilon_{it}) \] where \(\mu_i\) is individual error component and \(\epsilon_{it}\) is idiosysncratic error component.
    1. Pooling model
      Condition: \(\mu_i\) (individual error component) is same among all indivisuals i.
      OLS_pooled <- plm(Y~X, model = "pooling")
    2. Fixed model
      Condition: \(\mu_i\) (individual error component) is correlated with the regressors.
      OLS_within <- plm(Y~X, model = "within")
    3. Random model
      Condition: \(\mu_i\) (individual error component) is uncorrelated with the regressors.
      FGLS_random <- plm(Y~X, model = "random")
Model Estimator Function model = spatial.error = lag =
1 SEM-RE ML splm() "random", fixed "b" FALSE
KKP (SEM) ML splm() "random", fixed "kkp" FALSE
GM spgm() "random", fixed TRUE FALSE
2 SLM-RE ML splm() "random", fixed "none" TRUE
GM spgm() "random", fixed FALSE TRUE
3 SARAR ML splm() "random", fixed "b", "kkp" TRUE
GM spgm() "random", fixed TRUE TRUE

Note. 1) splm() and spgm() are used for ML and GM estimator, respectively. 2) listw = W and data = data.frame should be specified in each model. 3) model = c("random", "within") should be chosen according to Hausman test.

1) Random effects

  • SEM: Spatial Error Model
    1. SEM-RE: spatial error model with random effects \[ \epsilon_t = \rho W \epsilon_t + \eta_t \] SEM-RE considers only \(\epsilon_{it}\) (idiosysncratic error component).
      SEM_RE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = "b", lag = FALSE)

    2. KPP \[ u = \mu + \epsilon = \rho (I_T*W)u + \eta \] KKP considers both \(\mu_i\) (individual error component) and \(\epsilon_{it}\) (idiosysncratic error component).
      KKP is proposed by Kapoor et al. (2007).
      KKP_ML <- spml(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = "kkp", lag = FALSE)
      KKP_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = TRUE, lag = FALSE)

  • SLM: Spatial Lag Model
    1. SLM-RE: spatial lag model with random effects SLM_RE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = "none", lag = TRUE)
      SLM_RE_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = FALSE, lag = TRUE)

2) Fixed effects

  • SEM:
    1. SEM-FE
      SEM_FE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = "b", lag = FALSE)
    2. KPP-FE KKP_FE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = "kkp", lag = FALSE)
      KPP_FE_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = TRUE, lag = FALSE)
  • SLM:
    1. SLM-FE
      SLM_FE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = "none", lag = TRUE)
      SLM_FE_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = FALSE, lag = TRUE)

Model test and estimator

  • Hausan test: Randon vs Fixed
    • Hausan test is proposed by Hausman (1978) and extended for spatial panels by Lee and Yu (2012).
    • \(H_0\): Random model is true
      splm::sphtest(Y~X, data = data.frame, listw = W, spatial.model = c("error", "lag" "sarar"), method = c("ML", GM")
      When "ML" is chosed for method, errors = c("BSL", "KPP") can be specified.
  • Estimator for spatial panel data: ML vs GM
    1. ML: Maximum Likelihood
      ML for SEM and SLM for spatial panels is proposed by Elhorst (2003).
      \(+\) most efficient when all distributional assumptions are met
      \(-\) large computational demand
    2. GM: Generalized Moments
      \(+\) less computational demand
      \(+\) relaxed normality assumption compared with GM -> more robust

  • Further material


III. Advanced

Type Model Estimator Package Function
1 Heteroscedastic model FGS2SLS sphet gstslshet()
Spatial HAC sphet stslshac()
2 Descrete model a-Spatial McSpatial glm()
Spatial Probit ML McSpatial spprobitml()
GMM McSpatial gmmprobit()
LGMM McSpatial spprobit()
3 Non-stationary GWR spgwr gwr()

1) Heteroscedastic model sphet

  • Parametric
    1. FGS2SLS: Feasible 2SLS
      Modifying SARAR, the procedure is proposed by Kelejian and Prucha (2010).
      Hetero_SARAR <- gstslshet(Y~X, listw = W)
  • Non-parametric
    1. Spatial HAC: heteroscadasticity and autocorrelation
      the procedure is proposed by Kelejian and Prucha (2007).
      Spatial_HAC <- stslshac(Y~X, listw = W, distance = D, type = "Epanechnikov")

2) Discrete model McSpatial

  • a-spatial logia and probit
    1. Probit
      Probit <- glm(Y~X, family = binomial(link="probit"))
    2. Logit
      Logit <- glm(Y~X, family = binomial(link="logit"))
  • Spatial Probit
    1. ML (probit)
      ML cannot be found analytically and has computational demanding.
      Probit_ML <- spprobitml(Y~X, wmat = Wdash, stdprobit = FALSE)
    2. GMM (probit)
      GMM is proposed by Pinkse and Slade (1998).
      GMM has computational demanding in a large sample.
      rho = rho0 (\(|\rho_0|<1\))
      Probit_GMM <- gmmprobit(Y~X, wmat = Wdash, startrho = rho)
    3. LGMM (logit)
      LGMM is proposed by Klier and McMillen (2008).
      LGMM has less computational demanding but less accuracy.
      When \(\lambda<0.5\), there is no bias; otherwise, there is a upward biase.
      Lobit_LGMM <- spprobit(Y~X, wmat = Wdash)
    • To create Wdash, Spatial Probit model requires sepdep::nb2mat insted of spdep::nb2listw().
  • Logit vs Probit
    • Probit is more popular than Logit.
    • Anselin (2002) shows the error term of the spatial Logit is analytically intractable.
    • but Smirnov (2010) shows Probit is not easy to be extended to more than two alternatives.

3) Non-stationary spgwr

  • Scan statistics (parametric)
  • LWR: locally weighted regression (non-parametric)
    • LWR can be seen as a scan statistic technique to perform a regression around a point of interest using only a limited number of training data.
    • LWR is produced by Clevel and Devlin (1988).
    • LWR applies kernel; thus, produces a smooth variation though the regression estimates is calculated separately.
  • GWR: geographically weighted regression (non-parametric)


IV. Big data

under construction


Model test

The Model: REG <- lm(Y~X)

1) Homoscedasticity

  • BP test
    \(H_0\): No homoscedasticity
    lmtest::bptest(REG)

2) Normality of the regression residual

  • JB test
    \(H_0\): Normality distribution of regression residual
    tseries::jarque.bera.test(REG$residuals)

3) Spatial autoCorrelation for regression residuals

  • Moran I test, proposed by by Moran (1950)
    \(H_0\): No spatial autocorrelation in regression residual
    [i] sepdep::lm.morantest(REG, listw = W)
    [ii] sepdep::moran.test(REG$residuals, listw = W)
    Both have same Moran’s I statistic value, but different p-value.
  • LM and RLM test,
    \(H_0\): \(\lambda=0\) for \(LM_{SLM}\), \(\rho=0\) for \(LM_{SEM}\)
    spdep::lm.LMtests(OLS, listw = W, test = "all")

4) \(\lambda=\rho=0\)

  • Wald test
    \(H_0\): \(\lambda=\rho=0\)

5) Randon vs Fixed for spatial panel

  • Hausan test is proposed by Hausman (1978) and extended for spatial panels by Lee and Yu (2012).
    \(H_0\): Random model is true
    splm::sphtest(Y~X, data = data.frame, listw = W, spatial.model = c("error", "lag" "sarar"), method = c("ML", GM")
    When "ML" is chosed for method, errors = c("BSL", "KPP") can be specified.