Spatial Econometrics

I. Basic Models `spdep`

NOTE: the definitions of $\lambda$ and $\rho$ in this sheet and R are opposite

Concept

\[ y = \lambda W y + X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \\ u = \rho W u + \epsilon \qquad\qquad\qquad\qquad\qquad |\rho|<1 \]

	Model	Condition	Estimator	Function
1	Pure spatial autoregressive	$\beta=0; \quad \lambda, \rho=0$	ML	`spautolm()`
2	Lagged independent variable model	$\lambda=\rho=0$
3	Spatial Error Model (SEM)	$\lambda = 0; \quad \rho \neq 0$	ML	`errorsarlm()`
			FGLS	`GMerrorsar()`
4	Spatial Lag Model (SLM)	$\lambda \neq 0; \quad \rho = 0$	ML	`lagsarlm()`
			2SLS	`stsls()`
5	Complete model (SARAR)	$\lambda \neq 0; \quad \rho \neq 0$	ML	`sacsarlm()`
			GS2SLS	`gstsls()`
			BFGS2SLS

1) Pure spatial autoregressive

Condition: $\quad \beta=0 \quad$ & $\quad \lambda, \rho=0$
1. ML: Most Likelihood
  The procedure is deloped by Whittle (1954).
  PSA_ML <- spautolm(Y~X, data = df, listw = W)
  \[ (i)\qquad\qquad y = \lambda W y \qquad\qquad |\lambda|<1 \\ (ii)\qquad\quad y = \rho W u + \epsilon \qquad\quad |\rho|<1 \]

2) Lagged independent variable model

Condition: $\quad \lambda=\rho=0$ \[ y = X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \]

3) Spatial Error Model (SEM)

Condition: $\quad \lambda = 0 \quad$ & $\quad \rho \neq 0$ \[ y = X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \\ u = \rho W u + \epsilon \quad\qquad\qquad\qquad |\rho|<1 \]
SEM is referred by Anselin (1988) and Arbis (2006).
1. ML: Most Likelihood
  Lee (2004) proves constant and asymptotically normal in a model.
  SEM_ML <- errorsarlm(Y~X, data = df, listw = W)
2. FGLS: Feasible GLS
  The procedure is deloped by Kelejian and Prucha (1998).
  SEM_FGLS <- GMerrorsar(Y~X, data = df, listw = W)

4) Spatial Lag Model (SLM)

Condition: $\quad \lambda \neq 0 \quad$ & $\quad \rho = 0$ \[ y = \lambda W y + X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \]
SLM is referred by Anselin (1988) and Arbis (2006).
1. ML: Most Likelihood
  Assumption: normality of the residuals (BP test)
  SLM_ML <- lagsarlm(Y~X, data = df, listw = W)
2. 2SLS: Two-Stage Least Squares
  No assumption of normality of the residuals (BP test)
  SLM_2SLS <- stsls(Y~X, data = df, listw = W)

5) Complete model (SARAR)

Condition: $\quad \lambda \neq 0 \quad$ & $\quad \rho \neq 0$ \[ y = \lambda W y + X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \\ u = \rho W u + \epsilon \qquad\qquad\qquad\qquad\qquad |\rho|<1 \]
SARAR is referred as SARAR(1,1) by Kelejian and Prucha (1998) and as General Spatial Model by Anselin (1988).
1. ML: Most Likelihood
  Limitation is there is currently no formal proof that the estimators possess the usual optimal large sample properties.
  Assumption: normality of the residuals (BP test)
  SARAR_ML <- sacsarlm(Y~X, data = df, listw = W)
2. GS2SLS: Generalized Spatial Two-Stage Least Squares
  An extension of 2SLS proposed by Kelejian and Prucha (1998).
  No assumption of normality of the residuals (BP test)
  Limitation is not asymptotically fully efficient.
  SARAR_GS2SLS <- gstsls(Y~X, data = df, listw = W)
3. BFGS2SLS: Best Feasible GS2SLS
  An extension of GS2SLS proposed by Lee (2003), also known as LIV (Lee’s Instrumental Variable)
  No assumption of normality of the residuals (BP test)
  Limitation is numerically challenging in very large samples.
- Kelejian et al. (2004) shows BFGS2SLS and its simplified version do not differ substantially in terms of efficiency from GS2SLS in small sample.

Model test

Moran I test: SEM vs SLM
- $H_0$: No spatial autocorrelation in regression residual
- but there is no alternative hypothesis considering to contrast the null of uncorrelation
- Moran I test is proposed by Moran (1950).
1. spdep::lm.morantest(OLS, listw = W)
2. spdep::moran.test(REG$residuals, listw = W)
  Both have same Moran’s I statistic value, but different p-value.
LM and RLM test
- $H_0$: $\lambda=0$ for $LM_{SLM}$, $\rho=0$ for $LM_{SEM}$
- LM stands for Lagrange Multiplier and its robust version is proposed by Anselin et al. (1996).
  - spdep::lm.LMtests(OLS, listw = W, test = "all")
Wald test
- $H_0$: $\lambda=\rho=0$

Interpretation of parameters in spatial econometrics

LeSage and Pace (2009) introduce the impact measures
1. ADI: average direct impact
2. AII: average indirect impact
3. ATI: average total impact
4. ATIT: average total impact to an observation
5. ATIF: average total impact from an observation
- spdep::impacts(SLM/SARAR, listw = W)

- Further material
- Arbia (2011) provides revivews for spatial panel analysis.

II. Panel Data `splm`

Concept

Basic for panel data \[ y_{it} = \alpha + X_{it} \beta + u_{it} = \alpha + X_{it} \beta + (\mu_i + \epsilon_{it}) \] where $\mu_i$ is individual error component and $\epsilon_{it}$ is idiosysncratic error component.
1. Pooling model
  Condition: $\mu_i$ (individual error component) is same among all indivisuals i.
  OLS_pooled <- plm(Y~X, model = "pooling")
2. Fixed model
  Condition: $\mu_i$ (individual error component) is correlated with the regressors.
  OLS_within <- plm(Y~X, model = "within")
3. Random model
  Condition: $\mu_i$ (individual error component) is uncorrelated with the regressors.
  FGLS_random <- plm(Y~X, model = "random")

	Model	Estimator	Function	`model =`	`spatial.error =`	`lag =`
1	SEM-RE	ML	`splm()`	`"random"`, `fixed`	`"b"`	`FALSE`
	KKP (SEM)	ML	`splm()`	`"random"`, `fixed`	`"kkp"`	`FALSE`
		GM	`spgm()`	`"random"`, `fixed`	`TRUE`	`FALSE`
2	SLM-RE	ML	`splm()`	`"random"`, `fixed`	`"none"`	`TRUE`
		GM	`spgm()`	`"random"`, `fixed`	`FALSE`	`TRUE`
3	SARAR	ML	`splm()`	`"random"`, `fixed`	`"b"`, `"kkp"`	`TRUE`
		GM	`spgm()`	`"random"`, `fixed`	`TRUE`	`TRUE`

Note. 1) splm() and spgm() are used for ML and GM estimator, respectively. 2) listw = W and data = data.frame should be specified in each model. 3) model = c("random", "within") should be chosen according to Hausman test.

1) Random effects

SEM: Spatial Error Model
1. SEM-RE: spatial error model with random effects \[ \epsilon_t = \rho W \epsilon_t + \eta_t \] SEM-RE considers only $\epsilon_{it}$ (idiosysncratic error component).
  SEM_RE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = "b", lag = FALSE)
2. KPP \[ u = \mu + \epsilon = \rho (I_T*W)u + \eta \] KKP considers both $\mu_i$ (individual error component) and $\epsilon_{it}$ (idiosysncratic error component).
  KKP is proposed by Kapoor et al. (2007).
  KKP_ML <- spml(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = "kkp", lag = FALSE)
  KKP_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = TRUE, lag = FALSE)
SLM: Spatial Lag Model
1. SLM-RE: spatial lag model with random effects SLM_RE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = "none", lag = TRUE)
  SLM_RE_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = FALSE, lag = TRUE)

2) Fixed effects

SEM:
1. SEM-FE
  SEM_FE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = "b", lag = FALSE)
2. KPP-FE KKP_FE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = "kkp", lag = FALSE)
  KPP_FE_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = TRUE, lag = FALSE)
SLM:
1. SLM-FE
  SLM_FE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = "none", lag = TRUE)
  SLM_FE_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = FALSE, lag = TRUE)

Model test and estimator

Hausan test: Randon vs Fixed
- Hausan test is proposed by Hausman (1978) and extended for spatial panels by Lee and Yu (2012).
- $H_0$: Random model is true
  splm::sphtest(Y~X, data = data.frame, listw = W, spatial.model = c("error", "lag" "sarar"), method = c("ML", GM")
  When "ML" is chosed for method, errors = c("BSL", "KPP") can be specified.
Estimator for spatial panel data: ML vs GM
1. ML: Maximum Likelihood
  ML for SEM and SLM for spatial panels is proposed by Elhorst (2003).
  $+$ most efficient when all distributional assumptions are met
  $-$ large computational demand
2. GM: Generalized Moments
  $+$ less computational demand
  $+$ relaxed normality assumption compared with GM -> more robust
Further material
- Baltagi and Pesaran (2007), Lee and Yu (2010), and Lee and Yu (2011) provide revivews for spatial panel analysis.
- Millo and Piras (2012) provides further detail in the splm package for spatial econometrics.

III. Advanced

	Type	Model	Estimator	Package	Function
1	Heteroscedastic model	FGS2SLS		`sphet`	`gstslshet()`
		Spatial HAC		`sphet`	`stslshac()`
2	Descrete model	a-Spatial		`McSpatial`	`glm()`
		Spatial Probit	ML	`McSpatial`	`spprobitml()`
			GMM	`McSpatial`	`gmmprobit()`
			LGMM	`McSpatial`	`spprobit()`
3	Non-stationary	GWR		`spgwr`	`gwr()`

1) Heteroscedastic model `sphet`

Parametric
1. FGS2SLS: Feasible 2SLS
  Modifying SARAR, the procedure is proposed by Kelejian and Prucha (2010).
  Hetero_SARAR <- gstslshet(Y~X, listw = W)
Non-parametric
1. Spatial HAC: heteroscadasticity and autocorrelation
  the procedure is proposed by Kelejian and Prucha (2007).
  Spatial_HAC <- stslshac(Y~X, listw = W, distance = D, type = "Epanechnikov")

2) Discrete model `McSpatial`

a-spatial logia and probit
1. Probit
  Probit <- glm(Y~X, family = binomial(link="probit"))
2. Logit
  Logit <- glm(Y~X, family = binomial(link="logit"))
Spatial Probit
1. ML (probit)
  ML cannot be found analytically and has computational demanding.
  Probit_ML <- spprobitml(Y~X, wmat = Wdash, stdprobit = FALSE)
2. GMM (probit)
  GMM is proposed by Pinkse and Slade (1998).
  GMM has computational demanding in a large sample.
  rho = rho0 ($|\rho_0|<1$)
  Probit_GMM <- gmmprobit(Y~X, wmat = Wdash, startrho = rho)
3. LGMM (logit)
  LGMM is proposed by Klier and McMillen (2008).
  LGMM has less computational demanding but less accuracy.
  When $\lambda<0.5$, there is no bias; otherwise, there is a upward biase.
  Lobit_LGMM <- spprobit(Y~X, wmat = Wdash)
- To create Wdash, Spatial Probit model requires sepdep::nb2mat insted of spdep::nb2listw().
Logit vs Probit
- Probit is more popular than Logit.
- Anselin (2002) shows the error term of the spatial Logit is analytically intractable.
- but Smirnov (2010) shows Probit is not easy to be extended to more than two alternatives.

3) Non-stationary `spgwr`

Scan statistics (parametric)
LWR: locally weighted regression (non-parametric)
- LWR can be seen as a scan statistic technique to perform a regression around a point of interest using only a limited number of training data.
- LWR is produced by Clevel and Devlin (1988).
- LWR applies kernel; thus, produces a smooth variation though the regression estimates is calculated separately.
GWR: geographically weighted regression (non-parametric)
- GWR is a particular case of LWR.
- GWR applies geographical space as a selection criterion.
- Articles based on GWR are publised by Brunsdon et al. (1996), Fotheringham et al. (1998), Fotheringham et al. (2002), Fotheringham et al. (2007), McMillen and McDonald (1997), and McMillen and McDonald (2004).
  - Calibrate bandwidth
    bw <- gwr.sel(y ~ X + Z, coords, gweight = gwr.Gauss, adapt = TRUE)
    adapt can be specified as TRUE or FALSE according to cernel bandwidth or global bandwidth
  - Regression
    GWR <- gwr(Y~X, coords, adapt = bw, hatmatrix = TRUE)
  - Mapping
    plot(GWR$SDF, col = cols[findInterval(GWR$SDF$X, brks, all.inside=TRUE)])
  - Moran’s I test
    Null of no spatial autocorrelation in regression residual
    gwr.morantest(GWR, listw = W)

IV. Big data

under construction

Model test

The Model: REG <- lm(Y~X)

1) Homoscedasticity

BP test
$H_0$: No homoscedasticity
lmtest::bptest(REG)

2) Normality of the regression residual

JB test
$H_0$: Normality distribution of regression residual
tseries::jarque.bera.test(REG$residuals)

3) Spatial autoCorrelation for regression residuals

Moran I test, proposed by by Moran (1950)
$H_0$: No spatial autocorrelation in regression residual
[i] sepdep::lm.morantest(REG, listw = W)
[ii] sepdep::moran.test(REG$residuals, listw = W)
Both have same Moran’s I statistic value, but different p-value.
LM and RLM test,
$H_0$: $\lambda=0$ for $LM_{SLM}$, $\rho=0$ for $LM_{SEM}$
spdep::lm.LMtests(OLS, listw = W, test = "all")

4) $\lambda=\rho=0$

Wald test
$H_0$: $\lambda=\rho=0$

5) Randon vs Fixed for spatial panel

Hausan test is proposed by Hausman (1978) and extended for spatial panels by Lee and Yu (2012).
$H_0$: Random model is true
splm::sphtest(Y~X, data = data.frame, listw = W, spatial.model = c("error", "lag" "sarar"), method = c("ML", GM")
When "ML" is chosed for method, errors = c("BSL", "KPP") can be specified.

Spatial Econometrics

Takuya Shimamura

2020-07-06

I. Basic Models `spdep`

Concept

1) Pure spatial autoregressive

2) Lagged independent variable model

3) Spatial Error Model (SEM)

4) Spatial Lag Model (SLM)

5) Complete model (SARAR)

Model test

Interpretation of parameters in spatial econometrics

II. Panel Data `splm`

Concept

1) Random effects

2) Fixed effects

Model test and estimator

III. Advanced

1) Heteroscedastic model `sphet`

2) Discrete model `McSpatial`

3) Non-stationary `spgwr`

IV. Big data

Model test

1) Homoscedasticity

2) Normality of the regression residual

3) Spatial autoCorrelation for regression residuals

4) \(\lambda=\rho=0\)

5) Randon vs Fixed for spatial panel

	Model	Condition	Estimator	Function
1	Pure spatial autoregressive	\(\beta=0; \quad \lambda, \rho=0\)	ML	`spautolm()`
2	Lagged independent variable model	\(\lambda=\rho=0\)
3	Spatial Error Model (SEM)	\(\lambda = 0; \quad \rho \neq 0\)	ML	`errorsarlm()`
			FGLS	`GMerrorsar()`
4	Spatial Lag Model (SLM)	\(\lambda \neq 0; \quad \rho = 0\)	ML	`lagsarlm()`
			2SLS	`stsls()`
5	Complete model (SARAR)	\(\lambda \neq 0; \quad \rho \neq 0\)	ML	`sacsarlm()`
			GS2SLS	`gstsls()`
			BFGS2SLS

Spatial Econometrics

Takuya Shimamura

2020-07-06

I. Basic Models spdep

Concept

1) Pure spatial autoregressive

2) Lagged independent variable model

3) Spatial Error Model (SEM)

4) Spatial Lag Model (SLM)

5) Complete model (SARAR)

Model test

Interpretation of parameters in spatial econometrics

II. Panel Data splm

Concept

1) Random effects

2) Fixed effects

Model test and estimator

III. Advanced

1) Heteroscedastic model sphet

2) Discrete model McSpatial

3) Non-stationary spgwr

IV. Big data

Model test

1) Homoscedasticity

2) Normality of the regression residual

3) Spatial autoCorrelation for regression residuals

4) \(\lambda=\rho=0\)

5) Randon vs Fixed for spatial panel

I. Basic Models `spdep`

II. Panel Data `splm`

1) Heteroscedastic model `sphet`

2) Discrete model `McSpatial`

3) Non-stationary `spgwr`