I. Basic Models spdep
NOTE: the definitions of \(\lambda\) and \(\rho\) in this sheet and R are opposite
Concept
\[
y = \lambda W y + X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \\
u = \rho W u + \epsilon \qquad\qquad\qquad\qquad\qquad |\rho|<1
\]
1 |
Pure spatial autoregressive |
\(\beta=0; \quad \lambda, \rho=0\) |
ML |
spautolm() |
2 |
Lagged independent variable model |
\(\lambda=\rho=0\) |
|
|
3 |
Spatial Error Model (SEM) |
\(\lambda = 0; \quad \rho \neq 0\) |
ML |
errorsarlm() |
|
|
|
FGLS |
GMerrorsar() |
4 |
Spatial Lag Model (SLM) |
\(\lambda \neq 0; \quad \rho = 0\) |
ML |
lagsarlm() |
|
|
|
2SLS |
stsls() |
5 |
Complete model (SARAR) |
\(\lambda \neq 0; \quad \rho \neq 0\) |
ML |
sacsarlm() |
|
|
|
GS2SLS |
gstsls() |
|
|
|
BFGS2SLS |
|
1) Pure spatial autoregressive
- Condition: \(\quad \beta=0 \quad\) & \(\quad \lambda, \rho=0\)
- ML: Most Likelihood
The procedure is deloped by Whittle (1954).
PSA_ML <- spautolm(Y~X, data = df, listw = W)
\[
(i)\qquad\qquad y = \lambda W y \qquad\qquad |\lambda|<1 \\
(ii)\qquad\quad y = \rho W u + \epsilon \qquad\quad |\rho|<1
\]
2) Lagged independent variable model
- Condition: \(\quad \lambda=\rho=0\) \[
y = X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1
\]
3) Spatial Error Model (SEM)
- Condition: \(\quad \lambda = 0 \quad\) & \(\quad \rho \neq 0\) \[
y = X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \\
u = \rho W u + \epsilon \quad\qquad\qquad\qquad |\rho|<1
\]
- SEM is referred by Anselin (1988) and Arbis (2006).
- ML: Most Likelihood
Lee (2004) proves constant and asymptotically normal in a model.
SEM_ML <- errorsarlm(Y~X, data = df, listw = W)
- FGLS: Feasible GLS
The procedure is deloped by Kelejian and Prucha (1998).
SEM_FGLS <- GMerrorsar(Y~X, data = df, listw = W)
4) Spatial Lag Model (SLM)
- Condition: \(\quad \lambda \neq 0 \quad\) & \(\quad \rho = 0\) \[
y = \lambda W y + X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1
\]
- SLM is referred by Anselin (1988) and Arbis (2006).
- ML: Most Likelihood
Assumption: normality of the residuals (BP test)
SLM_ML <- lagsarlm(Y~X, data = df, listw = W)
- 2SLS: Two-Stage Least Squares
No assumption of normality of the residuals (BP test)
SLM_2SLS <- stsls(Y~X, data = df, listw = W)
5) Complete model (SARAR)
- Condition: \(\quad \lambda \neq 0 \quad\) & \(\quad \rho \neq 0\) \[
y = \lambda W y + X \beta_{(1)} + WX\beta_{(2)} + u \qquad |\lambda|<1 \\
u = \rho W u + \epsilon \qquad\qquad\qquad\qquad\qquad |\rho|<1
\]
- SARAR is referred as SARAR(1,1) by Kelejian and Prucha (1998) and as General Spatial Model by Anselin (1988).
- ML: Most Likelihood
Limitation is there is currently no formal proof that the estimators possess the usual optimal large sample properties.
Assumption: normality of the residuals (BP test)
SARAR_ML <- sacsarlm(Y~X, data = df, listw = W)
- GS2SLS: Generalized Spatial Two-Stage Least Squares
An extension of 2SLS proposed by Kelejian and Prucha (1998).
No assumption of normality of the residuals (BP test)
Limitation is not asymptotically fully efficient.
SARAR_GS2SLS <- gstsls(Y~X, data = df, listw = W)
- BFGS2SLS: Best Feasible GS2SLS
An extension of GS2SLS proposed by Lee (2003), also known as LIV (Lee’s Instrumental Variable)
No assumption of normality of the residuals (BP test)
Limitation is numerically challenging in very large samples.
- Kelejian et al. (2004) shows BFGS2SLS and its simplified version do not differ substantially in terms of efficiency from GS2SLS in small sample.
Model test
- Moran I test: SEM vs SLM
- \(H_0\): No spatial autocorrelation in regression residual
- but there is no alternative hypothesis considering to contrast the null of uncorrelation
- Moran I test is proposed by Moran (1950).
spdep::lm.morantest(OLS, listw = W)
spdep::moran.test(REG$residuals, listw = W)
Both have same Moran’s I statistic value, but different p-value.
- LM and RLM test
- \(H_0\): \(\lambda=0\) for \(LM_{SLM}\), \(\rho=0\) for \(LM_{SEM}\)
- LM stands for Lagrange Multiplier and its robust version is proposed by Anselin et al. (1996).
spdep::lm.LMtests(OLS, listw = W, test = "all")
- Wald test
- \(H_0\): \(\lambda=\rho=0\)
Interpretation of parameters in spatial econometrics
- LeSage and Pace (2009) introduce the impact measures
- ADI: average direct impact
- AII: average indirect impact
- ATI: average total impact
- ATIT: average total impact to an observation
- ATIF: average total impact from an observation
spdep::impacts(SLM/SARAR, listw = W)
- Further material
- Arbia (2011) provides revivews for spatial panel analysis.
II. Panel Data splm
Concept
- Basic for panel data \[
y_{it} = \alpha + X_{it} \beta + u_{it} = \alpha + X_{it} \beta + (\mu_i + \epsilon_{it})
\] where \(\mu_i\) is individual error component and \(\epsilon_{it}\) is idiosysncratic error component.
- Pooling model
Condition: \(\mu_i\) (individual error component) is same among all indivisuals i.
OLS_pooled <- plm(Y~X, model = "pooling")
- Fixed model
Condition: \(\mu_i\) (individual error component) is correlated with the regressors.
OLS_within <- plm(Y~X, model = "within")
- Random model
Condition: \(\mu_i\) (individual error component) is uncorrelated with the regressors.
FGLS_random <- plm(Y~X, model = "random")
1 |
SEM-RE |
ML |
splm() |
"random" , fixed |
"b" |
FALSE |
|
KKP (SEM) |
ML |
splm() |
"random" , fixed |
"kkp" |
FALSE |
|
|
GM |
spgm() |
"random" , fixed |
TRUE |
FALSE |
2 |
SLM-RE |
ML |
splm() |
"random" , fixed |
"none" |
TRUE |
|
|
GM |
spgm() |
"random" , fixed |
FALSE |
TRUE |
3 |
SARAR |
ML |
splm() |
"random" , fixed |
"b" , "kkp" |
TRUE |
|
|
GM |
spgm() |
"random" , fixed |
TRUE |
TRUE |
Note. 1) splm()
and spgm()
are used for ML and GM estimator, respectively. 2) listw = W
and data = data.frame
should be specified in each model. 3) model = c("random", "within")
should be chosen according to Hausman test.
1) Random effects
- SEM: Spatial Error Model
SEM-RE: spatial error model with random effects \[
\epsilon_t = \rho W \epsilon_t + \eta_t
\] SEM-RE considers only \(\epsilon_{it}\) (idiosysncratic error component).
SEM_RE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = "b", lag = FALSE)
KPP \[
u = \mu + \epsilon = \rho (I_T*W)u + \eta
\] KKP considers both \(\mu_i\) (individual error component) and \(\epsilon_{it}\) (idiosysncratic error component).
KKP is proposed by Kapoor et al. (2007).
KKP_ML <- spml(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = "kkp", lag = FALSE)
KKP_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = TRUE, lag = FALSE)
- SLM: Spatial Lag Model
- SLM-RE: spatial lag model with random effects
SLM_RE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = "none", lag = TRUE)
SLM_RE_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "random", spatial.error = FALSE, lag = TRUE)
2) Fixed effects
- SEM:
- SEM-FE
SEM_FE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = "b", lag = FALSE)
- KPP-FE
KKP_FE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = "kkp", lag = FALSE)
KPP_FE_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = TRUE, lag = FALSE)
- SLM:
- SLM-FE
SLM_FE_ML <- spml(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = "none", lag = TRUE)
SLM_FE_GM <- spgm(Y~X, listw = W, data = dada.frame, model = "within", spatial.error = FALSE, lag = TRUE)
Model test and estimator
- Hausan test: Randon vs Fixed
- Hausan test is proposed by Hausman (1978) and extended for spatial panels by Lee and Yu (2012).
- \(H_0\): Random model is true
splm::sphtest(Y~X, data = data.frame, listw = W, spatial.model = c("error", "lag" "sarar"), method = c("ML", GM")
When "ML"
is chosed for method, errors = c("BSL", "KPP")
can be specified.
- Estimator for spatial panel data: ML vs GM
- ML: Maximum Likelihood
ML for SEM and SLM for spatial panels is proposed by Elhorst (2003).
\(+\) most efficient when all distributional assumptions are met
\(-\) large computational demand
- GM: Generalized Moments
\(+\) less computational demand
\(+\) relaxed normality assumption compared with GM -> more robust
- Further material
III. Advanced
1 |
Heteroscedastic model |
FGS2SLS |
|
sphet |
gstslshet() |
|
|
Spatial HAC |
|
sphet |
stslshac() |
2 |
Descrete model |
a-Spatial |
|
McSpatial |
glm() |
|
|
Spatial Probit |
ML |
McSpatial |
spprobitml() |
|
|
|
GMM |
McSpatial |
gmmprobit() |
|
|
|
LGMM |
McSpatial |
spprobit() |
3 |
Non-stationary |
GWR |
|
spgwr |
gwr() |
1) Heteroscedastic model sphet
- Parametric
- FGS2SLS: Feasible 2SLS
Modifying SARAR, the procedure is proposed by Kelejian and Prucha (2010).
Hetero_SARAR <- gstslshet(Y~X, listw = W)
- Non-parametric
- Spatial HAC: heteroscadasticity and autocorrelation
the procedure is proposed by Kelejian and Prucha (2007).
Spatial_HAC <- stslshac(Y~X, listw = W, distance = D, type = "Epanechnikov")
2) Discrete model McSpatial
- a-spatial logia and probit
- Probit
Probit <- glm(Y~X, family = binomial(link="probit"))
- Logit
Logit <- glm(Y~X, family = binomial(link="logit"))
- Spatial Probit
- ML (probit)
ML cannot be found analytically and has computational demanding.
Probit_ML <- spprobitml(Y~X, wmat = Wdash, stdprobit = FALSE)
- GMM (probit)
GMM is proposed by Pinkse and Slade (1998).
GMM has computational demanding in a large sample.
rho = rho0
(\(|\rho_0|<1\))
Probit_GMM <- gmmprobit(Y~X, wmat = Wdash, startrho = rho)
- LGMM (logit)
LGMM is proposed by Klier and McMillen (2008).
LGMM has less computational demanding but less accuracy.
When \(\lambda<0.5\), there is no bias; otherwise, there is a upward biase.
Lobit_LGMM <- spprobit(Y~X, wmat = Wdash)
- To create Wdash, Spatial Probit model requires
sepdep::nb2mat
insted of spdep::nb2listw()
.
- Logit vs Probit
- Probit is more popular than Logit.
- Anselin (2002) shows the error term of the spatial Logit is analytically intractable.
- but Smirnov (2010) shows Probit is not easy to be extended to more than two alternatives.
3) Non-stationary spgwr
- Scan statistics (parametric)
- LWR: locally weighted regression (non-parametric)
- LWR can be seen as a scan statistic technique to perform a regression around a point of interest using only a limited number of training data.
- LWR is produced by Clevel and Devlin (1988).
- LWR applies kernel; thus, produces a smooth variation though the regression estimates is calculated separately.
- GWR: geographically weighted regression (non-parametric)
- GWR is a particular case of LWR.
- GWR applies geographical space as a selection criterion.
- Articles based on GWR are publised by Brunsdon et al. (1996), Fotheringham et al. (1998), Fotheringham et al. (2002), Fotheringham et al. (2007), McMillen and McDonald (1997), and McMillen and McDonald (2004).
- Calibrate bandwidth
bw <- gwr.sel(y ~ X + Z, coords, gweight = gwr.Gauss, adapt = TRUE)
adapt
can be specified as TRUE
or FALSE
according to cernel bandwidth
or global bandwidth
- Regression
GWR <- gwr(Y~X, coords, adapt = bw, hatmatrix = TRUE)
- Mapping
plot(GWR$SDF, col = cols[findInterval(GWR$SDF$X, brks, all.inside=TRUE)])
- Moran’s I test
Null of no spatial autocorrelation in regression residual
gwr.morantest(GWR, listw = W)
IV. Big data
under construction
Model test
The Model: REG <- lm(Y~X)
1) Homoscedasticity
- BP test
\(H_0\): No homoscedasticity
lmtest::bptest(REG)
2) Normality of the regression residual
- JB test
\(H_0\): Normality distribution of regression residual
tseries::jarque.bera.test(REG$residuals)
3) Spatial autoCorrelation for regression residuals
- Moran I test, proposed by by Moran (1950)
\(H_0\): No spatial autocorrelation in regression residual
[i] sepdep::lm.morantest(REG, listw = W)
[ii] sepdep::moran.test(REG$residuals, listw = W)
Both have same Moran’s I statistic value, but different p-value.
- LM and RLM test,
\(H_0\): \(\lambda=0\) for \(LM_{SLM}\), \(\rho=0\) for \(LM_{SEM}\)
spdep::lm.LMtests(OLS, listw = W, test = "all")
4) \(\lambda=\rho=0\)
- Wald test
\(H_0\): \(\lambda=\rho=0\)
5) Randon vs Fixed for spatial panel
- Hausan test is proposed by Hausman (1978) and extended for spatial panels by Lee and Yu (2012).
\(H_0\): Random model is true
splm::sphtest(Y~X, data = data.frame, listw = W, spatial.model = c("error", "lag" "sarar"), method = c("ML", GM")
When "ML"
is chosed for method, errors = c("BSL", "KPP")
can be specified.