An equivalent local measure can be calculated for most global measures!
Global analysis yields one statistic to summarize the entire study area
Local analysis is interested in more focused tests around a smaller area
An equivalent local measure can be calculated for most global measures!
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 28.30 30.36 30.74 30.66 31.06 31.92
require(spdep)
require(tmap)
tmap_mode('view')
orig_basemaps <- c("Esri.WorldGrayCanvas","OpenStreetMap","Esri.WorldTopoMap")
# Woah, we can change the default basemaps!!
tmap_options(basemaps = c("Stamen.Toner","Esri.WorldTopoMap","Esri.WorldImagery"))
tm_shape(uhi) + tm_polygons(
id = c("Temperature: " = "uhi_c"),
col='uhi_c',
palette='-RdBu',
style='cont',
border.col = 'white',
lwd=0.3,
title = "Evening UHI (Celsius)",
alpha = 0.7) +
tm_tiles("Stamen.TonerLabels")
Evening Urban Heat Island Effect
LISA, or Local Indicators of Spatial Association:
Global measures tell you whether the data are autocorrelated, but they don’t tell you where.
suppressPackageStartupMessages(require(spdep))
suppressPackageStartupMessages(require(sf))
suppressPackageStartupMessages(require(tidycensus))
suppressPackageStartupMessages(require(dplyr))
tract <- tidycensus::get_acs(
geography = 'tract',
variables = c("Households_" = 'B22010_001', "SNAP_" = 'B22010_002'),
state="CA",
county='Los Angeles',
geometry = TRUE,
output = 'wide') %>%
st_transform(4326) %>%
mutate(
pct_snap = SNAP_E/Households_E
# pct_snap = ifelse(is.nan(pct_snap), 0, pct_snap)
# pct_snap = ifelse(is.na(pct_snap), 0, pct_snap)
) %>%
filter(
as.vector(sf::st_area(.)) > 0,
st_coordinates(st_centroid(.))[,2] > 33.667,
!is.na(pct_snap),
!is.nan(pct_snap) # Lets remove the NA values
)
suppressMessages(tmap_mode('plot'))
tm_shape(tract) +
tm_polygons(
col='pct_snap',
palette = 'Blues',
border.col = 'white',
lwd=0.3,
style = 'quantile',
n=5,
contrast=c(0,1),
title = "Share of Households (0 to 1)") +
tm_shape(tract %>% st_union()) + # adding a nice border!
tm_polygons(col = NA, alpha = 0, border.col = 'black') +
tm_layout(main.title = "Households on Food Assistance (SNAP)")
tract <- tract %>% filter(st_coordinates(st_centroid(.))[,2] < 34.3)
tm_shape(tract) +
tm_polygons(
col='pct_snap',
palette = "Blues",
border.col = 'white',
lwd=0.3,
style = 'quantile',
contrast=c(0,1),
n=5,
title = "Share of Households (0 to 1)") +
tm_shape(tract %>% st_union()) + # adding a nice border!
tm_polygons(col = NA, alpha = 0, border.col = 'black') +
tm_layout(main.title = "Households on Food Assistance (SNAP)")
What is a cold spot?
Local Getis-Ord is the ratio of all the values within distance d of point i to the sum of all values in the study area:
\[G_i(d) = \dfrac{\sum\limits_jw_{ij}x_j}{\sum\limits_jx_j}\]
where:
\(x_j\) are the weighted values of all of the points in the study area
\(w_{ij}\) is the weights matrix
If local G returns z-scores, how do we assess significance?
First, we need to create our neighborhood and weights matrix:
tract_local_g <- spdep::localG(
x = tract$pct_snap,
listw = tract_W,
zero.policy = TRUE) %>%
as.vector()
# I made us another function!
make_g_bin <- function(v){
case_when(
v > -1.96 & v < 1.96 ~ "Insignificant",
v >= 1.96 & v < 2.58 ~ "Hotspot (>1.96)",
v >= 2.58 ~ "Hotspot (>2.58)",
v < -1.96 & v > -2.58 ~ 'Coldspot (<-1.96)',
v < -2.58 ~ 'Coldspot (<-2.58)',
TRUE ~ 'Unknown'
) %>% as.factor() %>%
ordered(c(
"Unknown",
"Coldspot (<-2.58)",
"Coldspot (<-1.96)",
"Insignificant",
"Hotspot (>1.96)",
"Hotspot (>2.58)"))
}
g_palette <- c('grey','#0571B0','#92C5DE','cornsilk','#F4A582','#CA0020')
# make_g_bin(v = tract_local_g)
x = tract %>% mutate(G=make_g_bin(tract_local_g)) %>%
tm_shape() +
tm_polygons(
col = "G",
palette = g_palette,
border.col = 'grey80',
alpha = 0.5,
lwd=0.4,
title = "Local Getis-Ord G") +
tm_shape(st_union(tract)) +
tm_polygons(zindex = 200,lwd=2,alpha = 0,border.col = "black")+
tm_tiles(leaflet::providers$Stamen.TerrainLabels) +
tm_layout(main.title = "Households on Food Assistance (SNAP)")
Global Moran’s I
Local Moran’s I
The local Moran statistic for areal unit \(i\) is:
\[I_i = z_i\sum\limits_jw_{ij}z_j\]
Z values are z scores determined from the values of the attribute of interest for the whole data set:
\[z_i = \dfrac{x_i - \bar{x}}{SD_x}\]
# Recreate, because we can't have "include.self()" here!
tract_nb <- poly2nb(tract,queen=T)
tract_W <- nb2listw(tract_nb,
style="W",
zero.policy = FALSE)
tract_local_i <- spdep::localmoran(
x = tract$pct_snap,
listw = tract_W) %>%
as_tibble()
tract_local_i
## # A tibble: 2,301 × 5
## Ii E.Ii Var.Ii Z.Ii `Pr(z != E(Ii))`
## <localmrn> <localmrn> <localmrn> <localmrn> <localmrn>
## 1 0.85494320 -4.911889e-04 0.37622927 1.3946345 0.16312608
## 2 0.80078768 -4.968960e-04 0.16282958 1.9857297 0.04706335
## 3 -0.02323440 -7.466797e-05 0.02447854 -0.1480269 0.88232153
## 4 0.64089642 -4.523111e-04 1.04029713 0.6288044 0.52947715
## 5 -0.20761487 -1.243103e-04 0.03166747 -1.1659816 0.24362190
## 6 0.16226049 -2.203842e-04 0.07223845 0.6045303 0.54549114
## 7 0.34636226 -1.945058e-04 0.07441626 1.2704000 0.20394219
## 8 -0.04907508 -1.140228e-04 0.03269224 -0.2707874 0.78655454
## 9 0.61518484 -4.779916e-04 0.12172306 1.7646416 0.07762402
## 10 0.49516832 -1.948956e-04 0.08951745 1.6556553 0.09779165
## # … with 2,291 more rows
Every time we perform a hypothesis test, there is a 5% chance that we are incorrectly rejecting \(H_0\) (when \(\alpha\) = 0.05).
So… what happens if we run 20+ hypothesis tests?
To account for the Multiple Compariosons Problem, we’ll adjust our
p-values to be more conservative based on how many there are using a
process known as FDR. Adjusting p-values is simple using R’s built in
p.adjust()
- look for it in the next slide!
# Transferring values from test into our sf object
tract$lisa_I <- tract_local_i$Ii
tract$lisa_EI <- tract_local_i$E.Ii
tract$lisa_Z <- tract_local_i$Z.Ii
tract$lisa_p <- p.adjust(tract_local_i$`Pr(z != E(Ii))`,
method = "fdr")
# A functino to bin I values
make_i_bin <- function(v){
case_when(
v <= 0.05 ~ "Significant",
v > 0.05 ~ "Insignificant",
is.na(v) ~ "Unknown",
TRUE ~ 'Unknown'
) %>% as.factor() %>%
ordered(c(
"Unknown",
"Insignificant",
"Significant"))
}
# DO NOT USE THIS IN YOUR LAB! THIS IS JUST AN EXAMPLE!
# You'll need to recreate the one in a few steps!
suppressMessages(tmap_mode('plot'))
tract %>% mutate(I=make_i_bin(tract$lisa_p)) %>%
tm_shape() +
tm_polygons(
col = "I",
palette = c('grey','cornsilk','lightgreen'),
border.col = 'grey60',
lwd=0.4,
title = "Local Moran's I") +
tm_layout(main.title = "Households on Food Assistance (SNAP)")
Cluster:
tract$lag_snap <- spdep::lag.listw(x = tract_W, var = tract$pct_snap)
# Does this look familiar?
moran_plot = function(var, w_var, xlab, ylab, main_title = NULL, normalize = T){
if(missing(main_title)){
main_title = "Moran's I Plot"
}
if(normalize){
var = scale(var) %>% as.vector
w_var = scale(w_var) %>% as.vector()
}
plot_dat = data.frame(var,w_var,stringsAsFactors = F) %>%
mutate(quad = case_when(
var > 0 & w_var > 0 ~ "HH",
var > 0 & w_var < 0 ~ "HL",
var < 0 & w_var < 0 ~ "LL",
var < 0 & w_var > 0 ~ "LH",
TRUE ~ 'Unknown'))
plot_dat %>% ggplot() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = 0) +
geom_point(aes(x=var,y=w_var,color=quad)) +
xlab(xlab) +
ylab(ylab) +
geom_smooth(
mapping = aes(x=var,y=w_var),
formula = 'y ~ x',
method = 'lm',
color="black",
size=0.5,
se=F) +
scale_color_manual(
name = "",
values = c('Unknown' = 'grey','HH'="firebrick",'HL'="brown1",'LL'="navy",'LH'="cornflowerblue")) +
ggtitle(main_title)
}
moran_plot(
var = tract$pct_snap,
w_var = tract$lag_snap,
xlab = "Households on SNAP (%)",
ylab = "Lagged Households on SNAP (%)")
# Once again, I'm being very kind and giving you an awesome function
# BUT! Do not use this version... use the one later with a 'p' argument
make_i_quads <- function(v, w_v, normalize = T){
if(normalize){
v = scale(v) %>% as.vector
w_v = scale(w_v) %>% as.vector()
}
case_when(
v < 0 & w_v < 0 ~ "Low-Low",
v < 0 & w_v > 0 ~ "Low-High",
v > 0 & w_v < 0 ~ "High-Low",
v > 0 & w_v > 0 ~ "High-High",
TRUE ~ 'Unknown'
) %>% as.factor() %>%
ordered(c(
"Unknown",
"Low-Low",
"Low-High",
"High-Low",
"High-High"))
}
i_palette <- c('grey','#0571B0','#92C5DE','#F4A582','#CA0020')
We aren’t done yet… we need to remove insignificant observations!
# I'll modify our function....
# USE THIS ONE!!!
make_i_quads <- function(v, w_v, p, normalize = T){
if(normalize){
v = scale(v) %>% as.vector
w_v = scale(w_v) %>% as.vector()
}
case_when(
p > 0.05 ~ "Insignificant", ### Make sure that 'p' has been adjusted!
v < 0 & w_v < 0 ~ "Low-Low",
v < 0 & w_v > 0 ~ "Low-High",
v > 0 & w_v < 0 ~ "High-Low",
v > 0 & w_v > 0 ~ "High-High",
TRUE ~ 'Unknown'
) %>% as.factor() %>%
ordered(c(
"Insignificant",
"Low-Low",
"Low-High",
"High-Low",
"High-High"))
}
i_palette <- c('cornsilk','#0571B0','#92C5DE','#F4A582','#CA0020')
Or, in a web map:
suppressMessages(tmap_mode("view"))
tract %>%
mutate(quad = make_i_quads(tract$pct_snap, tract$lag_snap,p = lisa_p)) %>%
tm_shape() +
tm_polygons(
id = "quad",
col = "quad",
palette = i_palette,
border.col = 'grey80',
lwd=0.4,
alpha=0.5,
title = "Local Moran's I") +
tm_tiles("Stamen.TerrainLabels") +
tm_shape(tract %>% st_union()) + # adding a nice border!
tm_polygons(zindex = 200, alpha = 0, border.col = 'black')
Moran’s I is the correlation between \(x\) and lag(\(x\)) - the same variable, but in nearby areas… Univariate Moran’s I
Bivariate Moran’s I is a correlation between X and lag(y) - a different variable in nearby areas
Let’s look at median household income and educational attainment (% of population 25+ years old with bachelor’s degree or higher) in Alamdeda County (Oakland, etc), NM:
moran_plot(
var = oak$mhi,
w_var = lag_edu,
ylab = "Lagged Adults 25+ with at Least a Bachelors Degree (%)",
xlab = "Median Household Income ($)",
main_title = "Bivariate Moran's I Plot")
We are observing positive spatial autocorrelation between “education” and lagged income!
Don’t worry about what’s going on here, it’s just an example:
good <- complete.cases(cbind(oak$mhi,lag_edu))
oak_nb2 <- poly2nb(pl = oak[good,], queen = TRUE)
mod <- lm(oak$mhi ~ 1, weights = lag_edu)
bivar <- as.data.frame(localmoran.sad(mod,nb = oak_nb2,zero.policy = TRUE))
oak$bivar_p <- NA
oak$bivar_p[good] <- bivar$`Pr. (Sad)`
oak$bivar_p[is.na(oak$bivar_p) | is.nan(oak$bivar_p)] <- 1
oak$bivar_p <- p.adjust(oak$bivar_p, method = 'fdr') # Adjusting p values!
Wow, that is a lot…
We are trying to explain one value with another.
Like regression?