Local Spatial Autocorrelation


GEOG 4/597: Advanced Spatial Quantitative Analysis, Winter 2023

Jackson Voelkel   |   Portland State University

Global and Local

Global Measures


  • A single value which applies to the entire data set
  • Assumes the same pattern or process occurs over the entire geographic area
  • An average for the entire area

Local Measures


  • A value calculated for each observation unit
  • Different patterns or processes may occur in different parts of the region
  • A unique number for each location


An equivalent local measure can be calculated for most global measures!

Global vs. Local


Global analysis yields one statistic to summarize the entire study area


  • Includes nearest neighbor analysis, quadrat analysis, OLS regression, Moran’s I
  • E.g., violent crime rate in Multnomah County = aspatial

Global vs. Local


Local analysis is interested in more focused tests around a smaller area


An equivalent local measure can be calculated for most global measures!

  • Includes geographically weighted regression, LISA (Local Indicators of Spatial Association)
  • E.g., violent crime rate in north Portland, downtown, and east of 82nd Ave = spatial

Global vs. Local

Global Example

summary(uhi$uhi_c)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   28.30   30.36   30.74   30.66   31.06   31.92
hist(uhi$uhi_c,breaks=20, main="Histogram of Evening UHI by Block Group", xlab = "Evening UHI")

uhi %>% ggplot(aes(x=uhi_c)) +
  geom_histogram(color ='white', lwd=0.2) +
  xlab("Average Temperature (Evening)") +
  ylab("Frequency") +
  theme_minimal()

Local Example

require(spdep)
require(tmap)
tmap_mode('view')
orig_basemaps <- c("Esri.WorldGrayCanvas","OpenStreetMap","Esri.WorldTopoMap")

# Woah, we can change the default basemaps!!
tmap_options(basemaps = c("Stamen.Toner","Esri.WorldTopoMap","Esri.WorldImagery"))
tm_shape(uhi) + tm_polygons(
  id = c("Temperature: " = "uhi_c"),
  col='uhi_c',
  palette='-RdBu',
  style='cont',
  border.col = 'white',
  lwd=0.3,
  title = "Evening UHI (Celsius)",
  alpha = 0.7) +
  tm_tiles("Stamen.TonerLabels")

Evening Urban Heat Island Effect

Local Indicators of Spatial Association


LISA, or Local Indicators of Spatial Association:


  • Local Moran’s I - Most commonly used. Often called Anselin’s LISA, or just LISA
  • Local Geary’s C
  • Getis-Ord G

Local Indicators


Global measures tell you whether the data are autocorrelated, but they don’t tell you where.


  • Not effective in identifying different types of local clustering patterns (hot spots or cold spots)
  • Local indicators allow for observation-level analysis and mapping

Local Getis-Ord G Statistic

Local Getis-Ord G Statistic

suppressPackageStartupMessages(require(spdep))
suppressPackageStartupMessages(require(sf))
suppressPackageStartupMessages(require(tidycensus))
suppressPackageStartupMessages(require(dplyr))
tract <- tidycensus::get_acs(
  geography = 'tract',
  variables = c("Households_" = 'B22010_001', "SNAP_" = 'B22010_002'),
  state="CA",
  county='Los Angeles',
  geometry = TRUE,
  output = 'wide') %>% 
  st_transform(4326) %>% 
  mutate(
    pct_snap = SNAP_E/Households_E
    # pct_snap = ifelse(is.nan(pct_snap), 0, pct_snap)
    # pct_snap = ifelse(is.na(pct_snap), 0, pct_snap)
  ) %>% 
  filter(
    as.vector(sf::st_area(.)) > 0,
    st_coordinates(st_centroid(.))[,2] > 33.667,
    !is.na(pct_snap),
    !is.nan(pct_snap) # Lets remove the NA values
  )

suppressMessages(tmap_mode('plot'))
tm_shape(tract) + 
  tm_polygons(
    col='pct_snap',
    palette = 'Blues',
    border.col = 'white',
    lwd=0.3,
    style = 'quantile',
    n=5,
    contrast=c(0,1),
    title = "Share of Households (0 to 1)") +
  tm_shape(tract %>% st_union()) +  # adding a nice border!
  tm_polygons(col = NA, alpha = 0, border.col = 'black') +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

tract <- tract %>% filter(st_coordinates(st_centroid(.))[,2] < 34.3)

tm_shape(tract) + 
  tm_polygons(
    col='pct_snap',
    palette = "Blues",
    border.col = 'white',
    lwd=0.3,
    style = 'quantile',
    contrast=c(0,1),
    n=5,
    title = "Share of Households (0 to 1)") +
  tm_shape(tract %>% st_union()) +  # adding a nice border!
  tm_polygons(col = NA, alpha = 0, border.col = 'black') +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

Hot Spots and Cold Spots


What is a hot spot?


  • An area where high values cluster together

What is a cold spot?

  • An area where low values cluster together

Moran's I and Geary's C cannot distinguish them
  • They only indicate clustering
  • Cannot tell if they are hot spots, cold spots, or both

Local Getis-Ord Statistic


Local Getis-Ord is the ratio of all the values within distance d of point i to the sum of all values in the study area:


\[G_i(d) = \dfrac{\sum\limits_jw_{ij}x_j}{\sum\limits_jx_j}\]

where:

\(x_j\) are the weighted values of all of the points in the study area

\(w_{ij}\) is the weights matrix

Assessing Significance


If local G returns z-scores, how do we assess significance?

Assessing Significance

Example: Local Gi*

First, we need to create our neighborhood and weights matrix:

tract_nb <- poly2nb(tract,queen=T) %>% 
  include.self() # This turns it into Gi*! Otherwise, it's just G
tract_W <- nb2listw(tract_nb, 
                    style="W",
                    zero.policy = TRUE)

Example: Local Gi*

tract_local_g <- spdep::localG(
  x = tract$pct_snap,
  listw = tract_W, 
  zero.policy = TRUE) %>% 
  as.vector()

# I made us another function!
make_g_bin <- function(v){
  case_when(
    v > -1.96 & v < 1.96 ~ "Insignificant",
    v >= 1.96 & v < 2.58 ~ "Hotspot (>1.96)",
    v >= 2.58 ~ "Hotspot (>2.58)",
    v < -1.96 & v > -2.58 ~ 'Coldspot (<-1.96)',
    v < -2.58 ~ 'Coldspot (<-2.58)',
    TRUE ~ 'Unknown'
  ) %>% as.factor() %>% 
    ordered(c(
      "Unknown",
      "Coldspot (<-2.58)",
      "Coldspot (<-1.96)",
      "Insignificant",
      "Hotspot (>1.96)",
      "Hotspot (>2.58)"))
}

g_palette <- c('grey','#0571B0','#92C5DE','cornsilk','#F4A582','#CA0020')
# make_g_bin(v = tract_local_g) 

# Apply our colors on the fly to the 'col' column
tract %>% mutate(G=make_g_bin(tract_local_g)) %>% 
  tm_shape() +
  tm_polygons(
    col = "G",
    palette = g_palette,
    border.col = 'grey80',
    lwd=0.4,
    title = "Local Getis-Ord G") +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

x = tract %>% mutate(G=make_g_bin(tract_local_g)) %>% 
  tm_shape() +
  tm_polygons(
    col = "G",
    palette = g_palette,
    border.col = 'grey80',
    alpha = 0.5,
    lwd=0.4,
    title = "Local Getis-Ord G") +
  tm_shape(st_union(tract)) + 
  tm_polygons(zindex = 200,lwd=2,alpha = 0,border.col = "black")+
  tm_tiles(leaflet::providers$Stamen.TerrainLabels) +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

Local Moran’s I

Global vs Local Moran’s I


Global Moran’s I

  • What is the extent of clustering in the total study area?
  • Is this clustering significantly different from a random spatial distribution?


Local Moran’s I

  • Do local clusters (high-high or low-low) or local spatial outliers (high-low or low-high) exist?
  • Are these local clusters and spatial outliers statistically significant?

Calculating Local Moran’s I


The local Moran statistic for areal unit \(i\) is:


\[I_i = z_i\sum\limits_jw_{ij}z_j\]


Z values are z scores determined from the values of the attribute of interest for the whole data set:


\[z_i = \dfrac{x_i - \bar{x}}{SD_x}\]

What does it look like?


# Recreate, because we can't have "include.self()" here!
tract_nb <- poly2nb(tract,queen=T)
tract_W <- nb2listw(tract_nb, 
                    style="W",
                    zero.policy = FALSE)

tract_local_i <- spdep::localmoran(
  x = tract$pct_snap, 
  listw = tract_W) %>% 
  as_tibble()

tract_local_i
## # A tibble: 2,301 × 5
##    Ii          E.Ii          Var.Ii     Z.Ii       `Pr(z != E(Ii))`
##    <localmrn>  <localmrn>    <localmrn> <localmrn> <localmrn>      
##  1  0.85494320 -4.911889e-04 0.37622927  1.3946345 0.16312608      
##  2  0.80078768 -4.968960e-04 0.16282958  1.9857297 0.04706335      
##  3 -0.02323440 -7.466797e-05 0.02447854 -0.1480269 0.88232153      
##  4  0.64089642 -4.523111e-04 1.04029713  0.6288044 0.52947715      
##  5 -0.20761487 -1.243103e-04 0.03166747 -1.1659816 0.24362190      
##  6  0.16226049 -2.203842e-04 0.07223845  0.6045303 0.54549114      
##  7  0.34636226 -1.945058e-04 0.07441626  1.2704000 0.20394219      
##  8 -0.04907508 -1.140228e-04 0.03269224 -0.2707874 0.78655454      
##  9  0.61518484 -4.779916e-04 0.12172306  1.7646416 0.07762402      
## 10  0.49516832 -1.948956e-04 0.08951745  1.6556553 0.09779165      
## # … with 2,291 more rows

Assessing Significance


Multiple Comparisons Problem


Every time we perform a hypothesis test, there is a 5% chance that we are incorrectly rejecting \(H_0\) (when \(\alpha\) = 0.05).


So… what happens if we run 20+ hypothesis tests?

Controling for Mutiple Comparisons


False Discovery Rate (FDR)


To account for the Multiple Compariosons Problem, we’ll adjust our p-values to be more conservative based on how many there are using a process known as FDR. Adjusting p-values is simple using R’s built in p.adjust() - look for it in the next slide!

Prepping the LISA values


# Transferring values from test into our sf object
tract$lisa_I <- tract_local_i$Ii
tract$lisa_EI <- tract_local_i$E.Ii
tract$lisa_Z <- tract_local_i$Z.Ii
tract$lisa_p <- p.adjust(tract_local_i$`Pr(z != E(Ii))`,
                         method = "fdr")

# A functino to bin I values
make_i_bin <- function(v){
  case_when(
    v <= 0.05 ~ "Significant",
    v > 0.05 ~ "Insignificant",
    is.na(v) ~ "Unknown",
    TRUE ~ 'Unknown'
  ) %>% as.factor() %>% 
    ordered(c(
      "Unknown",
      "Insignificant",
      "Significant"))
}

# DO NOT USE THIS IN YOUR LAB! THIS IS JUST AN EXAMPLE!
# You'll need to recreate the one in a few steps!
suppressMessages(tmap_mode('plot'))
tract %>% mutate(I=make_i_bin(tract$lisa_p)) %>% 
  tm_shape() +
  tm_polygons(
    col = "I",
    palette = c('grey','cornsilk','lightgreen'),
    border.col = 'grey60',
    lwd=0.4,
    title = "Local Moran's I") +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

What else should we do with this?


Cluster:


  • High-High
  • High-Low
  • Low-Low
  • Low-High

Creating a lag vector

tract$lag_snap <- spdep::lag.listw(x = tract_W, var = tract$pct_snap)

# Does this look familiar?
moran_plot = function(var, w_var, xlab, ylab, main_title = NULL, normalize = T){
  if(missing(main_title)){
    main_title = "Moran's I Plot"
  }
  if(normalize){
    var = scale(var) %>% as.vector
    w_var = scale(w_var) %>% as.vector()
  }
  plot_dat = data.frame(var,w_var,stringsAsFactors = F) %>% 
    mutate(quad = case_when(
        var > 0 & w_var > 0 ~ "HH",
        var > 0 & w_var < 0 ~ "HL",
        var < 0 & w_var < 0 ~ "LL",
        var < 0 & w_var > 0 ~ "LH",
        TRUE ~ 'Unknown'))
  plot_dat %>% ggplot() +
    geom_vline(xintercept = 0) +
    geom_hline(yintercept = 0) +
    geom_point(aes(x=var,y=w_var,color=quad)) +
    xlab(xlab) +
    ylab(ylab) +
    geom_smooth(
      mapping = aes(x=var,y=w_var),
      formula = 'y ~ x',
      method = 'lm',
      color="black",
      size=0.5,
      se=F) +
    scale_color_manual(
      name = "", 
      values = c('Unknown' = 'grey','HH'="firebrick",'HL'="brown1",'LL'="navy",'LH'="cornflowerblue")) +
    ggtitle(main_title)
}

moran_plot(
  var = tract$pct_snap, 
  w_var = tract$lag_snap, 
  xlab = "Households on SNAP (%)", 
  ylab = "Lagged Households on SNAP (%)")

tract %>% tm_shape() +
  tm_polygons(
    col = "lag_snap",
    palette = '-RdBu',
    border.col = 'grey80',
    lwd=0.4, 
    style = 'cont',
    title = "Lagged Households on SNAP (%)") +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

Classifying quadrants

# Once again, I'm being very kind and giving you an awesome function
# BUT! Do not use this version... use the one later with a 'p' argument
make_i_quads <- function(v, w_v, normalize = T){
  if(normalize){
    v = scale(v) %>% as.vector
    w_v = scale(w_v) %>% as.vector()
  }
  case_when(
    v < 0 & w_v < 0 ~ "Low-Low",
    v < 0 & w_v > 0 ~ "Low-High",
    v > 0 & w_v < 0 ~ "High-Low",
    v > 0 & w_v > 0 ~ "High-High",
    TRUE ~ 'Unknown'
  ) %>% as.factor() %>% 
    ordered(c(
      "Unknown",
      "Low-Low",
      "Low-High",
      "High-Low",
      "High-High"))
}

i_palette <- c('grey','#0571B0','#92C5DE','#F4A582','#CA0020')

tract %>% mutate(quad = make_i_quads(v = pct_snap, w_v = lag_snap)) %>% 
  tm_shape() +
  tm_polygons(
    col = "quad",
    palette = i_palette,
    border.col = 'grey80',
    lwd=0.4,
    title = "Local Moran's I")

We aren’t done yet… we need to remove insignificant observations!

# I'll modify our function....
# USE THIS ONE!!!
make_i_quads <- function(v, w_v, p, normalize = T){
  if(normalize){
    v = scale(v) %>% as.vector
    w_v = scale(w_v) %>% as.vector()
  }
  case_when(
    p > 0.05 ~ "Insignificant", ### Make sure that 'p' has been adjusted!
    v < 0 & w_v < 0 ~ "Low-Low",
    v < 0 & w_v > 0 ~ "Low-High",
    v > 0 & w_v < 0 ~ "High-Low",
    v > 0 & w_v > 0 ~ "High-High",
    TRUE ~ 'Unknown'
  ) %>% as.factor() %>% 
    ordered(c(
      "Insignificant",
      "Low-Low",
      "Low-High",
      "High-Low",
      "High-High"))
}

i_palette <- c('cornsilk','#0571B0','#92C5DE','#F4A582','#CA0020') 

tract %>% mutate(quad = make_i_quads(v = pct_snap, w_v = lag_snap,p = lisa_p)) %>% 
  tm_shape() +
  tm_polygons(
    col = "quad",
    palette = i_palette,
    border.col = 'grey70',
    lwd=0.4,
    title = "Local Moran's I") + 
  tm_layout(main.title = "Local Moran's I: SNAP Benefit Usage in LA County")

Or, in a web map:

suppressMessages(tmap_mode("view"))
tract %>% 
  mutate(quad = make_i_quads(tract$pct_snap, tract$lag_snap,p = lisa_p)) %>% 
  tm_shape() +
  tm_polygons(
    id = "quad",
    col = "quad",
    palette = i_palette,
    border.col = 'grey80',
    lwd=0.4,
    alpha=0.5,
    title = "Local Moran's I") +
  tm_tiles("Stamen.TerrainLabels") +
  tm_shape(tract %>% st_union()) +  # adding a nice border!
  tm_polygons(zindex = 200, alpha = 0, border.col = 'black')

How does this compare to Local G?

Moran Scatterplot


  • Local spatial autocorrelation suggests that spatial autocorrelation can vary over a geographic region
  • Where are the areas with “unusual” levels of spatial autocorrelation?
  • How do we detect outliers?
    • A lack of fit in the scatterplot could indicate important local spatial processes and associations
  • Moran scatterplot is an effective visual diganostic tool
  • Reminder: slope of the regression is Global Moran’s I

Bivariate LISA

Bivariate LISA


Moran’s I is the correlation between \(x\) and lag(\(x\)) - the same variable, but in nearby areas… Univariate Moran’s I

Bivariate Moran’s I is a correlation between X and lag(y) - a different variable in nearby areas

Bivariate LISA


Let’s look at median household income and educational attainment (% of population 25+ years old with bachelor’s degree or higher) in Alamdeda County (Oakland, etc), NM:

Creating a lag for income


oak_nb <- poly2nb(pl = oak, queen=T)
oak <- oak[card(oak_nb) != 0,]
oak_nb <- poly2nb(pl = oak, queen=T)
oak_W <- nb2listw(neighbours = oak_nb, style="W")
lag_edu <- lag.listw(x = oak_W, var = oak$edu, zero.policy = TRUE, NAOK = TRUE)

Bivariate Moran’s I Plot

moran_plot(
  var = oak$mhi,
  w_var = lag_edu,
  ylab = "Lagged Adults 25+ with at Least a Bachelors Degree (%)",
  xlab = "Median Household Income ($)",
  main_title = "Bivariate Moran's I Plot")

We are observing positive spatial autocorrelation between “education” and lagged income!

What’s missing?


  • Is it significant?
  • Hard to tell without simulation
    • Complex to calculate
  • Some software is built to calculated it (GeoDa)

Calculating Bivariate LISA p-values in R


Don’t worry about what’s going on here, it’s just an example:

good <- complete.cases(cbind(oak$mhi,lag_edu))
oak_nb2 <- poly2nb(pl = oak[good,], queen = TRUE)
mod <- lm(oak$mhi ~ 1, weights = lag_edu)
bivar <- as.data.frame(localmoran.sad(mod,nb = oak_nb2,zero.policy = TRUE))
oak$bivar_p <- NA
oak$bivar_p[good] <- bivar$`Pr. (Sad)`
oak$bivar_p[is.na(oak$bivar_p) | is.nan(oak$bivar_p)] <- 1
oak$bivar_p <- p.adjust(oak$bivar_p, method = 'fdr') # Adjusting p values!

Wow, that is a lot…

What is Happening?


We are trying to explain one value with another.


Like regression?

Next Class

Lab 3 - Local Spatial Autocorrelation

Lecture: Spatial Regression