Local Spatial Autocorrelation

GEOG 4/597: Advanced Spatial Quantitative Analysis, Winter 2023

Jackson Voelkel | Portland State University

Global and Local

Global Measures

A single value which applies to the entire data set
Assumes the same pattern or process occurs over the entire geographic area
An average for the entire area

Local Measures

A value calculated for each observation unit
Different patterns or processes may occur in different parts of the region
A unique number for each location

An equivalent local measure can be calculated for most global measures!

Global vs. Local

Global analysis yields one statistic to summarize the entire study area

Includes nearest neighbor analysis, quadrat analysis, OLS regression, Moran’s I
E.g., violent crime rate in Multnomah County = aspatial

Global vs. Local

Local analysis is interested in more focused tests around a smaller area

An equivalent local measure can be calculated for most global measures!

Includes geographically weighted regression, LISA (Local Indicators of Spatial Association)
E.g., violent crime rate in north Portland, downtown, and east of 82nd Ave = spatial

Global vs. Local

Global Example

summary(uhi$uhi_c)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   28.30   30.36   30.74   30.66   31.06   31.92

hist(uhi$uhi_c,breaks=20, main="Histogram of Evening UHI by Block Group", xlab = "Evening UHI")

uhi %>% ggplot(aes(x=uhi_c)) +
  geom_histogram(color ='white', lwd=0.2) +
  xlab("Average Temperature (Evening)") +
  ylab("Frequency") +
  theme_minimal()

Local Example

require(spdep)
require(tmap)
tmap_mode('view')
orig_basemaps <- c("Esri.WorldGrayCanvas","OpenStreetMap","Esri.WorldTopoMap")

# Woah, we can change the default basemaps!!
tmap_options(basemaps = c("Stamen.Toner","Esri.WorldTopoMap","Esri.WorldImagery"))
tm_shape(uhi) + tm_polygons(
  id = c("Temperature: " = "uhi_c"),
  col='uhi_c',
  palette='-RdBu',
  style='cont',
  border.col = 'white',
  lwd=0.3,
  title = "Evening UHI (Celsius)",
  alpha = 0.7) +
  tm_tiles("Stamen.TonerLabels")

Evening Urban Heat Island Effect

Local Indicators of Spatial Association

LISA, or Local Indicators of Spatial Association:

Local Moran’s I - Most commonly used. Often called Anselin’s LISA, or just LISA
Local Geary’s C
Getis-Ord G

Local Indicators

Global measures tell you whether the data are autocorrelated, but they don’t tell you where.

Not effective in identifying different types of local clustering patterns (hot spots or cold spots)
Local indicators allow for observation-level analysis and mapping

Local Getis-Ord G Statistic

suppressPackageStartupMessages(require(spdep))
suppressPackageStartupMessages(require(sf))
suppressPackageStartupMessages(require(tidycensus))
suppressPackageStartupMessages(require(dplyr))
tract <- tidycensus::get_acs(
  geography = 'tract',
  variables = c("Households_" = 'B22010_001', "SNAP_" = 'B22010_002'),
  state="CA",
  county='Los Angeles',
  geometry = TRUE,
  output = 'wide') %>% 
  st_transform(4326) %>% 
  mutate(
    pct_snap = SNAP_E/Households_E
    # pct_snap = ifelse(is.nan(pct_snap), 0, pct_snap)
    # pct_snap = ifelse(is.na(pct_snap), 0, pct_snap)
  ) %>% 
  filter(
    as.vector(sf::st_area(.)) > 0,
    st_coordinates(st_centroid(.))[,2] > 33.667,
    !is.na(pct_snap),
    !is.nan(pct_snap) # Lets remove the NA values
  )

suppressMessages(tmap_mode('plot'))
tm_shape(tract) + 
  tm_polygons(
    col='pct_snap',
    palette = 'Blues',
    border.col = 'white',
    lwd=0.3,
    style = 'quantile',
    n=5,
    contrast=c(0,1),
    title = "Share of Households (0 to 1)") +
  tm_shape(tract %>% st_union()) +  # adding a nice border!
  tm_polygons(col = NA, alpha = 0, border.col = 'black') +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

tract <- tract %>% filter(st_coordinates(st_centroid(.))[,2] < 34.3)

tm_shape(tract) + 
  tm_polygons(
    col='pct_snap',
    palette = "Blues",
    border.col = 'white',
    lwd=0.3,
    style = 'quantile',
    contrast=c(0,1),
    n=5,
    title = "Share of Households (0 to 1)") +
  tm_shape(tract %>% st_union()) +  # adding a nice border!
  tm_polygons(col = NA, alpha = 0, border.col = 'black') +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

Hot Spots and Cold Spots

What is a hot spot?

An area where high values cluster together

What is a cold spot?

An area where low values cluster together

Moran's I and Geary's C cannot distinguish them

They only indicate clustering
Cannot tell if they are hot spots, cold spots, or both

Local Getis-Ord Statistic

Local Getis-Ord is the ratio of all the values within distance d of point i to the sum of all values in the study area:

\[G_i(d) = \dfrac{\sum\limits_jw_{ij}x_j}{\sum\limits_jx_j}\]

where:

\(x_j\) are the weighted values of all of the points in the study area

\(w_{ij}\) is the weights matrix

Assessing Significance

If local G returns z-scores, how do we assess significance?

Assessing Significance

Example: Local Gi*

First, we need to create our neighborhood and weights matrix:

tract_nb <- poly2nb(tract,queen=T) %>% 
  include.self() # This turns it into Gi*! Otherwise, it's just G
tract_W <- nb2listw(tract_nb, 
                    style="W",
                    zero.policy = TRUE)

Example: Local Gi*

tract_local_g <- spdep::localG(
  x = tract$pct_snap,
  listw = tract_W, 
  zero.policy = TRUE) %>% 
  as.vector()

# I made us another function!
make_g_bin <- function(v){
  case_when(
    v > -1.96 & v < 1.96 ~ "Insignificant",
    v >= 1.96 & v < 2.58 ~ "Hotspot (>1.96)",
    v >= 2.58 ~ "Hotspot (>2.58)",
    v < -1.96 & v > -2.58 ~ 'Coldspot (<-1.96)',
    v < -2.58 ~ 'Coldspot (<-2.58)',
    TRUE ~ 'Unknown'
  ) %>% as.factor() %>% 
    ordered(c(
      "Unknown",
      "Coldspot (<-2.58)",
      "Coldspot (<-1.96)",
      "Insignificant",
      "Hotspot (>1.96)",
      "Hotspot (>2.58)"))
}

g_palette <- c('grey','#0571B0','#92C5DE','cornsilk','#F4A582','#CA0020')
# make_g_bin(v = tract_local_g)

# Apply our colors on the fly to the 'col' column
tract %>% mutate(G=make_g_bin(tract_local_g)) %>% 
  tm_shape() +
  tm_polygons(
    col = "G",
    palette = g_palette,
    border.col = 'grey80',
    lwd=0.4,
    title = "Local Getis-Ord G") +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

x = tract %>% mutate(G=make_g_bin(tract_local_g)) %>% 
  tm_shape() +
  tm_polygons(
    col = "G",
    palette = g_palette,
    border.col = 'grey80',
    alpha = 0.5,
    lwd=0.4,
    title = "Local Getis-Ord G") +
  tm_shape(st_union(tract)) + 
  tm_polygons(zindex = 200,lwd=2,alpha = 0,border.col = "black")+
  tm_tiles(leaflet::providers$Stamen.TerrainLabels) +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

Local Moran’s I

Global vs Local Moran’s I

Global Moran’s I

What is the extent of clustering in the total study area?
Is this clustering significantly different from a random spatial distribution?

Local Moran’s I

Do local clusters (high-high or low-low) or local spatial outliers (high-low or low-high) exist?
Are these local clusters and spatial outliers statistically significant?

Calculating Local Moran’s I

The local Moran statistic for areal unit \(i\) is:

\[I_i = z_i\sum\limits_jw_{ij}z_j\]

Z values are z scores determined from the values of the attribute of interest for the whole data set:

\[z_i = \dfrac{x_i - \bar{x}}{SD_x}\]

What does it look like?

# Recreate, because we can't have "include.self()" here!
tract_nb <- poly2nb(tract,queen=T)
tract_W <- nb2listw(tract_nb, 
                    style="W",
                    zero.policy = FALSE)

tract_local_i <- spdep::localmoran(
  x = tract$pct_snap, 
  listw = tract_W) %>% 
  as_tibble()

tract_local_i

## # A tibble: 2,301 × 5
##    Ii          E.Ii          Var.Ii     Z.Ii       `Pr(z != E(Ii))`
##    <localmrn>  <localmrn>    <localmrn> <localmrn> <localmrn>      
##  1  0.85494320 -4.911889e-04 0.37622927  1.3946345 0.16312608      
##  2  0.80078768 -4.968960e-04 0.16282958  1.9857297 0.04706335      
##  3 -0.02323440 -7.466797e-05 0.02447854 -0.1480269 0.88232153      
##  4  0.64089642 -4.523111e-04 1.04029713  0.6288044 0.52947715      
##  5 -0.20761487 -1.243103e-04 0.03166747 -1.1659816 0.24362190      
##  6  0.16226049 -2.203842e-04 0.07223845  0.6045303 0.54549114      
##  7  0.34636226 -1.945058e-04 0.07441626  1.2704000 0.20394219      
##  8 -0.04907508 -1.140228e-04 0.03269224 -0.2707874 0.78655454      
##  9  0.61518484 -4.779916e-04 0.12172306  1.7646416 0.07762402      
## 10  0.49516832 -1.948956e-04 0.08951745  1.6556553 0.09779165      
## # … with 2,291 more rows

Assessing Significance

Multiple Comparisons Problem

Every time we perform a hypothesis test, there is a 5% chance that we are incorrectly rejecting \(H_0\) (when \(\alpha\) = 0.05).

So… what happens if we run 20+ hypothesis tests?

Controling for Mutiple Comparisons

False Discovery Rate (FDR)

To account for the Multiple Compariosons Problem, we’ll adjust our p-values to be more conservative based on how many there are using a process known as FDR. Adjusting p-values is simple using R’s built in p.adjust() - look for it in the next slide!

Prepping the LISA values

# Transferring values from test into our sf object
tract$lisa_I <- tract_local_i$Ii
tract$lisa_EI <- tract_local_i$E.Ii
tract$lisa_Z <- tract_local_i$Z.Ii
tract$lisa_p <- p.adjust(tract_local_i$`Pr(z != E(Ii))`,
                         method = "fdr")

# A functino to bin I values
make_i_bin <- function(v){
  case_when(
    v <= 0.05 ~ "Significant",
    v > 0.05 ~ "Insignificant",
    is.na(v) ~ "Unknown",
    TRUE ~ 'Unknown'
  ) %>% as.factor() %>% 
    ordered(c(
      "Unknown",
      "Insignificant",
      "Significant"))
}

# DO NOT USE THIS IN YOUR LAB! THIS IS JUST AN EXAMPLE!
# You'll need to recreate the one in a few steps!
suppressMessages(tmap_mode('plot'))
tract %>% mutate(I=make_i_bin(tract$lisa_p)) %>% 
  tm_shape() +
  tm_polygons(
    col = "I",
    palette = c('grey','cornsilk','lightgreen'),
    border.col = 'grey60',
    lwd=0.4,
    title = "Local Moran's I") +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

What else should we do with this?

Cluster:

High-High
High-Low
Low-Low
Low-High

Creating a lag vector

tract$lag_snap <- spdep::lag.listw(x = tract_W, var = tract$pct_snap)

# Does this look familiar?
moran_plot = function(var, w_var, xlab, ylab, main_title = NULL, normalize = T){
  if(missing(main_title)){
    main_title = "Moran's I Plot"
  }
  if(normalize){
    var = scale(var) %>% as.vector
    w_var = scale(w_var) %>% as.vector()
  }
  plot_dat = data.frame(var,w_var,stringsAsFactors = F) %>% 
    mutate(quad = case_when(
        var > 0 & w_var > 0 ~ "HH",
        var > 0 & w_var < 0 ~ "HL",
        var < 0 & w_var < 0 ~ "LL",
        var < 0 & w_var > 0 ~ "LH",
        TRUE ~ 'Unknown'))
  plot_dat %>% ggplot() +
    geom_vline(xintercept = 0) +
    geom_hline(yintercept = 0) +
    geom_point(aes(x=var,y=w_var,color=quad)) +
    xlab(xlab) +
    ylab(ylab) +
    geom_smooth(
      mapping = aes(x=var,y=w_var),
      formula = 'y ~ x',
      method = 'lm',
      color="black",
      size=0.5,
      se=F) +
    scale_color_manual(
      name = "", 
      values = c('Unknown' = 'grey','HH'="firebrick",'HL'="brown1",'LL'="navy",'LH'="cornflowerblue")) +
    ggtitle(main_title)
}

moran_plot(
  var = tract$pct_snap, 
  w_var = tract$lag_snap, 
  xlab = "Households on SNAP (%)", 
  ylab = "Lagged Households on SNAP (%)")

tract %>% tm_shape() +
  tm_polygons(
    col = "lag_snap",
    palette = '-RdBu',
    border.col = 'grey80',
    lwd=0.4, 
    style = 'cont',
    title = "Lagged Households on SNAP (%)") +
  tm_layout(main.title = "Households on Food Assistance (SNAP)")

Classifying quadrants

# Once again, I'm being very kind and giving you an awesome function
# BUT! Do not use this version... use the one later with a 'p' argument
make_i_quads <- function(v, w_v, normalize = T){
  if(normalize){
    v = scale(v) %>% as.vector
    w_v = scale(w_v) %>% as.vector()
  }
  case_when(
    v < 0 & w_v < 0 ~ "Low-Low",
    v < 0 & w_v > 0 ~ "Low-High",
    v > 0 & w_v < 0 ~ "High-Low",
    v > 0 & w_v > 0 ~ "High-High",
    TRUE ~ 'Unknown'
  ) %>% as.factor() %>% 
    ordered(c(
      "Unknown",
      "Low-Low",
      "Low-High",
      "High-Low",
      "High-High"))
}

i_palette <- c('grey','#0571B0','#92C5DE','#F4A582','#CA0020')

tract %>% mutate(quad = make_i_quads(v = pct_snap, w_v = lag_snap)) %>% 
  tm_shape() +
  tm_polygons(
    col = "quad",
    palette = i_palette,
    border.col = 'grey80',
    lwd=0.4,
    title = "Local Moran's I")

We aren’t done yet… we need to remove insignificant observations!

# I'll modify our function....
# USE THIS ONE!!!
make_i_quads <- function(v, w_v, p, normalize = T){
  if(normalize){
    v = scale(v) %>% as.vector
    w_v = scale(w_v) %>% as.vector()
  }
  case_when(
    p > 0.05 ~ "Insignificant", ### Make sure that 'p' has been adjusted!
    v < 0 & w_v < 0 ~ "Low-Low",
    v < 0 & w_v > 0 ~ "Low-High",
    v > 0 & w_v < 0 ~ "High-Low",
    v > 0 & w_v > 0 ~ "High-High",
    TRUE ~ 'Unknown'
  ) %>% as.factor() %>% 
    ordered(c(
      "Insignificant",
      "Low-Low",
      "Low-High",
      "High-Low",
      "High-High"))
}

i_palette <- c('cornsilk','#0571B0','#92C5DE','#F4A582','#CA0020')

tract %>% mutate(quad = make_i_quads(v = pct_snap, w_v = lag_snap,p = lisa_p)) %>% 
  tm_shape() +
  tm_polygons(
    col = "quad",
    palette = i_palette,
    border.col = 'grey70',
    lwd=0.4,
    title = "Local Moran's I") + 
  tm_layout(main.title = "Local Moran's I: SNAP Benefit Usage in LA County")

Or, in a web map:

suppressMessages(tmap_mode("view"))
tract %>% 
  mutate(quad = make_i_quads(tract$pct_snap, tract$lag_snap,p = lisa_p)) %>% 
  tm_shape() +
  tm_polygons(
    id = "quad",
    col = "quad",
    palette = i_palette,
    border.col = 'grey80',
    lwd=0.4,
    alpha=0.5,
    title = "Local Moran's I") +
  tm_tiles("Stamen.TerrainLabels") +
  tm_shape(tract %>% st_union()) +  # adding a nice border!
  tm_polygons(zindex = 200, alpha = 0, border.col = 'black')

How does this compare to Local G?

Moran Scatterplot

Local spatial autocorrelation suggests that spatial autocorrelation can vary over a geographic region
Where are the areas with “unusual” levels of spatial autocorrelation?
How do we detect outliers?
- A lack of fit in the scatterplot could indicate important local spatial processes and associations
Moran scatterplot is an effective visual diganostic tool
Reminder: slope of the regression is Global Moran’s I

Bivariate LISA

Moran’s I is the correlation between \(x\) and lag(\(x\)) - the same variable, but in nearby areas… Univariate Moran’s I

Bivariate Moran’s I is a correlation between X and lag(y) - a different variable in nearby areas

Bivariate LISA

Let’s look at median household income and educational attainment (% of population 25+ years old with bachelor’s degree or higher) in Alamdeda County (Oakland, etc), NM:

Creating a lag for income

oak_nb <- poly2nb(pl = oak, queen=T)
oak <- oak[card(oak_nb) != 0,]
oak_nb <- poly2nb(pl = oak, queen=T)
oak_W <- nb2listw(neighbours = oak_nb, style="W")
lag_edu <- lag.listw(x = oak_W, var = oak$edu, zero.policy = TRUE, NAOK = TRUE)

Bivariate Moran’s I Plot

moran_plot(
  var = oak$mhi,
  w_var = lag_edu,
  ylab = "Lagged Adults 25+ with at Least a Bachelors Degree (%)",
  xlab = "Median Household Income ($)",
  main_title = "Bivariate Moran's I Plot")

We are observing positive spatial autocorrelation between “education” and lagged income!

What’s missing?

Is it significant?
Hard to tell without simulation
- Complex to calculate
Some software is built to calculated it (GeoDa)

Calculating Bivariate LISA p-values in R

Don’t worry about what’s going on here, it’s just an example:

good <- complete.cases(cbind(oak$mhi,lag_edu))
oak_nb2 <- poly2nb(pl = oak[good,], queen = TRUE)
mod <- lm(oak$mhi ~ 1, weights = lag_edu)
bivar <- as.data.frame(localmoran.sad(mod,nb = oak_nb2,zero.policy = TRUE))
oak$bivar_p <- NA
oak$bivar_p[good] <- bivar$`Pr. (Sad)`
oak$bivar_p[is.na(oak$bivar_p) | is.nan(oak$bivar_p)] <- 1
oak$bivar_p <- p.adjust(oak$bivar_p, method = 'fdr') # Adjusting p values!

Wow, that is a lot…

What is Happening?

We are trying to explain one value with another.

Like regression?

Local Spatial Autocorrelation

GEOG 4/597: Advanced Spatial Quantitative Analysis, Winter 2023

Jackson Voelkel | Portland State University

Global and Local

Global Measures

Local Measures

Global vs. Local

Global vs. Local

Global vs. Local

Global Example

Local Example

Local Indicators of Spatial Association

Local Indicators

Local Getis-Ord G Statistic

Local Getis-Ord G Statistic

Hot Spots and Cold Spots

Local Getis-Ord Statistic

Assessing Significance

Assessing Significance

Example: Local Gi*

Example: Local Gi*

Local Moran’s I

Global vs Local Moran’s I

Calculating Local Moran’s I

What does it look like?

Assessing Significance

Multiple Comparisons Problem

Controling for Mutiple Comparisons

False Discovery Rate (FDR)

Prepping the LISA values

What else should we do with this?

Creating a lag vector

Classifying quadrants

How does this compare to Local G?

Moran Scatterplot

Bivariate LISA

Bivariate LISA

Bivariate LISA

Creating a lag for income

Bivariate Moran’s I Plot

What’s missing?

Calculating Bivariate LISA p-values in R

What is Happening?

Next Class

Lab 3 - Local Spatial Autocorrelation

Lecture: Spatial Regression