Problems with Spatial Data


GEOG 4/597: Advanced Spatial Quantitative Analysis, Winter 2023

Jackson Voelkel   |   Portland State University

Overview


  1. How are spatial data special?
  2. Identify the pitfalls of this “specialness”
    • Spatial autocorrelation
    • Modifiable areal unit problem (MAUP)
    • Ecological fallacy
    • Scale effects
    • Non-uniformity of space
    • Edge effects

Statistical Assumptions


  1. Linear relationship (for linear analyses)
  2. Normality
  3. No (or very little) multicollinearity.
    • Independent variables are independent from one another
  4. No autocorrelation.
    • Residuals are independent from one another
  5. Homoscedasticity
    • Error terms are equal.

Homoscedasticity

x <- readRDS('C:/temp/model_object.RDS')
mod <- lm(bldgval ~ bldg_sqft_sum + gis_acres, data=x)
summary(mod)
## 
## Call:
## lm(formula = bldgval ~ bldg_sqft_sum + gis_acres, data = x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -52257986    -74999      4154     80163 129886375 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -8.990e+03  1.512e+03  -5.944 2.78e-09 ***
## bldg_sqft_sum  1.040e+02  3.079e-01 337.571  < 2e-16 ***
## gis_acres      6.268e+05  2.399e+03 261.288  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 810900 on 351151 degrees of freedom
##   (2418 observations deleted due to missingness)
## Multiple R-squared:  0.3591, Adjusted R-squared:  0.3591 
## F-statistic: 9.839e+04 on 2 and 351151 DF,  p-value: < 2.2e-16

plot(mod,3)

“As building value (bldgval) increases, so does our error.” or “Our model is better at predicting home values for less expensive homes”

Spatial is Special


Tobler’s First Law of Geography

Everything is related to everything else, but near things are more related than distant things.
  • If this weren’t generally true, geography would be irrelevant
  • In quantitative geography, this is important because it means data are not independent observations

Tobler, W. R. (1970). A Computer Movie Simulating Urban Growth in the Detroit Region. Economic Geography, 46(sup1), 234-240. https://doi.org/10.2307/143141

So What?


  • Spatial autocorrelation
  • Scale effects
  • Modifiable areal unit problem (MAUP)
  • Non-uniformity of space
  • Edge effects

Spatial Autocorrelation

Spatial Autocorrelation


Follows directly from Tobler’s Law

  • There is redundancy in spatial data because nearby points are similar
  • What is important:
    • to be able to measure the autocorrelation
    • to describe the autocorrelation structure

Spatial Autocorrelation



Actual Population




->

Random Population

https://mgimond.github.io/Spatial/spatial-autocorrelation.html

Spatial Autocorrelation



Actual Elevation




->

Random Elevation

https://mgimond.github.io/Spatial/spatial-autocorrelation.html

Spatial Variation

Spatial Variation


Two aspects of spatial variation:

  1. First order variation: Large-scale variation in the mean (or typical) value; variation is due to changes in the environment or background
  2. Second order variation: Local variations due to interaction effects between observations

First Order Variation

https://en.wikipedia.org/wiki/File:Annual_Average_Temperature_Map.jpg

Second order

Voelkel, J., & Shandas, V. (2017). Towards Systematic Prediction of Urban Heat Islands: Grounding Measurements, Assessing Modeling Techniques. Climate, 5(2), 41. https://doi.org/10.3390/cli5020041

Modifiable Areal Unit Problem (MAUP)

Modifiable Areal Unit Problem (MAUP)


  • Areal units are often arbitrary, for the convenience of data collection and aggregation, rather than related to geography
  • Many standard statistical techniques are sensitive to the chosen units

So what?

1. We can take note of this, and include it in our analysis considerations…
2. … or we can use it for voter suppression!


Modifiable Areal Unit Problem (MAUP)

Our choice of spatial reference frame is itself a significant determinant of the statistical and other patterns we observe.

x = % of population over 62; y= % vote for Republican congressional candidates (1968)

Openshaw, S. and P.J. Taylor. 1979. “A million or so correlation coefficients: Three experiments on the modifiable areal unit problem.” Pp. 127-144 in Statistical Methods in the Spatial Sciences, edited by N. Wrigley. London: Pion.

Modifiable Areal Unit Problem (MAUP)

Openshaw, S. and P.J. Taylor. 1979. “A million or so correlation coefficients: Three experiments on the modifiable areal unit problem.” Pp. 127-144 in Statistical Methods in the Spatial Sciences, edited by N. Wrigley. London: Pion.

Modifiable Areal Unit Problem (MAUP)

Modifiable Areal Unit Problem (MAUP)

Ecological Fallacy

Ecological Fallacy


Invalid transfer of conclusions from spatially aggregated analysis to smaller areas or even to the individual level.


  • Statistical relationship observed at one scale cannot be assumed at another scale
  • Extremely common fallacy
  • Closely related to MAUP: Statistical relationships may change at different levels of aggregation


Shandas, et al., 2016


Shandas, et al., 2016


Shandas, et al., 2016


Shandas, et al., 2016

Assessing Exposure

Shandas, et al., 2016

Assessing Exposure


Comparing Dasymetric population data to standard census geometries:


Shandas, et al., 2016

Scale Effect

Scale Effect


Scale effects are fundamental and should be considered before spatial analysis


  • Different representations are appropriate at different scales, so representation of the entities changes depending on the scale
  • Scale is also a factor in the description of the spatial structure - in the distinction between first and second order.
  • Can be similar to MAUP, though uniform (e.g. raster cells)

Dependent

87 95 72 37 44 24
40 55 55 38 88 34
41 30 26 35 38 24
14 56 37 34 8 18
49 44 51 67 17 37
55 25 33 32 59 54

Independent

72 75 85 29 58 30
50 60 49 46 84 23
21 46 22 42 45 14
19 36 48 23 8 29
38 47 52 52 22 48
58 40 46 38 35 55

Dependent

69.25 50.5 47.5
35.25 33 22
43.25 45.75 41.75

Independent

64.25 52.25 48.75
30.5 33.75 24
45.75 47 40

Dependent

55.67 40.22
40.44 36.22

Independent

53.33 41.22
42.67 34.44

Scale: Census Geographies

Median Household Income


require(tidycensus)
require(sf)
require(tmap)

cnty <- tidycensus::get_acs(
  geography = 'county', variables = 'B19049_001E',
  state="OR", county=c('Clackamas','Multnomah'),
  year=2010, geometry = TRUE) %>% 
  st_transform(2913)

trct <- tidycensus::get_acs(
  geography = 'tract', variables = 'B19049_001E',
  state="OR", county=c('Clackamas','Multnomah'),
  year=2020, geometry = TRUE) %>% 
  st_transform(2913)

bg <- tidycensus::get_acs(
  geography = 'block group', variables = 'B19049_001E',
  state="OR", county=c('Clackamas','Multnomah'),
  year=2020, geometry = TRUE) %>% 
  st_transform(2913)

cnty %>% tm_shape() +
  tm_polygons(col='estimate', title="MHI", palette = "YlGnBu",
              border.col = 'white',lwd=0.5) +
  tm_layout(main.title = "Median Household Income: County")

trct %>% tm_shape() +
  tm_polygons(col='estimate', title="MHI", palette = "YlGnBu",
              border.col = 'white',lwd=0.5) +
  tm_layout(main.title = "Median Household Income: Tract")

bg %>% tm_shape() +
  tm_polygons(col='estimate', title="MHI", palette = "YlGnBu",
              border.col = 'white',lwd=0.3, style="jenks") +
  tm_layout(main.title = "Median Household Income: Block Group")

Scale: Census Geographies

Percent Hispanic or Latino

cnty <- tidycensus::get_acs(
  geography = 'county', 
  variables = c('B01001_001E','B03002_012E'), 
  state="OR", 
  year=2020, 
  county=c('Clackamas','Multnomah'), 
  geometry = TRUE,
  output = 'wide') %>% 
  mutate(
    pct_hl = B03002_012E / B01001_001E
  ) %>% st_transform(2913)

trct <- tidycensus::get_acs(
  geography = 'tract',
  variables = c('B01001_001E','B03002_012E'), 
  state="OR", 
  year=2020, 
  county=c('Clackamas','Multnomah'), 
  geometry = TRUE,
  output = 'wide') %>% 
  mutate(
    pct_hl = B03002_012E / B01001_001E
  ) %>% st_transform(2913)

bg <- tidycensus::get_acs(
  geography = 'block group', 
  variables = c('B01001_001E','B03002_012E'), 
  state="OR", 
  year=2020, 
  county=c('Clackamas','Multnomah'), 
  geometry = TRUE,
  output = 'wide') %>% 
  mutate(
    pct_hl = B03002_012E / B01001_001E
  ) %>% st_transform(2913)

cnty %>% tm_shape() +
  tm_polygons(col='pct_hl', title="% of Pop.", palette = "YlGnBu",
              border.col = 'white', lwd=0.5) +
  tm_layout(main.title = "Hispanic or Latino: County")

Scale: Census Geographies


Scale and Resolution

Uncertain Geographic Context Problem (UGCoP)

Uncertain Geographic Context Problem (UGCoP)


Findings about the effects of area-based attributes could be affected by how contextual units or neighborhoods are geographically delineated and the extent to which these areal units deviate from the ‘true causally relevant’ geographic context

Mei-Po Kwan (2012) The Uncertain Geographic Context Problem, Annals of the Association of American Geographers, 102:5, 958-968, DOI: 10.1080/00045608.2012.687349

Uncertain Geographic Context Problem (UGCoP)


  • Different from MAUP
  • Arises not because of issues with aggregation or zoning, but rather because of uncertainty
  • Uncertainty as to what are the actual areas that exert contextual influence on the phenomenon under study
  • Individuals living in the same areal unit may experience influences from many other areal units

Uncertain Geographic Context Problem (UGCoP)


Arises because of our limited knowledge about the precise spatial and temporal configuration of each individual’s true geographic context, not because of the use of a particular scheme of areal division, zonal aggregation, or spatial scale

Kwan, Mei-Po. 2012. “How GIS can help address the uncertain geographic context problem in social science research.” Annals of GIS 18:245-255.


https://gisgeography.com/kriging-interpolation-prediction/

Non-uniformity of Space

Non-uniformity of Space


  • Space is not uniform
  • Particularly problematic for data gathered on human geography
  • E.g.: Disease clusters against background population density. So does graffiti

Non-uniformity of Space

Source: XKCD

Non-uniformity of Space


Clift, K., Scott, L., Johnson, M., & Gonzalez, C. (2014). Leveraging geographic information systems in an integrated health care delivery organization. The Permanente journal, 18(2), 71-5.

Non-uniformity of Space


Megler, V., Banis, D., & Chang, H. (2014). Spatial analysis of graffiti in San Francisco. Applied Geography, 54, 63-73. https://doi.org/10.1016/j.apgeog.2014.06.031

Megler, V., Banis, D., & Chang, H. (2014). Spatial analysis of graffiti in San Francisco. Applied Geography, 54, 63-73. https://doi.org/10.1016/j.apgeog.2014.06.031

Edge Effects

Edge Effects


  • Edge effects are a commonly occurring example of non-uniformity and scale effects rolled into one
  • Entities on the edge of the study area only have neighbors in one direction
  • Can substantially alter results

Edge Effects - Model NOx for an Industirial Area


  • Causes: combustion (vehicles, factories)
  • Ameliorators: trees …

Model NOx for an Industirial Area




Model NOx for an Industirial Area


For our study area we’ve included information on:

  • Roads/traffic/rail
  • Industrial/manufacturing
  • Trees/vegetation
  • Docks/ports
  • Water
  • etc

Model NOx for an Industirial Area




Model NOx for an Industirial Area




Model NOx for an Industirial Area


Maybe our initial model has some issues…

What scale best fixes edge effects?



  • It depends!

Edge effects via data limitations


Summary

Spatial autocorrelation


  • Operates to the detriment of conventional statistical by introducing redundancy and nonindependence into data
  • Diagnostic measures are available; they are also used to describe the spatial autocorrelation structure
  • Distinction between first and second order effects

Scale


  • The first / second order effect is often related to scale

Modifiable areal unit problem


  • Related to scale
  • Remains unsolved it is always present!

Non-uniformity of space


  • Important underlying causes are often not uniform

Edge effects


  • There are usually effects beyond what you are looking at

What does this all have in common?


Tobler’s First Law of Geography

Next class

Next class


  • Keep plugging away at DataCamp
  • An introduction to R Studio, R Markdown
  • Our first lab!