library(sfdep)
library(dplyr)

The colocation quotient evaluates if there is a spatial association between two categorical vectors $$A$$ and $$B$$. It is similar to the join count and cross-K metric. It’s unique, however, in the way that the CLQ can measure asymmetric relationships.

The CLQ is a ratio of the observed proportion of a categorical variable to the expected proportion. Values larger than 1 mean that there is more colocation than to be expected under the null hypothesis of no spatial association. When the value is smaller than 0, we interpret it to mean that there is less colocation than expected under the null.

Note that the CLQ’s intended use is with point data.

### Global Colocation

The global_colocation() quotient is used to identify if there is any colocation in a single categorical vector $$A$$.

A <- guerry[["main_city"]]
nb <- st_contiguity(sf::st_geometry(guerry))
global_colocation(A, nb, 299)
#> $CLQ #> [1] 1.02841 #> #>$p_sim
#> [1] 0.19

The result of the global CLQ is approximately 0 and insignificant. We can interpret this to mean that there is no significant colocation of “Main Cities.”

### Pairwise Colocation

The pairwise CLQ evaluates colocation between values of categorical variables $$A$$ and $$B$$ at a global level. The results are one-directional with $$A \to B$$.

We provide the 2 vectors to the pairwise_colocation() function and the results is a matrix with unique values of $$A$$ as column headers and unique values of $$B$$ as row names. There are additional columns to indicate the simulated p-values of the colocation quotients.

A <- guerry$main_city B <- guerry$region
nb <- st_knn(sf::st_geometry(guerry), 5)
#> ! Polygon provided. Using point on surface.
pairwise_colocation(B, A, nb, 199)
#>        1:Sm     2:Med      3:Lg p_sim_1:Sm p_sim_2:Med p_sim_3:Lg
#> C 0.8894118 1.1250679 0.1976471      0.415       0.015      0.000
#> E 1.8776471 0.8514027 0.9882353      0.000       0.020      0.495
#> N 0.1976471 1.0794570 1.1858824      0.000       0.060      0.220
#> S 1.5811765 0.9122172 0.8894118      0.040       0.120      0.370
#> W 0.6917647 1.0338462 0.9882353      0.170       0.190      0.460

If we look at the first row, we see that the colocation of $$C \to 3{:}Lg$$ is very significant with a value of ~ $$0.20$$. We can interpret this to mean that there is systematically less colocation of region C near large main cities.

### Local Colocation

The local CLQ is rather similar to the pairwise CLQ. The key difference is the weights list that is used in calculating the CLQ. In its introductory paper, Wang et al. (2017) emphasize that adaptive gaussian kernel weights be used. This introduces a distance decay where closer values have more heft than distant ones in calculating the local CLQ.

The result of the local CLQ returns a CLQ and a simulated p-value for each unique value of B that falls in the neighborhood.

# Identify A & B variables
A <- guerry$main_city B <- guerry$region
geo <- sf::st_geometry(guerry)
nb <- st_knn(geo, 5)
#> ! Polygon provided. Using point on surface.
wt <- st_kernel_weights(nb, geo, "gaussian", adaptive = TRUE)
#> ℹ It is recommended to include the ith observation. Consider nb = include_self(nb).
#> ! Polygon provided. Using point on surface.

res <- local_colocation(A, B, nb, wt, 49)

#>           C        E        N        S  W p_sim_C p_sim_E p_sim_N p_sim_S
#> 1 0.8461533 4.095023       NA       NA NA    0.06    0.02      NA      NA
#> 2        NA       NA 4.941176       NA NA      NA      NA    0.02      NA
#> 3 4.9411765       NA       NA       NA NA    0.02      NA      NA      NA
#> 4        NA 2.090837       NA 2.850339 NA      NA    0.20      NA    0.08
#> 5        NA 3.268800       NA 1.672377 NA      NA    0.02      NA    0.24
#> 6 1.1554807 1.878544       NA 1.907152 NA    0.26    0.30      NA    0.18
#>   p_sim_W
#> 1      NA
#> 2      NA
#> 3      NA
#> 4      NA
#> 5      NA
#> 6      NA

Note that for the first observation we only have a CLQ for C and E. This is because the neighborhood of the first observation consists of only C and E values. We can verify this like so.

# view first row
res[1,]
#>           C        E  N  S  W p_sim_C p_sim_E p_sim_N p_sim_S p_sim_W
#> 1 0.8461533 4.095023 NA NA NA    0.06    0.02      NA      NA      NA

# find xj values for the B column based on neighbors
# take first observations neighbors
sfdep::find_xj(B, nb)[[1]]
#> [1] E E C E E
#> Levels: C E N S W

It’s also worth noting that we can create the local CLQ in a dplyr pipe like so


res_tidy <- guerry |>
select(main_city, region) |>
mutate(nb = st_knn(geometry, 5),
wt = st_kernel_weights(nb, geometry, "gaussian", adaptive = TRUE),
lclq = local_colocation(main_city, region, nb, wt, 99)) |>
tidyr::unnest(lclq)
#> ! Polygon provided. Using point on surface.
#> ℹ It is recommended to include the ith observation. Consider nb = include_self(nb).
#> ! Polygon provided. Using point on surface.

slice(res_tidy, 1:5)
#> Simple feature collection with 5 features and 14 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 595532 ymin: 1858801 xmax: 975716 ymax: 2564568
#> CRS:           NA
#> # A tibble: 5 × 15
#>   main_city region                       geometry nb    wt         C     E     N
#>   <ord>     <fct>                  <MULTIPOLYGON> <nb>  <lis>  <dbl> <dbl> <dbl>
#> 1 2:Med     E      (((801150 2092615, 800669 209… <int… <dbl…  0.846  4.10 NA
#> 2 2:Med     N      (((729326 2521619, 729320 252… <int… <dbl… NA     NA     4.94
#> 3 2:Med     C      (((710830 2137350, 711746 213… <int… <dbl…  4.94  NA    NA
#> 4 1:Sm      E      (((882701 1920024, 882408 192… <int… <dbl… NA      2.09 NA
#> 5 1:Sm      E      (((886504 1922890, 885733 192… <int… <dbl… NA      3.27 NA
#> # … with 7 more variables: S <dbl>, W <dbl>, p_sim_C <dbl>, p_sim_E <dbl>,
#> #   p_sim_N <dbl>, p_sim_S <dbl>, p_sim_W <dbl>