Efficiency of spatial intersects in R

September 12, 2017

R spatial intersection

By

David Beauchesne

Kevin Cazelles

Rémi Daigle



Intersects & R

We are increasingly performing spatial analyses in R. The replicability and the efficiency of programming languages is much more appealing than using user friendly softwares like ArcGIS, even though you can still code your way through analyses when using those softwares (latter versions of QGIS do a fantastic job in that regard!). The performance of tools available for spatial analyses in R is however not completely certain.

In this post, we compare four different methods to perform spatial intersects between objects in R, from three different packages:

  1. raster::intersect
  2. rgeos::gIntersects
  3. rgeos::gIntersection
  4. sf::st_intersects
  5. sf::st_intersection

More specifically, we test how these methods fare when performing binary (TRUE/FALSE) and zonal or aerial intersects. Keep in mind, not all methods can be used for both binary and zonal intersects:


Function Binary Zonal
raster::intersect X X
rgeos::gIntersects X
rgeos::gIntersection X X
sf::st_intersects X
sf::st_intersection X X


Obviously, if you mean to perform binary intersects only, the binary functions make more sense as they are built to include less calculations. We nonetheless compare all the functions together for the sake of comparison in this post.


Initiate R

# R version
sessionInfo()[[1]]$version.string
#R> [1] "R version 3.4.2 (2017-01-27)"
#--
rm(list=ls())
library(sf)
library(rgdal)
library(sp)
library(raster)
library(rgeos)
library(magrittr)
library(knitr)
library(kableExtra)
##--
opts_chunk$set(fig.align='center')
##--
# Empty plot function
eplot <- function(x = 1, y = 1, xmin = 0, xmax = 1, ymin = 0, ymax = 1) {
  plot(x = x,y = y,bty = "n",ann = FALSE,xaxt = "n",yaxt = "n",type = "n",bg = 'grey',ylim = c(ymin,ymax),xlim = c(xmin,xmax))
}

Version of packages that have been used to build this post and are herein discussed:

packageVersion("raster")
packageVersion("rgeos")
packageVersion("sf")
#R> [1] '2.6.7'
#R> [1] '0.3.26'
#R> [1] '0.5.6'


Generate spatial objects for testing

We start be generating random spatial object in space. For the record, the area selected is within the St. Lawrence estuary in eastern Canada (see online ecology series), although the actual location really does not matter for this post!


Grid

We use a regular grid to intersect vectorized data, i.e. points and polygons for this post. This simulate the use of a grid used to extract environmental data (biotic and/or abiotic) from multiple sources to characterize a study area.


# Projection in lambers
projSpat <- "+proj=lcc +lat_1=46 +lat_2=60 +lat_0=44 +lon_0=-68.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"

# Bounding box
latmax <- 625000
latmin <- 585000
lonmax <- 150000
lonmin <- 100000

# Create a spatial bounding box for the area of interest using the 'sp' package:
bb <- cbind(c(lonmin,lonmax,lonmax,lonmin,lonmin),
                   c(latmin,latmin,latmax,latmax,latmin)) %>%
      Polygon() %>% list() %>%
      Polygons(ID = 'ID1') %>% list() %>%
      SpatialPolygons(proj4string = CRS(projSpat))


# Create spatial grid that will be used for intersects
A <- 5*1000000 # Surface of cells in m^2 (area in km^2 * 1000000)
cellsize <- sqrt(2*A)/3^(1/4) # value of the small diagonal
grid <- sp::spsample(bb,type="hexagonal",cellsize=cellsize, offset=c(0,0)) %>% # Points for a hexagonal grid
        sp::HexPoints2SpatialPolygons() # creating polygons

# We now have a grid of polygons to work with!
plot(grid)
lines(bb, col = 'blue', lwd = 2)


# We need this grid in `sp` and `sf` usable formats
gridSP <- grid
gridSF <- sf::st_as_sf(grid)
class(gridSP)
#R> [1] "SpatialPolygons"
#R> attr(,"package")
#R> [1] "sp"
class(gridSF)
#R> [1] "sf"         "data.frame"


Points and Polygons

Now we generate random points within the bounding box to test the intersects. This is done for 1, 10, 50, 100, 250, 500, 1000, 10000 points. Then, to get all data required to perform the tests, we also need to create polygons from the point data.


# Number of samples
samp <- c(1, 10, 50, 100, 250, 500, 1000, 10000)
nSamp <- length(samp)

# Names of points samples
sampNames <- paste0('Pt',samp)
sampNames

# Random points for all samples
for(i in 1:nSamp) {
    assign(x = sampNames[i],
           value = data.frame(lon = runif(n = samp[i], min = lonmin, max = lonmax),
                              lat = runif(n = samp[i], min = latmin, max = latmax)))    
}

# We now have nSamp new objects with randomly generated coordinates
ls()

# Let's now create spatial objects with those coordinates for use with the `sp` package
sampNamesSP <- paste0(sampNames, 'SP')
for(i in 1:nSamp) {
    assign(
      x = sampNamesSP[i],
      value = SpatialPoints(coords = get(sampNames[i]), proj4string=CRS(projSpat))
    )
}

# Visualize them
par(mfrow = c(3,3))
for(i in 1:nSamp) {
    par(mar = c(0,0,0,0))
    plot(bb, lwd = 2)
    points(get(sampNamesSP[i]), cex = 0.75, col = 'transparent', bg = '#1e6b7955', pch = 21)
}


# Now we generate polygons from the point data using the `rgeos::gBuffer`,
# see `sf::st_buffer` is the equivalent for the `sf` package
sampNamesPoly <- paste0('Poly',samp)
sampNamesSPPoly <- paste0(sampNamesPoly, 'SP')
for(i in 1:nSamp) {
    assign(x = sampNamesSPPoly[i],
           value = rgeos::gBuffer(get(sampNamesSP[i]), byid = T, width = 2000))
}

# Visualize them
par(mfrow = c(3,3))
for(i in 1:nSamp) {
    par(mar = c(0,0,0,0))
    plot(bb, lwd = 2)
    plot(get(sampNamesSPPoly[i]), border = 'transparent', col = '#1e6b7955', add = T)
}


# Transform points and polygons for use with the `sf` package
sampNamesSF <- paste0(sampNames, 'SF') # for points
sampNamesSFPoly <- paste0(sampNamesPoly, 'SF') # for polygons

for(i in 1:nSamp) {
    assign(x = sampNamesSF[i],
           sf::st_as_sf(get(sampNamesSP[i]))) # points
    assign(x = sampNamesSFPoly[i],
           sf::st_as_sf(get(sampNamesSPPoly[i]))) # polygons
}


Benchmarking

Now that the data is ready for use, we can perform the tests! But first, let’s take a quick look at the type of results that are returned by each function.


Points intersections

First, we will test the intersects only with the points.


res1 <- data.frame(n = samp,
                   raster_intersect = numeric(nSamp),
                   rgeos_gIntersects = numeric(nSamp),
                   rgeos_gIntersection = numeric(nSamp),
                   sf_st_intersects = numeric(nSamp),
                   sf_st_intersection = numeric(nSamp))

for(i in 1:nSamp) {
    res1$raster_intersect[i] <- system.time(raster::intersect(gridSP, get(sampNamesSP[i])))['elapsed']
    res1$rgeos_gIntersects[i] <- system.time(rgeos::gIntersects(gridSP, get(sampNamesSP[i]), byid = T))['elapsed']
    res1$rgeos_gIntersection[i] <- system.time(rgeos::gIntersection(gridSP, get(sampNamesSP[i]), byid = T))['elapsed']
    res1$sf_st_intersects[i] <- system.time(sf::st_intersects(gridSF, get(sampNamesSF[i])))['elapsed']
    res1$sf_st_intersection[i] <- system.time(sf::st_intersection(gridSF, get(sampNamesSF[i])))['elapsed']
}
# Visualize results table
knitr::kable(res1, "html", digits = 2) %>% kable_styling(full_width = F)
n raster_intersect rgeos_gIntersects rgeos_gIntersection sf_st_intersects sf_st_intersection
1 0.06 0.05 0.05 0.01 0.01
10 0.03 0.03 0.17 0.00 0.01
50 0.07 0.03 0.67 0.01 0.01
100 0.03 0.04 1.34 0.01 0.03
250 0.04 0.03 3.47 0.02 0.03
500 0.05 0.04 7.49 0.02 0.06
1000 0.06 0.05 14.55 0.01 0.12
10000 0.44 0.31 139.51 0.04 0.85

# Visualize results
cols <- c("#3fb3b2", "#484f42", "#ffdd55", "#c7254e", "#1b95e0")
layout(matrix(c(1,2), ncol =2), widths = c(5,2), heights = 3)
par(mar = c(4,4,1,1), family = "serif", las=1)
#
plot(res1$raster_intersect ~ res1$n, type = 'l', lwd = 2, col = cols[1], ylim = c(0,max(res1[,2:6])), xlab = 'Number of points', ylab = 'Time (s)')
lines(res1$rgeos_gIntersects ~ res1$n, lwd = 2, col = cols[2])
lines(res1$rgeos_gIntersection ~ res1$n, lwd = 2, col = cols[3])
lines(res1$sf_st_intersects ~ res1$n, lwd = 2, col = cols[4])
lines(res1$sf_st_intersection ~ res1$n, lwd = 2, col = cols[5])
# Legend
par(mar = c(0,0,0,0), family = "serif", las = 1)
eplot()
legend(x = 'center', legend = as.character(colnames(res1)[-1]), bty = 'n', lty = 1, col = cols, seg.len = 2, cex = 0.8, title = expression(bold('Fonctions')))


# Visualize results without gIntersection
layout(matrix(c(1,2), ncol =2), widths = c(5,2), heights = 3)
par(mar = c(4,4,1,1), family = "serif", las = 1)
#
plot(res1$raster_intersect ~ res1$n, type = 'l', lwd = 2, col = cols[1], ylim = c(0,max(res1[,c(2,3,5,6)])), xlab = 'Number of points', ylab = 'Time (s)')
lines(res1$rgeos_gIntersects ~ res1$n, lwd = 2, col = cols[2])
lines(res1$sf_st_intersects ~ res1$n, lwd = 2, col = cols[4])
lines(res1$sf_st_intersection ~ res1$n, lwd = 2, col = cols[5])
# Legend
par(mar = c(0,0,0,0), family = "serif")
eplot()
legend(x = 'center', legend = as.character(colnames(res1)[-c(1,4)]), bty = 'n', lty = 1, col = cols[-3], seg.len = 2, cex = 0.8, title = expression(bold('Fonctions')))


In this analysis rgeos::gIntersection is clearly much less efficient than the alternative options. Using raster::intersect, rgeos::gIntersects, sf::st_intersects or sf::st_intersection significantly decreases calculation time, with sf::st_intersects proving to be the most efficient option.


Polygons intersections

Now let’s take a look at intersects using polygons only.


res2 <- data.frame(n = samp,
                   raster_intersect = numeric(nSamp),
                   rgeos_gIntersects = numeric(nSamp),
                   rgeos_gIntersection = numeric(nSamp),
                   sf_st_intersects = numeric(nSamp),
                   sf_st_intersection = numeric(nSamp))

for(i in 1:nSamp) {
    res2$raster_intersect[i] <- system.time(raster::intersect(gridSP, get(sampNamesSPPoly[i])))['elapsed']
    res2$rgeos_gIntersects[i] <- system.time(rgeos::gIntersects(gridSP, get(sampNamesSP[i]), byid = T))['elapsed']
    res2$rgeos_gIntersection[i] <- system.time(rgeos::gIntersection(gridSP, get(sampNamesSPPoly[i]), byid = T))['elapsed']
    res2$sf_st_intersects[i] <- system.time(sf::st_intersects(gridSF, get(sampNamesSFPoly[i])))['elapsed']
    res2$sf_st_intersection[i] <- system.time(sf::st_intersection(gridSF, get(sampNamesSFPoly[i])))['elapsed']
}
# Visualize results table
knitr::kable(res2, "html", digits = 2) %>% kable_styling(full_width = F)
n raster_intersect rgeos_gIntersects rgeos_gIntersection sf_st_intersects sf_st_intersection
1 0.06 0.04 0.05 0.01 0.01
10 0.08 0.03 0.28 0.01 0.02
50 0.60 0.03 1.02 0.01 0.05
100 1.78 0.03 2.05 0.01 0.10
250 5.11 0.03 5.02 0.03 0.32
500 10.41 0.03 10.21 0.03 0.47
1000 20.33 0.04 21.51 0.05 0.99
10000 211.44 0.26 212.33 0.47 9.97

# Visualize results
layout(matrix(c(1,2), ncol =2), widths = c(5,2), heights = 3)
par(mar = c(4,4,1,1), las =1, family = "serif")
##
plot(res2$raster_intersect ~ res2$n, type = 'l', lwd = 2, col = cols[1], ylim = c(0,max(res2[,2:6])), xlab = 'Number of polygons', ylab = 'Time (s)')
lines(res2$rgeos_gIntersects ~ res1$n, lwd = 2, col = cols[2])
lines(res2$rgeos_gIntersection ~ res2$n, lwd = 2, col = cols[3])
lines(res2$sf_st_intersects ~ res2$n, lwd = 2, col = cols[4])
lines(res2$sf_st_intersection ~ res2$n, lwd = 2, col = cols[5])
# Legend
par(mar = c(0,0,0,0), family = "serif")
eplot()
legend(x = 'center', legend = as.character(colnames(res2)[-1]), bty = 'n', lty = 1, col = cols, seg.len = 2, cex = 0.8, title = expression(bold('Fonctions')))


# Visualize results without gIntersection
layout(matrix(c(1,2), ncol =2), widths = c(5,2), heights = 3)
par(mar = c(4,4,1,1), family = "serif", las = 1)
#
plot(res2$rgeos_gIntersects ~ res2$n, type = 'l', lwd = 2, col = cols[2], ylim = c(0,max(res2[,c(3,5,6)])), xlab = 'Number of points', ylab = 'Time (s)')
lines(res2$sf_st_intersects ~ res2$n, lwd = 2, col = cols[4])
lines(res2$sf_st_intersection ~ res2$n, lwd = 2, col = cols[5])
# Legend
par(mar = c(0,0,0,0), family = "serif")
eplot()
legend(x = 'center', legend = as.character(colnames(res1)[-c(1,2,4)]), bty = 'n', lty = 1, col = cols[-c(1,3)], seg.len = 2, cex = 0.8, title = expression(bold('Fonctions')))


We see here that rgeos::gIntersects, sf::st_intersects, sf::st_intersection are far more efficient when dealing with polygons only intersects, with rgeos::gIntersects the most efficient option. raster::intersects loses its previous efficiency, while the efficiency of rgeos::gIntersection decreases even further.


Concluding remarks

Et voilà! It is obvious from these simulations that the sf package overall provides the most efficient options to perform spatial intersects in R. rgeos is also very efficient when it comes to binary intersects, especially with polygons on polygons intersects where it edges st_intersects by decreasing calculation time in half.

Our recommendation: use sf::st_intersects for binary intersects and use sf::st_intersection for zonal intersects. However, be aware that the sf package evolves very rapidly and functions are likely to be modified, although one would hope that efficiency decrease will not be the price of further development.

If you wish to stick with the older packages, then binary intersects could be done quite efficiently, but if you need zonal intersects, we recommend that you start considering changing your ways!


Fish and tips 002 {R}

November 12, 2017

R tips

Gantt charts in R

September 20, 2017

R time management gantt

Animations in R: Time series of erythemal irradiance in the St. Lawrence

July 5, 2017

R plot animation NASA radiation St. Lawrence