We are increasingly performing spatial analyses in R. The replicability and the efficiency of programming languages is much more appealing than using user friendly softwares like ArcGIS, even though you can still code your way through analyses when using those softwares (latter versions of QGIS do a fantastic job in that regard!). The performance of tools available for spatial analyses in R is however not completely certain.
In this post, we compare four different methods to perform spatial intersects between objects in R, from three different packages:
raster::intersect
rgeos::gIntersects
rgeos::gIntersection
sf::st_intersects
sf::st_intersection
More specifically, we test how these methods fare when performing binary (TRUE/FALSE) and zonal or aerial intersects. Keep in mind, not all methods can be used for both binary and zonal intersects:
Function  Binary  Zonal 

raster::intersect

X  X 
rgeos::gIntersects

X  
rgeos::gIntersection

X  X 
sf::st_intersects

X  
sf::st_intersection

X  X 
Obviously, if you mean to perform binary intersects only, the binary functions make more sense as they are built to include less calculations. We nonetheless compare all the functions together for the sake of comparison in this post.
# R version
sessionInfo()[[1L]]$version.string
#R> [1] "R version 3.5.1 (20181212)"
#
rm(list=ls())
library(sf)
library(rgdal)
library(sp)
library(raster)
library(rgeos)
library(magrittr)
library(knitr)
library(kableExtra)
##
opts_chunk$set(fig.align='center')
##
# Empty plot function
eplot < function(x = 1, y = 1, xmin = 0, xmax = 1, ymin = 0, ymax = 1) {
plot(x = x,y = y,bty = "n",ann = FALSE,xaxt = "n",yaxt = "n",type = "n",bg = 'grey',ylim = c(ymin,ymax),xlim = c(xmin,xmax))
}
Version of packages that have been used to build this post and are herein discussed:
packageVersion("raster")
packageVersion("rgeos")
packageVersion("sf")
#R> [1] '2.8.4'
#R> [1] '0.4.2'
#R> [1] '0.7.1'
We start be generating random spatial object in space. For the record, the area selected is within the St. Lawrence estuary in eastern Canada (see online ecology series), although the actual location really does not matter for this post!
We use a regular grid to intersect vectorized data, i.e. points and polygons for this post. This simulate the use of a grid used to extract environmental data (biotic and/or abiotic) from multiple sources to characterize a study area.
# Projection in lambers
projSpat < "+proj=lcc +lat_1=46 +lat_2=60 +lat_0=44 +lon_0=68.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
# Bounding box
latmax < 625000
latmin < 585000
lonmax < 150000
lonmin < 100000
# Create a spatial bounding box for the area of interest using the 'sp' package:
bb < cbind(c(lonmin,lonmax,lonmax,lonmin,lonmin),
c(latmin,latmin,latmax,latmax,latmin)) %>%
Polygon() %>% list() %>%
Polygons(ID = 'ID1') %>% list() %>%
SpatialPolygons(proj4string = CRS(projSpat))
# Create spatial grid that will be used for intersects
A < 5*1000000 # Surface of cells in m^2 (area in km^2 * 1000000)
cellsize < sqrt(2*A)/3^(1/4) # value of the small diagonal
grid < sp::spsample(bb,type="hexagonal",cellsize=cellsize, offset=c(0,0)) %>% # Points for a hexagonal grid
sp::HexPoints2SpatialPolygons() # creating polygons
# We now have a grid of polygons to work with!
plot(grid)
lines(bb, col = 'blue', lwd = 2)
# We need this grid in `sp` and `sf` usable formats
gridSP < grid
gridSF < sf::st_as_sf(grid)
class(gridSP)
#R> [1] "SpatialPolygons"
#R> attr(,"package")
#R> [1] "sp"
class(gridSF)
#R> [1] "sf" "data.frame"
Now we generate random points within the bounding box to test the intersects.
This is done for 1, 10, 50, 100, 250, 500, 1000, 10000
points. Then, to get
all data required to perform the tests, we also need to create polygons from
the point data.
# Number of samples
samp < c(1, 10, 50, 100, 250, 500, 1000, 10000)
nSamp < length(samp)
# Names of points samples
sampNames < paste0('Pt',samp)
sampNames
# Random points for all samples
for(i in 1:nSamp) {
assign(x = sampNames[i],
value = data.frame(lon = runif(n = samp[i], min = lonmin, max = lonmax),
lat = runif(n = samp[i], min = latmin, max = latmax)))
}
# We now have nSamp new objects with randomly generated coordinates
ls()
# Let's now create spatial objects with those coordinates for use with the `sp` package
sampNamesSP < paste0(sampNames, 'SP')
for(i in 1:nSamp) {
assign(
x = sampNamesSP[i],
value = SpatialPoints(coords = get(sampNames[i]), proj4string=CRS(projSpat))
)
}
# Visualize them
par(mfrow = c(3,3))
for(i in 1:nSamp) {
par(mar = c(0,0,0,0))
plot(bb, lwd = 2)
points(get(sampNamesSP[i]), cex = 0.75, col = 'transparent', bg = '#1e6b7955', pch = 21)
}
# Now we generate polygons from the point data using the `rgeos::gBuffer`,
# see `sf::st_buffer` is the equivalent for the `sf` package
sampNamesPoly < paste0('Poly',samp)
sampNamesSPPoly < paste0(sampNamesPoly, 'SP')
for(i in 1:nSamp) {
assign(x = sampNamesSPPoly[i],
value = rgeos::gBuffer(get(sampNamesSP[i]), byid = T, width = 2000))
}
# Visualize them
par(mfrow = c(3,3))
for(i in 1:nSamp) {
par(mar = c(0,0,0,0))
plot(bb, lwd = 2)
plot(get(sampNamesSPPoly[i]), border = 'transparent', col = '#1e6b7955', add = T)
}
# Transform points and polygons for use with the `sf` package
sampNamesSF < paste0(sampNames, 'SF') # for points
sampNamesSFPoly < paste0(sampNamesPoly, 'SF') # for polygons
for(i in 1:nSamp) {
assign(x = sampNamesSF[i],
sf::st_as_sf(get(sampNamesSP[i]))) # points
assign(x = sampNamesSFPoly[i],
sf::st_as_sf(get(sampNamesSPPoly[i]))) # polygons
}
Now that the data is ready for use, we can perform the tests! But first, let’s take a quick look at the type of results that are returned by each function.
First, we will test the intersects only with the points.
res1 < data.frame(n = samp,
raster_intersect = numeric(nSamp),
rgeos_gIntersects = numeric(nSamp),
rgeos_gIntersection = numeric(nSamp),
sf_st_intersects = numeric(nSamp),
sf_st_intersection = numeric(nSamp))
for(i in 1:nSamp) {
res1$raster_intersect[i] < system.time(raster::intersect(gridSP, get(sampNamesSP[i])))['elapsed']
res1$rgeos_gIntersects[i] < system.time(rgeos::gIntersects(gridSP, get(sampNamesSP[i]), byid = T))['elapsed']
res1$rgeos_gIntersection[i] < system.time(rgeos::gIntersection(gridSP, get(sampNamesSP[i]), byid = T))['elapsed']
res1$sf_st_intersects[i] < system.time(sf::st_intersects(gridSF, get(sampNamesSF[i])))['elapsed']
res1$sf_st_intersection[i] < system.time(sf::st_intersection(gridSF, get(sampNamesSF[i])))['elapsed']
}
# Visualize results table
knitr::kable(res1, "html", digits = 2) %>% kable_styling(full_width = F)
n  raster_intersect  rgeos_gIntersects  rgeos_gIntersection  sf_st_intersects  sf_st_intersection 

1  0.06  0.05  0.05  0.01  0.01 
10  0.03  0.03  0.17  0.00  0.01 
50  0.07  0.03  0.67  0.01  0.01 
100  0.03  0.04  1.34  0.01  0.03 
250  0.04  0.03  3.47  0.02  0.03 
500  0.05  0.04  7.49  0.02  0.06 
1000  0.06  0.05  14.55  0.01  0.12 
10000  0.44  0.31  139.51  0.04  0.85 
# Visualize results
cols < c("#3fb3b2", "#484f42", "#ffdd55", "#c7254e", "#1b95e0")
layout(matrix(c(1,2), ncol =2), widths = c(5,2), heights = 3)
par(mar = c(4,4,1,1), family = "serif", las=1)
#
plot(res1$raster_intersect ~ res1$n, type = 'l', lwd = 2, col = cols[1], ylim = c(0,max(res1[,2:6])), xlab = 'Number of points', ylab = 'Time (s)')
lines(res1$rgeos_gIntersects ~ res1$n, lwd = 2, col = cols[2])
lines(res1$rgeos_gIntersection ~ res1$n, lwd = 2, col = cols[3])
lines(res1$sf_st_intersects ~ res1$n, lwd = 2, col = cols[4])
lines(res1$sf_st_intersection ~ res1$n, lwd = 2, col = cols[5])
# Legend
par(mar = c(0,0,0,0), family = "serif", las = 1)
eplot()
legend(x = 'center', legend = as.character(colnames(res1)[1]), bty = 'n', lty = 1, col = cols, seg.len = 2, cex = 0.8, title = expression(bold('Fonctions')))
# Visualize results without gIntersection
layout(matrix(c(1,2), ncol =2), widths = c(5,2), heights = 3)
par(mar = c(4,4,1,1), family = "serif", las = 1)
#
plot(res1$raster_intersect ~ res1$n, type = 'l', lwd = 2, col = cols[1], ylim = c(0,max(res1[,c(2,3,5,6)])), xlab = 'Number of points', ylab = 'Time (s)')
lines(res1$rgeos_gIntersects ~ res1$n, lwd = 2, col = cols[2])
lines(res1$sf_st_intersects ~ res1$n, lwd = 2, col = cols[4])
lines(res1$sf_st_intersection ~ res1$n, lwd = 2, col = cols[5])
# Legend
par(mar = c(0,0,0,0), family = "serif")
eplot()
legend(x = 'center', legend = as.character(colnames(res1)[c(1,4)]), bty = 'n', lty = 1, col = cols[3], seg.len = 2, cex = 0.8, title = expression(bold('Fonctions')))
In this analysis rgeos::gIntersection
is clearly much less efficient than the
alternative options. Using raster::intersect
, rgeos::gIntersects
,
sf::st_intersects
or sf::st_intersection
significantly decreases calculation
time, with sf::st_intersects
proving to be the most efficient option.
Now let’s take a look at intersects using polygons only.
res2 < data.frame(n = samp,
raster_intersect = numeric(nSamp),
rgeos_gIntersects = numeric(nSamp),
rgeos_gIntersection = numeric(nSamp),
sf_st_intersects = numeric(nSamp),
sf_st_intersection = numeric(nSamp))
for(i in 1:nSamp) {
res2$raster_intersect[i] < system.time(raster::intersect(gridSP, get(sampNamesSPPoly[i])))['elapsed']
res2$rgeos_gIntersects[i] < system.time(rgeos::gIntersects(gridSP, get(sampNamesSP[i]), byid = T))['elapsed']
res2$rgeos_gIntersection[i] < system.time(rgeos::gIntersection(gridSP, get(sampNamesSPPoly[i]), byid = T))['elapsed']
res2$sf_st_intersects[i] < system.time(sf::st_intersects(gridSF, get(sampNamesSFPoly[i])))['elapsed']
res2$sf_st_intersection[i] < system.time(sf::st_intersection(gridSF, get(sampNamesSFPoly[i])))['elapsed']
}
# Visualize results table
knitr::kable(res2, "html", digits = 2) %>% kable_styling(full_width = F)
n  raster_intersect  rgeos_gIntersects  rgeos_gIntersection  sf_st_intersects  sf_st_intersection 

1  0.06  0.04  0.05  0.01  0.01 
10  0.08  0.03  0.28  0.01  0.02 
50  0.60  0.03  1.02  0.01  0.05 
100  1.78  0.03  2.05  0.01  0.10 
250  5.11  0.03  5.02  0.03  0.32 
500  10.41  0.03  10.21  0.03  0.47 
1000  20.33  0.04  21.51  0.05  0.99 
10000  211.44  0.26  212.33  0.47  9.97 
# Visualize results
layout(matrix(c(1,2), ncol =2), widths = c(5,2), heights = 3)
par(mar = c(4,4,1,1), las =1, family = "serif")
##
plot(res2$raster_intersect ~ res2$n, type = 'l', lwd = 2, col = cols[1], ylim = c(0,max(res2[,2:6])), xlab = 'Number of polygons', ylab = 'Time (s)')
lines(res2$rgeos_gIntersects ~ res1$n, lwd = 2, col = cols[2])
lines(res2$rgeos_gIntersection ~ res2$n, lwd = 2, col = cols[3])
lines(res2$sf_st_intersects ~ res2$n, lwd = 2, col = cols[4])
lines(res2$sf_st_intersection ~ res2$n, lwd = 2, col = cols[5])
# Legend
par(mar = c(0,0,0,0), family = "serif")
eplot()
legend(x = 'center', legend = as.character(colnames(res2)[1]), bty = 'n', lty = 1, col = cols, seg.len = 2, cex = 0.8, title = expression(bold('Fonctions')))
# Visualize results without gIntersection
layout(matrix(c(1,2), ncol =2), widths = c(5,2), heights = 3)
par(mar = c(4,4,1,1), family = "serif", las = 1)
#
plot(res2$rgeos_gIntersects ~ res2$n, type = 'l', lwd = 2, col = cols[2], ylim = c(0,max(res2[,c(3,5,6)])), xlab = 'Number of points', ylab = 'Time (s)')
lines(res2$sf_st_intersects ~ res2$n, lwd = 2, col = cols[4])
lines(res2$sf_st_intersection ~ res2$n, lwd = 2, col = cols[5])
# Legend
par(mar = c(0,0,0,0), family = "serif")
eplot()
legend(x = 'center', legend = as.character(colnames(res1)[c(1,2,4)]), bty = 'n', lty = 1, col = cols[c(1,3)], seg.len = 2, cex = 0.8, title = expression(bold('Fonctions')))
We see here that rgeos::gIntersects
, sf::st_intersects
, sf::st_intersection
are far more efficient when dealing with polygons only intersects, with
rgeos::gIntersects
the most efficient option. raster::intersects
loses its
previous efficiency, while the efficiency of rgeos::gIntersection
decreases
even further.
Et voilà! It is obvious from these simulations that the sf
package overall
provides the most efficient options to perform spatial intersects in R.
rgeos
is also very efficient when it comes to binary intersects, especially
with polygons on polygons intersects where it edges st_intersects
by
decreasing calculation time in half.
Our recommendation: use sf::st_intersects
for binary intersects and
use sf::st_intersection
for zonal intersects. However, be aware that the sf
package evolves very rapidly and functions are likely to be modified,
although one would hope that efficiency decrease will not be the price of
further development.
If you wish to stick with the older packages, then binary intersects could be done quite efficiently, but if you need zonal intersects, we recommend that you start considering changing your ways!