R in Space - Attribute manipulations

April 9, 2018

  R in Space R Spatial

David Beauchesne   Marie-Hélène Brice   Nicolas Casajus   Kevin Cazelles   Elliot Dreujou   Steve Vissault  


Spatial objects attributes manipulations

Now that we know how to import and transform different classes of spatial objects in R, we can start manipulating their attributes. In this post, we will make a brief overview of some useful basic manipulations that we can perform on spatial object attributes. By no means are those examples exhaustive, but they constitute common manipulations that are made with spatial objects.


Vector objects

For this part, we discuss how to manipulate attributes of objects from the sf package. sf objects have the advantage of being structured like data frames, making their manipulations more intuitive than for objects of class sp. However, if the structure of sp objects is well understood, then the same principles will mostly apply.

Let’s begin by creating an sf points object as seen in the post on Spatial objects in R

mydata <- data.frame(
  id = 1:20,
  long = -82+2*runif(20),
  lat = 42+2*runif(20),
  var1 = rnorm(20),
  var2 = 10*runif(20)
)

library(sf)
## Linking to GEOS 3.5.0, GDAL 2.2.2, proj.4 4.8.0
spatData <- st_as_sf(mydata,
                     coords = c("long", "lat"),
                     crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0")

knitr::kable(head(spatData))
id var1 var2 geometry
1 0.9270980 0.6714380 c(-80.3088961294852, 43.8184660021216)
2 -1.1313136 1.5877346 c(-81.2571280682459, 42.3603247725405)
3 1.0497236 0.5693127 c(-81.119451764971, 42.3756320239045)
4 1.2614181 0.5446427 c(-81.9388353168033, 42.9843103890307)
5 -0.6287195 0.4639034 c(-81.4983005775139, 42.3819039552473)
6 0.3794677 1.9562217 c(-81.9701790879481, 43.1425468740053)
plot(spatData)

Adding and removing attributes

The object we currently have has to variables var1 and var2. Additional attributes can quickly be added to the attributes table of our objects.

spatData$var3 <- runif(20)
spatData$var4 <- spatData$var1 * spatData$var2
knitr::kable(head(spatData))
id var1 var2 geometry var3 var4
1 0.9270980 0.6714380 c(-80.3088961294852, 43.8184660021216) 0.7619921 0.6224888
2 -1.1313136 1.5877346 c(-81.2571280682459, 42.3603247725405) 0.5036308 -1.7962258
3 1.0497236 0.5693127 c(-81.119451764971, 42.3756320239045) 0.4632262 0.5976210
4 1.2614181 0.5446427 c(-81.9388353168033, 42.9843103890307) 0.1061850 0.6870222
5 -0.6287195 0.4639034 c(-81.4983005775139, 42.3819039552473) 0.4507760 -0.2916651
6 0.3794677 1.9562217 c(-81.9701790879481, 43.1425468740053) 0.7198817 0.7423230


Similarly, unwanted columns can be removed.

spatData$var3 <- spatData$var4 <- NULL
knitr::kable(head(spatData))
id var1 var2 geometry
1 0.9270980 0.6714380 c(-80.3088961294852, 43.8184660021216)
2 -1.1313136 1.5877346 c(-81.2571280682459, 42.3603247725405)
3 1.0497236 0.5693127 c(-81.119451764971, 42.3756320239045)
4 1.2614181 0.5446427 c(-81.9388353168033, 42.9843103890307)
5 -0.6287195 0.4639034 c(-81.4983005775139, 42.3819039552473)
6 0.3794677 1.9562217 c(-81.9701790879481, 43.1425468740053)


However, if you have a very big dataset, you may want to remove columns without having to write all column names manually! You could do this based on the names of the columns you wish to remove or, alternatively, on the names of the attributes you wish to keep.

# Remove last 20 attributes
for(i in 1:5) spatData <- cbind(spatData, varSup = runif(20))
knitr::kable(head(spatData))
id var1 var2 varSup varSup.1 varSup.2 varSup.3 varSup.4 geometry
1 0.9270980 0.6714380 0.9345434 0.3339886 0.5532299 0.9324183 0.2317484 c(-80.3088961294852, 43.8184660021216)
2 -1.1313136 1.5877346 0.9654768 0.8124204 0.6443058 0.7945957 0.7358402 c(-81.2571280682459, 42.3603247725405)
3 1.0497236 0.5693127 0.0210233 0.3770851 0.7555289 0.5885625 0.4625502 c(-81.119451764971, 42.3756320239045)
4 1.2614181 0.5446427 0.8962263 0.6051263 0.5127991 0.1748355 0.0850028 c(-81.9388353168033, 42.9843103890307)
5 -0.6287195 0.4639034 0.3161982 0.7398729 0.3810332 0.7759428 0.3961464 c(-81.4983005775139, 42.3819039552473)
6 0.3794677 1.9562217 0.8236935 0.9354838 0.1034101 0.1026595 0.9305186 c(-81.9701790879481, 43.1425468740053)
rem <- colnames(spatData)[4:8]
spatData <- spatData[, !colnames(spatData) %in% rem]
knitr::kable(head(spatData))
id var1 var2 geometry
1 0.9270980 0.6714380 c(-80.3088961294852, 43.8184660021216)
2 -1.1313136 1.5877346 c(-81.2571280682459, 42.3603247725405)
3 1.0497236 0.5693127 c(-81.119451764971, 42.3756320239045)
4 1.2614181 0.5446427 c(-81.9388353168033, 42.9843103890307)
5 -0.6287195 0.4639034 c(-81.4983005775139, 42.3819039552473)
6 0.3794677 1.9562217 c(-81.9701790879481, 43.1425468740053)
# Keep id, var1 and var2
for(i in 1:5) spatData <- cbind(spatData, varSup = runif(20))
knitr::kable(head(spatData))
id var1 var2 varSup varSup.1 varSup.2 varSup.3 varSup.4 geometry
1 0.9270980 0.6714380 0.7956143 0.4239242 0.3585274 0.1722828 0.2390317 c(-80.3088961294852, 43.8184660021216)
2 -1.1313136 1.5877346 0.9159461 0.2287298 0.5725111 0.6742238 0.1165937 c(-81.2571280682459, 42.3603247725405)
3 1.0497236 0.5693127 0.5001332 0.7603395 0.1397776 0.8450948 0.2562135 c(-81.119451764971, 42.3756320239045)
4 1.2614181 0.5446427 0.3779519 0.4266593 0.7271035 0.7542253 0.2042067 c(-81.9388353168033, 42.9843103890307)
5 -0.6287195 0.4639034 0.4032054 0.5502505 0.6024036 0.5657860 0.6090533 c(-81.4983005775139, 42.3819039552473)
6 0.3794677 1.9562217 0.0644541 0.3666157 0.5652170 0.1126947 0.0203621 c(-81.9701790879481, 43.1425468740053)
keep <- c('id','var1','var2')
spatData <- spatData[, keep]
knitr::kable(head(spatData))
id var1 var2 geometry
1 0.9270980 0.6714380 c(-80.3088961294852, 43.8184660021216)
2 -1.1313136 1.5877346 c(-81.2571280682459, 42.3603247725405)
3 1.0497236 0.5693127 c(-81.119451764971, 42.3756320239045)
4 1.2614181 0.5446427 c(-81.9388353168033, 42.9843103890307)
5 -0.6287195 0.4639034 c(-81.4983005775139, 42.3819039552473)
6 0.3794677 1.9562217 c(-81.9701790879481, 43.1425468740053)


Subsets

You may also wish to subset your object based on certain attribute values. We will begin by adding some factorial attributes to our spatial object to discuss this in more detail.

spatData$fact1 <- paste0('a', 1:2) # Create factor with 2 levels
spatData$fact2 <- paste0('b', 1:5) # Create factor with 5 levels
knitr::kable(head(spatData))
id var1 var2 geometry fact1 fact2
1 0.9270980 0.6714380 c(-80.3088961294852, 43.8184660021216) a1 b1
2 -1.1313136 1.5877346 c(-81.2571280682459, 42.3603247725405) a2 b2
3 1.0497236 0.5693127 c(-81.119451764971, 42.3756320239045) a1 b3
4 1.2614181 0.5446427 c(-81.9388353168033, 42.9843103890307) a2 b4
5 -0.6287195 0.4639034 c(-81.4983005775139, 42.3819039552473) a1 b5
6 0.3794677 1.9562217 c(-81.9701790879481, 43.1425468740053) a2 b1


The most simple way to subset an attributes table would be to manually select the rows that we wish to view. In this instance, let’s say we only wish to use the first 10 rows of our data.

selectID <- 1:10
plot(spatData$geometry, col = '#00000055', pch = 20, cex = 1.25, main = '')
plot(spatData$geometry[selectID], col = '#000000', add = T, pch = 1, cex = 2, lwd = 2)


However, subsets based on certain criteria, e.g. all values greater to or equal to 0, can be much more efficient to create subsets. These are conditional statements and there is a vast body of material available discussing them, so we will only present a few examples and invite you to consult other resources like StackOverflow for more specific questions.

# Select all values for var1 greater to or equal to 0
selectID <- spatData$var1 >= 0
plot(spatData$geometry, col = '#00000055', pch = 20, cex = 1.25, main = '')
plot(spatData$geometry[selectID], col = '#000000', add = T, pch = 1, cex = 2, lwd = 2)

# var1 smaller than 0 and var2 higher than 5
selectID <- spatData$var1 < 0 & spatData$var2 < 5
plot(spatData$geometry, col = '#00000055', pch = 20, cex = 1.25, main = '')
plot(spatData$geometry[selectID], col = '#000000', add = T, pch = 1, cex = 2, lwd = 2)

# fact1 equal to a1
selectID <- spatData$fact1 == 'a1'
plot(spatData$geometry, col = '#00000055', pch = 20, cex = 1.25, main = '')
plot(spatData$geometry[selectID], col = '#000000', add = T, pch = 1, cex = 2, lwd = 2)

# fact1 equal to a1 or var1 greater than 0
selectID <- spatData$fact1 == 'a1' | spatData$var1 > 0
plot(spatData$geometry, col = '#00000055', pch = 20, cex = 1.25, main = '')
plot(spatData$geometry[selectID], col = '#000000', add = T, pch = 1, cex = 2, lwd = 2)

# fact2 equal to b3 or b4
selectID <- spatData$fact2 %in% c('b3','b4')
plot(spatData$geometry, col = '#00000055', pch = 20, cex = 1.25, main = '')
plot(spatData$geometry[selectID], col = '#000000', add = T, pch = 1, cex = 2, lwd = 2)


Join

Joining tables based on their shared id is another common manipulation. This can be quickly accomplished using the left_join function from the dplyr package.

# Create data.frame with id field similar to that in the spatial object
joinData <- data.frame(id = seq(1, 20, by = 2),
                       var3 = rnorm(10))

# Join with attributes table of spatial object
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
spatData <- left_join(spatData, joinData)
## Joining, by = "id"
knitr::kable(head(spatData))
id var1 var2 fact1 fact2 var3 geometry
1 0.9270980 0.6714380 a1 b1 0.5323896 c(-80.3088961294852, 43.8184660021216)
2 -1.1313136 1.5877346 a2 b2 NA c(-81.2571280682459, 42.3603247725405)
3 1.0497236 0.5693127 a1 b3 -1.5397219 c(-81.119451764971, 42.3756320239045)
4 1.2614181 0.5446427 a2 b4 NA c(-81.9388353168033, 42.9843103890307)
5 -0.6287195 0.4639034 a1 b5 0.0717289 c(-81.4983005775139, 42.3819039552473)
6 0.3794677 1.9562217 a2 b1 NA c(-81.9701790879481, 43.1425468740053)


Aggregate

Information contained in a table can also be used to aggregate

aggData <- aggregate(spatData['var1'], by = list(spatData$fact1), FUN = sum)
knitr::kable(head(aggData))
Group.1 var1 geometry
a1 2.1976609 c(-81.9729002397507, -81.9549671234563, -81.8189416350797, -81.6736573562957, -81.4983005775139, -81.119451764971, -80.7948883166537, -80.6914501325227, -80.598174369894, -80.3088961294852, 43.5758538492955, 42.2510251891799, 43.8191140932031, 43.5807057088241, 42.3819039552473, 42.3756320239045, 43.3794603967108, 43.4588852208108, 42.0354972006753, 43.8184660021216)
a2 0.7278787 c(-81.9701790879481, -81.9477061666548, -81.9388353168033, -81.7660956108011, -81.5866447654553, -81.2571280682459, -81.1475938907824, -80.817894898355, -80.5272745699622, -80.5205666976981, 43.1425468740053, 42.8101213369519, 42.9843103890307, 43.1564394352026, 42.5906838919036, 42.3603247725405, 42.1014717705548, 43.4995718039572, 43.6571002476849, 42.2294469410554)
plot(aggData, cex = abs(aggData$var1))

Raster objects

Attributes tables associated with raster objects can also be manipulated, although the data attached to a raster will typically be less amenable to changes and multiple parameters.

Setting values

Subsets