R in Space - About spatial data

April 12, 2018

  R in Space R Spatial

  raster sf sp

David Beauchesne   Marie-Hélène Brice   Nicolas Casajus   Kevin Cazelles   Elliot Dreujou   Steve Vissault  

Spatial data refer to phenomenon or information that can be observed geographically. What a definition! In short spatial data concern every information that could be reported on a map, i.e. on a two-dimensions geographical coordinates system. Two kinds of spatial data exist: vector data and raster data. Generally vectors refer to discrete locations, i.e. objects with boundaries (for instance a city, a road, a country) whereas raster data refer to continuous phenomenon that can be observed everywhere, but without natural boundaries (for instance the surface temperature). Let’s take a look at their characteristics.

Vector data

Vector data consist of the description of a spatially explicit phenomenon geometry (position and coordinates of its boundaries in a specific spatial referential). In addition to this geometry, vector data may contain variables (or attributes) with additional information about the phenomenon at each location. For instance, cities of a country are vector data (points) and attributes could be cities names, population sizes, etc. Borders of a country are also vector data (polygons) and could contain the name of the country, the population for a given year, and the mean number of crimes as attributes.

There are main symbol types for vector data: points, lines and polygons. The geometry of these data structures consists of sets of geographical coordinate pairs (longitude, latitude) in a specific Coordinate Reference System (CRS).

Points data

Points are the simplest vector object. Each point has one two-dimensional coordinates, and several associated variables. For instance, a point might represent a location where an animal was trapped, and the attributes could include the capture date, the size, the sex, and information about the physical environment.

Lines data

Lines are the second type of vector data. A line is a shape consisting of one or several segments (or paths) and each segment has two noteworthy points (or vertex): a starting point and an ending point. Note that the ending point of a segment will have the same coordinates as the starting point of the following segment in the case of a line consisting of connected segments. In other words, a line with one segment is defined by two coordinates pairs (longitude and latitude). A line with two connected segments is defined by three coordinates pairs and a line with two isolated segments is defined by four coordinates pairs. Thus, the representation of a line is very similar to that of a collection of points. The main difference is that the ordering of the points is important, because we need to know which points should be connected two-by-two.

Polygons data

Polygons are closed spatial lines where the coordinates of the starting point of the first segment is identical to the ending point of the last segment. The geometry of a polygon is very similar to that of a line but polygons have three characteristics:

  1. a polygon cannot self-intersect (whereas a line can),

  2. a polygon can contain hole (think about the state of Vatican overlapping the country of Italy but considered as a separated polygon),

  3. as a closed feature, a polygon has an inside and a border.

Vector data are generally stored in ESRI Shapefile, GeoJSON, KML or Simple Features files. In R, two main packages exist to handle spatial vector data: sp and sf. Note that the package rgdal will be used to import/export vector data (go next post to learn more).

Raster data

Raster data are commonly used to represent continuous phenomenon that can be observed everywhere, but without natural boundaries (but with artificial boundaries defined by the user). A raster consists of a grid of equally sized cells (or pixels) that all have a values (or a missing value) for one single variable.

Unlike vector data, the geometry of raster data is not explicitly stored as coordinates. Rather it is implicitly set by setting the spatial extent and the number of rows and columns of a regular grid. From this spatial information, the size of the raster cells (spatial resolution) can be computed. Working with raster data will therefore be more efficient than working with polygons data.

Raster can be used to represent a very broad range of data: continuous (temperature values), discrete (habitat classes) or even binary (species occurrence) variables.

Several file formats exist to store raster data. The most commons are: GeoTIFF, NetCDF, grd and ascii formats. Even the package rgdal can be used to import/export raster data, we will prefer the package raster.

Vector objects in R

In this section we present two packges: sp and sf. The sp package actualy defines classes for both vector and raster objects. Below, we however focus on the vector ones and so we do not detail SpatialGrid and SpatialPixels objects. Also note that sf “[…] aims at succeeding sp in the long term” (Simple Features for R, sf vignette).

Let first create a data frame mydata:

mylon <- -82+2*runif(20)
mylat <- 42+2*runif(20)
mydata <- data.frame(
  lon = mylon,
  lat = mylat,
  var1 = rnorm(20),
  var2 = 10*runif(20)

Let’s have a look at thus data frame:

lon lat var1 var2
-80.57065 43.68849 -0.7315385 7.329688
-80.40040 42.08281 0.0099957 4.949049
-81.54197 42.62232 -0.3790755 3.818837
-80.06123 43.70764 2.1879284 6.555559
-80.13286 42.48617 0.4721329 5.239393
-81.89590 42.50583 -0.4817928 9.654585

Package sp


The table below includes a description of the classes for points, lines and polygons. Basically all these classes work the same way. For instance, in order to define a SpatialPointsDataPoints object, three elements are required: a set of coordinates, a Coordinate Reference System (CRS) and an attribute table. Intermediate class are also defined (for instance points + CRS = SpatialPoints) and the name of the class is also the name of the function to be called.

Classes and functions Contents
Points list of points (set of coordinates)
SpatialPoints list of points + CRS
SpatialPointsDataPoints list of points + CRS + attribute table
Line a line (set of coordinates)
Lines list of lines
SpatialLines list of lines + CRS
SpatialLinesDataFrame list of lines + CRS + attribute table
Polygon a line (set of coordinates)
Polygons list of lines
SpatialPolygons list of lines + CRS
SpatialPolygonsDataFrame list of lines + CRS + attribute table


As an more tangible example, let’s now create a SpatialPointsDataFrame based on our data frame mydata.

mysp <- SpatialPointsDataFrame(
  coords = mydata[, 1:2],
  data = mydata[, 3:4],
  proj4string = CRS(
    "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"

Note that proj4 is used and we therefore wrote a string that describes the CRS and that proj4 understands. Below are listed some properties of the object we have defined.

#R> [1] TRUE
#R> [1] "SpatialPointsDataFrame"
#R> attr(,"package")
#R> [1] "sp"
#R> [1] "data"        "coords.nrs"  "coords"      "bbox"        "proj4string"
#R> [1] 20  2

Basically, it is a S4 object of class SpatialPointsDataFrame. All slot names refer to attribute that are accessible via and @:

#R> CRS arguments:
#R>  +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0
#R>           var1     var2
#R> 1 -0.731538531 7.329688
#R> 2  0.009995721 4.949049
#R> 3 -0.379075541 3.818837
#R> 4  2.187928401 6.555559
#R> 5  0.472132880 5.239393
#R> 6 -0.481792820 9.654586

In order to change projection, the user must call spTransform(), like so:

(mysp2 <- spTransform(mysp, CRS=CRS("+proj=merc +ellps=GRS80")))
#R> class       : SpatialPointsDataFrame 
#R> features    : 20 
#R> extent      : -9122725, -8912376, 5137114, 5426871  (xmin, xmax, ymin, ymax)
#R> coord. ref. : +proj=merc +ellps=GRS80 
#R> variables   : 2
#R> names       :              var1,              var2 
#R> min values  : -1.81620111914297, 0.444704792462289 
#R> max values  :  2.18792840079437,  9.65458554914221

Package sf

Below is a very short overview of classes in sf, the reader that requires further explanation would find more details on the first vignette of sf. Basically three classes are defined: sf, sfc and sfg.

Class sf

pts_sf <- st_as_sf(
  x = mydata,
  coords = c("lon", "lat"),
  crs = 4326)

Let’s examine its class

#R> [1] "sf"         "data.frame"

Class sfc

pts_sfc <- st_geometry(pts_sf)
#R> [1] "sfc_POINT" "sfc"

Class sfg

(x <- st_point(c(1,2)))
#R> [1] "XY"    "POINT" "sfg"

How to import a sp object

st_as_sf() can also be used to convert a sp object into a sf one.

#R> Simple feature collection with 20 features and 2 fields
#R> geometry type:  POINT
#R> dimension:      XY
#R> bbox:           xmin: -81.95083 ymin: 42.03171 xmax: -80.06123 ymax: 43.94239
#R> epsg (SRID):    4326
#R> proj4string:    +proj=longlat +datum=WGS84 +no_defs
#R> First 10 features:
#R>            var1     var2                   geometry
#R> 1  -0.731538531 7.329688 POINT (-80.57065 43.68849)
#R> 2   0.009995721 4.949049  POINT (-80.4004 42.08281)
#R> 3  -0.379075541 3.818837 POINT (-81.54197 42.62232)
#R> 4   2.187928401 6.555559 POINT (-80.06123 43.70764)
#R> 5   0.472132880 5.239393 POINT (-80.13286 42.48617)
#R> 6  -0.481792820 9.654586  POINT (-81.8959 42.50583)
#R> 7  -1.290596994 5.735338 POINT (-80.33147 42.64487)
#R> 8   2.162975539 7.793934 POINT (-80.22192 42.19699)
#R> 9   0.713897043 9.606329 POINT (-81.58998 42.04869)
#R> 10 -0.060456859 5.500665  POINT (-80.1472 43.11689)

Raster objects in R

The R package raster provides three main classes of raster object (more details here):

  1. RasterLayer imports a single-layer (variable) raster,
  2. RasterStack imports in one single object several single-layer (variable) rasters stored in one or different files,
  3. RasterBrick imports in one single object several single-layer (variable) rasters stored in one single file.

Using RasterStack and RasterBrick requires that the geometry of all raster data is equal.

Package raster define three classes of rater object we detail below.


val1 <- matrix(runif(100*100,0,10), ncol = 100, nrow = 100)
ras1 <- raster(
    crs = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"),
    xmn = -82, xmx = +80, ymn = +42, ymx = +44
#R> [1] "RasterLayer"
#R> attr(,"package")
#R> [1] "raster"
#R> [1] 100 100   1
#R> [1] 5.994056 7.130506 9.532949 0.224109 6.618390 1.629529
#R> [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

RasterStack and RasterBrick


Let’s first create another raster (with the same CRS)

val2 <- matrix(rnorm(100*100), ncol = 100, nrow = 100)
ras2 <- raster(
    crs = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"),
    xmn = -82, xmx = +80, ymn = +42, ymx = +44
#R> [1] "RasterLayer"
#R> attr(,"package")
#R> [1] "raster"

Let’s stack() ras1 and ras2:

sta1 <- stack(ras1, ras2)
#R> [1] "RasterStack"
#R> attr(,"package")
#R> [1] "raster"

Let’s brick() them:

bri1 <- brick(ras1, ras2)
#R> [1] "RasterBrick"
#R> attr(,"package")
#R> [1] "raster"