Spatial data refer to phenomenon or information that can be observed geographically. What a definition! In short spatial data concern every information that could be reported on a map, i.e. on a two-dimensions geographical coordinates system. Two kinds of spatial data exist: vector data and raster data. Generally vectors refer to discrete locations, i.e. objects with boundaries (for instance a city, a road, a country) whereas raster data refer to continuous phenomenon that can be observed everywhere, but without natural boundaries (for instance the surface temperature). Let’s take a look at their characteristics.
Vector data consist of the description of a spatially explicit phenomenon geometry (position and coordinates of its boundaries in a specific spatial referential). In addition to this geometry, vector data may contain variables (or attributes) with additional information about the phenomenon at each location. For instance, cities of a country are vector data (points) and attributes could be cities names, population sizes, etc. Borders of a country are also vector data (polygons) and could contain the name of the country, the population for a given year, and the mean number of crimes as attributes.
There are main symbol types for vector data: points, lines and polygons. The geometry of these data structures consists of sets of geographical coordinate pairs (longitude, latitude) in a specific Coordinate Reference System (CRS).
Points are the simplest vector object. Each point has one two-dimensional coordinates, and several associated variables. For instance, a point might represent a location where an animal was trapped, and the attributes could include the capture date, the size, the sex, and information about the physical environment.
Lines are the second type of vector data. A line is a shape consisting of one or several segments (or paths) and each segment has two noteworthy points (or vertex): a starting point and an ending point. Note that the ending point of a segment will have the same coordinates as the starting point of the following segment in the case of a line consisting of connected segments. In other words, a line with one segment is defined by two coordinates pairs (longitude and latitude). A line with two connected segments is defined by three coordinates pairs and a line with two isolated segments is defined by four coordinates pairs. Thus, the representation of a line is very similar to that of a collection of points. The main difference is that the ordering of the points is important, because we need to know which points should be connected two-by-two.
Polygons are closed spatial lines where the coordinates of the starting point of the first segment is identical to the ending point of the last segment. The geometry of a polygon is very similar to that of a line but polygons have three characteristics:
a polygon cannot self-intersect (whereas a line can),
a polygon can contain hole (think about the state of Vatican overlapping the country of Italy but considered as a separated polygon),
as a closed feature, a polygon has an inside and a border.
Vector data are generally stored in ESRI Shapefile, GeoJSON, KML or Simple Features files. In R, two main packages exist to handle spatial vector data:
sf. Note that the package
rgdal will be used to import/export vector data (go next post to learn more).
Raster data are commonly used to represent continuous phenomenon that can be observed everywhere, but without natural boundaries (but with artificial boundaries defined by the user). A raster consists of a grid of equally sized cells (or pixels) that all have a values (or a missing value) for one single variable.
Unlike vector data, the geometry of raster data is not explicitly stored as coordinates. Rather it is implicitly set by setting the spatial extent and the number of rows and columns of a regular grid. From this spatial information, the size of the raster cells (spatial resolution) can be computed. Working with raster data will therefore be more efficient than working with polygons data.
Raster can be used to represent a very broad range of data: continuous (temperature values), discrete (habitat classes) or even binary (species occurrence) variables.
Several file formats exist to store raster data. The most commons are: GeoTIFF, NetCDF, grd and ascii formats. Even the package
rgdal can be used to import/export raster data, we will prefer the package
In this section we present two packges:
sp package actualy defines classes for both vector and raster objects. Below, we however focus on the vector ones and so we do not detail
SpatialPixels objects. Also note that
sf “[…] aims at succeeding
sp in the long term” (Simple Features for R,
Let first create a data frame
mylon <- -82+2*runif(20) mylat <- 42+2*runif(20) mydata <- data.frame( lon = mylon, lat = mylat, var1 = rnorm(20), var2 = 10*runif(20) )
Let’s have a look at thus data frame:
The table below includes a description of the classes for points, lines and polygons. Basically all these classes work the same way. For instance, in order to define a
SpatialPointsDataPoints object, three elements are required: a set of coordinates, a Coordinate Reference System (CRS) and an attribute table. Intermediate class are also defined (for instance points + CRS =
SpatialPoints) and the name of the class is also the name of the function to be called.
|Classes and functions||Contents|
|Points||list of points (set of coordinates)|
|SpatialPoints||list of points + CRS|
|SpatialPointsDataPoints||list of points + CRS + attribute table|
|Line||a line (set of coordinates)|
|Lines||list of lines|
|SpatialLines||list of lines + CRS|
|SpatialLinesDataFrame||list of lines + CRS + attribute table|
|Polygon||a line (set of coordinates)|
|Polygons||list of lines|
|SpatialPolygons||list of lines + CRS|
|SpatialPolygonsDataFrame||list of lines + CRS + attribute table|
As an more tangible example, let’s now create a
SpatialPointsDataFrame based on our data frame
library(sp) mysp <- SpatialPointsDataFrame( coords = mydata[, 1:2], data = mydata[, 3:4], proj4string = CRS( "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0" ) )
Note that proj4 is used and we therefore wrote a string that describes the CRS and that proj4 understands. Below are listed some properties of the object we have defined.
isS4(mysp) class(mysp) slotNames(mysp) dim(mysp) #R>  TRUE #R>  "SpatialPointsDataFrame" #R> attr(,"package") #R>  "sp" #R>  "data" "coords.nrs" "coords" "bbox" "proj4string" #R>  20 2
Basically, it is a S4 object of class
SpatialPointsDataFrame. All slot names refer to attribute that are accessible via and
mysp@proj4string #R> CRS arguments: #R> +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0 head(mysp@data) #R> var1 var2 #R> 1 -0.731538531 7.329688 #R> 2 0.009995721 4.949049 #R> 3 -0.379075541 3.818837 #R> 4 2.187928401 6.555559 #R> 5 0.472132880 5.239393 #R> 6 -0.481792820 9.654586
In order to change projection, the user must call
spTransform(), like so:
(mysp2 <- spTransform(mysp, CRS=CRS("+proj=merc +ellps=GRS80"))) #R> class : SpatialPointsDataFrame #R> features : 20 #R> extent : -9122725, -8912376, 5137114, 5426871 (xmin, xmax, ymin, ymax) #R> coord. ref. : +proj=merc +ellps=GRS80 #R> variables : 2 #R> names : var1, var2 #R> min values : -1.81620111914297, 0.444704792462289 #R> max values : 2.18792840079437, 9.65458554914221
Below is a very short overview of classes in
sf, the reader that requires further explanation would find more details on the first vignette of sf. Basically three classes are defined:
library(sf) pts_sf <- st_as_sf( x = mydata, coords = c("lon", "lat"), crs = 4326)
Let’s examine its class
class(pts_sf) #R>  "sf" "data.frame"
pts_sfc <- st_geometry(pts_sf) class(pts_sfc) #R>  "sfc_POINT" "sfc"
(x <- st_point(c(1,2))) class(x) #R>  "XY" "POINT" "sfg"
st_as_sf() can also be used to convert a
sp object into a
st_as_sf(mysp) #R> Simple feature collection with 20 features and 2 fields #R> geometry type: POINT #R> dimension: XY #R> bbox: xmin: -81.95083 ymin: 42.03171 xmax: -80.06123 ymax: 43.94239 #R> epsg (SRID): 4326 #R> proj4string: +proj=longlat +datum=WGS84 +no_defs #R> First 10 features: #R> var1 var2 geometry #R> 1 -0.731538531 7.329688 POINT (-80.57065 43.68849) #R> 2 0.009995721 4.949049 POINT (-80.4004 42.08281) #R> 3 -0.379075541 3.818837 POINT (-81.54197 42.62232) #R> 4 2.187928401 6.555559 POINT (-80.06123 43.70764) #R> 5 0.472132880 5.239393 POINT (-80.13286 42.48617) #R> 6 -0.481792820 9.654586 POINT (-81.8959 42.50583) #R> 7 -1.290596994 5.735338 POINT (-80.33147 42.64487) #R> 8 2.162975539 7.793934 POINT (-80.22192 42.19699) #R> 9 0.713897043 9.606329 POINT (-81.58998 42.04869) #R> 10 -0.060456859 5.500665 POINT (-80.1472 43.11689)
The R package
raster provides three main classes of raster object (more details here):
Using RasterStack and RasterBrick requires that the geometry of all raster data is equal.
raster define three classes of rater object we detail below.
library(raster) val1 <- matrix(runif(100*100,0,10), ncol = 100, nrow = 100) ras1 <- raster( val1, crs = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"), xmn = -82, xmx = +80, ymn = +42, ymx = +44 ) class(ras1) #R>  "RasterLayer" #R> attr(,"package") #R>  "raster" dim(ras1) #R>  100 100 1 head(values(ras1)) #R>  5.994056 7.130506 9.532949 0.224109 6.618390 1.629529 projection(ras1) #R>  "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
Let’s first create another raster (with the same CRS)
val2 <- matrix(rnorm(100*100), ncol = 100, nrow = 100) ras2 <- raster( val2, crs = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"), xmn = -82, xmx = +80, ymn = +42, ymx = +44 ) class(ras2) #R>  "RasterLayer" #R> attr(,"package") #R>  "raster"
sta1 <- stack(ras1, ras2) class(sta1) #R>  "RasterStack" #R> attr(,"package") #R>  "raster"
bri1 <- brick(ras1, ras2) class(bri1) #R>  "RasterBrick" #R> attr(,"package") #R>  "raster"