David Beauchesne Marie-Hélène Brice Nicolas Casajus Kevin Cazelles Elliot Dreujou Steve Vissault

Spatial data refer to phenomenon or information that can be observed geographically. What a definition! In short spatial data concern every information that could be reported on a map, *i.e.* on a two-dimensions geographical coordinates system. Two kinds of spatial data exist: **vector data** and **raster data**. Generally vectors refer to discrete locations, *i.e.* objects with boundaries (for instance a city, a road, a country) whereas raster data refer to continuous phenomenon that can be observed everywhere, but without natural boundaries (for instance the surface temperature). Let’s take a look at their characteristics.

Vector data consist of the description of a spatially explicit phenomenon geometry (position and coordinates of its boundaries in a specific spatial referential). In addition to this geometry, vector data may contain variables (or attributes) with additional information about the phenomenon at each location. For instance, cities of a country are vector data (points) and attributes could be cities names, population sizes, etc. Borders of a country are also vector data (polygons) and could contain the name of the country, the population for a given year, and the mean number of crimes as attributes.

There are main symbol types for vector data: **points**, **lines** and **polygons**. The geometry of these data structures consists of sets of geographical coordinate pairs (longitude, latitude) in a specific Coordinate Reference System (CRS).

Points are the simplest vector object. Each point has one two-dimensional coordinates, and several associated variables. For instance, a point might represent a location where an animal was trapped, and the attributes could include the capture date, the size, the sex, and information about the physical environment.

Lines are the second type of vector data. A line is a shape consisting of one or several segments (or paths) and each segment has two noteworthy points (or vertex): a starting point and an ending point. Note that the ending point of a segment will have the same coordinates as the starting point of the following segment in the case of a line consisting of connected segments. In other words, a line with one segment is defined by two coordinates pairs (longitude and latitude). A line with two connected segments is defined by three coordinates pairs and a line with two isolated segments is defined by four coordinates pairs. Thus, the representation of a line is very similar to that of a collection of points. The main difference is that the ordering of the points is important, because we need to know which points should be connected two-by-two.

Polygons are closed spatial lines where the coordinates of the starting point of the first segment is identical to the ending point of the last segment. The geometry of a polygon is very similar to that of a line but polygons have three characteristics:

a polygon cannot self-intersect (whereas a line can),

a polygon can contain hole (think about the state of Vatican overlapping the country of Italy but considered as a separated polygon),

as a closed feature, a polygon has an inside and a border.

Vector data are generally stored in **ESRI Shapefile**, **GeoJSON**, **KML** or **Simple Features** files. In R, two main packages exist to handle spatial vector data: `sp`

and `sf`

. Note that the package `rgdal`

will be used to import/export vector data (go next post to learn more).

Raster data are commonly used to represent *continuous* phenomenon that can be observed everywhere, but without natural boundaries (but with artificial boundaries defined by the user). A raster consists of a grid of equally sized cells (or pixels) that all have a values (or a missing value) for one single variable.

Unlike vector data, the geometry of raster data is not explicitly stored as coordinates. Rather it is implicitly set by setting the spatial extent and the number of rows and columns of a regular grid. From this spatial information, the size of the raster cells (spatial resolution) can be computed. Working with raster data will therefore be more efficient than working with polygons data.

Raster can be used to represent a very broad range of data: continuous (temperature values), discrete (habitat classes) or even binary (species occurrence) variables.

Several file formats exist to store raster data. The most commons are: **GeoTIFF**, **NetCDF**, **grd** and **ascii** formats. Even the package `rgdal`

can be used to import/export raster data, we will prefer the package `raster`

.

In this section we present two packges: `sp`

and `sf`

. The `sp`

package actualy defines classes for both vector and raster objects. Below, we however focus on the vector ones and so we do not detail `SpatialGrid`

and `SpatialPixels`

objects. Also note that `sf`

“[…] aims at succeeding `sp`

in the long term” (Simple Features for R, `sf`

vignette).

Let first create a data frame `mydata`

:

```
mylon <- -82+2*runif(20)
mylat <- 42+2*runif(20)
mydata <- data.frame(
lon = mylon,
lat = mylat,
var1 = rnorm(20),
var2 = 10*runif(20)
)
```

Let’s have a look at thus data frame:

lon | lat | var1 | var2 |
---|---|---|---|

-80.17746 | 43.69678 | -0.0167867 | 7.0482498 |

-81.22428 | 43.48041 | -1.9239083 | 0.1237473 |

-80.49486 | 43.95634 | -0.5615889 | 3.5762181 |

-81.46914 | 42.05815 | -0.5943212 | 7.4626709 |

-81.88972 | 43.90675 | 1.1032172 | 1.2491654 |

-81.32393 | 42.19640 | -0.2718432 | 3.7749343 |

`sp`

The table below includes a description of the classes for points, lines and polygons. Basically all these classes work the same way. For instance, in order to define a `SpatialPointsDataPoints`

object, three elements are required: a set of coordinates, a Coordinate Reference System (CRS) and an attribute table. Intermediate class are also defined (for instance points + CRS = `SpatialPoints`

) and the name of the class is also the name of the function to be called.

Classes and functions | Contents |
---|---|

Points | list of points (set of coordinates) |

SpatialPoints | list of points + CRS |

SpatialPointsDataPoints | list of points + CRS + attribute table |

Line | a line (set of coordinates) |

Lines | list of lines |

SpatialLines | list of lines + CRS |

SpatialLinesDataFrame | list of lines + CRS + attribute table |

Polygon | a line (set of coordinates) |

Polygons | list of lines |

SpatialPolygons | list of lines + CRS |

SpatialPolygonsDataFrame | list of lines + CRS + attribute table |

`SpatialPointsDataFrame`

As an more tangible example, let’s now create a `SpatialPointsDataFrame`

based on our data frame `mydata`

.

```
library(sp)
mysp <- SpatialPointsDataFrame(
coords = mydata[, 1:2],
data = mydata[, 3:4],
proj4string = CRS(
"+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"
)
)
```

Note that proj4 is used and we therefore wrote a string that describes the CRS and that proj4 understands. Below are listed some properties of the object we have defined.

```
isS4(mysp)
class(mysp)
slotNames(mysp)
dim(mysp)
#R> [1] TRUE
#R> [1] "SpatialPointsDataFrame"
#R> attr(,"package")
#R> [1] "sp"
#R> [1] "data" "coords.nrs" "coords" "bbox" "proj4string"
#R> [1] 20 2
```

Basically, it is a S4 object of class `SpatialPointsDataFrame`

. All slot names refer to attribute that are accessible *via* and `@`

:

```
mysp@proj4string
#R> CRS arguments:
#R> +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0
head(mysp@data)
#R> var1 var2
#R> 1 -0.01678669 7.0482498
#R> 2 -1.92390826 0.1237473
#R> 3 -0.56158892 3.5762181
#R> 4 -0.59432122 7.4626709
#R> 5 1.10321721 1.2491654
#R> 6 -0.27184318 3.7749343
```

In order to change projection, the user must call `spTransform()`

, like so:

```
(mysp2 <- spTransform(mysp, CRS=CRS("+proj=merc +ellps=GRS80")))
#R> class : SpatialPointsDataFrame
#R> features : 20
#R> extent : -9127208, -8925314, 5141063, 5430722 (xmin, xmax, ymin, ymax)
#R> coord. ref. : +proj=merc +ellps=GRS80
#R> variables : 2
#R> names : var1, var2
#R> min values : -1.92390826336492, 0.123747307807207
#R> max values : 1.10321720717945, 9.88472919445485
```

`sf`

Below is a very short overview of classes in `sf`

, the reader that requires further explanation would find more details on the first vignette of sf. Basically three classes are defined: `sf`

, `sfc`

and `sfg`

.

`sf`

```
library(sf)
pts_sf <- st_as_sf(
x = mydata,
coords = c("lon", "lat"),
crs = 4326)
```

Let’s examine its class

```
class(pts_sf)
#R> [1] "sf" "data.frame"
```

`sfc`

```
pts_sfc <- st_geometry(pts_sf)
class(pts_sfc)
#R> [1] "sfc_POINT" "sfc"
```

`sfg`

```
(x <- st_point(c(1,2)))
class(x)
#R> [1] "XY" "POINT" "sfg"
```

`sp`

object`st_as_sf()`

can also be used to convert a `sp`

object into a `sf`

one.

```
st_as_sf(mysp)
#R> Simple feature collection with 20 features and 2 fields
#R> geometry type: POINT
#R> dimension: XY
#R> bbox: xmin: -81.99111 ymin: 42.05815 xmax: -80.17746 ymax: 43.96739
#R> epsg (SRID): 4326
#R> proj4string: +proj=longlat +datum=WGS84 +no_defs
#R> First 10 features:
#R> var1 var2 geometry
#R> 1 -0.01678669 7.0482498 POINT (-80.17746 43.69678)
#R> 2 -1.92390826 0.1237473 POINT (-81.22428 43.48041)
#R> 3 -0.56158892 3.5762181 POINT (-80.49486 43.95634)
#R> 4 -0.59432122 7.4626709 POINT (-81.46914 42.05815)
#R> 5 1.10321721 1.2491654 POINT (-81.88972 43.90675)
#R> 6 -0.27184318 3.7749343 POINT (-81.32393 42.1964)
#R> 7 0.10252638 4.2445331 POINT (-81.00485 42.82219)
#R> 8 -1.59836651 1.0484629 POINT (-80.76968 43.50884)
#R> 9 -0.45704253 1.8863388 POINT (-80.44606 43.55616)
#R> 10 -0.81695852 1.5732857 POINT (-81.17272 42.46098)
```

The R package `raster`

provides three main classes of raster object (more details here):

**RasterLayer**imports a single-layer (variable) raster,**RasterStack**imports in one single object several single-layer (variable) rasters stored in one or different files,**RasterBrick**imports in one single object several single-layer (variable) rasters stored in one single file.

Using **RasterStack** and **RasterBrick** requires that the geometry of all raster data is equal.

Package `raster`

define three classes of rater object we detail below.

```
library(raster)
val1 <- matrix(runif(100*100,0,10), ncol = 100, nrow = 100)
ras1 <- raster(
val1,
crs = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"),
xmn = -82, xmx = +80, ymn = +42, ymx = +44
)
class(ras1)
#R> [1] "RasterLayer"
#R> attr(,"package")
#R> [1] "raster"
dim(ras1)
#R> [1] 100 100 1
head(values(ras1))
#R> [1] 4.1183787 9.2731285 0.3108436 1.3063270 6.5016707 5.6588556
projection(ras1)
#R> [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
```

http://www.rspatial.org/spatial/rst/4-rasterdata.html#rasterstack-and-rasterbrick

Let’s first create another raster (with the same CRS)

```
val2 <- matrix(rnorm(100*100), ncol = 100, nrow = 100)
ras2 <- raster(
val2,
crs = CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"),
xmn = -82, xmx = +80, ymn = +42, ymx = +44
)
class(ras2)
#R> [1] "RasterLayer"
#R> attr(,"package")
#R> [1] "raster"
```

Let’s `stack()`

`ras1`

and `ras2`

:

```
sta1 <- stack(ras1, ras2)
class(sta1)
#R> [1] "RasterStack"
#R> attr(,"package")
#R> [1] "raster"
```

Let’s `brick()`

them:

```
bri1 <- brick(ras1, ras2)
class(bri1)
#R> [1] "RasterBrick"
#R> attr(,"package")
#R> [1] "raster"
```