Vector Data in R
Introduction
This chapter provides brief explanations of the fundamental vector model. You will get familiar with the theory behind vector model and the disciplines in which they predominate, before demonstrating its implementation in R.
Vector is the most basic data structure in R. It is a sequence of elements of the same data type. if the elemenets are of different data types, they be coerced to a commontype that can accomodate all the elelements. Vector are generally created using the c()
function widely called concatenate
, though depeending on the type vector being created, other medhod.
Numeric Vector
We create a numeric vector using a c()
function but you can use any function that creates a sequence of numbers
sst = c(25.4, 26, 28, 27.8, 29, 24.8, 22.3)
We can use the is.vector()
function to check if is is avector and class to check the data type
is.vector(sst); class(sst)
FALSE [1] TRUE
FALSE [1] "numeric"
Integer vector
Creating an integer vector is similar to numeric vector except that we need to instruct R to treat the data as integer and not numeric or double. To command R creating integer, we specify a suffix L
to an element
depth = c(5L, 10L, 15L, 20L, 25L,30L)
is.vector(depth);class(depth)
FALSE [1] TRUE
FALSE [1] "integer"
Character vector
A character vector may contain a single character , a word or a group of words. The elements must be enclosed with a single or double quotations mark.
sites = c("Pemba Channel", "Zanzibar Channnel", "Pemba Channel")
is.vector(sites); class(sites)
FALSE [1] TRUE
FALSE [1] "character"
Logical Vector
A vector of logical values will either contain TRUE or FALSE or both
presence = c(TRUE,TRUE, FALSE, TRUE, FALSE)
is.vector(presence);class(presence)
FALSE [1] TRUE
FALSE [1] "logical"
Vector Data
The geographic vector model is based on points located within a coordinate reference system (CRS). Points can represent self-standing features (e.g., the locations where research sample were taken ) or they can be linked together to form complex geometries like lines and polygons. Most point geometries contain only two dimensions with longitude and latitude together with the attribute information. However 3-dimensional points contain an additional \(z\) value— representing a thrid dimension—elevation, bathmetry etc.
The standard and widely implemented spatial format for vector data is shapefile. shapefile format is popular geospatial vector data format for geographical information system (GIS) software.It is developed and maintained by Esri. Despite what its name may imply, a “single” shapefile is actually composed of at least three files, and as many as eight. Each file that makes up a “shapefile” has a common filename but different extension type. The list of files that define a “shapefile” are shown in table 1. Note that each file has a specific role in defining a shapefile.
Description | Extension |
---|---|
Attribute information | .dbf |
Feature geometry | .shp |
Feature geometry index | .shx |
Attribute index | .aih |
Attribute index | .ain |
Coordinate system information | .prj |
Spatial index file | .sbn |
Spatial index file | .sbx |
Until recent, shapefile format was the de facto form ofvector data basis for libraries such as GDAL. R has well-supported classes for storing spatial data and interfacing to the shapefile format, but has so far lacked a complete implementation of simple features, making conversions at times convoluted, inefficient or incomplete [@sf].
Simple features
@sf plainly described simple features as hierachical data model that present objects in the real world in computers, with emphasis on the spatial geometry of these objects. Out of 17, there are only seven seven simple feature types described in Table 2 that are commonly used. sf can represent common vector geometry types—points, lines, polygons and their respective ‘multi’ versions. sf also supports geometry collections, which can contain multiple geometry types in a single object.
Type | Description |
---|---|
Point | zero-dimensional geometry containing a single point |
Linestring | sequence of points connected by straight, non-self intersecting line pieces; one-dimensional geometry |
Polygon | geometry with a positive area (two-dimensional); sequence of points form a closed, non-self intersecting ring; the first ring denotes the exterior ring, zero or more subsequent rings denote holes in this exterior ring |
Multipoint | set of points; a MULTIPOINT is simple if no two Points in the MULTIPOINT are equal |
Multilinestring | Set of linestrings |
Multipolygon | set of polygons |
Geometrycollection | Set of geometries of any type with exception of geometrycollection |
These core geometry types are fully supported by the R package sf [@sf]. sf is a package providing a class system for geographic vector data [@geocomputation] supersede, sp—methods for spatial data [@sp]. It also provides a consistent command-line interface to GEOS and GDAL, superseding rgdal— for data read/write [@rgdal] and rgeos—for spatial operations [@rgeos] packages
Reading vector data
We will use the sf package to work with vector data in R [@sf. Notice that the rgdal package automatically loads when sf is loaded. The sf package has the st_read()
function that read different types of vector data to sf object.
require(sf)
Reading shapefiles
Shapefile is the widely used vector format in GIS software. The function st_read()
import any type of shapefile into R. for example the chunk block below show how to import the sampling location that are in shapefile format into simple feature object in R’s worksapace.
location = st_read("data/simple_feature.shp", quiet = TRUE)
location
FALSE Simple feature collection with 11 features and 4 fields
FALSE Geometry type: POINT
FALSE Dimension: XY
FALSE Bounding box: xmin: 39.50958 ymin: -8.425115 xmax: 42.00623 ymax: -6.414011
FALSE Geodetic CRS: WGS 84
FALSE First 10 features:
FALSE id type depth sst geometry
FALSE 1 294 marker 29 27.87999 POINT (39.50958 -6.438159)
FALSE 2 300 marker -604 27.97999 POINT (39.6318 -6.621774)
FALSE 3 306 marker -569 27.97999 POINT (39.65447 -6.746649)
FALSE 4 312 marker -485 28.03999 POINT (39.62563 -6.805321)
FALSE 5 318 marker -325 28.03999 POINT (39.58374 -6.833973)
FALSE 6 326 marker -461 28.03999 POINT (39.66476 -6.837384)
FALSE 7 414 marker -505 28.02999 POINT (39.95728 -7.843535)
FALSE 8 428 marker -132 28.23999 POINT (39.67712 -8.136846)
FALSE 9 434 marker -976 28.16999 POINT (39.74853 -8.425115)
FALSE 10 456 marker -3311 28.33999 POINT (42.00623 -7.025368)
When we print the this simple feature it tells us that it has 18 features that span between longitude 39.45336°E and 39.55239°E and latitude 6.850945°S and 6.461915°S with defined geographical coordinate system of WGS84
.
Reading GPX file
The st_read()
function can also read files from GPS devices with the .gpx
extension.
track = st_read("data/Track-180911-063740.gpx", quiet = TRUE)
track
FALSE Simple feature collection with 1 feature and 24 fields
FALSE Geometry type: POINT
FALSE Dimension: XY
FALSE Bounding box: xmin: 39.44527 ymin: -6.907095 xmax: 39.44527 ymax: -6.907095
FALSE Geodetic CRS: WGS 84
FALSE ele time magvar geoidheight name cmt
FALSE 1 14.4 2018-09-11 07:42:07 NA NA Track Recording Stopped <NA>
FALSE desc
FALSE 1 Recording stopped at 33'00" because the user stopped it after 6.58km (0.50m gain).
FALSE src link1_href link1_text link1_type link2_href link2_text link2_type sym
FALSE 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
FALSE type fix sat hdop vdop pdop ageofdgpsdata dgpsid x_speed
FALSE 1 <NA> <NA> NA NA NA NA NA NA 0.385527
FALSE geometry
FALSE 1 POINT (39.44527 -6.907095)
We can assess the geographical extent of the simple feature track with the st_bbox()
function.
track %>% st_bbox()
FALSE xmin ymin xmax ymax
FALSE 39.445274 -6.907095 39.445274 -6.907095
And check the type of geographical coordinate system with st_crs()
function
track %>% st_crs()
FALSE Coordinate Reference System:
FALSE User input: WGS 84
FALSE wkt:
FALSE GEOGCRS["WGS 84",
FALSE DATUM["World Geodetic System 1984",
FALSE ELLIPSOID["WGS 84",6378137,298.257223563,
FALSE LENGTHUNIT["metre",1]]],
FALSE PRIMEM["Greenwich",0,
FALSE ANGLEUNIT["degree",0.0174532925199433]],
FALSE CS[ellipsoidal,2],
FALSE AXIS["geodetic latitude (Lat)",north,
FALSE ORDER[1],
FALSE ANGLEUNIT["degree",0.0174532925199433]],
FALSE AXIS["geodetic longitude (Lon)",east,
FALSE ORDER[2],
FALSE ANGLEUNIT["degree",0.0174532925199433]],
FALSE ID["EPSG",4326]]
Make shapefiles from Tabular data
Sometimes the geographical information are in tabular form and you need to convert them into simple feature to work with spatial analysis and mapping. The sf package provide a st_as_sf()
function that can make simple feature from the location information in the table. To illustrate this point, let us first load the file that contain the geographical information into the workspace.
location = read_csv("data/kimbiji_kizimkazi_transect.csv")
Looking the internal structure of the location object we loaded, we find that there are eighteen observations and each observation has the longitude and latitude information.
location %>% glimpse()
FALSE Rows: 18
FALSE Columns: 2
FALSE $ lon <dbl> 39.45336, 39.45336, 39.46751, 39.47458, 39.47812, 39.49226, 39.485~
FALSE $ lat <dbl> -6.850945, -6.822652, -6.787286, -6.758993, -6.730700, -6.713016, ~
The file contain only the geographical information. We can add the column for station names. mutate()
function from dplyr package add the third column. Because the station name should be sequentially numbered, the paste()
function was used to do this.
location = location %>%
mutate(name = paste("station", 1:18))
Once we know that the dataset contain the longitude and latitude information, we can use these spatial information to make simple feature object using the st_as_sf()
from sf package
location.sf = location %>%
st_as_sf(coords = c("lon", "lat"))
location.sf
FALSE Simple feature collection with 18 features and 1 field
FALSE Geometry type: POINT
FALSE Dimension: XY
FALSE Bounding box: xmin: 39.45336 ymin: -6.850945 xmax: 39.55239 ymax: -6.461915
FALSE CRS: NA
FALSE # A tibble: 18 x 2
FALSE name geometry
FALSE <chr> <POINT>
FALSE 1 station 1 (39.45336 -6.850945)
FALSE 2 station 2 (39.45336 -6.822652)
FALSE 3 station 3 (39.46751 -6.787286)
FALSE 4 station 4 (39.47458 -6.758993)
FALSE 5 station 5 (39.47812 -6.7307)
FALSE 6 station 6 (39.49226 -6.713016)
FALSE 7 station 7 (39.48519 -6.695333)
FALSE 8 station 8 (39.49226 -6.659967)
FALSE 9 station 9 (39.50641 -6.64582)
FALSE 10 station 10 (39.51702 -6.631674)
FALSE 11 station 11 (39.52056 -6.61399)
FALSE 12 station 12 (39.52763 -6.578624)
FALSE 13 station 13 (39.52763 -6.557404)
FALSE 14 station 14 (39.5347 -6.539721)
FALSE 15 station 15 (39.54178 -6.518501)
FALSE 16 station 16 (39.54531 -6.497281)
FALSE 17 station 17 (39.54531 -6.483135)
FALSE 18 station 18 (39.55239 -6.461915)
The coords
parameter is given the latitude
and longitude
value columns–values used to locate the points associated with each record. We now have a simple featuere with 18 points. However, the simple feature lack the coordinate system. We can define the coordinate system for the simple feature with the st_set_crs()
function and parse the epsg code of WGS84.
location.sf = location.sf %>%
st_set_crs(4326)
Let us check if the location.sf
is indeed a spatial object
location.sf
FALSE Simple feature collection with 18 features and 1 field
FALSE Geometry type: POINT
FALSE Dimension: XY
FALSE Bounding box: xmin: 39.45336 ymin: -6.850945 xmax: 39.55239 ymax: -6.461915
FALSE Geodetic CRS: WGS 84
FALSE # A tibble: 18 x 2
FALSE name geometry
FALSE * <chr> <POINT [°]>
FALSE 1 station 1 (39.45336 -6.850945)
FALSE 2 station 2 (39.45336 -6.822652)
FALSE 3 station 3 (39.46751 -6.787286)
FALSE 4 station 4 (39.47458 -6.758993)
FALSE 5 station 5 (39.47812 -6.7307)
FALSE 6 station 6 (39.49226 -6.713016)
FALSE 7 station 7 (39.48519 -6.695333)
FALSE 8 station 8 (39.49226 -6.659967)
FALSE 9 station 9 (39.50641 -6.64582)
FALSE 10 station 10 (39.51702 -6.631674)
FALSE 11 station 11 (39.52056 -6.61399)
FALSE 12 station 12 (39.52763 -6.578624)
FALSE 13 station 13 (39.52763 -6.557404)
FALSE 14 station 14 (39.5347 -6.539721)
FALSE 15 station 15 (39.54178 -6.518501)
FALSE 16 station 16 (39.54531 -6.497281)
FALSE 17 station 17 (39.54531 -6.483135)
FALSE 18 station 18 (39.55239 -6.461915)
let us check the class of the simple feature
location.sf %>%
class()
FALSE [1] "sf" "tbl_df" "tbl" "data.frame"
Note the object has four class sf
, tbl_df
, tbl
, and data_frame
. The data frame contents was also carried over into the attributes table of the simple feature. There was only one attribute, name
, other than lon
and lat
in the tabular data used to create this simple feature.
Looking on the file clearly the projection is defined to WGS84
. We can further transform the geographical coordinate system that is degree into the UTM, which is in metric. The function st_transform()
from sf package handle transformation of coordinate system [@sf]. The epsg code for zone 37 south is 32737, which is parsed into the function.
location.utm = location.sf %>%
st_transform(32737)
location.utm
FALSE Simple feature collection with 18 features and 1 field
FALSE Geometry type: POINT
FALSE Dimension: XY
FALSE Bounding box: xmin: 550090.3 ymin: 9242705 xmax: 561079.7 ymax: 9285701
FALSE Projected CRS: WGS 84 / UTM zone 37S
FALSE # A tibble: 18 x 2
FALSE name geometry
FALSE * <chr> <POINT [m]>
FALSE 1 station 1 (550090.3 9242705)
FALSE 2 station 2 (550093.2 9245833)
FALSE 3 station 3 (551660.2 9249741)
FALSE 4 station 4 (552444.8 9252868)
FALSE 5 station 5 (552838.7 9255995)
FALSE 6 station 6 (554404.1 9257949)
FALSE 7 station 7 (553624.3 9259904)
FALSE 8 station 8 (554410 9263813)
FALSE 9 station 9 (555975.3 9265376)
FALSE 10 station 10 (557149.7 9266938)
FALSE 11 station 11 (557542.7 9268893)
FALSE 12 station 12 (558328.7 9272802)
FALSE 13 station 13 (558331.2 9275147)
FALSE 14 station 14 (559115.3 9277101)
FALSE 15 station 15 (559899.8 9279446)
FALSE 16 station 16 (560293.4 9281792)
FALSE 17 station 17 (560295.1 9283356)
FALSE 18 station 18 (561079.7 9285701)
Export simple feature as shapefile
Once the simple feature is created, you might be interested to export as shapefile for use with other GIS software like QGIS and Esri ARCGIS. The sf package has a st_write()
function that export simple feature from the workspace into shapefiles in the working directory. The chunk block below demonstrates the export of simple feature object location.sf into the location.shp in the working directory—denoted with ./
location.sf %>% st_write("./location.shp")