Introduction

This example will utilise a file from EUMETSAT CM SAF. We will work with a NetCDF file of monthly sunshine duration over Europe for the single time point of August 2018 from the Interim Climate Data Record. This data file should be provided along with this script and is called SDUms201808010000401UD1000101UD.nc. Put this data file in the same folder as this script is in.

For further information about downloading data from EUMETSAT CM SAF you can check the webpage https://wui.cmsaf.eu/ where you can browse the product catalog. There is also an online short course about this data record offered by EUMETSAT which includes tutorials on accessing data https://training.eumetsat.int/course/view.php?id=378.

Getting set up

We begin by importing the libraries we need for working with the data. If the packages are already installed, you can load them with the library() function. If this gives an error, you need to install them first using install.packages().

It’s good practice to load the packages used in your script in one place near the top of the file. This makes it easier for someone else you might share the file with to quickly see which packages they need to run your file.

# If you have not previously installed these packages, do this first by running the line below without the # sign. You only need to do this once.
# install.packages(c("raster", "rgdal", "ncdf4", "tidync", "cmsaf", "rworldxtra", "fields", "maps"))

# 'raster' and 'rgdal' for working with and plotting NetCDF files
# 'ncdf4' and 'tidync' are for working with NetCDF files
# 'cmsaf' has the CM SAF R Toolbox and will also load `cmsafops`, which has helpful functions for working with CM SAF data.
# 'rworldxtra' for world maps to use in plotting
# 'fields' for plotting
# 'maps' for adding country features to plots
library(raster)
library(rgdal)
library(ncdf4)
library(tidync)
library(cmsaf)
library(rworldxtra)
library(fields)
library(maps)

It’s often useful to set the working directory in R. This is just the default directory that R will use for reading and writing files. Files directly in your working directory can be referred to just with their name (or relative path). You can still access files outside your working directory by using the full path.

If you keep your scripts and data files in the same folder, you do not need to provide the full paths. However, you may want to store things in different places and so it is good practice to be specific.

Change the path within setwd() below to the folder that this file is in. Remember in R you must always use / as the path separator.

You can confirm that you set this correctly by running getwd() which should return the directory you set.

# Change the path below to the folder this file is in.
setwd("C:/Users/Danny/Documents/EO Coding R scripts")
# Confirm the working directory was set correctly.
getwd()
## [1] "C:/Users/Danny/Documents/EO Coding R scripts"

Now confirm that the data file SDUms201808010000401UD1000101UD.nc exists. If you placed the file within the same folder as this script file, you will not need to change the data_path variable below because it is directly in the working directory. If it is somewhere else, you can change data_path below with the full file path. If the file exists then file.exists will return TRUE. If it doesn’t, you need to check the file is in the correct place or your path may be incorrect.

data_path <- "SDUms201808010000401UD1000101UD.nc"
file.exists(data_path)
## [1] TRUE

If this returns TRUE then we’re reading to start working with the data!

Reading a NetCDF file

We can connect to a NetCDF file using the nc_open() function from the ncdf4 package. We save the connection to the file as a variable called nc.

nc <- nc_open(data_path)

To view information/metadata about the file we can simply run nc.

nc
## File SDUms201808010000401UD1000101UD.nc (NC_FORMAT_NETCDF4):
## 
##      1 variables (excluding dimension variables):
##         float SDU[lon,lat,time]   (Chunking: [1200,54,1])  (Compression: shuffle,level 6)
##             standard_name: duration_of_sunshine
##             long_name: Sunshine Duration
##             units: h
##             _FillValue: -999
##             missing_value: -999
## 
##      3 dimensions:
##         time  Size:1   *** is unlimited ***
##             standard_name: time
##             long_name: time
##             units: days since 1983-01-01 00:00:00
##             calendar: standard
##             axis: T
##         lon  Size:1200
##             standard_name: longitude
##             long_name: longitude
##             units: degrees_east
##             axis: X
##         lat  Size:500
##             standard_name: latitude
##             long_name: latitude
##             units: degrees_north
##             axis: Y
## 
##     36 global attributes:
##         CDI: Climate Data Interface version 1.9.5 (http://mpimet.mpg.de/cdi)
##         history: Tue Nov 16 15:26:03 2021: cdo -v -L -f nc4 -z zip_6 -r -sellonlatbox,-20.00002,39.95002,34.99998,59.95002 /dev/shm/runsystem/orders/order_45043/SDUms201808010000401231000101MA.nc /dev/shm/routcm/tmp.8ps5xfTh7T/tmp.jMbCHZWKFm/tmp.7gGkDXOYPL/SDUms201808010000401UD1000101UD.nc
##         institution: EUMETSAT/CMSAF
##         Conventions: CF-1.6, ACDD-1.3
##         title: ICDR SEVIRI Radiation
##         summary: This file contains data from the CM SAF Meteosat Surface Solar Radiation operational processing (based on SARAH-2 methods)
##         product_version: 4.01
##         creator_name: DE/DWD
##         creator_email: contact.cmsaf@dwd.de
##         creator_url: http://www.cmsaf.eu/
##         creator_type: institution
##         publisher_name: EUMETSAT/CMSAF
##         publisher_email: contact.cmsaf@dwd.de
##         publisher_url: http://www.cmsaf.eu/
##         publisher_type: institution
##         references: https://wui.cmsaf.eu/safira/action/viewICDRDetails?acronym=SARAH_V002_ICDR
##         keywords_vocabulary: GCMD Science Keywords, Version 8.1
##         keywords: EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > INCOMING SOLAR RADIATION, EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > SUNSHINE
##         project: Satellite Application Facility on Climate Monitoring (CM SAF)
##         standard_name_vocabulary: Standard Name Table (v28, 07 January 2015)
##         date_created: 2018-12-20T11:03:58Z
##         geospatial_lat_units: degrees_north
##         geospatial_lon_units: degrees_east
##         geospatial_lat_resolution: 0.05 degree
##         geospatial_lon_resolution: 0.05 degree
##         platform_vocabulary: GCMD Platforms, Version 8.1
##         platform: Earth Observation Satellites > METEOSAT > METEOSAT-11
##         instrument_vocabulary: GCMD Instruments, Version 8.1
##         instrument: SEVIRI > Spinning Enhanced Visible and Infrared Imager
##         geospatial_lat_min: -65
##         geospatial_lat_max: 65
##         geospatial_lon_min: -65
##         geospatial_lon_max: 65
##         time_coverage_duration: P1M
##         time_coverage_resolution: P1M
##         CDO: Climate Data Operators version 1.9.5 (http://mpimet.mpg.de/cdo)

This gives a lot of output. We can see that there are 2 variables. sdu is the main one and it is indexed by three dimensions from the line sdu[lon,lat,time].

A more concise output can be obtained from the ncinfo function from cmsafops. For this function, we need to specify that we are providing the nc object by naming it explicitly (nc = nc). You can find the help file for any function in R by using ? e.g. ?ncinfo. We could have instead provided the path to the file.

ncinfo(nc = nc)
## The file: SDUms201808010000401UD1000101UD.nc contains: 
## 
## Variable:
## SDU 
## 
## With following dimensions:
## time with length 1 (range 2018-08-01 to 2018-08-01)
## lon with length 1200 (range -20 to 39.95)
## lat with length 500 (range 35 to 59.95)

Notice that there are 3 dimensions in this file, but the time dimension is of length 1, so essentially this can be considered as a 2 dimensional data.

Extracting data & variables

There are different packages in R that can work with NetCDF files and each has their own way of extracting data and variables from a file. The method you use should depend on what you want to do with the data.

To extract variables as a simple array, you can use ncvar_get() from the ncdf4 package. The dim functions tell us the dimensions of the array.

sdu_array <- ncvar_get(nc, "SDU")
dim(sdu_array)
## [1] 1200  500

Notice that sdu_array has just two dimensions, 1200 x 500 for the longitude and latitude values - the time variable has been dropped since it was just length 1.

We can extract values from this array using the format [ , ]. Note that arrays in R are one-based, meaning the first item is [1, 1] (not [0, 0] like in Python).

# Extract a single value.
sdu_array[10, 10]
## [1] 271.8325
# Extract a range of values
sdu_array[1:10, 1:5]
##           [,1]     [,2]     [,3]     [,4]     [,5]
##  [1,] 280.5894 278.2027 279.7673 272.7908 269.3072
##  [2,] 277.7051 278.2921 275.5980 271.0869 270.8192
##  [3,] 276.8530 273.8229 272.9657 273.3355 271.7457
##  [4,] 272.3888 270.8380 274.3143 273.9053 271.3394
##  [5,] 271.6544 274.1878 275.8790 274.0243 274.4838
##  [6,] 272.8050 273.8245 275.0791 276.4910 273.2270
##  [7,] 272.0881 273.1986 275.3747 274.7835 273.5800
##  [8,] 269.9512 273.6153 271.0646 273.8537 271.6868
##  [9,] 266.8825 267.0848 266.3067 269.9421 270.0137
## [10,] 264.1013 264.5317 267.6389 270.5815 268.8121
# Extract all values along one dimension
# (Not printed)
# sdu_array[, 1]
# sdu_array[1, ]
# sdu_array[1:5, ]

We can use the same function to extract the dimension variables too. These are just one dimensional arrays (or vectors) because it has extracted the list of latitude values.

lon_vals <- ncvar_get(nc, "lon")
lat_vals <- ncvar_get(nc, "lat")
dim(lon_vals)
## [1] 1200
dim(lat_vals)
## [1] 500
lat_vals
##   [1] 35.00 35.05 35.10 35.15 35.20 35.25 35.30 35.35 35.40 35.45 35.50 35.55
##  [13] 35.60 35.65 35.70 35.75 35.80 35.85 35.90 35.95 36.00 36.05 36.10 36.15
##  [25] 36.20 36.25 36.30 36.35 36.40 36.45 36.50 36.55 36.60 36.65 36.70 36.75
##  [37] 36.80 36.85 36.90 36.95 37.00 37.05 37.10 37.15 37.20 37.25 37.30 37.35
##  [49] 37.40 37.45 37.50 37.55 37.60 37.65 37.70 37.75 37.80 37.85 37.90 37.95
##  [61] 38.00 38.05 38.10 38.15 38.20 38.25 38.30 38.35 38.40 38.45 38.50 38.55
##  [73] 38.60 38.65 38.70 38.75 38.80 38.85 38.90 38.95 39.00 39.05 39.10 39.15
##  [85] 39.20 39.25 39.30 39.35 39.40 39.45 39.50 39.55 39.60 39.65 39.70 39.75
##  [97] 39.80 39.85 39.90 39.95 40.00 40.05 40.10 40.15 40.20 40.25 40.30 40.35
## [109] 40.40 40.45 40.50 40.55 40.60 40.65 40.70 40.75 40.80 40.85 40.90 40.95
## [121] 41.00 41.05 41.10 41.15 41.20 41.25 41.30 41.35 41.40 41.45 41.50 41.55
## [133] 41.60 41.65 41.70 41.75 41.80 41.85 41.90 41.95 42.00 42.05 42.10 42.15
## [145] 42.20 42.25 42.30 42.35 42.40 42.45 42.50 42.55 42.60 42.65 42.70 42.75
## [157] 42.80 42.85 42.90 42.95 43.00 43.05 43.10 43.15 43.20 43.25 43.30 43.35
## [169] 43.40 43.45 43.50 43.55 43.60 43.65 43.70 43.75 43.80 43.85 43.90 43.95
## [181] 44.00 44.05 44.10 44.15 44.20 44.25 44.30 44.35 44.40 44.45 44.50 44.55
## [193] 44.60 44.65 44.70 44.75 44.80 44.85 44.90 44.95 45.00 45.05 45.10 45.15
## [205] 45.20 45.25 45.30 45.35 45.40 45.45 45.50 45.55 45.60 45.65 45.70 45.75
## [217] 45.80 45.85 45.90 45.95 46.00 46.05 46.10 46.15 46.20 46.25 46.30 46.35
## [229] 46.40 46.45 46.50 46.55 46.60 46.65 46.70 46.75 46.80 46.85 46.90 46.95
## [241] 47.00 47.05 47.10 47.15 47.20 47.25 47.30 47.35 47.40 47.45 47.50 47.55
## [253] 47.60 47.65 47.70 47.75 47.80 47.85 47.90 47.95 48.00 48.05 48.10 48.15
## [265] 48.20 48.25 48.30 48.35 48.40 48.45 48.50 48.55 48.60 48.65 48.70 48.75
## [277] 48.80 48.85 48.90 48.95 49.00 49.05 49.10 49.15 49.20 49.25 49.30 49.35
## [289] 49.40 49.45 49.50 49.55 49.60 49.65 49.70 49.75 49.80 49.85 49.90 49.95
## [301] 50.00 50.05 50.10 50.15 50.20 50.25 50.30 50.35 50.40 50.45 50.50 50.55
## [313] 50.60 50.65 50.70 50.75 50.80 50.85 50.90 50.95 51.00 51.05 51.10 51.15
## [325] 51.20 51.25 51.30 51.35 51.40 51.45 51.50 51.55 51.60 51.65 51.70 51.75
## [337] 51.80 51.85 51.90 51.95 52.00 52.05 52.10 52.15 52.20 52.25 52.30 52.35
## [349] 52.40 52.45 52.50 52.55 52.60 52.65 52.70 52.75 52.80 52.85 52.90 52.95
## [361] 53.00 53.05 53.10 53.15 53.20 53.25 53.30 53.35 53.40 53.45 53.50 53.55
## [373] 53.60 53.65 53.70 53.75 53.80 53.85 53.90 53.95 54.00 54.05 54.10 54.15
## [385] 54.20 54.25 54.30 54.35 54.40 54.45 54.50 54.55 54.60 54.65 54.70 54.75
## [397] 54.80 54.85 54.90 54.95 55.00 55.05 55.10 55.15 55.20 55.25 55.30 55.35
## [409] 55.40 55.45 55.50 55.55 55.60 55.65 55.70 55.75 55.80 55.85 55.90 55.95
## [421] 56.00 56.05 56.10 56.15 56.20 56.25 56.30 56.35 56.40 56.45 56.50 56.55
## [433] 56.60 56.65 56.70 56.75 56.80 56.85 56.90 56.95 57.00 57.05 57.10 57.15
## [445] 57.20 57.25 57.30 57.35 57.40 57.45 57.50 57.55 57.60 57.65 57.70 57.75
## [457] 57.80 57.85 57.90 57.95 58.00 58.05 58.10 58.15 58.20 58.25 58.30 58.35
## [469] 58.40 58.45 58.50 58.55 58.60 58.65 58.70 58.75 58.80 58.85 58.90 58.95
## [481] 59.00 59.05 59.10 59.15 59.20 59.25 59.30 59.35 59.40 59.45 59.50 59.55
## [493] 59.60 59.65 59.70 59.75 59.80 59.85 59.90 59.95

It’s important to close the connection to the file after you’ve finished extracting data from it. This disconnects R from the file and prevents file corruption. Simply use nc_close() from ncdf4 to do this.

nc_close(nc)

These arrays are useful for doing simple calculations. For example, calculating the mean sunshine duration over the entire grid or converting the values from hours to seconds

mean(sdu_array, na.rm = TRUE)
## [1] 261.3753
sdu_seconds <- sdu_array * 60 * 60

For plotting data from a NetCDF file, we can use these arrays and also different structures to help with projections.

Plotting a file

We can plots the arrays we have just extracted using the image.plot function from the fields package.

# Make a quick plot using fields::image.plot
image.plot(sdu_array)

Now lets adapt the figure a little bit. Plot the 2D-image using image.plot from the fields package. The color scale label is adjusted by legend.line. The color scale named larry.colors from the fields package is used in reversed order using rev().

image.plot(x = lon_vals, y = lat_vals, z = sdu_array,
           xlab = "Longitude", ylab = "Latitude",
           main = "CM SAF Sunshine Duration",
           legend.lab = "Sunshine Duration (hours)",
           legend.line = -2.0,
           col = rev(larry.colors()))

# Adding a subtitle
mtext(side = 3, line = 0, "Monthly Sum August 2018")

# Add border lines using `map` from the `maps`
map("world", interior = FALSE, lwd = 1.5, col = "gray20", add = TRUE)

Another way to plot your data is to use the brick and plot functions from the raster package. brick imports the data into a “multi-layer raster object” and plot will automatically generate a nice plot. The advantage of this method is that the raster objects are spatial objects, so we can work with projections when plotting if needed.

sdu_ras <- brick(data_path, varname = "SDU")
plot(sdu_ras)

You can also add country borders by using the countriesHigh dataset from the rworldxtra package. lwd and col set the size and colour of the border lines. We use as() to convert the countriesHigh dataset from polygons to lines, since we only want the outlines.

data("countriesHigh")
plot(sdu_ras)
world_countries <- as(countriesHigh, "SpatialLines")
raster::plot(world_countries, add = TRUE, lwd = 1.5, col = "grey20")

You can also change the projection of a raster object using the projectRaster() function and setting the crs (coordinates reference system) parameter. We do this for both the sunshine duration data and the countries data.

new_proj <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +datum=WGS84"
sdu_ras_new_proj <- projectRaster(sdu_ras, crs = new_proj)
plot(sdu_ras_new_proj)

world_countires_new_proj <- sp::spTransform(world_countries, CRSobj = crs(sdu_ras_new_proj))
raster::plot(world_countires_new_proj, add = TRUE, lwd = 1.5, col = "grey20")

Another data extraction method

Another way to extract data from a NetCDF file is to use the tidync package to import the file into a “tidy” data frame format. That means the data will be in a long table, with a column for the variable (sdu) and one column for each dimension (lat, lon, time). We use the functions tidync() and hyper_tibble() to read and then convert the data to a “tibble” (dataframe). The %>% is called a “pipe” and it’s just a convenient way of applying functions after another. It is the same as hyper_tibble(tidync(data_path)) but a bit more readable. You often also use this with the hyper_filter() function to extract a subset.

The head() function give us the first few rows of the data frame.

This format might be useful for manipulating and summarising the data, and if you want to use the data with the “tidyverse” packages such as ggplot2 for graphics, and dplyr for data manipulation.

sdu_df <- tidync(data_path) %>% hyper_tibble()
head(sdu_df)
## # A tibble: 6 x 4
##     SDU   lon   lat  time
##   <dbl> <dbl> <dbl> <dbl>
## 1  281. -20      35 12996
## 2  278. -20.0    35 12996
## 3  277. -19.9    35 12996
## 4  272. -19.9    35 12996
## 5  272. -19.8    35 12996
## 6  273. -19.8    35 12996