Introduction

This example will utilise a file from EUMETSAT CM SAF. We will work with a NetCDF file of monthly Direct Normalized Irradiance over Germany for the 30 year period 1983 to 2012 from the Climate Data Record SARAH 2.1. This data file should be provided along with this script and is called DNI_1983-01-01-2012-12-01.nc. Put this data file in the same folder as this script is in.

For further information about downloading data from EUMETSAT CM SAF you can check the webpage https://wui.cmsaf.eu/ where you can browse the product catalog. There is also an online short course about this data record offered by EUMETSAT which includes tutorials on accessing data https://training.eumetsat.int/course/view.php?id=378.

This tutorial will show you how to generate a time series plot for both an area and a point location as well as show how to generate basic statistics from your file.

Getting set up

We begin by importing the libraries we need for working with the data. If the packages are already installed, you can load them with the library() function. If this gives an error, you need to install them first using install.packages().

It’s good practice to load the packages used in your script in one place near the top of the file. This makes it easier for someone else you might share the file with to quickly see which packages they need to run your file.

# If you have not previously installed these packages, do this first by running the line below without the # sign. 
# You only need to do this once.
# install.packages(c("ncdf4", "ncdf4.helpers", "cmsaf", "lubridate", "ggplot2"))

# 'ncdf4', 'ncdf4.helpers' are for working with NetCDF files
# 'cmsaf' has the CM SAF R Toolbox and will also load `cmsafops`, which has helpful functions for working with CM SAF data.
# 'ggplot2' for graphics
# 'lubridate' for working with dates
library(ncdf4)
library(ncdf4.helpers)
library(cmsaf)
library(lubridate)
library(ggplot2)

It’s often useful to set the working directory in R. This is just the default directory that R will use for reading and writing files. Files directly in your working directory can be referred to just with their name (or relative path). You can still access files outside your working directory by using the full path.

If you keep your scripts and data files in the same folder, you do not need to provide the full paths. However, you may want to store things in different places and so it is good practice to be specific.

Change the path within setwd() below to the folder that this file is in. Remember in R you must always use / as the path separator.

You can confirm that you set this correctly by running getwd() which should return the directory you set.

# Change the path below to the folder this file is in.
setwd("C:/Users/Danny/Documents/EO Coding R scripts")
# Confirm the working directory was set correctly.
getwd()
## [1] "C:/Users/Danny/Documents/EO Coding R scripts"

Now confirm that the data file DNI_1983-01-01-2012-12-01.nc exists. If you placed the file within the same folder as this script file, you will not need to change the data_path variable below because it is directly in the working directory. If it is somewhere else, you can change data_path below with the full file path. If the file exists then file.exists will return TRUE. If it doesn’t, you need to check the file is in the correct place or your path may be incorrect.

data_path <- "DNI_1983-01-01-2012-12-01.nc"
file.exists(data_path)
## [1] TRUE

If this returns TRUE then we’re ready to start working with the data!

Reading a NetCDF file

Connect to the NetCDF file using the nc_open() function from the ncdf4 package. We save the connection to the file as a variable called nc.

nc <- nc_open(data_path)

First, get the list of variable names and dimension names from the file. The variables are in nc$var and dimensions in nc$dim. We use the names() function to get the names from these lists.

names(nc$var)
## [1] "DNI"
names(nc$dim)
## [1] "lon"  "lat"  "time"

We can also get this information from the ncinfo() function from cmsafops.

ncinfo(nc = nc)
## The file: DNI_1983-01-01-2012-12-01.nc contains: 
## 
## Variable:
## DNI 
## 
## With following dimensions:
## lon with length 181 (range 6 to 15)
## lat with length 161 (range 47 to 55)
## time with length 360 (range 1983-01-01 to 2012-12-01)

Notice that there are 3 dimensions in this file, and 360 time points, one for each month between 1983 and 2015.

Reading the metadata from a NetCDF file

NetCDF files are self-describing, and contain useful metadata that gives you important information about your data.

The ncatt_get() function gets the attributes, or metadata, from the file. See the help for this function by running ?ncatt_get. It shows us that we need to specify two arguments, nc (the nc object) and varid (the variable we want the metadata for).

ncatt_get(nc, "DNI")
## $units
## [1] "W m-2"
## 
## $`_FillValue`
## [1] -999
## 
## $standard_name
## [1] "undefined"
## 
## $long_name
## [1] "Surface Normalized Direct Shortwave Flux"
## 
## $cmsaf_info
## [1] "cmsaf::box_mergetime for variable DNI"

The gives us useful metadata for “DNI” such as the units of the values, the missing value code, the standard name (if defined) and a more descriptive long name. We also see that this file was generated using box_mergetime.

Notice this line from the help file:

As a special case, if varid==0, then a global (file) attribute will be read rather than a particular variable’s attribute.

Let’s try that to get the global metadata too.

ncatt_get(nc, 0)
## $Info
## [1] "Created with the CM SAF R Toolbox."
## 
## $institution
## [1] "EUMETSAT/CMSAF"
## 
## $title
## [1] "Surface Solar Radiation Climate Data Record using Heliosat, Edition 2 (SARAH-2); CM SAF"
## 
## $summary
## [1] "This file contains data from the CM SAF Surface Solar Radiation Climate Data Record using Heliosat, Edition 2 (SARAH-2)."
## 
## $id
## [1] "DOI:10.5676/EUM_SAF_CM/SARAH/V002"
## 
## $creator_name
## [1] "DE/DWD"
## 
## $creator_email
## [1] "contact.cmsaf@dwd.de"
## 
## $creator_url
## [1] "http://www.cmsaf.eu/"
## 
## $creator_type
## [1] "institution"
## 
## $publisher_name
## [1] "EUMETSAT/CMSAF"
## 
## $publisher_email
## [1] "contact.cmsaf@dwd.de"
## 
## $publisher_url
## [1] "http://www.cmsaf.eu/"
## 
## $publisher_type
## [1] "institution"
## 
## $references
## [1] "http://dx.doi.org/10.5676/EUM_SAF_CM/SARAH/V002"
## 
## $keywords_vocabulary
## [1] "GCMD Science Keywords, Version 8.1"
## 
## $keywords
## [1] "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > INCOMING SOLAR RADIATION"
## 
## $project
## [1] "Satellite Application Facility on Climate Monitoring (CM SAF)"
## 
## $standard_name_vocabulary
## [1] "Standard Name Table (v28, 07 January 2015)"
## 
## $geospatial_lat_units
## [1] "degrees_north"
## 
## $geospatial_lon_units
## [1] "degrees_east"
## 
## $geospatial_lat_resolution
## [1] "0.05 degree"
## 
## $geospatial_lon_resolution
## [1] "0.05 degree"
## 
## $platform_vocabulary
## [1] "GCMD Platforms, Version 8.1"
## 
## $platform
## [1] "Earth Observation Satellites > METEOSAT "
## 
## $instrument_vocabulary
## [1] "GCMD Instruments, Version 8.1"
## 
## $instrument
## [1] "MVIRI > Meteosat Visible Infra-Red Imager"
## 
## $date_created
## [1] "2017-01-24T15:08:42Z"
## 
## $product_version
## [1] "2.0"

Here we get lots of information about the dataset, its source and publisher, geographic and instrument details.

Basic Statistics

Let’s do some calculations.

First, extract the “DNI” variable as an array using ncvar_get() from ncdf4. The dim functions tell us the dimensions of the array.

dni_array <- ncvar_get(nc, "DNI")
dim(dni_array)
## [1] 181 161 360

The summary() function quickly gives us six useful summaries of all the data.

summary(dni_array)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     8.0    61.0   112.0   117.3   165.0   365.0

This gives us all the summary numbers of a boxplot - and the mean. Notice that it doesn’t mention missing values (NAs) which means there are no missing values in the data.

To look at the distribution of values, we can plot a histogram.

hist(dni_array, main = "Monthly DNI over Germany: 1983 - 2012", 
     xlab = "DNI (W m-2)")

The time variables

Let’s have a look at the time variable using ncvar_get.

nc_time_var <- ncvar_get(nc, "time")
nc_time_var
##   [1]     0    31    59    90   120   151   181   212   243   273   304   334
##  [13]   365   396   425   456   486   517   547   578   609   639   670   700
##  [25]   731   762   790   821   851   882   912   943   974  1004  1035  1065
##  [37]  1096  1127  1155  1186  1216  1247  1277  1308  1339  1369  1400  1430
##  [49]  1461  1492  1520  1551  1581  1612  1642  1673  1704  1734  1765  1795
##  [61]  1826  1857  1886  1917  1947  1978  2008  2039  2070  2100  2131  2161
##  [73]  2192  2223  2251  2282  2312  2343  2373  2404  2435  2465  2496  2526
##  [85]  2557  2588  2616  2647  2677  2708  2738  2769  2800  2830  2861  2891
##  [97]  2922  2953  2981  3012  3042  3073  3103  3134  3165  3195  3226  3256
## [109]  3287  3318  3347  3378  3408  3439  3469  3500  3531  3561  3592  3622
## [121]  3653  3684  3712  3743  3773  3804  3834  3865  3896  3926  3957  3987
## [133]  4018  4049  4077  4108  4138  4169  4199  4230  4261  4291  4322  4352
## [145]  4383  4414  4442  4473  4503  4534  4564  4595  4626  4656  4687  4717
## [157]  4748  4779  4808  4839  4869  4900  4930  4961  4992  5022  5053  5083
## [169]  5114  5145  5173  5204  5234  5265  5295  5326  5357  5387  5418  5448
## [181]  5479  5510  5538  5569  5599  5630  5660  5691  5722  5752  5783  5813
## [193]  5844  5875  5903  5934  5964  5995  6025  6056  6087  6117  6148  6178
## [205]  6209  6240  6269  6300  6330  6361  6391  6422  6453  6483  6514  6544
## [217]  6575  6606  6634  6665  6695  6726  6756  6787  6818  6848  6879  6909
## [229]  6940  6971  6999  7030  7060  7091  7121  7152  7183  7213  7244  7274
## [241]  7305  7336  7364  7395  7425  7456  7486  7517  7548  7578  7609  7639
## [253]  7670  7701  7730  7761  7791  7822  7852  7883  7914  7944  7975  8005
## [265]  8036  8067  8095  8126  8156  8187  8217  8248  8279  8309  8340  8370
## [277]  8401  8432  8460  8491  8521  8552  8582  8613  8644  8674  8705  8735
## [289]  8766  8797  8825  8856  8886  8917  8947  8978  9009  9039  9070  9100
## [301]  9131  9162  9191  9222  9252  9283  9313  9344  9375  9405  9436  9466
## [313]  9497  9528  9556  9587  9617  9648  9678  9709  9740  9770  9801  9831
## [325]  9862  9893  9921  9952  9982 10013 10043 10074 10105 10135 10166 10196
## [337] 10227 10258 10286 10317 10347 10378 10408 10439 10470 10500 10531 10561
## [349] 10592 10623 10652 10683 10713 10744 10774 10805 10836 10866 10897 10927

These are numeric values. NetCDF files usually stores dates and times as numbers, but for us it would be more useful to have these as dates. A simple way to do this is to use the nc.get.time.series() function from ncdf4.helpers.

date_time_var <- ncdf4.helpers::nc.get.time.series(nc, "DNI", "time")
# print the top and bottom values
head(date_time_var)
## [1] "1983-01-01" "1983-02-01" "1983-03-01" "1983-04-01" "1983-05-01"
## [6] "1983-06-01"
tail(date_time_var)
## [1] "2012-07-01" "2012-08-01" "2012-09-01" "2012-10-01" "2012-11-01"
## [6] "2012-12-01"

This function relies on the file having sensible metadata. If this doesn’t work, we can also do it manually ourselves.

First, get the metadata to find out how to interpret the numeric time values.

ncatt_get(nc, "time")
## $units
## [1] "days since 1983-01-01 00:00:00"
## 
## $long_name
## [1] "time"
## 
## $standard_name
## [1] "time"
## 
## $calendar
## [1] "proleptic_gregorian"

Note that the units are “days since 1983-01-01 00:00:00”.

We could use get_time from cmsafops to convert to a date-time object using this information.

units <- ncatt_get(nc, "time")$units
date_time_var <- cmsafops::get_time(units, nc_time_var)
head(date_time_var)
## [1] "1983-01-01 UTC" "1983-02-01 UTC" "1983-03-01 UTC" "1983-04-01 UTC"
## [5] "1983-05-01 UTC" "1983-06-01 UTC"
tail(date_time_var)
## [1] "2012-07-01 UTC" "2012-08-01 UTC" "2012-09-01 UTC" "2012-10-01 UTC"
## [5] "2012-11-01 UTC" "2012-12-01 UTC"

We could also convert to a date using as.Date() and specifying the origin. However, this only gives the date part, and ignores the time component. So this wouldn’t be appropriate for hourly data, for example, but is ok for our monthly data here.

date_time_var <- as.Date(nc_time_var, origin = as.Date("1983/01/01"))
head(date_time_var)
## [1] "1983-01-01" "1983-02-01" "1983-03-01" "1983-04-01" "1983-05-01"
## [6] "1983-06-01"
tail(date_time_var)
## [1] "2012-07-01" "2012-08-01" "2012-09-01" "2012-10-01" "2012-11-01"
## [6] "2012-12-01"

You can check that these give the same result by printing out the result.

Summarising over dimensions

Let’s begin by calculating the mean DNI over Germany for each month. This involves summarising our data by getting the mean of all the grid points each month. This is sometimes called a “field mean”.

We’ll show two methods of doing this.

Method 1: The “R” way

If you have experience with other programming languages you might have thought that this kind of calculation sounds like it could be done by using a “for loop”, because we are looping over each time step and doing a calculation. And you are right! And that’s method 2. But in R, many functions are set up to already work with arrays and vectors, so you can often avoid using a for loop. There’s no problem with for loops, and if you are comfortable with them you should use them, but here we show another method using the very powerful apply() function.

field_means <- apply(dni_array, 3, mean)
field_means
##   [1]  41.79290  77.87777  80.76130 106.87969 107.94609 189.29937 247.63121
##   [8] 200.58533 132.92924 100.09540  75.83250  50.43667  40.65156  84.48046
##  [15] 139.02460 175.10250 112.43087 131.74109 150.36461 170.62342  81.58165
##  [22]  91.12141  74.78721  46.19574  46.18153  75.13301  67.86088 124.64867
##  [29] 161.84125 127.24766 186.05686 168.61902 148.29783 117.07103  41.98868
##  [36]  36.94252  37.25730 101.02268  76.00069 100.72132 170.80213 210.67407
##  [43] 170.03483 155.90337 137.54700 111.49741  77.27545  43.36437  42.66556
##  [50]  50.98881 100.21441 162.08051 120.30658 108.98418 162.44899 122.30284
##  [57] 132.13109 105.21825  36.33664  38.17628  42.74146  55.37459  57.18153
##  [64] 172.94005 190.47630 128.18596 162.50201 162.42246 103.26104  80.49696
##  [71]  62.37030  27.18095  61.98857  73.22494 114.38701 103.08140 257.23486
##  [78] 191.77118 162.17443 168.68824 144.32456 104.09207  93.89407  52.07062
##  [85]  47.46234  97.50558 118.35822 139.45966 231.46182 129.84647 201.67873
##  [92] 198.71906  97.03703 125.94046  40.25703  35.20596  66.29340  83.85127
##  [99] 100.65101 160.12357 141.09433 127.71367 216.16897 193.58914 150.79057
## [106] 123.52400  48.43008  53.93092  56.20603  68.63179  97.36135 130.57853
## [113] 252.81730 201.76325 185.19701 176.14001 154.86648  77.20209  50.84417
## [120]  54.69449  62.52864  64.38612 123.83213 170.97485 194.21070 166.64747
## [127] 134.98363 170.46477 103.00896  74.21996  48.12172  27.91016  41.12127
## [134]  78.61137  79.66902 128.26327 155.40417 185.73326 259.70149 174.74798
## [141] 101.26169 117.16942  51.70104  42.21914  36.25679  60.74905 105.38753
## [148] 116.96949 178.57692 138.27267 233.71902 203.70142  98.01754 111.88840
## [155]  51.69586  36.33307  55.38815  56.11455 107.81929 179.60303 113.36392
## [162] 181.44755 163.83171 160.83748 105.92828  82.72111  40.25480  42.24773
## [169]  45.44027  77.38321 107.30136 156.08775 184.82835 170.51141 164.53495
## [176] 212.26159 182.73041 107.16279  55.42775  31.95436  55.72643  81.25339
## [183] 100.32768  99.56817 199.78340 155.79843 126.72310 176.76216  94.28441
## [190]  54.02622  43.45946  39.22961  50.30582  47.49487 100.13500 140.62929
## [197] 187.53159 174.41371 198.95577 151.03445 167.98984  99.04760  50.51549
## [204]  30.49161  44.14564  69.39981  71.15147 154.30033 207.90649 217.01524
## [211] 107.86006 189.02721 127.53063  67.97066  61.19485  48.29326  51.71233
## [218]  72.56103  62.17618 107.62530 219.74514 160.37044 203.09845 181.26149
## [225]  72.22082 103.90186  53.50057  34.38681  41.40283  76.69390 124.57881
## [232] 150.23204 154.13249 193.50613 149.46097 167.56906 143.82945  81.08840
## [239]  40.59665  39.28613  33.88919 107.83014 154.28266 192.78165 175.80159
## [246] 242.38784 185.52112 234.07344 178.94516 100.14979  67.54497  50.50383
## [253]  32.23235  64.16516 109.03922 165.61803 156.97008 162.26687 161.42720
## [260] 174.61779 161.93130  95.33578  46.71188  43.28070  45.85797  59.57236
## [267]  88.20655 167.51453 168.10559 201.89060 156.73848 147.17189 170.14608
## [274] 152.36042  56.37260  32.92269  71.59929  54.80560  77.34635 118.46127
## [281] 168.98020 217.01808 270.21166 110.85876 184.80076 103.29941  63.24769
## [288]  54.05686  41.83628  68.83501 144.07052 263.42493 183.12100 163.42267
## [295] 157.37192 160.09210 121.92715 105.46913  45.42147  46.30905  49.80660
## [302] 116.81538  94.11365 128.50184 240.91394 197.04036 176.52394 145.45163
## [309] 117.00563  93.01325  46.73556  48.13857  55.95666  46.59850  80.54878
## [316] 228.01524 188.54343 166.02152 169.01112 205.27943 153.62373  78.37905
## [323]  53.88446  37.37655  40.36584  48.47874 101.45925 193.75862  99.03040
## [330] 210.02536 225.16938 120.55581 123.41900 107.96610  38.96109  33.15003
## [337]  40.85749  79.86682 165.56213 219.41080 228.40280 169.48842 128.05545
## [344] 160.85800 160.93199 125.75135  92.32641  33.82399  51.69582  78.42212
## [351] 154.70430 132.31179 197.75159 145.42246 158.94369 194.47215 143.08147
## [358]  99.55286  51.65787  34.42572

apply() essentially does the looping for you. Let’s breakdown the arguments:

Have a look at the help with ?apply to see the possible argumens.

dni_array is the array we are calculating on.

3 is the “MARGIN” argument. This is the dimension we are looping over. Since time is the third dimension of our array we set MARGIN to 3.

mean is the “FUN” argument. This is the function we are apply()ing to each part of the data.

This is a very concise method, but you need to understand the apply() function well to make sure you get the result you want.

Method 2: The “for loop” way

If you’re more comfortable with the idea of a for loop, then here is an equivalent method.

# Initialise a numeric vector with the same length as the time variable
field_means <- vector("numeric", length(nc_time_var))
# Loop over each time point
for (i in seq_along(nc_time_var)) {
  # Calculate the mean over all lon and lat points for each time point, i
  field_means[i] <- mean(dni_array[ , , i], na.rm = TRUE)
}

Let’s break down the different parts of the code above to make sure we understand what’s going on:

First we initialise a numeric vector with the length we want, which is the length of the time variable. This just creates a vector of length 360 with default values of 0. It’s good practice to create the vector with the right length up front if we know it.

vector("numeric", length(nc_time_var))
##   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##  [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

for is for doing a loop i.e. repeating the same code a fixed number of times with a different counter variable each time.

seq_along(nc_time_var) just gives us a list from 1 to 360 to use as the counter in the for loop.

seq_along(nc_time_var)
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
## [199] 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
## [217] 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
## [235] 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252
## [253] 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270
## [271] 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288
## [289] 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
## [307] 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
## [325] 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342
## [343] 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360

Finally, field_means[i] refers to the i’th value of our field_means vector.

And dni_array[ , , i] extracts all the values for time point i. We can specify three entries inside the [ ] because dni_array is a 3-dimensional array. The first two values are blank, which means take all the values for that dimension. So we get all the values over the lon and lat dimensions (the first two dimensions) and just the values for the i’th time step. So we have taken a 2-dimensional slice from the array, which we then calculate the mean of.

# e.g. this extracts a 2-dimensional slice
first_time_point <- dni_array[ , , 1]
dim(first_time_point)
## [1] 181 161

Whichever method you used, now let’s plot the results. Remember we now have a one-dimensional array (vector) of the means for each month, and we have the time points. So we can plot these as a line plot.

plot(date_time_var, field_means, type = "l", 
     main = "Mean Monthly DNI over Germany 1983 - 2012",
     xlab = "Date",
     ylab = "DNI (W m-2)")
abline(h = mean(field_means), col = "blue")

If we want to do the plot just for the first year, then we simply extract just the first 12 values from our vectors. Note that we use type = "b" to get both points and lines.

plot(date_time_var[1:12], field_means[1:12], type = "b", 
     main = "Mean Monthly DNI over Germany in 1983",
     xlab = "Month",
     ylab = "DNI (W m-2)")

Extracting a single point

We could do the same kind of plot for a single point within Germany.

We know that the extraction will be in the form dni_array[ , , ].

To extract a single point for the whole time series, we need a single number for the first and second (lon and lat) index and a blank value for the time index (so that we get all time points). So something of the form dni_array[x1, y1, ].

Let’s extract the DNI at Frankfurt, which has longitude 8.68 and latitude 50.11.

You might be tempted to try dni_array[8.68, 50.11, ], however this won’t give us what we want. This will instead extract the data at the 8th longitude value and 50th latitude value. Remember, we have to provide an index within [ ], not a value.

So first, we need to get the lon and lat arrays, and then work out which position in these array our location is.

We define the point we want to extract, then get the lon and lat arrays from the nc object. Before going further, we check that the points we want to extract are within the bounds of our grid. If these both return TRUE our point is within the bounds.

point_lon <- 8.68
point_lat <- 50.11
lon_vals <- ncvar_get(nc, "lon")
lat_vals <- ncvar_get(nc, "lat")

print("Longitude range check:")
## [1] "Longitude range check:"
point_lon >= min(lon_vals) && point_lon <= max(lon_vals)
## [1] TRUE
print("Latitude range check:")
## [1] "Latitude range check:"
point_lat >= min(lat_vals) && point_lat <= max(lat_vals)
## [1] TRUE

Now we find the index of the lon and lat points which are closest to our point.

lon_index <- which.min(abs(lon_vals - point_lon))
lat_index <- which.min(abs(lat_vals - point_lat))

lon_index
## [1] 55
lat_index
## [1] 63
print("Closest grid point:")
## [1] "Closest grid point:"
lon_vals[lon_index]
## [1] 8.7
lat_vals[lat_index]
## [1] 50.1

First we calculated the absolute value of the difference between all the lon/lat points and our point: abs(lon_vals - point_lon). The minimum of these gives us the closest point to our point. which.min gives us the index of that minimum, since we need the index for the extraction. You can print these values out by using the calculated index values to check it gives a point close to our point.

Now we have the index values we need to extract from dni_array.

point_dni <- dni_array[lon_index, lat_index, ]

And we can plot it in a similar way.

plot(date_time_var, point_dni, type = "l", main = "DNI at Frankfurt 1983 - 2012")

Extracting an area

Using similar ideas as above, we could also extract a smaller region from our data, which we could then do further calculations on.

range_lon <- c(7, 8)
range_lat <- c(50, 50.5)

print("Longitude range check:")
## [1] "Longitude range check:"
range_lon[1] <= range_lon[2] && range_lon[1] >= min(lon_vals) && range_lon[2] <= max(lon_vals)
## [1] TRUE
print("Latitude range check:")
## [1] "Latitude range check:"
range_lat[1] <= range_lat[2] && range_lat[1] >= min(lat_vals) && range_lat[2] <= max(lat_vals)
## [1] TRUE
lon_min_index = which.min(abs(lon_vals - range_lon[1]))
lon_max_index = which.min(abs(lon_vals - range_lon[2]))

lat_min_index = which.min(abs(lat_vals - range_lat[1]))
lat_max_index = which.min(abs(lat_vals - range_lat[2]))

We created the extraction ranges as vectors with 2 values, a min and a max. The c() function is for creating vectors: range_lon <- c(7, 8)

Again, we check the values are within the range of the data.

Then we calculate the four index values we need using which.min() again. This now gives us a range of index values we can use for extraction.

dni_region <- dni_array[lon_min_index:lon_max_index, lat_min_index:lat_max_index, ]
dim(dni_region)
## [1]  21  11 360

We use the format a:b to specify a range of indices to extract. Then we check the dimensions of our extraction and we see we now have 21 lon points, 11 lat points and all 360 time points.

Now try to calculate the mean DNI over this region using either of the methods shown before.

More time series analysis

There’s lots more time series analysis and graphs that can be produced from this data. If you’re doing more time series work, either for a single point or from summarising over a region (e.g. field mean) then a natural structure to be work with this data is a data frame i.e. tabular data with a number of columns of the same length, like you might use in a spreadsheet and for in-situ data.

Data frames are also the data format that the “tidyverse” set of packages use for everything from data manipulation, graphics and modelling. These are a very powerful, coherent set of R packages which is well worth learning if you’re planning to continue learning R. The “tidyverse” includes the ggplot2 R package, which is becoming the standard graphics package in R and has an incredibly powerful and flexible plotting system.

The ggplot2 and “tidyverse” system of packages takes a bit of learning, and is beyond the scope of this short course, but here is an example below showing what’s possible by using data frames and ggplot2.

# Create a data frame with 2 columns for date and DNI
field_df <- data.frame(date = date_time_var, dni = field_means)
# Create month and year columns from the date
field_df$month <- month(field_df$date, label = TRUE)
field_df$year <- year(field_df$date)
# Boxplots of DNI for each month
ggplot(field_df, aes(x = month, y = dni)) +
  geom_boxplot()

# Line plot of DNI with trend line including standard error
ggplot(field_df, aes(x = date, y = dni)) +
  geom_line() +
  geom_smooth(method = "lm")