This example will utilise a file from EUMETSAT CM SAF. We will work with a NetCDF file of monthly Direct Normalized Irradiance over Germany for the 30 year period 1983 to 2012 from the Climate Data Record SARAH 2.1. This data file should be provided along with this script and is called DNI_1983-01-01-2012-12-01.nc
. Put this data file in the same folder as this script is in.
For further information about downloading data from EUMETSAT CM SAF you can check the webpage https://wui.cmsaf.eu/ where you can browse the product catalog. There is also an online short course about this data record offered by EUMETSAT which includes tutorials on accessing data https://training.eumetsat.int/course/view.php?id=378.
This tutorial will show you how to generate a time series plot for both an area and a point location as well as show how to generate basic statistics from your file.
We begin by importing the libraries we need for working with the data. If the packages are already installed, you can load them with the library()
function. If this gives an error, you need to install them first using install.packages()
.
It’s good practice to load the packages used in your script in one place near the top of the file. This makes it easier for someone else you might share the file with to quickly see which packages they need to run your file.
# If you have not previously installed these packages, do this first by running the line below without the # sign.
# You only need to do this once.
# install.packages(c("ncdf4", "ncdf4.helpers", "cmsaf", "lubridate", "ggplot2"))
# 'ncdf4', 'ncdf4.helpers' are for working with NetCDF files
# 'cmsaf' has the CM SAF R Toolbox and will also load `cmsafops`, which has helpful functions for working with CM SAF data.
# 'ggplot2' for graphics
# 'lubridate' for working with dates
library(ncdf4)
library(ncdf4.helpers)
library(cmsaf)
library(lubridate)
library(ggplot2)
It’s often useful to set the working directory in R. This is just the default directory that R will use for reading and writing files. Files directly in your working directory can be referred to just with their name (or relative path). You can still access files outside your working directory by using the full path.
If you keep your scripts and data files in the same folder, you do not need to provide the full paths. However, you may want to store things in different places and so it is good practice to be specific.
Change the path within setwd()
below to the folder that this file is in. Remember in R you must always use /
as the path separator.
You can confirm that you set this correctly by running getwd()
which should return the directory you set.
# Change the path below to the folder this file is in.
setwd("C:/Users/Danny/Documents/EO Coding R scripts")
# Confirm the working directory was set correctly.
getwd()
## [1] "C:/Users/Danny/Documents/EO Coding R scripts"
Now confirm that the data file DNI_1983-01-01-2012-12-01.nc
exists. If you placed the file within the same folder as this script file, you will not need to change the data_path
variable below because it is directly in the working directory. If it is somewhere else, you can change data_path
below with the full file path. If the file exists then file.exists
will return TRUE
. If it doesn’t, you need to check the file is in the correct place or your path may be incorrect.
data_path <- "DNI_1983-01-01-2012-12-01.nc"
file.exists(data_path)
## [1] TRUE
If this returns TRUE
then we’re ready to start working with the data!
Connect to the NetCDF file using the nc_open()
function from the ncdf4
package. We save the connection to the file as a variable called nc
.
nc <- nc_open(data_path)
First, get the list of variable names and dimension names from the file. The variables are in nc$var
and dimensions in nc$dim
. We use the names()
function to get the names from these lists.
names(nc$var)
## [1] "DNI"
names(nc$dim)
## [1] "lon" "lat" "time"
We can also get this information from the ncinfo()
function from cmsafops
.
ncinfo(nc = nc)
## The file: DNI_1983-01-01-2012-12-01.nc contains:
##
## Variable:
## DNI
##
## With following dimensions:
## lon with length 181 (range 6 to 15)
## lat with length 161 (range 47 to 55)
## time with length 360 (range 1983-01-01 to 2012-12-01)
Notice that there are 3 dimensions in this file, and 360 time points, one for each month between 1983 and 2015.
NetCDF files are self-describing, and contain useful metadata that gives you important information about your data.
The ncatt_get()
function gets the attributes, or metadata, from the file. See the help for this function by running ?ncatt_get
. It shows us that we need to specify two arguments, nc
(the nc
object) and varid
(the variable we want the metadata for).
ncatt_get(nc, "DNI")
## $units
## [1] "W m-2"
##
## $`_FillValue`
## [1] -999
##
## $standard_name
## [1] "undefined"
##
## $long_name
## [1] "Surface Normalized Direct Shortwave Flux"
##
## $cmsaf_info
## [1] "cmsaf::box_mergetime for variable DNI"
The gives us useful metadata for “DNI” such as the units of the values, the missing value code, the standard name (if defined) and a more descriptive long name. We also see that this file was generated using box_mergetime
.
Notice this line from the help file:
As a special case, if varid==0, then a global (file) attribute will be read rather than a particular variable’s attribute.
Let’s try that to get the global metadata too.
ncatt_get(nc, 0)
## $Info
## [1] "Created with the CM SAF R Toolbox."
##
## $institution
## [1] "EUMETSAT/CMSAF"
##
## $title
## [1] "Surface Solar Radiation Climate Data Record using Heliosat, Edition 2 (SARAH-2); CM SAF"
##
## $summary
## [1] "This file contains data from the CM SAF Surface Solar Radiation Climate Data Record using Heliosat, Edition 2 (SARAH-2)."
##
## $id
## [1] "DOI:10.5676/EUM_SAF_CM/SARAH/V002"
##
## $creator_name
## [1] "DE/DWD"
##
## $creator_email
## [1] "contact.cmsaf@dwd.de"
##
## $creator_url
## [1] "http://www.cmsaf.eu/"
##
## $creator_type
## [1] "institution"
##
## $publisher_name
## [1] "EUMETSAT/CMSAF"
##
## $publisher_email
## [1] "contact.cmsaf@dwd.de"
##
## $publisher_url
## [1] "http://www.cmsaf.eu/"
##
## $publisher_type
## [1] "institution"
##
## $references
## [1] "http://dx.doi.org/10.5676/EUM_SAF_CM/SARAH/V002"
##
## $keywords_vocabulary
## [1] "GCMD Science Keywords, Version 8.1"
##
## $keywords
## [1] "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > INCOMING SOLAR RADIATION"
##
## $project
## [1] "Satellite Application Facility on Climate Monitoring (CM SAF)"
##
## $standard_name_vocabulary
## [1] "Standard Name Table (v28, 07 January 2015)"
##
## $geospatial_lat_units
## [1] "degrees_north"
##
## $geospatial_lon_units
## [1] "degrees_east"
##
## $geospatial_lat_resolution
## [1] "0.05 degree"
##
## $geospatial_lon_resolution
## [1] "0.05 degree"
##
## $platform_vocabulary
## [1] "GCMD Platforms, Version 8.1"
##
## $platform
## [1] "Earth Observation Satellites > METEOSAT "
##
## $instrument_vocabulary
## [1] "GCMD Instruments, Version 8.1"
##
## $instrument
## [1] "MVIRI > Meteosat Visible Infra-Red Imager"
##
## $date_created
## [1] "2017-01-24T15:08:42Z"
##
## $product_version
## [1] "2.0"
Here we get lots of information about the dataset, its source and publisher, geographic and instrument details.
Let’s do some calculations.
First, extract the “DNI” variable as an array using ncvar_get()
from ncdf4
. The dim
functions tell us the dimensions of the array.
dni_array <- ncvar_get(nc, "DNI")
dim(dni_array)
## [1] 181 161 360
The summary()
function quickly gives us six useful summaries of all the data.
summary(dni_array)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.0 61.0 112.0 117.3 165.0 365.0
This gives us all the summary numbers of a boxplot - and the mean. Notice that it doesn’t mention missing values (NAs) which means there are no missing values in the data.
To look at the distribution of values, we can plot a histogram.
hist(dni_array, main = "Monthly DNI over Germany: 1983 - 2012",
xlab = "DNI (W m-2)")
Let’s have a look at the time variable using ncvar_get
.
nc_time_var <- ncvar_get(nc, "time")
nc_time_var
## [1] 0 31 59 90 120 151 181 212 243 273 304 334
## [13] 365 396 425 456 486 517 547 578 609 639 670 700
## [25] 731 762 790 821 851 882 912 943 974 1004 1035 1065
## [37] 1096 1127 1155 1186 1216 1247 1277 1308 1339 1369 1400 1430
## [49] 1461 1492 1520 1551 1581 1612 1642 1673 1704 1734 1765 1795
## [61] 1826 1857 1886 1917 1947 1978 2008 2039 2070 2100 2131 2161
## [73] 2192 2223 2251 2282 2312 2343 2373 2404 2435 2465 2496 2526
## [85] 2557 2588 2616 2647 2677 2708 2738 2769 2800 2830 2861 2891
## [97] 2922 2953 2981 3012 3042 3073 3103 3134 3165 3195 3226 3256
## [109] 3287 3318 3347 3378 3408 3439 3469 3500 3531 3561 3592 3622
## [121] 3653 3684 3712 3743 3773 3804 3834 3865 3896 3926 3957 3987
## [133] 4018 4049 4077 4108 4138 4169 4199 4230 4261 4291 4322 4352
## [145] 4383 4414 4442 4473 4503 4534 4564 4595 4626 4656 4687 4717
## [157] 4748 4779 4808 4839 4869 4900 4930 4961 4992 5022 5053 5083
## [169] 5114 5145 5173 5204 5234 5265 5295 5326 5357 5387 5418 5448
## [181] 5479 5510 5538 5569 5599 5630 5660 5691 5722 5752 5783 5813
## [193] 5844 5875 5903 5934 5964 5995 6025 6056 6087 6117 6148 6178
## [205] 6209 6240 6269 6300 6330 6361 6391 6422 6453 6483 6514 6544
## [217] 6575 6606 6634 6665 6695 6726 6756 6787 6818 6848 6879 6909
## [229] 6940 6971 6999 7030 7060 7091 7121 7152 7183 7213 7244 7274
## [241] 7305 7336 7364 7395 7425 7456 7486 7517 7548 7578 7609 7639
## [253] 7670 7701 7730 7761 7791 7822 7852 7883 7914 7944 7975 8005
## [265] 8036 8067 8095 8126 8156 8187 8217 8248 8279 8309 8340 8370
## [277] 8401 8432 8460 8491 8521 8552 8582 8613 8644 8674 8705 8735
## [289] 8766 8797 8825 8856 8886 8917 8947 8978 9009 9039 9070 9100
## [301] 9131 9162 9191 9222 9252 9283 9313 9344 9375 9405 9436 9466
## [313] 9497 9528 9556 9587 9617 9648 9678 9709 9740 9770 9801 9831
## [325] 9862 9893 9921 9952 9982 10013 10043 10074 10105 10135 10166 10196
## [337] 10227 10258 10286 10317 10347 10378 10408 10439 10470 10500 10531 10561
## [349] 10592 10623 10652 10683 10713 10744 10774 10805 10836 10866 10897 10927
These are numeric values. NetCDF files usually stores dates and times as numbers, but for us it would be more useful to have these as dates. A simple way to do this is to use the nc.get.time.series()
function from ncdf4.helpers
.
date_time_var <- ncdf4.helpers::nc.get.time.series(nc, "DNI", "time")
# print the top and bottom values
head(date_time_var)
## [1] "1983-01-01" "1983-02-01" "1983-03-01" "1983-04-01" "1983-05-01"
## [6] "1983-06-01"
tail(date_time_var)
## [1] "2012-07-01" "2012-08-01" "2012-09-01" "2012-10-01" "2012-11-01"
## [6] "2012-12-01"
This function relies on the file having sensible metadata. If this doesn’t work, we can also do it manually ourselves.
First, get the metadata to find out how to interpret the numeric time values.
ncatt_get(nc, "time")
## $units
## [1] "days since 1983-01-01 00:00:00"
##
## $long_name
## [1] "time"
##
## $standard_name
## [1] "time"
##
## $calendar
## [1] "proleptic_gregorian"
Note that the units are “days since 1983-01-01 00:00:00”.
We could use get_time
from cmsafops
to convert to a date-time object using this information.
units <- ncatt_get(nc, "time")$units
date_time_var <- cmsafops::get_time(units, nc_time_var)
head(date_time_var)
## [1] "1983-01-01 UTC" "1983-02-01 UTC" "1983-03-01 UTC" "1983-04-01 UTC"
## [5] "1983-05-01 UTC" "1983-06-01 UTC"
tail(date_time_var)
## [1] "2012-07-01 UTC" "2012-08-01 UTC" "2012-09-01 UTC" "2012-10-01 UTC"
## [5] "2012-11-01 UTC" "2012-12-01 UTC"
We could also convert to a date using as.Date()
and specifying the origin. However, this only gives the date part, and ignores the time component. So this wouldn’t be appropriate for hourly data, for example, but is ok for our monthly data here.
date_time_var <- as.Date(nc_time_var, origin = as.Date("1983/01/01"))
head(date_time_var)
## [1] "1983-01-01" "1983-02-01" "1983-03-01" "1983-04-01" "1983-05-01"
## [6] "1983-06-01"
tail(date_time_var)
## [1] "2012-07-01" "2012-08-01" "2012-09-01" "2012-10-01" "2012-11-01"
## [6] "2012-12-01"
You can check that these give the same result by printing out the result.
Let’s begin by calculating the mean DNI over Germany for each month. This involves summarising our data by getting the mean of all the grid points each month. This is sometimes called a “field mean”.
We’ll show two methods of doing this.
If you have experience with other programming languages you might have thought that this kind of calculation sounds like it could be done by using a “for loop”, because we are looping over each time step and doing a calculation. And you are right! And that’s method 2. But in R, many functions are set up to already work with arrays and vectors, so you can often avoid using a for loop. There’s no problem with for loops, and if you are comfortable with them you should use them, but here we show another method using the very powerful apply()
function.
field_means <- apply(dni_array, 3, mean)
field_means
## [1] 41.79290 77.87777 80.76130 106.87969 107.94609 189.29937 247.63121
## [8] 200.58533 132.92924 100.09540 75.83250 50.43667 40.65156 84.48046
## [15] 139.02460 175.10250 112.43087 131.74109 150.36461 170.62342 81.58165
## [22] 91.12141 74.78721 46.19574 46.18153 75.13301 67.86088 124.64867
## [29] 161.84125 127.24766 186.05686 168.61902 148.29783 117.07103 41.98868
## [36] 36.94252 37.25730 101.02268 76.00069 100.72132 170.80213 210.67407
## [43] 170.03483 155.90337 137.54700 111.49741 77.27545 43.36437 42.66556
## [50] 50.98881 100.21441 162.08051 120.30658 108.98418 162.44899 122.30284
## [57] 132.13109 105.21825 36.33664 38.17628 42.74146 55.37459 57.18153
## [64] 172.94005 190.47630 128.18596 162.50201 162.42246 103.26104 80.49696
## [71] 62.37030 27.18095 61.98857 73.22494 114.38701 103.08140 257.23486
## [78] 191.77118 162.17443 168.68824 144.32456 104.09207 93.89407 52.07062
## [85] 47.46234 97.50558 118.35822 139.45966 231.46182 129.84647 201.67873
## [92] 198.71906 97.03703 125.94046 40.25703 35.20596 66.29340 83.85127
## [99] 100.65101 160.12357 141.09433 127.71367 216.16897 193.58914 150.79057
## [106] 123.52400 48.43008 53.93092 56.20603 68.63179 97.36135 130.57853
## [113] 252.81730 201.76325 185.19701 176.14001 154.86648 77.20209 50.84417
## [120] 54.69449 62.52864 64.38612 123.83213 170.97485 194.21070 166.64747
## [127] 134.98363 170.46477 103.00896 74.21996 48.12172 27.91016 41.12127
## [134] 78.61137 79.66902 128.26327 155.40417 185.73326 259.70149 174.74798
## [141] 101.26169 117.16942 51.70104 42.21914 36.25679 60.74905 105.38753
## [148] 116.96949 178.57692 138.27267 233.71902 203.70142 98.01754 111.88840
## [155] 51.69586 36.33307 55.38815 56.11455 107.81929 179.60303 113.36392
## [162] 181.44755 163.83171 160.83748 105.92828 82.72111 40.25480 42.24773
## [169] 45.44027 77.38321 107.30136 156.08775 184.82835 170.51141 164.53495
## [176] 212.26159 182.73041 107.16279 55.42775 31.95436 55.72643 81.25339
## [183] 100.32768 99.56817 199.78340 155.79843 126.72310 176.76216 94.28441
## [190] 54.02622 43.45946 39.22961 50.30582 47.49487 100.13500 140.62929
## [197] 187.53159 174.41371 198.95577 151.03445 167.98984 99.04760 50.51549
## [204] 30.49161 44.14564 69.39981 71.15147 154.30033 207.90649 217.01524
## [211] 107.86006 189.02721 127.53063 67.97066 61.19485 48.29326 51.71233
## [218] 72.56103 62.17618 107.62530 219.74514 160.37044 203.09845 181.26149
## [225] 72.22082 103.90186 53.50057 34.38681 41.40283 76.69390 124.57881
## [232] 150.23204 154.13249 193.50613 149.46097 167.56906 143.82945 81.08840
## [239] 40.59665 39.28613 33.88919 107.83014 154.28266 192.78165 175.80159
## [246] 242.38784 185.52112 234.07344 178.94516 100.14979 67.54497 50.50383
## [253] 32.23235 64.16516 109.03922 165.61803 156.97008 162.26687 161.42720
## [260] 174.61779 161.93130 95.33578 46.71188 43.28070 45.85797 59.57236
## [267] 88.20655 167.51453 168.10559 201.89060 156.73848 147.17189 170.14608
## [274] 152.36042 56.37260 32.92269 71.59929 54.80560 77.34635 118.46127
## [281] 168.98020 217.01808 270.21166 110.85876 184.80076 103.29941 63.24769
## [288] 54.05686 41.83628 68.83501 144.07052 263.42493 183.12100 163.42267
## [295] 157.37192 160.09210 121.92715 105.46913 45.42147 46.30905 49.80660
## [302] 116.81538 94.11365 128.50184 240.91394 197.04036 176.52394 145.45163
## [309] 117.00563 93.01325 46.73556 48.13857 55.95666 46.59850 80.54878
## [316] 228.01524 188.54343 166.02152 169.01112 205.27943 153.62373 78.37905
## [323] 53.88446 37.37655 40.36584 48.47874 101.45925 193.75862 99.03040
## [330] 210.02536 225.16938 120.55581 123.41900 107.96610 38.96109 33.15003
## [337] 40.85749 79.86682 165.56213 219.41080 228.40280 169.48842 128.05545
## [344] 160.85800 160.93199 125.75135 92.32641 33.82399 51.69582 78.42212
## [351] 154.70430 132.31179 197.75159 145.42246 158.94369 194.47215 143.08147
## [358] 99.55286 51.65787 34.42572
apply()
essentially does the looping for you. Let’s breakdown the arguments:
Have a look at the help with ?apply
to see the possible argumens.
dni_array
is the array we are calculating on.
3
is the “MARGIN” argument. This is the dimension we are looping over. Since time is the third dimension of our array we set MARGIN to 3.
mean
is the “FUN” argument. This is the function we are apply()
ing to each part of the data.
This is a very concise method, but you need to understand the apply()
function well to make sure you get the result you want.
If you’re more comfortable with the idea of a for loop, then here is an equivalent method.
# Initialise a numeric vector with the same length as the time variable
field_means <- vector("numeric", length(nc_time_var))
# Loop over each time point
for (i in seq_along(nc_time_var)) {
# Calculate the mean over all lon and lat points for each time point, i
field_means[i] <- mean(dni_array[ , , i], na.rm = TRUE)
}
Let’s break down the different parts of the code above to make sure we understand what’s going on:
First we initialise a numeric vector with the length we want, which is the length of the time variable. This just creates a vector of length 360 with default values of 0. It’s good practice to create the vector with the right length up front if we know it.
vector("numeric", length(nc_time_var))
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
for
is for doing a loop i.e. repeating the same code a fixed number of times with a different counter variable each time.
seq_along(nc_time_var)
just gives us a list from 1 to 360 to use as the counter in the for
loop.
seq_along(nc_time_var)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
## [109] 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
## [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
## [145] 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162
## [163] 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
## [181] 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
## [199] 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
## [217] 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234
## [235] 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252
## [253] 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270
## [271] 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288
## [289] 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
## [307] 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324
## [325] 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342
## [343] 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360
Finally, field_means[i]
refers to the i’th value of our field_means
vector.
And dni_array[ , , i]
extracts all the values for time point i. We can specify three entries inside the [ ]
because dni_array
is a 3-dimensional array. The first two values are blank, which means take all the values for that dimension. So we get all the values over the lon
and lat
dimensions (the first two dimensions) and just the values for the i’th time step. So we have taken a 2-dimensional slice from the array, which we then calculate the mean
of.
# e.g. this extracts a 2-dimensional slice
first_time_point <- dni_array[ , , 1]
dim(first_time_point)
## [1] 181 161
Whichever method you used, now let’s plot the results. Remember we now have a one-dimensional array (vector) of the means for each month, and we have the time points. So we can plot these as a line plot.
plot(date_time_var, field_means, type = "l",
main = "Mean Monthly DNI over Germany 1983 - 2012",
xlab = "Date",
ylab = "DNI (W m-2)")
abline(h = mean(field_means), col = "blue")
If we want to do the plot just for the first year, then we simply extract just the first 12 values from our vectors. Note that we use type = "b"
to get both points and lines.
plot(date_time_var[1:12], field_means[1:12], type = "b",
main = "Mean Monthly DNI over Germany in 1983",
xlab = "Month",
ylab = "DNI (W m-2)")
We could do the same kind of plot for a single point within Germany.
We know that the extraction will be in the form dni_array[ , , ]
.
To extract a single point for the whole time series, we need a single number for the first and second (lon and lat) index and a blank value for the time index (so that we get all time points). So something of the form dni_array[x1, y1, ]
.
Let’s extract the DNI at Frankfurt, which has longitude 8.68 and latitude 50.11.
You might be tempted to try dni_array[8.68, 50.11, ]
, however this won’t give us what we want. This will instead extract the data at the 8th longitude value and 50th latitude value. Remember, we have to provide an index within [ ]
, not a value.
So first, we need to get the lon and lat arrays, and then work out which position in these array our location is.
We define the point we want to extract, then get the lon and lat arrays from the nc
object. Before going further, we check that the points we want to extract are within the bounds of our grid. If these both return TRUE
our point is within the bounds.
point_lon <- 8.68
point_lat <- 50.11
lon_vals <- ncvar_get(nc, "lon")
lat_vals <- ncvar_get(nc, "lat")
print("Longitude range check:")
## [1] "Longitude range check:"
point_lon >= min(lon_vals) && point_lon <= max(lon_vals)
## [1] TRUE
print("Latitude range check:")
## [1] "Latitude range check:"
point_lat >= min(lat_vals) && point_lat <= max(lat_vals)
## [1] TRUE
Now we find the index of the lon and lat points which are closest to our point.
lon_index <- which.min(abs(lon_vals - point_lon))
lat_index <- which.min(abs(lat_vals - point_lat))
lon_index
## [1] 55
lat_index
## [1] 63
print("Closest grid point:")
## [1] "Closest grid point:"
lon_vals[lon_index]
## [1] 8.7
lat_vals[lat_index]
## [1] 50.1
First we calculated the absolute value of the difference between all the lon/lat points and our point: abs(lon_vals - point_lon)
. The minimum of these gives us the closest point to our point. which.min
gives us the index of that minimum, since we need the index for the extraction. You can print these values out by using the calculated index values to check it gives a point close to our point.
Now we have the index values we need to extract from dni_array
.
point_dni <- dni_array[lon_index, lat_index, ]
And we can plot it in a similar way.
plot(date_time_var, point_dni, type = "l", main = "DNI at Frankfurt 1983 - 2012")
Using similar ideas as above, we could also extract a smaller region from our data, which we could then do further calculations on.
range_lon <- c(7, 8)
range_lat <- c(50, 50.5)
print("Longitude range check:")
## [1] "Longitude range check:"
range_lon[1] <= range_lon[2] && range_lon[1] >= min(lon_vals) && range_lon[2] <= max(lon_vals)
## [1] TRUE
print("Latitude range check:")
## [1] "Latitude range check:"
range_lat[1] <= range_lat[2] && range_lat[1] >= min(lat_vals) && range_lat[2] <= max(lat_vals)
## [1] TRUE
lon_min_index = which.min(abs(lon_vals - range_lon[1]))
lon_max_index = which.min(abs(lon_vals - range_lon[2]))
lat_min_index = which.min(abs(lat_vals - range_lat[1]))
lat_max_index = which.min(abs(lat_vals - range_lat[2]))
We created the extraction ranges as vectors with 2 values, a min and a max. The c()
function is for creating vectors: range_lon <- c(7, 8)
Again, we check the values are within the range of the data.
Then we calculate the four index values we need using which.min()
again. This now gives us a range of index values we can use for extraction.
dni_region <- dni_array[lon_min_index:lon_max_index, lat_min_index:lat_max_index, ]
dim(dni_region)
## [1] 21 11 360
We use the format a:b
to specify a range of indices to extract. Then we check the dimensions of our extraction and we see we now have 21 lon points, 11 lat points and all 360 time points.
Now try to calculate the mean DNI over this region using either of the methods shown before.
There’s lots more time series analysis and graphs that can be produced from this data. If you’re doing more time series work, either for a single point or from summarising over a region (e.g. field mean) then a natural structure to be work with this data is a data frame i.e. tabular data with a number of columns of the same length, like you might use in a spreadsheet and for in-situ data.
Data frames are also the data format that the “tidyverse” set of packages use for everything from data manipulation, graphics and modelling. These are a very powerful, coherent set of R packages which is well worth learning if you’re planning to continue learning R. The “tidyverse” includes the ggplot2
R package, which is becoming the standard graphics package in R and has an incredibly powerful and flexible plotting system.
The ggplot2
and “tidyverse” system of packages takes a bit of learning, and is beyond the scope of this short course, but here is an example below showing what’s possible by using data frames and ggplot2
.
# Create a data frame with 2 columns for date and DNI
field_df <- data.frame(date = date_time_var, dni = field_means)
# Create month and year columns from the date
field_df$month <- month(field_df$date, label = TRUE)
field_df$year <- year(field_df$date)
# Boxplots of DNI for each month
ggplot(field_df, aes(x = month, y = dni)) +
geom_boxplot()
# Line plot of DNI with trend line including standard error
ggplot(field_df, aes(x = date, y = dni)) +
geom_line() +
geom_smooth(method = "lm")