Up | Back | Next

Data Required for the Study

In order to perform downscaling of a GCM ensemble against global solar radiation observations, two types of data sources are required. The first consists of the GCM outputs for historical data as well as the future scenarios of interest and the second type of data consists of the ground truth observations that the models will leverage in learning the mapping function for downscaling.

GCM outputs are sourced from the Centre for Environmental Data Analysis (CEDA) online archive which makes available the GCM output collection CMIP5 project [1]. Daily atmospheric model outputs for historical, RCP4.5 and RCP8.5 runs are sourced from this archive. The models include CSIRO-BOM ACCESS1-0 (grid size \(1.25{^\circ} \times 1.875{^\circ}\)) [2], MOHC Hadley-GEM2-CC (grid size \(1.25{^\circ} \times 1.875{^\circ}\)) [3] and the MRI MRI-CGCM3 (grid size \(1.12148{^\circ} \times 1.125{^\circ}\)) [4]. The runs for the historical outputs span the time range between 1950-01-01T12:00:00 and 2006-01-01T00:00:00. Both RCP4.5 and RCP8.5 runs span the time range between 2006-01-01T00:00:00 and 2101-01-01T00:00:00. Variables output by the model are indexed by dimensions for longitude, latitude, time, atmospheric pressure (at 8 levels), or as near surface readings. The available set of variables are listed in Table 4. Data storage requirements for the raw data of each model varies between 850Gb to 1.5 Tb.

Variable Description Units Spatial and Temporal Dimensions
clt Cloud  Area  Fraction \[\%\] lon,lat,time
hfls Surface  Upward  Latent  Heat  Flux \[\mathbf{W}\mathbf{m}^{\left. \ - \mathbf{2} \right.\ }\] lon,lat,time
hfss Surface  Upward  Sensible  Heat  Flux \[\mathbf{W}\mathbf{m}^{\left. \ - \mathbf{2} \right.\ }\] lon,lat,time
hur Relative  Humidity \[\%\] lon,lat,plev8,time
hus Near  Surface  Specific  Humidity \[\mathbf{g}\ \mathbf{k}\mathbf{g}^{- \mathbf{1}}\] lon,lat,plev8,time
huss Near-Surface  Specific  Humidity \[\mathbf{g}\ \mathbf{k}\mathbf{g}^{- \mathbf{1}}\] lon,lat,time,height2m
pr Precipitation \[\mathbf{\text{kg}}\ \mathbf{m}^{- \mathbf{2}}\mathbf{s}^{- \mathbf{1}}\] lon,lat,time
prc Convective  Precipitation \[\mathbf{\text{kg}}\ \mathbf{m}^{- \mathbf{2}}\mathbf{s}^{- \mathbf{1}}\] lon,lat,time
prsn Solid  Precipitation \[\mathbf{\text{kg}}\ \mathbf{m}^{- \mathbf{2}}\mathbf{s}^{- \mathbf{1}}\] lon,lat,time
psl Sea  Level  Pressure \[\mathbf{\text{Pa}}\] lon,lat,time
rhs Near  Surface  Relative  Humidity \[\%\] lon,lat,time,height2m
rhsmax Surface  Daily  Max  Relative  Humidity \[\%\] lon,lat,time,height2m
rhsmin Surface  Daily  Min  Relative  Humidity \[\%\] lon,lat,time,height2m
rlds Surface  Downwelling  Longwave  Radiation \[\mathbf{W}\mathbf{m}^{- \mathbf{2}}\] lon,lat,time
rlus Surface  Upwelling  Longwave  Radiation \[\mathbf{W}\mathbf{m}^{- \mathbf{2}}\] lon,lat,time
rlut TOA  Outgoing  Longwave  Radiation \[\mathbf{W}\mathbf{m}^{- \mathbf{2}}\] lon,lat,time
rsds Surface  Downwelling  Shortwave  Radiation \[\mathbf{W}\mathbf{m}^{- \mathbf{2}}\] lon,lat,time
rsus Surface  Upwelling  Shortwave  Radiation \[\mathbf{W}\mathbf{m}^{- \mathbf{2}}\] lon,lat,time
sfcWind Wind  Speed \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] lon,lat,time
sfcWindmax Daily  Maximum  Near-Surface  Wind  Speed \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] lon,lat,time,height10m
ta Air  Temperature \[\mathbf{K}\] lon,lat,plev8,time
tas Near  Surface  Air  Temperature \[\mathbf{K}\] lon,lat,time,height2m
tasmax Daily  Max  Near  Surface  Air  Temperature \[\mathbf{K}\] lon,lat,time,height2m
tasmin Daily  Min  Near  Surface  Air  Temperature \[\mathbf{K}\] lon,lat,time,height2m
ua Eastward  Wind \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] lon,lat,plev8,time
uas Eastern  Near-Surface  Wind \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] lon,lat,time,height10m
va Northward  Wind \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] lon,lat,plev8,time
vas Northern  Near-Surface  Wind \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] lon,lat,time,height10
wap Omega  (Lagrangian  Tendency  of  Air  Pressure) \[\mathbf{\text{Pa}}\ \mathbf{s}^{- \mathbf{1}}\] lon,lat,time
zg Geopotential Height \[\mathbf{m}\] lon,lat,time

Table 4 List of variables generated by GCM models used as covariates in the downscaling process.

Once obtained, the GCM data needs to be extracted for the sites of interest over time for each profile. This requires several of the nearest locations for each variable to be extracted within the region of interest. As different network modules can receive data in varying formats the pre-processing for this input will mean reusing the data both as lagged timeseries in 1-dimensional vectors and as frames of 2-dimensional matrices. Hence the modelling architecture will predetermine the data processing that is required.

Past observations of global solar radiation \(\text{MJ}m^{- 2}\) (including both direct and indirect radiation) are extracted for each BOM site from the SILO database. In addition, grid point data of interpolated global solar radiation \(\text{MJ}m^{- 2}\) estimates are obtained for use in the 2-dimensional downscaling setting at a resolution of \(0.05{^\circ} \times 0.05{^\circ}\) or approximately 5 km grid square 13[]. The time range available for this data spans from 1859 up until recent observations, hence historical GCM outputs will need to be aligned with observational data for the time range between 1950 and 2006. There is also the advantage of having more current observational data which could be leveraged for assessment of uncertainty in the resultant downscaling for projections since 2006. 1-dimensional observations are extracted and paired with the daily sequences of GCM outputs for the setting where a 1-dimensional output from the model is required. In addition, a 50km bounding box is used to extract a region of \(9\ \times 9\) grid points around each site of interest which is intended for use in the 2-dimensional downscaling setting.

In order to reduce the impact of extreme values in different measures, all data will need to be normalised prior to training the models, a min-max normalisation is applied.

\[\mathrm{\text{normalised}}\left( d_{\text{.j}} \right) = \frac{d_{\text{.j}} - \mathrm{\min}\left( d_{\text{.j}} \right)}{\mathrm{\max}\left( d_{\text{.j}} \right) - \mathrm{\min}\left( d_{\text{.j}} \right)}\]

Where \(d_{\text{.j}}\) is the \(j^{\text{th}}\) column of the input data set, in the case of 2-dimensions, normalisation of each variable occurs prior to conversion to the 2-dimensional format. It is necessary to store the minimum and maximum values for each column so as to transform values back into original units during evaluation, but also to perform normalisation during model inference on new data.


References

[1]
Centre for Environmental Data Analysis. (2020 ). CEDA archive. https://www.ceda.ac.uk

[2]
The Commonwealth Scientific and Industrial Research Organisation; Bureau of Meteorology. (2017 ). WCRP CMIP5: The CSIRO-BOM team ACCESS1-0 model output collection. http://catalogue.ceda.ac.uk/uuid/98a933094fa44e8cb886649cf3f5ba4c

[3]
Met Office Hadley Centre (2012 ). WCRP CMIP5: Met office hadley centre (MOHC) HadGEM2-CC model output collection. lhttp://catalogue.ceda.ac.uk/uuid/2e4f5b3748874c61a265f58039898ea5

[4]
Meteorological Research Institute of the Korean Meteorological Administration (2013 ). WCRP CMIP5: Meteorological research institute of KMA MRI-CGCM3 model output collection. http://catalogue.ceda.ac.uk/uuid/a1febf62fbb54c79ab73e6f9b93bc485

Creative Commons License
Downscaling Global Climate Models with Convolutional and Long-Short-Term Memory Networks for Solar Energy Applications by C.P. Davey is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.