In order to perform downscaling of a GCM ensemble against global solar radiation observations, two types of data sources are required. The first consists of the GCM outputs for historical data as well as the future scenarios of interest and the second type of data consists of the ground truth observations that the models will leverage in learning the mapping function for downscaling.
GCM outputs are sourced from the Centre for Environmental Data Analysis (CEDA) online archive which makes available the GCM output collection CMIP5 project [1]. Daily atmospheric model outputs for historical, RCP4.5 and RCP8.5 runs are sourced from this archive. The models include CSIRO-BOM ACCESS1-0 (grid size \(1.25{^\circ} \times 1.875{^\circ}\)) [2], MOHC Hadley-GEM2-CC (grid size \(1.25{^\circ} \times 1.875{^\circ}\)) [3] and the MRI MRI-CGCM3 (grid size \(1.12148{^\circ} \times 1.125{^\circ}\)) [4]. The runs for the historical outputs span the time range between 1950-01-01T12:00:00 and 2006-01-01T00:00:00. Both RCP4.5 and RCP8.5 runs span the time range between 2006-01-01T00:00:00 and 2101-01-01T00:00:00. Variables output by the model are indexed by dimensions for longitude, latitude, time, atmospheric pressure (at 8 levels), or as near surface readings. The available set of variables are listed in Table 4. Data storage requirements for the raw data of each model varies between 850Gb to 1.5 Tb.
Variable | Description | Units | Spatial and Temporal Dimensions |
---|---|---|---|
clt | Cloud Area Fraction | \[\%\] | lon,lat,time |
hfls | Surface Upward Latent Heat Flux | \[\mathbf{W}\mathbf{m}^{\left. \ - \mathbf{2} \right.\ }\] | lon,lat,time |
hfss | Surface Upward Sensible Heat Flux | \[\mathbf{W}\mathbf{m}^{\left. \ - \mathbf{2} \right.\ }\] | lon,lat,time |
hur | Relative Humidity | \[\%\] | lon,lat,plev8,time |
hus | Near Surface Specific Humidity | \[\mathbf{g}\ \mathbf{k}\mathbf{g}^{- \mathbf{1}}\] | lon,lat,plev8,time |
huss | Near-Surface Specific Humidity | \[\mathbf{g}\ \mathbf{k}\mathbf{g}^{- \mathbf{1}}\] | lon,lat,time,height2m |
pr | Precipitation | \[\mathbf{\text{kg}}\ \mathbf{m}^{- \mathbf{2}}\mathbf{s}^{- \mathbf{1}}\] | lon,lat,time |
prc | Convective Precipitation | \[\mathbf{\text{kg}}\ \mathbf{m}^{- \mathbf{2}}\mathbf{s}^{- \mathbf{1}}\] | lon,lat,time |
prsn | Solid Precipitation | \[\mathbf{\text{kg}}\ \mathbf{m}^{- \mathbf{2}}\mathbf{s}^{- \mathbf{1}}\] | lon,lat,time |
psl | Sea Level Pressure | \[\mathbf{\text{Pa}}\] | lon,lat,time |
rhs | Near Surface Relative Humidity | \[\%\] | lon,lat,time,height2m |
rhsmax | Surface Daily Max Relative Humidity | \[\%\] | lon,lat,time,height2m |
rhsmin | Surface Daily Min Relative Humidity | \[\%\] | lon,lat,time,height2m |
rlds | Surface Downwelling Longwave Radiation | \[\mathbf{W}\mathbf{m}^{- \mathbf{2}}\] | lon,lat,time |
rlus | Surface Upwelling Longwave Radiation | \[\mathbf{W}\mathbf{m}^{- \mathbf{2}}\] | lon,lat,time |
rlut | TOA Outgoing Longwave Radiation | \[\mathbf{W}\mathbf{m}^{- \mathbf{2}}\] | lon,lat,time |
rsds | Surface Downwelling Shortwave Radiation | \[\mathbf{W}\mathbf{m}^{- \mathbf{2}}\] | lon,lat,time |
rsus | Surface Upwelling Shortwave Radiation | \[\mathbf{W}\mathbf{m}^{- \mathbf{2}}\] | lon,lat,time |
sfcWind | Wind Speed | \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] | lon,lat,time |
sfcWindmax | Daily Maximum Near-Surface Wind Speed | \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] | lon,lat,time,height10m |
ta | Air Temperature | \[\mathbf{K}\] | lon,lat,plev8,time |
tas | Near Surface Air Temperature | \[\mathbf{K}\] | lon,lat,time,height2m |
tasmax | Daily Max Near Surface Air Temperature | \[\mathbf{K}\] | lon,lat,time,height2m |
tasmin | Daily Min Near Surface Air Temperature | \[\mathbf{K}\] | lon,lat,time,height2m |
ua | Eastward Wind | \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] | lon,lat,plev8,time |
uas | Eastern Near-Surface Wind | \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] | lon,lat,time,height10m |
va | Northward Wind | \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] | lon,lat,plev8,time |
vas | Northern Near-Surface Wind | \[\mathbf{m}\ \mathbf{s}^{- \mathbf{1}}\] | lon,lat,time,height10 |
wap | Omega (Lagrangian Tendency of Air Pressure) | \[\mathbf{\text{Pa}}\ \mathbf{s}^{- \mathbf{1}}\] | lon,lat,time |
zg | Geopotential Height | \[\mathbf{m}\] | lon,lat,time |
Table 4 List of variables generated by GCM models used as covariates in the downscaling process.
Once obtained, the GCM data needs to be extracted for the sites of interest over time for each profile. This requires several of the nearest locations for each variable to be extracted within the region of interest. As different network modules can receive data in varying formats the pre-processing for this input will mean reusing the data both as lagged timeseries in 1-dimensional vectors and as frames of 2-dimensional matrices. Hence the modelling architecture will predetermine the data processing that is required.
Past observations of global solar radiation \(\text{MJ}m^{- 2}\) (including both direct and indirect radiation) are extracted for each BOM site from the SILO database. In addition, grid point data of interpolated global solar radiation \(\text{MJ}m^{- 2}\) estimates are obtained for use in the 2-dimensional downscaling setting at a resolution of \(0.05{^\circ} \times 0.05{^\circ}\) or approximately 5 km grid square 13[]. The time range available for this data spans from 1859 up until recent observations, hence historical GCM outputs will need to be aligned with observational data for the time range between 1950 and 2006. There is also the advantage of having more current observational data which could be leveraged for assessment of uncertainty in the resultant downscaling for projections since 2006. 1-dimensional observations are extracted and paired with the daily sequences of GCM outputs for the setting where a 1-dimensional output from the model is required. In addition, a 50km bounding box is used to extract a region of \(9\ \times 9\) grid points around each site of interest which is intended for use in the 2-dimensional downscaling setting.
In order to reduce the impact of extreme values in different measures, all data will need to be normalised prior to training the models, a min-max normalisation is applied.
\[\mathrm{\text{normalised}}\left( d_{\text{.j}} \right) = \frac{d_{\text{.j}} - \mathrm{\min}\left( d_{\text{.j}} \right)}{\mathrm{\max}\left( d_{\text{.j}} \right) - \mathrm{\min}\left( d_{\text{.j}} \right)}\]
Where \(d_{\text{.j}}\) is the \(j^{\text{th}}\) column of the input data set, in the case of 2-dimensions, normalisation of each variable occurs prior to conversion to the 2-dimensional format. It is necessary to store the minimum and maximum values for each column so as to transform values back into original units during evaluation, but also to perform normalisation during model inference on new data.