Uncertainty quantification for spatial field data using expensive computer models: refocussed Bayesian calibration with optimal projection
Salter, James Martin
Thesis or dissertation
University of Exeter
In this thesis, we present novel methodology for emulating and calibrating computer models with high-dimensional output. Computer models for complex physical systems, such as climate, are typically expensive and time-consuming to run. Due to this inability to run computer models efficiently, statistical models ('emulators') are used as fast approximations of the computer model, fitted based on a small number of runs of the expensive model, allowing more of the input parameter space to be explored. Common choices for emulators are regressions and Gaussian processes. The input parameters of the computer model that lead to output most consistent with the observations of the real-world system are generally unknown, hence computer models require careful tuning. Bayesian calibration and history matching are two methods that can be combined with emulators to search for the best input parameter setting of the computer model (calibration), or remove regions of parameter space unlikely to give output consistent with the observations, if the computer model were to be run at these settings (history matching). When calibrating computer models, it has been argued that fitting regression emulators is sufficient, due to the large, sparsely-sampled input space. We examine this for a range of examples with different features and input dimensions, and find that fitting a correlated residual term in the emulator is beneficial, in terms of more accurately removing regions of the input space, and identifying parameter settings that give output consistent with the observations. We demonstrate and advocate for multi-wave history matching followed by calibration for tuning. In order to emulate computer models with large spatial output, projection onto a low-dimensional basis is commonly used. The standard accepted method for selecting a basis is to use n runs of the computer model to compute principal components via the singular value decomposition (the SVD basis), with the coefficients given by this projection emulated. We show that when the n runs used to define the basis do not contain important patterns found in the real-world observations of the spatial field, linear combinations of the SVD basis vectors will not generally be able to represent these observations. Therefore, the results of a calibration exercise are meaningless, as we converge to incorrect parameter settings, likely assigning zero posterior probability to the correct region of input space. We show that the inadequacy of the SVD basis is very common and present in every climate model field we looked at. We develop a method for combining important patterns from the observations with signal from the model runs, developing a calibration-optimal rotation of the SVD basis that allows a search of the output space for fields consistent with the observations. We illustrate this method by performing two iterations of history matching on a climate model, CanAM4. We develop a method for beginning to assess model discrepancy for climate models, where modellers would first like to see whether the model can achieve certain accuracy, before allowing specific model structural errors to be accounted for. We show that calibrating using the basis coefficients often leads to poor results, with fields consistent with the observations ruled out in history matching. We develop a method for adjusting for basis projection when history matching, so that an efficient and more accurate implausibility bound can be derived that is consistent with history matching using the computationally prohibitive spatial field.
The majority of the results of Chapter 3 have appeared in the following: Salter, James M., and Daniel Williamson. "A comparison of statistical emulation methodologies for multiwave calibration of environmental models." Environmetrics 27.8 (2016): 507-523.
PhD in Mathematics