Uncertainty Quantification for Numerical Models with Two Regions of Solution

Kimpton, L

Abstract

Complex numerical models and simulators are essential for representing real life physical systems so that we can make predictions and get a better understanding of the systems themselves. For certain models, the outputs can behave very differently for some input parameters as compared with others, and hence, we end up with distinct ...

Complex numerical models and simulators are essential for representing real life physical systems so that we can make predictions and get a better understanding of the systems themselves. For certain models, the outputs can behave very differently for some input parameters as compared with others, and hence, we end up with distinct bounded regions in the input space. The aim of this thesis is to develop methods for uncertainty quantification for such models. Emulators act as `black box' functions to statistically represent the relationships between complex simulator inputs and outputs. It is important not to assume continuity across the output space as there may be discontinuities between the distinct regions. Therefore, it is not possible to use one single Gaussian process emulator (GP) for the entire model. Further, model outputs can take any form and can be either qualitative or quantitative. For example, there may be computer code for a complex model that fails to run for certain input values. In such an example, the output data would correspond to separate binary outcomes of either `runs' or`fails to run'. Classification methods can be used to split the input space into separate regions according to their associated outputs. Existing classification methods include logistic regression, which models the probability of being classified into one of two regions. However, to make classification predictions we often draw from an independent Bernoulli distribution (0 represents one region and 1 represents the other), meaning that a distance relationship is lost from the independent draws, and so can result in many misclassifications. The first section of this thesis presents a new method for classification, where the model outputs are given distinct classifying labels, which are modelled using a latent Gaussian process. The latent variable is estimated using MCMC sampling, a unique likelihood and distinct prior specifications. The classifier is then verified by calculating a misclassification rate across the input space. By modelling the labels using a latent GP, the major problems associated with logistic regression are avoided. The novel method is applied to a range of examples, including a motivating example which models the hormones associated with the reproductive system in mammals. The two labelled outputs are high and low rates of reproduction. The remainder of this thesis looks into developing a correlated Bernoulli process to solve the independent drawing problems found when using logistic regression. If simulating chains or fields of 0’s and 1’s, it is hard to control the ‘stickiness’ of like symbols. Presented here is a novel approach for a correlated Bernoulli process to create chains of 0’s and 1’s, for which like symbols cluster together. The structure is used from de Bruijn Graphs - a directed graph, where given a set of symbols, V, and a ‘word’ length, m, the nodes of the graph consist of all possible sequences of V of length m. De Bruijn Graphs are a generalisation of Markov chains, where the ‘word’ length controls the number of states that each individual state is dependent on. This increases correlation over a wider area. A de Bruijn process is defined along with run length properties and inference. Ways of expanding this process to higher dimensions are also presented.

Uncertainty Quantification for Numerical Models with Two Regions of Solution

Doctoral Theses

Doctoral College