Modelling mould growth in domestic environments using relative humidity and temperature

Damp and high levels of relative humidity (RH), typically above 70 – 80%, are known to provide mould-favourable conditions. Exposure to indoor mould contamination has been associated with an increased risk of developing and/or exacerbating a range of allergic and non-allergic diseases. The VTT model is a mathematical model of indoor mould growth that was developed based on surface readings of RH and temperature on wood in a controlled laboratory chamber. The model provides a mould index based on the environmental readings. We test the generalisability of this laboratory-based model to less-controlled domestic environments across different values of model parameters. Mould indices were generated using objective measurements of RH and temperature in the air, taken from sensors in a domestic setting every 3 – 5 min over 1 year in the living room and bedroom across 219 homes. Mould indices were assessed against self-reports from occupants regarding the presence of visible mould growth and mouldy odour in the home. Logistic regression provided evidence for relationships between mould indices and occupant responses. Mould indices were most successful at predicting occupant responses when the model parameters encouraged higher vulnerability to mould growth compared with the original VTT model. A lower critical RH level, above which mould grows, a higher sensitivity, and larger increases in the mould index all consistently increased performance. Using moment-to-moment time-series data for temperature and RH, the model and its developments could help inform smart monitoring or control of RH, for example to counter risks associated with reduced ventilation in energy efficient homes.


Introduction
Eighty million people in Europe live in dwellings with indoor mould contamination [1], with 15% of homes in the UK affected [2].In the UK, around 31% of home have been found to have humidity problems (48% of whom do not use air conditioning, fans or dehumidifier) [3,4], which increases the risk of condensation.Furthermore, 23% of UK homes have been found to have visible mould and 13% had a mouldy odour [3].
Mould spores are ubiquitous in outdoor environments [5][6][7].Outdoor concentrations of mould spores vary seasonally and can influence the indoor environment [8].The presence of indoor dampness (caused by water ingress, rising damp and condensation) can lead to increased mould contamination [9].The extent of indoor dampness and resultant mould contamination increases in homes that are suffering fuel poverty [10].Housing interventions to alleviate risk of fuel poverty (energy efficiency measures) can also suffer from condensation and mould contamination, unless there is adequate ventilation and heating [11,12].
All of these influences are important to consider because the presence of mould or a mouldy odour has an adverse effect on health for a wide range of conditions [13,14], including the development and/or exacerbation of a range of allergic and non-allergenic diseases [1,[15][16][17][18][19][20][21].The resultant impact on health depends on the timing and extent of exposure to a range of physical, chemical and biological agents [9,22] and the variable risk of allergic diseases throughout the life course [23].
Mould growth, including its impact on health, is further modified by a complex interaction between diverse built environment and resident characteristics.These include interactions between built age, build type, architectural design, building materials, geographic location and resident behaviours such as variable heating, ventilation and maintenance patterns [9].
Mould growth is dependent on the material and species [7,[24][25][26][27], modified by duration of exposure to humidity [28], and some species can survive in very dry environments [29].However, the optimum fungal growth conditions require high levels of relative humidity (RH), typically above 70-80%.Consequently, mould prediction models are influenced by a range of factors such as fungal diffusion, fungal production and available nutrients [30].However, to our knowledge, all models incorporate levels of RH and temperature.
The VTT model of mould growth is a deterministic dynamic mathematical model that predicts mould growth from surface RH and temperature [31,32].The model is reported to be one of the most used (e.g., Refs.[33][34][35]).For example, it has been used to test different methods of insulation in a heritage building [36] and to show no increased risk of mould growth in a house designed to have low energy usage [37].
Other commonly used models of mould growth, based on RH and temperature, include the use of isopleth curves, which separate favourable and unfavourable RH and temperature steady-state conditions for mould growth [33,38,39].Isopleth systems are often used in conjunction with different mould types and building materials to provide biohygrothermal models, which can also account for fluctuation in the conditions over time by taking into account effect of current conditions on mould spores [5,[40][41][42][43].They have, for example, been applied to the study of mould growth on different building facades [44].
Comparisons of such models with the VTT model show some differences in mould growth predictions, in part due to different behaviours under the same starting conditions, and under conditions that fluctuate or are unfavourable for mould growth, but there is also a strong positive relationship between the outputs for the two types of models [42,45,46].Specifically, the VTT model allows a decline in the mould level under unfavourable conditions, has a smaller growth rate when there is a low mould level, includes a maximum mould level, and to some extent takes into account duration of mould-favourable conditions [46][47][48].More generally, differences between mould growth models are attributable to the complexity of the mould germination and growth processes combined with the assumptions and simplifications in each type of model [33].These studies and others [e.g., 35] suggest that unreliability under fluctuating conditions is due to limited experimental data when conditions are unfavourable for mould growth, and that more measurement research is required.
Models of mould are generally based on specific surface conditions, such as RH, temperature and material, because mould growth depends on these conditions at the location of the growth.These surface conditions can be affected by building construction, thermal conductance, temperature differences across the surface, heating and ventilation levels, and occupant behaviours [9,43].The surface conditions used to develop and define models need to be specified within a certain set of parameters and ranges of values.However, these settings may not always capture the range of conditions found in a less controlled domestic setting, with noise introduced by influences of occupant behaviour, external conditions and building type.
The purpose of the current study was to test whether RH and temperature under these changeable domestic environments can be used to predict observed mould presence.The ambient measurements we use were taken from indoor sensors to capture the influences of build type, external conditions and occupant behaviours on RH and temperature.If the model is successful in predicting mould presence in the home, it would provide a basis for application to domestic monitoring in realtime, to allow interactive control of the current ambient conditions.
Being the first study to model ambient conditions, we focus on one model in order to test the feasibility of generalisation to air measurements.We chose to use the VTT model of mould growth because it is suitable for time-series RH and temperature input data.It captures the influence of historical conditions on the current mould level prediction.
It also has flexibility in the rate of change for different mould levels and different durations of unfavourable growth conditions, as well as allowing for decline in growth.The previous work that has compared results from the VTT and other types of model, summarised earlier, showed broadly similar predictions [42,45,46].It also provides a dynamic model that would allow the future development of real-time smart control of domestic air conditions.
Air readings of RH have previously resulted in underestimates of the mould level from the VTT model, partly due to practical limits in obtaining an equilibrium between air and surface RH within the test chamber especially when conditions are fluctuating [33].In a domestic setting, surface readings of RH that result in mould growth (model default of 80%) often arise from condensation or water intake [9].Levels of 80% and above can be sustained on a surface by, for example the presence of standing water, although different types of surface material will retain varying levels of RH [30].It seems less likely that such high moisture levels would be sustained in air in a domestic environment.Therefore, we speculate that the original VTT model is likely to be less sensitive to air RH than to the surface RH for which it was developed.
We test the generalisability of the VTT model to heterogeneous timeseries measurements from domestic ambient environments.Overall, we wished to identify parameter values for the mould model that provide the best performance in predicting observed mould in the home, by comparing performance across different sets of parameter values.
This unique study utilises objective sensor data from one of the largest projects of its type.To our knowledge, the current study is the first to use time-series sensor measurements of domestic air temperature and RH across a large sample size and a long duration with an application to mould contamination.Most previous studies examining indoor domestic temperature or humidity have a smaller sample size [49] or a shorter monitoring duration than the current study [50][51][52], or both [39].We are aware of only one study of a similar size, which logged temperature, although not RH, every 20 min over one year in more than 600 homes, to investigate factors associated with unhealthy temperatures [53].
In the next section we present the details of the VTT model.Section 3 provides the study background and describes the data.The methods, including model parameter values and analysis methods, are presented in Section 4. Section presents the results and evaluation of the model performance.In Section 6 we discuss the findings and implications, and present the limitations together with suggestions for future work and application.

The VTT model
The VTT model was developed using surface readings of RH and temperature on wood in a controlled laboratory chamber [54,55].The model calculates a mould index, on a scale of 0-6, that represents different levels of growth from no growth to visually detected coverage of 100%.It is described by the equations [31,32]: when RH(t) ≥ RH crit (t) T. Menneer et al.
where dM dt determines the rate of change of the mould index, M = M(t), at each time-point t, per 24 h, T = T(t) is the temperature at time t, RH = RH(t) is the relative humidity at time t, RH >20 is a constant of 80% or 85% depending on the sensitivity level (see Table 1), p T = 0.68, p RH = 13.9, p C = 66.02,W and SQ are additional design parameters representing the wood type (0 = pine or 1 = spruce) and the wood surface quality (0 = sawn surface, 1 = kiln dried quality), C decline is a constant to adjust the rate of decline for M when RH is below the critical level, k 1 and k 2 moderate dM dt depending on M and the maximum mould level, M max , with their parameters k 11 , k 12 and A, B and C determined by the sensitivity level (Table 1). 1  The direction of change depends on the current RH, at time t, RH(t).If RH(t) is equal to or greater than the critical RH value, RH crit (t), then dM dt is positive, otherwise dM dt is zero or negative.The critical relative humidity at time t, RH crit = RH crit (t), is determined by Equation (1).When the temperature is above 20 When RH(t) ≥ RH crit (t), dM dt is positive, and is dependent on the current RH, the current temperature, wood-type and surface quality, as determined by Equation (2).The values in this equation are from the original VTT model [31], and represents the change in M across 24 h.The values were determined from regression equations predicting the time for mould growth to develop and to become visible under different RH levels and temperatures [54,55].
When RH(t) < RH crit (t), dM dt is zero or negative, according to Equation (3), although the process of decline is dependent on the type of surface.For wood surfaces, and in the original VTT model, the decline rate depends on the length of time for which RH < RH crit .If RH < RH crit holds for up to 6 h then M declines at the default rate, between 6 and h there is no decline, and for more than 24 h M declines at half the default rate.These timespans were determined in laboratory-based experiments [55] in order to account for a delay in mould growth after a period of unfavourable conditions [56].For non-wood surfaces, M declines at a constant rate [32], which is set to the first setting for wood surfaces for the current study (− 0.032 per 24 h).The rate of decline is multiplied by a constant, C decline , that represents the intensity of the decline on different materials, and ranges from 0.1 to 1 [32,57,58].
The model also includes a sensitivity parameter, which can be set to one of four levels from 'resistant to mould growth' to 'very sensitive' [Viitanen, Ojanen, et al., 2011, as 4)).k 2 moderates dM dt according to the current value of M relative to the maximum level of M, M max , in order to prevent M exceeding M max (Equation ( 5)).M max is determined by Equation (6), using A, B and C values for the sensitivity level.
Berger et al. [34] showed that the sensitivity parameters in the VTT model should be identifiable by minimising the error between observed and predicted mould indices given known RH and temperature levels.However, in aiming to identify the model sensitivity parameter values using empirical data from a new material (bamboo fibreboard), they concluded that the model parameter values need to be expanded to include a higher level of sensitivity for generalisation to new materials.They also revealed large differences in results in response to small changes in the constant term (i.e., p C = 66.02 in Equation ( 2)).They proposed a simplified model, in which the number of values required to prevent M exceeding the maximum is reduced, and the constant term in the model differential equation is removed.The adjusted model provided a better fit to the experimental data than the original VTT model.However, it was sensitive to the initial start value of M, and the authors note that further experimentation is required to establish an appropriate range of parameter values under different RH and temperature conditions.

Data
This novel study utilises a unique set of validated cross-sectional survey and time-series sensor data from over 300 homes recruited into the Smartline project [60][61][62][63][64][65].The homes are owned and managed by Coastline Housing, a medium-sized not-for-profit housing association located in Cornwall, South West of England.Ethics approval was granted by the University of Exeter's Research Ethics Committee.
Surveys were conducted with 330 participants, face-to-face in their homes and included a range of topics such as resident behaviours, and health and wellbeing.For the current study, we used the self-report regarding the presence of mould and a mouldy odour in the home,

Table 1
Constants for each sensitivity level [32]., which takes into account differences in time to reach different levels of mould for pine versus other materials [32], so is not applicable to the current study in which material is unknown.In other versions k 12 is a constant [34], as in Table 1.
which has been used in prior studies in a social housing setting [10,66].The survey questions were "Does your home have visible mould patches?" and "Has your home suffered from a mouldy/musty odour in last 12 months?",each with response options of "Yes", "No" or "Not answered".Sensors in 280 Smartline homes, in the living room and main bedroom, provide objective measurements of ambient RH and temperature, with a maximum frequency of every 3 min, from October 2017 to August 2022.Sensors were installed by the Blue Flame company [67], and were ISL 067 radio ultra-RF (reference: QC0160) manufactured by Invisible Systems Limited [68], with an accuracy of ±0.5 • C and ±0.7% RH.
Fig. 2 presents the sensor data from the bedroom in one home.In the first set of models (using the first parameter space, P1, described below), the date range was limited to 21 st August 2018 to 6 th December 2018.This range was chosen in order to maximise the data we had available at the time of model implementation.For the second set of models (using P2), the date range was 1 st March 2018 to 28 th February 2019 to capture a full year.
Given some variation in the time interval between measurements and some periods of missing data due to interruptions to sensor power, data were linearly interpolated to every minute.Mean average readings were then calculated over regular intervals of 5 min in order to capture the resolution and detail available in the data.Data at 5-min intervals require sensor and processing costs that are potentially beyond future practical real-time monitoring applications of the model.We therefore also included a resampling interval of 60 min to test whether the higher resolution is necessary or both intervals perform equally well.

Overview
The problem can be framed as an inverse problem in which we wish to determine VTT model parameter values that map from the time-series RH and temperature sensor data to the observed survey response about mould or a mouldy odour.The process is summarised in Fig. 3.The input data are time-series RH and temperature sensor data and the observed data are the survey responses, taken at a single time-point.The error to be minimised is between the predicted presence of mould or mould odour and the observed responses.
Outputs from the mould model comprise a time-series of mould indices, one for each time-point in the input data.The mean mould index was calculated to provide an overall single mould level for comparison with the binary observed response.The strength of the relationship between the overall mould level and the observed response was assessed with regression analysis.Measures of accuracy were assessed for parameter value sets that gave rise to a significant relationship.
The following sections detail the mould model parameters and outputs, analyses, and measures of accuracy performance.

Model parameter values
Model parameters values were manipulated in two stages in order to search a feasible number of combinations at each stage.In the first parameter space, P1, we searched a wide grid of parameter values, and used the findings to inform finer-grained selected ranges for the second parameter space, P2.Values for P1 and P2 are presented in Table 2.In addition to capturing a reasonable range, values were chosen for reasons specific to the parameter, described in the subsections below.When using P2 we aimed to produce a stable mould level, with details provided below in Section 4.3, we therefore focussed on parameters that affect the model's sensitivity to RH and the nature of the change in M and did not manipulate those parameters that could be considered growth rate moderators.
As reasoned in the Introduction, we expected the VTT model with original parameter values to be less sensitive to air RH than to the surface RH for which it was developed.Accordingly, we chose to manipulate parameters in such a way to increase the model's sensitivity to RH levels, and encourage increases in M.

Starting value for M
In the absence of other information, the mould level within each home was assumed to be zero.

Sensitivity
A new sensitivity level was introduced that comprised values twice that of the values of the highest existing level in the original model, in line with previous research reviewed earlier [34].Sensitivity levels therefore included 'Very' and 'VeryX2'.

Default RH crit
A range of values were used for the default RH crit , which is 80% in the original model, from 40% to 80%.We chose these lower and upper limits to reflect the possible range of RH values within the cohort of homes measured.Equation ( 1) was adjusted such that the function dropped from 100% to the appropriate default RH crit value, rather than  to 80%.

Coefficients for T and RH, and the constant
From Equation ( 2), the values p T , p RH and p C were manipulated.(Values in the original publication of the VTT model were 0.68, 13.9 and 66.02 respectively [31].)The values used were the original values, and original values plus and minus half of the original value to provide three levels for p T , p RH and p C .We chose these values to test the original values against a 50% change in each direction, which seemed sufficient to capture a wide range of values but not so extreme as to be departing from the original model.
It is important to note that the three values were chosen to be of the same level (low, medium or high) within the same model.We therefore avoided very small or very large changes in M that can occur when the parameters are from different levels, as shown in Fig. 4, which are not particularly meaningful.Changes in M are a reasonable magnitude when the three parameters are at the same level (e.g., 0.0851 when low, and 0.0006 when high).

Decline method and C decline
C decline ranges from 0.1 to 1 in the original VTT model.These extremes were used to capture the largest difference between values, with the aim of observing the largest effect of the manipulation.

Mould index stabilisation
For P1, the sensor data were presented to the model once.With P2, we used a full year of input data, which captures seasonal fluctuations and allows a representation of the internal conditions of the property to build up over time.We re-presented the RH and temperature sensor data to the model until convergence on a stable mould index (M) was achieved.Convergence was tested by comparing dM dt values for the entire cycle with zero using a one-sample t-test.The model was terminated when p > 0.05, or if convergence was not achieved by the 30th cycle.
Convergence would be expected when RH fluctuates above and below RH crit , causing increase and decline in M. Convergence is not driven by the input sensor data because they are the same in each cycle.Instead convergence is driven by the value of M. As M increases, k 1 increases when M ≥ 1, and k 2 decreases as M approaches M max .The decrease in k 2 allows M to stabilise rather than continuing to increase.However, there will be circumstances under which M will not reach stability, for example if the RH is always above RH crit .

Model outputs
For each home, the RH and temperature time-series data were input to the model to calculate M(t) for each time-point, t, for the living room and the bedroom.In order to test the relationship between multiple M(t) and the single observed survey response about the presence of mould or odour, we used the mean M(t) to capture a single overall mould level for the home across the date range, , where N is the number of time-points (t).For P2, the mean was calculated over the last cycle.M m was calculated for each home and for each combination of parameter values.Each M m was divided by the maximum value across all homes for each room and combination of parameter values, in order to use a full range of values and maximise the spread used in the analysis below.M m values were also scales from 0 to 6 for consistency with the original VTT model.

Analyses
Logistic regression was used to test for a relationship between M m and the observed response about mould or mouldy odour, with M m as the predictor and the response as the outcome.M m was a continuous variable (0-6), and the response was a binary variable with "No" coded as 0 and "Yes" as 1.
A separate regression was conducted for M m values calculated from living room and bedroom sensor data, for each survey question (observations of mould and of mouldy odour), and for each combination of parameter levels, resulting in 2 × 2 × 144 = 576 regressions for P1 and 2 × 2 × 36 = 144 regressions for P2.
Homes were excluded from the analyses if any survey responses relating to mould were missing (22 homes of the 330 that responded to the survey), 2 or the sensor data did not span the entire date range in both the living room and the bedroom (up to 107 homes of the 280 with sensors installed).There were 213 homes included in each regression for P1, and 158 at most for P2, with a total of 219 3 households across the two analyses.

Prediction accuracy measures
We will consider accuracy measures for parameter sets that produced a significant relationship (p < 0.05) in the regression between M m and the observed response.For direct comparison with the observed data, the output from the regression was categorised as Y or N as follows.The regression equation was used to predict the probability of a Y response given M m .The probability was categorised by a threshold.This threshold was determined by maximising the overall (i.e., balanced)

Fig. 4. Simulated dM dt using Equation (2) for different levels (low, medium, high) of p T , p RH and p C , with the other two values held at a constant level (low, medium, high). For example the fourth set of bars is labelled "low, other values medium" and the black bar represents dM
dt when p T is low and p RH and p C are at the medium level.Temperature is 20 • C and RH is 70% and is assumed to be above RH crit . 2 16 homes were excluded due to missing responses for mould and odour specifically, and 6 further homes were removed due to missing responses regarding heating or ventilation habits, mould or odour in specific rooms, mould size, or mould affecting health.These latter homes were excluded for consistency in anticipation of future model developments. 3At the time of analyses for P1 sensor data were missing from the system for 6 homes that were recouped for P2 analyses.
T. Menneer et al. accuracy for the model.We report the true positive rate and the true negative rate, which are combined to give the accuracy balanced across the number of Y and N survey responses.We also report the precision, which is the proportion of Y predictions that are true Ys.We use F1 to assess performance, which combines the true positive rate and the precision, thereby accounting for observed Ys that are missed as well as correct predicted Ys, to provide the overall accuracy of the Y predictions.Tables 3 and 4 present the formulae.
Chance-level balanced accuracy is 0.5, given that the TPR and TNR are weighted by the number of Y and N observed responses.Chance level for F1 is not 0.5, because the numbers of Y and N observed responses differ.Chance-level values for F1 were therefore estimated by calculating the measures for the shuffled set of predicted probabilities, and selecting the Y/N threshold that maximised F1.This process was conducted 200 times and the mean F1 was taken as the chance-level.The standard deviation of the 200 F1s was used to measure how far the F1 for the regression predictions fell from chance-level F1.
It is worth noting that the chance-level F1 is based on a Y/N threshold to maximise F1, while the F1 reported for the model is based on the Y/N threshold that maximises balanced accuracy, to provide consistent Y/N predictions from the regression across all performance measures.

Household characteristics
Of the 330 Smartline households, 320 responded to each question "Does your home have visible mould patches?" and "Has your home suffered from a mouldy/musty odour in last 12 months?",with 43.8% responding "Yes" for mould and 17.8% for odour.Of the 219 homes that were included in the analyses, and had therefore responded to both questions, 47.8% reported mould and 17.2% reported odour.
These rates of mould and odour are higher than UK rates (23% and 13% respectively).However, the Smartline rates are more similar to national rates when examining only homes rented from the local authority or a housing association (32% and 22% respectively) [4].
Descriptive statistics for other survey responses are provided in Table 5.

Parameter space P1
The p-values from the regression analyses provide a measure of the strength of the relationship between M m and the observed response.The   p-values are provided in Fig. 5, split by different parameter values, by models based on sensor data from the living room (LR) and the bedroom (BR), and by survey responses about mould and odour.Prediction accuracy is considered only for regression models that revealed M m as a significant predictor of the observed response (p < 0.05).
For the survey response about mould, models based on the living room sensor data provided the strongest predictors (all bedroom-data models p > 0.037).For the survey response about odour, 104 of the 288 models gave p < 0.05.Only models based on bedroom data gave p < 0.01 (living room all p > 0.013).18 models gave the lowest p-value (0.0001), with parameter values presented in Table 6.Table 7 provides the performance measures for the models that resulted in either smallest p, highest balanced accuracy, or highest F1 for each survey response.

Evaluation of P1 and refining parameter values for P2
Using the distributions of p-values (Fig. 5), the bedroom-odour regressions with p = 0.0001 (Table 6), and the best-performing regressions (Table 7), we can evaluate the differences between parameter values for each parameter in P1, as well as refine the parameter values

Fig. 5. P1 regression p-values for the observed response about mould (left panels) and odour (right panels), for M m modelled from the sensor data in the living room (LR) and the bedroom (BR), and for each parameter (panels A to F).
The dashed line is at p = 0.05.

Table 6
Combinations of parameter values in P1 that produced regression models with p = 0.0001, indicated by "Yes", using M m from bedroom sensor data to predict the response about odour.W represents the Wood decline method and NW the Non-wood method.Only parameter levels that produced a model with p = 0.0001 are included.All are for 5-min and 60-min resampling intervals except those marked with *, which are for 60-min only.for use in P2.
The results for P1 provide little evidence for differences in performance for resampling intervals of 5 min and 60 min, with similar distributions of p-values, few differences for highly significant bedroomodour regressions, and identical performance for most best-performing regressions.The one difference is that the model using 5-min data provided the highest performance for the bedroom-odour regressions.However, the equivalent 60-min model also provided the lowest p-value and the second highest performance accuracy measures.Both resampling intervals were retained for P2.
Sensitivity is largely a growth rate parameter.Performance tends to be better for the VeryX2 sensitivity than Very, with slightly lower p-values, more bedroom-odour regressions with the lowest p-value, and more regressions with the highest performance.The only uniquely high performance for Very sensitive was from a regression with near chancelevel F1.Only the VeryX2 sensitivity level was used for P2.
Change in RH crit has one of the most apparent effects on performance.Performance is higher when RH crit is lower than 80%, but the evidence is split between 40% and 60%.In P2, we therefore included more gradations in the set of RH crit levels, and retained the 80% value from the original VTT model.
Changes in p T , p RH and p C , also produced a notable effect.The medium level produced one regression with high performance for balanced accuracy, but a near chance-level F1.Otherwise, the low level consistently outperformed the medium and higher levels.Fig. 4 shows that the low level results in the largest change in M (Fig. 4: low = 0.0851, medium = 0.0072, high = 0.0006).This result from P1 is in line with the earlier argument that model performance benefits from a propensity towards mould growth when using air measurements.For P2 we retained only the low level.
The decline method showed some slightly lower distributions of p-values for Wood than Non-wood.Regression performance was similar for Wood and Non-wood, except Non-wood produced three more regressions with best performance, although one gave near chance-level F1.There is therefore mixed evidence from P1 and both decline methods were retained for P2.
For C decline , the evidence is again mixed, with higher performance from regressions using C decline of 1, but the distributions of p-values show a tendency for stronger relationships when C decline is 0.1 than 1.Like sensitivity, C decline can be considered a growth rate moderator, which is not the focus in P2 given that cycles are repeated to allow M to stabilise.We therefore decided to choose only one C decline level.Compared with the Very sensitivity level, the VeryX2 level causes a faster increase in M, and gave better performance for prediction in P1.Given a larger C decline would counter this increased rate, we chose to use the low C decline value of 0.1.

Parameter space P2
Numbers of models that did not converge after 30 cycles of the input data are given in Table 8.More models did not converge for lower default RH crit values than higher, which is perhaps not surprising due to RH falling above RH crit more frequently and therefore M continued to increase rather than stabilise.
More models failed to converge for the data resampled with the 5min interval than the 60-min, due to more dM dt values providing greater power for the convergence t-test for the 5-min interval data.When power was equated by sampling every 12th dM dt for the 5-min interval data, the numbers of models that failed to converge were similar across the two resampling intervals.Regression analyses conducted with this equated convergence criterion revealed parameters for the best performance that are similar to those reported below, but some regressions no longer reached significance for prediction of the mould response, and performance measures were worse than those reported in almost all cases.Given increased performance, we therefore report the results for the models that converged based on all dM dt values in the cycle.M m values from the models that did not converge were not included in the regression analysis.Given high numbers of models that failed to converge for the 5-min resampling interval combined with RH crit of 40% and 45%, these parameter values were not considered when assessing performance.
The regression p-values are provided in Fig. 6.As for P1, only regression models that revealed M m as a significant predictor of the survey response (p < 0.05) were considered further in terms of examining prediction accuracy.
For the mould response, M m is a stronger predictor when the generated from the living room data (panel A, Fig. 6) than when generated from the bedroom data (panel B, Fig. 6).Both panels show the strongest relationships with the default RH crit of 50% and a resampling interval of 5 min.For the living room, M m is also a significant predictor when default RH crit is 60%.
For the odour response, M m is a stronger predictor when generated from the bedroom data (panel D, Fig. 6) than when generated from the living room data (panel C, Fig. 6).For sensor data from the living room, panel C shows the strongest relationships occur when the default RH crit is 50% and 70% and when the resampling interval is 5 min.For the bedroom data, M m is a significant predictor (p < 0.05) for default RH crit values from 45% to 75%, with the strongest relationships (p = 0.003) occurring for 70%.
Table 9 provides the performance measures for the models that resulted in either smallest p, highest balanced accuracy, or highest F1, for each survey response.

Reliability of P2 parameter values
To test the reliability of the parameter values revealed by the main P2 analyses (above), we repeated analyses using M m calculated from M values in the last day only (28 th February 2019) and from the final cycle of the RH and temperature input data.
Table 10 provides the models that produced the lowest p-value, the highest balanced accuracy or highest F1.The parameter values for these models are the same as those for the main analysis of P2, except that one model produced the joint-lowest p-value in the main analyses, but produced the second-lowest value here.Balanced accuracy and F1 values were very similar to those from the main analysis.
This analysis used a subset of data used in the main analyses.Future reliability testing is therefore recommended with new datasets, as discussed in Section 6.

Evaluation of P2
For P2, sensitivity, p RH , p T , p C and C decline were held constant (see Table 2).Results for the manipulated parameters are evaluated using the p-values (Fig. 6) and best performing regressions (Table 9), also considering the number of models that failed to converge on a stable value for M (Table 8).
Despite resulting in more models that did not converge, a resampling interval of 5 min generally gave lower p-values and more best performing regressions than 60 min.These results could suggest that differences between 5-min and 60-min performance are, at least in part, due to more cycles being completed by the models using 5-min data than 60-min (M = 10.1 and 6.2 cycles, respectively).However, in the regressions conducted with the equated convergence criterion, as outlined earlier, the 5-min models still outperformed the 60-min models despite similar number of cycles (M = 6.1 and 6.2 cycles respectively).
For the default RH crit , 50% clearly provides most of the strongest relationships and the best prediction performance for both observed responses.70% also provides the strongest relationship for the response about odour, but the equivalent regression with the 50% model also has a strong relationship and outperforms the 70% regression on every accuracy measure.
There is no consistent evidence for differences between Wood and Non-wood decline methods from p-values.However, Non-wood gave fewer models that did not converge and more best-performing models, with the only highlighted Wood model giving a chance-level F1.The decline process has been acknowledged to be a seemingly artificial method to account for the delay in mould growth after unfavourable conditions, rather the delay should ideally incorporate seasonal changes [46].The practical effect of the Non-wood over the Wood method is a faster decline given the Non-wood rate is set to the fastest Wood rate.This effect is also in line with a future need to test a larger C decline .

Results summary
Our results show that the most accurate predictions about mould and a mouldy odour were made from mould levels, M m , that were generated by a model that was conducted on data at 5-min intervals.Best performance was achieved when the model was implemented using a sensitivity higher than the highest sensitivity in the original VTT model, a default critical RH (RH crit ) of 50%, which is lower than the 80% in the original model, coefficients and constant in Equation ( 2) that promoted a greater change in the mould index, M, (low p RH , p T , p C ), and a consistent decline rate (Non-wood), rather than one that varied based on the time spent below RH crit (Wood).Further work is required to make a definitive decision about the rate of decline (C decline ), but the lowest value available in the VTT model (0.1) did support successful prediction of survey responses.

Discussion and conclusions
The aim of this study was to expand an existing model of mould growth for generalisation to air readings of RH and temperature from a domestic setting, extending from the controlled laboratory readings on which the model was originally based.Despite the environment being less controlled than the VTT model's original basis, the model can predict mould growth in this setting when the critical RH level is reduced from the 80% default value.

Table 8
The number of homes (out of 158) for which models did not converge for each combination of parameter values in P2, for the living room (first value) and the bedroom (second value).W represents the Wood decline method and NW the Non-wood method.More specifically, in order to predict mould and a mouldy odour from air measurements of RH and temperature, our findings suggest that parameters should promote vulnerability to mould growth, by increasing sensitivity, lowering the RH threshold at which mould grows, and increasing the change in M.However, there of course needs to remain an opportunity for M to decline, so the RH threshold cannot be so low that it is never reached.There is some indication that a faster decline rate could result in better performance, but more research on this parameter's space is required.
The need to increase the propensity for mould growth could be argued to arise from the starting value for M of zero.However, in P2 the repeated presentation of the RH and temperature data to the model allows M to stabilise such that any bias in the starting value would be removed.
Two possible reasons follow as to why increasing the model's propensity to predict mould improves performance.Firstly, a lower critical RH level allows mould to be predicted even if RH in the air is lower than that on a surface.As reasoned in the introduction, 80% RH and above seems more likely to be sustainable on a surface than in the air in a domestic setting.Secondly, changes in RH can affect different mould species in different ways [7], so it may be that the reported mould in our participants' homes was of a species that grows under reduced RH than typically required.
Models using parameter space P2 were based on a full year of data and had the opportunity to converge on a stable M. The highest balanced accuracy and F1 were above chance level at 0.708 and 0.694 respectively.However, there was some indication from P1 models that a higher balanced accuracy might be achieved by using a higher rate of decline.
Precision is low for the models that predict the survey response about odour, with under half the predicted positives actually being a true positive.The threshold for the Y/N response was chosen to maximise the balanced accuracy (true positive and true negative rates, TPR and TNR), rather than the precision.Given 17.2% of survey responses included in the analyses were positive regarding a mouldy odour, the balanced accuracy will increase more by correctly identifying a positive than a negative response, because the denominator for the TPR is smaller than for the TNR.The models are therefore liberal towards positive responses, as also reflected in the TPRs and TNRs, which results in reduced precision.
Prediction of mould is more successful using sensor data from the living room than from the bedroom, while prediction of the mouldy odour is more successful using sensor data from the bedroom.This pattern is consistent with the idea that a mouldy odour is indicative of high levels of mould [70], given it correlates with mould contamination [12] and is a strong predictor of health impacts [9,23,66].In the current study, the presence or not of a mouldy odour is predicted by air conditions in the bedroom.It seems likely that the bedroom would have less air circulation than the living room, because movement of people would be higher in the living room, and the bedroom may contain larger pieces of furniture, such as the wardrobe and bed, which can prevent air circulating near the walls.Less air circulation would maintain mould-favourable or unfavourable conditions within the fabric of the room, thereby leading to high levels of mould contamination that produce the mouldy odour.Variances in the RH in the living room and bedroom do not support this explanation, with the mean of descriptive statistics over all homes being mean = 59.85%,SD = 7.18% for the living room and mean = 59.86%,SD = 8.18% for the bedroom, with a correlation of r = 0.81, p < 0.001.However, our survey responses about mould or odour in specific rooms confirm higher contamination rates in the bedroom (21.6% and 7.5%) than in the living room (11.0% and 4.0%).
The main implications for our findings are for future application to determine the minimum intervention required to control RH in order to reduce predicted mould levels.In moment-to-moment time-series data,  peak changes in the model's mould index, M, can be identified, which would allow targeted reduction of RH at those points.Real-time predictions could enable smart control to provide the intervention necessary to minimise mould growth and in turn reduce its impact on human health.Smart control could also avoid unintended consequences in homes with reduced ventilation rates and its impact on the indoor environment and health [e.g., some energy efficient homes: [66,71,72]].Through such minimal intervention, human comfort could be maintained and unnecessary power expenditure avoided [e.g., Ref. [73]].Smart control systems have the capability to consider a data-driven mould index and human comfort in parallel, hence the intervention is less likely to be countered by behaviours (e.g., a fan is switched off because it is cold or noisy at night, instead a smart control system would appreciate the occupants preferences) [e.g., Ref. [74]].Smart monitoring by property owners could alleviate costs of repair associated with the presence of mould, and could be extended to monitoring for other damaging conditions such as cold or damp.Such tools would be useful to homeowners, but are likely to be more useful to housing providers who do not reside at the property, so that remote monitoring can be achieved.
The main strength of this study is the large sample size for a study of this type.The multifaceted and large dataset has allowed for verification of the model's predicted mould level, and exploration of parameter values that achieve useful levels of performance.
More specifically, strengths of our study include the use of a codesigned resident survey, which adopted previous questionnaires on a social housing population [10,66], reporting both the presence of visible mould and a mouldy odour.The study benefited from the use of objective time-series measurements across 280 homes, which have been part of the Smartline project since 2017.
However, there are a number of limitations to consider.These include the potential bias resulting from self-reported mould contamination [14].However, face-to-face questionnaires should result in less bias than remotely delivered questionnaires [12].In addition, most studies find agreement between self-report of mould presence and the results of building inspections [75], particularly for the presence of a mould odour [70].There may also be bias specifically for individuals suffering health issues that are associated with damp and mould, who Fig. 7. Mould index (black) and changes in the mould index (grey) from RH and temperature in the bedroom in one home from 1 st March 2018 to 28 th February 2019.RH crit was 50%, using the Non-wood decline method, and data were at the 5-min resampling interval.The upper panel shows values from the first presentation of the RH and temperature data, and the lower panel shows the 9th presentation at which point the changes were not significantly different from zero.may be more likely to respond to the mould-related questions, or to report mould or a mould odour [14].
Additionally, the Smartline cohort comprises an older adult population, in a relatively deprived area of England, which could influence heating and ventilation behaviour.There is also a high proportion of flats, which have a higher optimal air exchange rate than detached houses [9,76].However, these influences would be reflected in the RH and temperature data, which being time-series data, will account for fluctuations or changes in behaviour, for example in response to ventilation and heating costs.
The use of a single observation regarding mould and mouldy odour is also a limiting factor, given that the model outputs comprise a time series of transient values of M, and therefore the relationship between the model outputs and the observed data could be inconsistent over time.We therefore used M m to provide a single mean value to represent the model outputs.For P2, RH and temperature data were repeatedly presented to the model to allow a stabilisation of the mould index.Fig. 7 presents the mould index for one home during the first and last cycles, and illustrates the stabilisation of the mould index.It could be argued that stabilisation dilutes the benefit of using dynamic mould index, but it does allow current and historic influences on the mould level to be captured and verified in this first study.In future work, using the dynamic model will allow influences (e.g., dehumidifiers) on the environmental conditions to be considered, as discussed below.
Previous criticisms of the VTT model include the robustness of the parameter values and sensitivity to small changes [34] (e.g., the values we avoided in Fig. 4).While we chose the VTT model in order to test a dynamic model of mould, an alternative is an isopleth approach, which provides a fixed reference for mould growth under the different sets of conditions [38,39].Future work should develop isopleth curves for air measures using the current dataset in order to test performance against the survey responses and test the consistency of our findings.Biohygrothermal models based on such isopleth systems would also account for effects of changeable conditions on mould spores, and would therefore be a useful development in future work.
Future research should test the model's outputs against self-report of the coverage of the mould, e.g., the size of the mould patches and the number of rooms affected.In relation to bias in self-report of mould [14], further work with the model may be useful in identifying reliable indictors in the self-reported measures, which could inform future survey content for identifying mould severity.
Future work should also examine whether performance with different parameter values varies across different subsets of homes, including groups based on some of factors summarised in Table 5, such as energy ratings, property type, and ventilation habits.Reliability will also be tested with new datasets.
Our UK-based dataset prevents testing the generalisability of the model to different climates, outside air conditions, different building constructions, and heating and ventilation systems.These factors affect the indoor conditions depending on thermal conductance of building materials and indoor-to-outdoor humidity ratios [8,43,51].Useful considerations for future research would be test the robustness of model using data that takes into account outdoor conditions, or on a different dataset, either from a commercial setting or a location with different climates.

Conclusion
The results provide evidence for relationships between the model outputs and the occupant responses, showing that the adapted model can be used for predicting mould growth from these less controlled air measurements, as opposed to surface wood measurements for which the model was originally developed.Mould levels modelled using relative humidity (RH) and temperature measures from the living room were successful at predicting the presence of mould in the home, while levels modelled using measures from the bedroom were successful in pre-dicting the presence of a mouldy odour.Compared with the original VTT model, performance in predicting mould and odour from air measurements was higher with parameter values that increase the vulnerability to mould growth, including increased sensitivity, a lower RH threshold, and a larger change in the mould index (M).This study supports the adoption of indoor sensors and modelling as a way to support housing associations to identify properties and residents at risk of being exposed to indoor damp and mould contamination.In doing so, this will help housing providers to reduce the cost in remediating homes with dampness-related and mould contaminated homes.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 2 .
Fig. 2. RH and temperature in the bedroom in one home from 1 st March 2018 to 28 th February 2019, with changes in RH crit (top panel, grey line) when the temperature drops below 20 • C. For example, when the temperature drops to 10 • C on 1 st February 2019, there is an increase in RH crit .

Fig. 3 .
Fig. 3.Steps for determining mould model parameter values that produce best performance for predicting mould and a mouldy odour from domestic air measurements of RH and temperature data.

Fig. 6 .
Fig. 6.P2 regression p-values for the observed response about mould (panels A and B) and odour (panel C and D) for the living room (panels A and C) and the bedroom (panels B and D).Values are provided for each individual model, for each resampling interval (5 versus 60 min), decline method (Wood (W) versus Non-wood (NW)), and RH crit default (40 to 80%).
• C, RH crit is the default constant of RH >20 .At or below 20 • C, RH crit is a function of temperature.The resulting RH crit function is shown in Fig. 1.Fig. shows RH crit varying over time for a home with the displayed temperature and RH levels.
[32,59]n Refs.[32,59]].Increased sensitivity results in a greater increase in M when RH ≥ RH crit .Each level comprises six values, provided in Table1.One value provides RH crit for each sensitivity level.The remaining five values moderate dM dt in two ways when RH ≥ RH crit .k 1 scales dM dt by k 11 or k 12 , depending on the current level of M (Equation (

Table 2
Levels of each parameter for the model, providing 144 combinations of parameter values for P1 and 36 for P2.

Table 3
Different types of responses and the corresponding accuracy rates.

Table 4
Performance measures to assess the prediction accuracy of the regression.

Table 5
Descriptive statistics for the survey responses and building characteristics for all homes and for homes that were included in the analyses.N provides the number of homes that responded or for which information was available.

Table 7
P1 performance measures for significant (p < 0.05) regression models showing either smallest p, highest balanced accuracy, or highest F1 for the mould response and for the odour response, with the relevant metric underlined.For cases where multiple models are reported within one row, the highest chance-level F1 value is reported.W represents the Wood decline method and NW the Non-wood method.

Table 9
P2 performance measures for significant (p < 0.05) regression models showing either smallest p, highest balanced accuracy, or highest F1 for the mould response and for the odour response, with the relevant metric underlined.W represents the Wood decline method and NW the Non-wood method.

Table 10
P2 for 28 th February 2019 only.Performance measures for significant (p < 0.05) regression models showing either smallest p, highest balanced accuracy, or highest F1 for the mould response and for the odour response, with the relevant metric underlined.One model is also included that produced the joint-lowest p-value in the main analyses, but produced the second-lowest value here (row 3).