Data Driven Models of Blockage Likelihood in the Wastewater Network
Bailey, James Richard
Thesis or dissertation
University of Exeter
Blockages are a major problem for Water and Sewerage Companies (WaSCs), impacting on customers and the environment through flooding and pollution incidents. Proactive maintenance aims to reduce this impact by identifying issues and clearing them before there is any impact. Given the large size of the networks, accurate predictions of blockage likelihood are required for this maintenance to be cost-effective. Data mining has the potential to provide these predictions by finding patterns in large datasets. This work presents the novel application of these techniques to the datasets on incidents and assets covering the whole region of a WaSC. The work also contributes an investigation of an input feature formed from a sewer's blockage history and application to real-world data of the techniques decision trees and ensembles methods. Initially, decision trees were used to produce models at a sewer and area level. General models for the network and for the different causes of blockages were developed. The models are of reasonable accuracy, give a blockage likelihood output and understanding of the important variables relating to blockages. The sewer level models had improved area under the ROC curve (AUC) and gave greater spatial resolution than the area level models. Therefore these were developed using both ensemble techniques and experiments which evaluated the effect of an input feature based on a sewer's blockage history. The historical input feature improved performance, particularly for those sewers most likely to be proactively maintained. Finally, the best performing models were validated using a further dataset of incidents and survey results. The model outputs combined with the historical blockage rate showed good performance for both blockages and flooding incidents on the unseen dataset. Overall, decision trees gave accurate models on this real-world data and informed which factors influence blockages. Good accuracy was achieved using models including the sewer characteristics, property information and blockage history. These outputs, validated using the further dataset of incidents, demonstrate the performance of these data mining techniques on real-world data.