Semisupervised Clustering Approach for Pipe Failure Prediction with Imbalanced Data Set
Beig Zali, R; Latifi, M; Javadi, AA; et al.Farmani, R
Date: 22 November 2023
Article
Journal
Journal of Water Resources Planning and Management
Publisher
American Society of Civil Engineers (ASCE)
Publisher DOI
Abstract
In recent years, machine learning (ML) approaches have been used widely for water pipe condition assessment and failure prediction. These methods require a considerable amount of data from water distribution networks (WDNs). Imbalanced and missing data, whether asset or failure data, compromise a model’s prediction performance. In this ...
In recent years, machine learning (ML) approaches have been used widely for water pipe condition assessment and failure prediction. These methods require a considerable amount of data from water distribution networks (WDNs). Imbalanced and missing data, whether asset or failure data, compromise a model’s prediction performance. In this research, using only 2 years of failure data in a real WDN, three ML methods—XGBoost, random forest and logistic regression—were used to prioritize asset rehabilitation. To address the issue of imbalanced data, a novel method of semisupervised clustering is proposed to leverage the domain knowledge in combination with unsupervised learning to divide the data set into homogeneous categories and enhance the classification accuracy. The introduced approach performed better than well-known data science class imbalance treatment techniques. Furthermore, analysis of the results indicated that classification evaluation metrics struggled to assess practically the effectiveness of various methods. To address this, an economic indicator is proposed to rank the pipes for rehabilitation based on their cost and likelihood of failure (LoF). Preventive maintenance using the results of an economic indicator reduces the number of failures with a small fraction of the total replacement cost. Moreover, another indicator was developed to consider the consequence of the failures and LoF simultaneously. This indicator mitigates in a cost-effective manner the flow capacity reductions in WDNs caused by failures. The results of this study provide asset managers with a powerful tool to prioritize assets for rehabilitation.
Engineering
Faculty of Environment, Science and Economy
Item views 0
Full item downloads 0