Deep Reinforcement Learning for Optimal Hydropower Reservoir Operation
Xu, W; Meng, F; Guo, W; et al.Li, X; Fu, G
Date: 21 May 2021
Journal
Journal of Water Resources Planning and Management
Publisher
American Society of Civil Engineers (ASCE)
Publisher DOI
Abstract
Optimal operation of hydropower reservoir systems is a classical optimization problem of high dimensionality and stochastic nature. A key challenge lies in improving the interpretability of operation strategies, i.e., the cause-effect relationship between system outputs (or actions) and contributing variables such as states and inputs. ...
Optimal operation of hydropower reservoir systems is a classical optimization problem of high dimensionality and stochastic nature. A key challenge lies in improving the interpretability of operation strategies, i.e., the cause-effect relationship between system outputs (or actions) and contributing variables such as states and inputs. Here we report for the first time a new Deep Reinforcement Learning (DRL) framework for optimal operation of reservoir systems based on Deep Q-Networks (DQN), which provides a significant advance in understanding the performance of optimal operations. DQN combines Q-learning and two deep ANN networks and acts as the agent to interact with the reservoir system through learning its states and providing actions. Three knowledge forms of learning considering the states, actions and rewards are constructed to improve the interpretability of operation strategies. The impacts of these knowledge forms and DRL learning parameters on operation performance are analysed. The DRL framework is tested on the Huanren hydropower system in China, using 400-year synthetic flow data for training and 30-year observed flow data for verification. The discretization levels of reservoir water level and energy output yield contrasting effects: finer discretization of water level improves performance in terms of annual hydropower generated and hydropower production reliability; however, finer discretization of hydropower production can reduce search efficiency and thus resulting DRL performance. Compared with benchmark algorithms including dynamic programming, stochastic dynamic programming, and decision tree, the proposed DRL approach can effectively factor in future inflow uncertainties when deciding optimal operations and generate markedly higher hydropower. This study provides new knowledge on the performance of DRL in the context of hydropower system characteristics and data input features, and shows promise of potentially being implemented in practice to derive operation policies that can be automatically updated by learning on new data.
Engineering
Faculty of Environment, Science and Economy
Item views 0
Full item downloads 0