知识与强化学习融合的气电两用热水器需求响应优化

杨晓坤; 燕凯; 张烁; 刘岩; 袁瑞铭; 郑小平

杨晓坤,燕凯,张烁,刘岩,袁瑞铭,郑小平.知识与强化学习融合的气电两用热水器需求响应优化[J].电测与仪表,2025,62(5):208-217.
yangxiaokun,yankai,zhangshuo,liuyan,yuanruiming,zhengxiaoping.Demand response optimization of gas-electric water heater based on fusing knowledge and reinforcement learning[J].Electrical Measurement & Instrumentation,2025,62(5):208-217.

知识与强化学习融合的气电两用热水器需求响应优化

Demand response optimization of gas-electric water heater based on fusing knowledge and reinforcement learning

DOI：10.19753/j.issn1001-1390.2025.05.025

中文关键词: 气电两用热水器综合需求响应深度强化学习知识规则不确定性

英文关键词:gas-electric water heater, integrated demand response, deep reinforcement learning, knowledge rule, uncertainty

基金项目:国家电网总部科技项目（5400-202211163A-1-1-ZN）

作者中文名	作者英文名	单位
杨晓坤	yangxiaokun	国网冀北电力有限公司营销服务中心(资金集约中心、计量中心）
燕凯	yankai	国网冀北电力有限公司营销服务中心(资金集约中心、计量中心）
张烁	zhangshuo	国网冀北电力有限公司营销服务中心(资金集约中心、计量中心）
刘岩	liuyan	国网冀北电力有限公司营销服务中心(资金集约中心、计量中心）
袁瑞铭	yuanruiming	国网冀北电力有限公司营销服务中心(资金集约中心、计量中心）
郑小平	zhengxiaoping	威胜集团有限公司

摘要点击次数: 344

中文摘要:

气电两用热水设备(gas-electric water heater, GEWH)是重要的综合需求响应(integrated demand response, IDR)资源，设备自身及其运行环境的不确定性，要求IDR优化策略具备快速自动适应能力。文中研究知识和深度强化学习融合的解决方法。提出了物理系统、设备模型、优化策略自动联动的优化框架；构建了设备IDR优化的知识规则；设计了融入知识的DQN(deep Q-learning)的优化模型，包括DQN常规要素、优化知识作用于回报的方法、知识作用深度与概率的控制机制。通过算例证明所提方法具有自动适应来自设备自身及其运行环境的不确定性，并寻优到最优解的能力；与电能热水器需求响应相比，GEWH IDR能降低能源成本18.7%；该方法收敛速度是标准DQN的5倍，能为大规模实施GEWH IDR提供优化策略参照。

英文摘要:

Gas-electric water heater (GEWH) is an important load type for integrated demand response (IDR), in which the IDR optimization strategy is required of fast self-adaptiveness to conquer uncertainties in the load itself and in operation environment of GEWH. This paper investigates a solution that integrates knowledge in deep reinforcement learning (DRL) method. We establish an optimization structure that couples the physical device, device model and optimization strategy automatically. We set up rule-based knowledge for IDR optimization. Furthermore, we design a DQN(deep Q-learning)-based optimization model with knowledge integration including common features of DQN, the method that optimization knowledge works in reward function and the control mechanism that coordinates the depthand probability of knowledge participation. Our case studies show that the proposed method is able to automatically adapt to the uncertainties in GEWH load and its working environment and converge to the optimal solution. Compared with the demand response of electric water heater, IDR for GEWH reduces the energy cost by 18.7%. Moreover, the proposed method outperforms standard DQN by five times the convergence rate, which provides references for large-scale IDR optimization implementation for GEWH.

查看全文查看/发表评论下载PDF阅读器

关闭