基于教学强化机制的区域扩展互联电力系统自动发电控制方法An automatic generation control method for cross-region interconnected power systems based on TR mechanism
史悦星,杨帆,李东东,邵心怡
SHI Yuexing,YANG Fan,LI Dongdong,SHAO Xinyi
摘要(Abstract):
区域扩展互联电力系统通过区域间协同调控可以提升电网频率稳定性。现有面向区域扩展互联电力系统的AGC(自动发电控制)研究中,基于数据驱动技术的学习方法未能有效利用区域扩展前的经验,存在模型训练效率不足的问题。针对此问题,提出一种基于TR-MATD3(教学强化机制与多智能体双重延迟深度确定性策略梯度)算法的AGC方法。一方面,在TR(教学强化)机制下,当电力系统区域扩展时,原区域的智能体通过点对点教学的方式对新加入区域的智能体进行指导,以加速其策略网络的收敛速度,并进一步提高算法的控制精度;另一方面,TR-MATD3算法引入双目标批判网络,解决强化学习中存在的Q值高估问题,从而提升智能体策略网络的控制性能。仿真结果表明,相较于其他算法,TR-MATD3算法能够使固定电力系统区域的|ACE|减少58%~79.68%,并使扩展区域的智能体离线训练时间减少47.13%~51.56%,|A_(CE)|减少23.33%~63.72%,表现出良好的可扩展性和控制性能。
Cross-region interconnected power systems improve grid frequency stability through coordinated interregional regulation. However, existing automatic generation control(AGC) studies fail to effectively leverage the pre-expansion operational experience in data-drive learning approaches, leading to suboptimal model training efficiency. To address this issue, this paper proposes a teaching reinforcement-based multi-agent twin delayed deep deterministic policy gradient(TR-MATD3) algorithm for AGC. This method features:(1) A Teaching Reinforcement(TR) mechanism where experienced agents provide point-to-point guidance to newly added regional agents during system expansion, accelerating policy network convergence while improving control accuracy;(2) A dual-critic network to mitigate Q-value overestimation in reinforcement learning, enhancing control performance of agent policy networks. Simulation results demonstrate that compared to other algorithms, the TR-MATD3 algorithm achieves 58%~79.68% reduction in |A_(CE)| for existing power system regions, 47.13%~51.56% decrease in offline training time, and 23.33%~63.72% improvement in |A_(CE)| performance. These metrics confirm the solution's superior scalability and control performance.
关键词(KeyWords):
互联电力系统;自动发电控制;教学强化机制;多智能体强化学习;双目标批判网络
interconnected power systems;AGC;TR mechanism;multi-agent reinforcement learning;dual-critic network
基金项目(Foundation): 国家自然科学基金(52377111)
作者(Author):
史悦星,杨帆,李东东,邵心怡
SHI Yuexing,YANG Fan,LI Dongdong,SHAO Xinyi
DOI: 10.19585/j.zjdl.202508008
参考文献(References):
- [1]张礼浩,刘翔宇,顾雪平,等.新型电力系统频率安全稳定研究综述及展望[J].浙江电力,2024,43(10):12-26.ZHANG Lihao,LIU Xiangyu,GU Xueping,et al.Review and prospects of frequency security and stability research in new-type power systems[J].Zhejiang Electric Power,2024,43(10):12-26.
- [2]袁岑颉,戴敏敏,周旭,等.电力市场环境下火电机组调频性能提升研究[J].浙江电力,2022,41(6):84-91.YUAN Cenjie,DAI Minmin,ZHOU Xu,et al. Research on frequency modulation performance improvement of thermal power units in the context of power market[J].Zhejiang Electric Power,2022,41(6):84-91.
- [3] YAN Z M,XU Y.Data-driven load frequency control for stochastic power systems:a deep reinforcement learning method with continuous action search[J].IEEE Transactions on Power Systems,2019,34(2):1653-1656.
- [4]方仍存,桑子夏,刘知行,等.基于改进协同量子粒子群算法的多微网负荷频率控制[J].电力建设,2023,44(7):87-97.FANG Rengcun,SANG Zixia,LIU Zhixing,et al.Loadfrequency control of multi-microgrid systems based on improved cooperative quantum-behaved particle swarm optimization[J]. Electric Power Construction,2023,44(7):87-97.
- [5]赵熙临,林震宇,付波,等.预测优化PID方法在含风电电力系统AGC中的应用[J].电力系统及其自动化学报,2019,31(3):16-22.ZHAO Xilin,LIN Zhenyu,FU Bo,et al. Application of predictive optimization PID method to AGC of power system with wind power[J].Proceedings of the CSU-EPSA,2019,31(3):16-22.
- [6]符杨,丁枳尹,米阳.计及储能调节的时滞互联电力系统频率控制[J].上海交通大学学报,2022,56(9):1128-1138.FU Yang,DING Zhiyin,MI Yang. Frequency control strategy for interconnected power systems with time delay considering optimal energy storage regulation[J]. Journal of Shanghai Jiao Tong University,2022,56(9):1128-1138.
- [7]殷林飞,余涛.基于深度Q学习的强鲁棒性智能发电控制器设计[J].电力自动化设备,2018,38(5):12-19.YIN Linfei,YU Tao.Design of strong robust smart generation controller based on deep Q learning[J]. Electric Power Automation Equipment,2018,38(5):12-19.
- [8]席磊,杜雄,李彦营,等.基于具有强化学习思想的集成学习自动发电控制算法[J].南方电网技术,2023,17(7):74-82.XI Lei,DU Xiong,LI Yanying,et al. Automatic generation control algorithm based on ensemble learning with the idea of reinforcement learning[J].Southern Power System Technology,2023,17(7):74-82.
- [9]席磊,全悦,刘治洪,等.基于自适应强化探索悲观Q的多智能体协同AGC算法[J].高电压技术,2023,49(6):2286-2298.XI Lei,QUAN Yue,LIU Zhihong,et al.Multi-agent collaborative AGC algorithm based on self-adaptive reinforcement-exploration maxmin Q[J].High Voltage Engineering,2023,49(6):2286-2298.
- [10]席磊,金澄心,李彦营,等.基于信息松弛的多态能源协调控制方法研究[J].电力系统保护与控制,2023,51(9):1-12.XI Lei,JIN Chengxin,LI Yanying,et al. A polymorphic energy-coordinated control strategy based on information relaxation[J]. Power System Protection and Control,2023,51(9):1-12.
- [11]王力,蒋宇翔,曾祥君,等.基于深度强化学习的孤岛微电网二次频率控制[J/OL].中国电力,2025:1-12.(2025-01-23).https://kns.cnki.net/KCMS/detail/detail.aspx?filename=ZGDL20250121001&dbname=CJFD&dbcode=CJFQ.WANG Li,JIANG Yuxiang,ZENG Xiangjun,et al.Secondary frequency control of islanded microgrid based on deep reinforcement learning[J/OL].Electric Power,2025:1-12.(2025-01-23). https://kns. cnki. net/KCMS/detail/detail. aspx? filename=ZGDL20250121001&dbname=CJFD&dbcode=CJFQ.
- [12] YAN Z M,XU Y.A multi-agent deep reinforcement learning method for cooperative load frequency control of a multi-area power system[J].IEEE Transactions on Power Systems,2020,35(6):4599-4608.
- [13] YANG F,HUANG D H,LI D D,et al.Data-driven load frequency control based on multi-agent reinforcement learning with attention mechanism[J].IEEE Transactions on Power Systems,2023,38(6):5560-5569.
- [14] LI J W,ZHOU T,CUI H Y. Brain-inspired deep metareinforcement learning for active coordinated fault-tolerant load frequency control of multi-area grids[J].IEEE Transactions on Automation Science and Engineering,2024,21(3):2518-2530.
- [15]张磊,马晓伟,王满亮,等.互联新能源电力系统区内AGC机组分布式协同控制策略[J].中国电力,2025,58(3):8-19.ZHANG Lei,MA Xiaowei,WANG Manliang,et al.Distributed collaborative control strategy for intra-regional AGC units in interconnected power system with renewable energy[J].Electric Power,2025,58(3):8-19.
- [16] OMIDSHAFIEI S,KIM D K,LIU M,et al.Learning to teach in cooperative multiagent reinforcement learning[J].Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):6128-6136.
- [17] XING X J,ZHOU Z W,LI Y,et al.Multi-UAV adaptive cooperative formation trajectory planning based on an improved MATD3 algorithm of deep reinforcement learning[J].IEEE Transactions on Vehicular Technology,2024,73(9):12484-12499.
- [18] LI J W,YU T,ZHANG X S,et al.Efficient experience replay based deep deterministic policy gradient for AGC dispatch in integrated energy system[J]. Applied Energy,2021,285:116386.
- [19]黄邻熹,刘继春,刘阳.与新能源互补和独立参加多级市场的抽蓄电站容量分配策略[J].电网技术,2024,48(12):4948-4957.HUANG Linxi,LIU Jichun,LIU Yang. Capacity allocation strategies for pumped storage plants that complement renewable energy sources and participate independently in the multi-level market[J]. Power System Technology,2024,48(12):4948-4957.
- [20]劳文洁,史林军,李杨,等.深度强化学习驱动的双馈抽蓄抽水工况下调频控制[J].电力系统及其自动化学报,2023,35(12):59-70.LAO Wenjie,SHI Linjun,LI Yang,et al.Frequency regulation of doubly-fed induction machine pumped storage hydro in pumping mode driven by deep reinforcement learning[J]. Proceedings of the CSU-EPSA,2023,35(12):59-70.
- [21]成昕雨,王丹,李文浩,等.考虑不对称转速边界约束的变速抽蓄机组综合惯量调频控制[J].电力系统自动化,2024,48(16):88-98.CHENG Xinyu,WANG Dan,LI Wenhao,et al.Comprehensive inertia frequency regulation control of variablespeed pumped storage unit considering asymmetric speed boundary constraints[J]. Automation of Electric Power Systems,2024,48(16):88-98.
- [22]陈增强,郑月敏,孙明玮,等.含发电速率约束的三区域互联电力系统自抗扰控制[J].哈尔滨工程大学学报,2020,41(9):1312-1319.CHEN Zengqiang,ZHENG Yuemin,SUN Mingwei,et al.Active disturbance rejection control for three-area interconnected power system with generation rate constraints[J].Journal of Harbin Engineering University,2020,41(9):1312-1319.
- [23] YU T,XI L,YANG B,et al. Multiagent stochastic dynamic game for smart generation control[J].Journal of Energy Engineering,2016,142(1):04015012.
- [24] JALEELI N,VANSLYCK L S. NERC’s new control performance standards[J]. IEEE Transactions on Power Systems,1999,14(3):1092-1099.
- [25] ZHAN M Y,CHEN J C,DU C L,et al. Twin delayed multi-agent deep deterministic policy gradient[C]//2021IEEE International Conference on Progress in Informatics and Computing(PIC).December 17-19,2021,Shanghai,China.IEEE,2021:48-52.