基于BERT-MRC的电网现场作业文本关键实体识别方法A BERT-MRC-based method for key entity recognition in power grid field operation texts
费正明,袁可为,黄弘扬,张亦翔,尹凡,周辉,罗华峰
FEI Zhengming,YUAN Kewei,HUANG Hongyang,ZHANG Yixiang,YIN Fan,ZHOU Hui,LUO Huafeng
摘要(Abstract):
对电网现场作业进行管控稽查是保障安全生产的重要举措,而准确识别作业文本中的关键设备实体是实现智能化管控稽查的基础。现有电力实体识别方法依赖大量人工标注的文本数据来训练模型,难以应用于产生速度快、数量多、且存在实体嵌套等复杂关系的电网现场作业文本。在分析电网现场作业相关文本特点的基础上,提出了面向电网作业风险管控稽查的关键实体识别方法,在提高识别效果的同时,大幅降低了模型对有标签数据的需求。首先,使用BERT(基于Transformer的双向自编码器)获取融合上下文特征的文本数据向量;然后,基于BERT-MRC(基于Transformer的双向自编码器-机器阅读理解)将原实体识别任务改造成机器阅读理解任务,完成模型构建;最后,使用基于Noisy Student的小样本学习方法迭代训练模型,大幅降低了模型对有标签数据的需求量。采用真实电网现场作业文本进行实验,结果表明了所提方法的有效性。
Ensuring production safety in power grids requires effective control and inspection of field operations, where accurate recognition of key equipment entities in operation texts serves as the foundation for intelligent control and inspection. However, existing power entity recognition methods rely heavily on large volumes of manually annotated text data to train models, making them difficult to apply to field operation texts, which are generated rapidly, exist in large quantities, and often involve nested entities and other complex relationships. Based on an analysis of the characteristics of power grid field operation texts, this paper proposes a key entity recognition method tailored for risk control and inspection of power grid operations. The method enhances recognition performance while significantly reducing the model's dependence on labeled data. First, bidirectional encoder representations from transformers(BERT) are employed to obtain text data vectors that incorporate contextual features. Then, leveraging BERT-machine reading comprehension(MRC), the entity recognition task is reformulated as an MRC task to build the model. Finally, a few-short learning(FSL) method based on the Noisy Student is applied to iteratively train the model, greatly reducing the reliance on labeled data. Experiments conducted on real-world power grid field operation texts demonstrate the effectiveness of the proposed method.
关键词(KeyWords):
实体识别;机器阅读理解;电网现场作业;风险管控稽查;BERT;小样本学习
entity recognition;MRC;grid field operation;risk control and inspection;BERT;FSL
基金项目(Foundation): 国家电网华东分公司科技项目(520800230008)
作者(Author):
费正明,袁可为,黄弘扬,张亦翔,尹凡,周辉,罗华峰
FEI Zhengming,YUAN Kewei,HUANG Hongyang,ZHANG Yixiang,YIN Fan,ZHOU Hui,LUO Huafeng
DOI: 10.19585/j.zjdl.202507003
参考文献(References):
- [1]虞佳淼,王慧芳,张亦翔,等.基于BERT的电网现场作业风险自动评级方法[J].电网技术,2023,47(11):4746-4754.YU Jiamiao,WANG Huifang,ZHANG Yixiang,et al.BERT-based automatic risk rating method for power grid field operation[J]. Power System Technology,2023,47(11):4746-4754.
- [2]国家电网有限公司.国家电网有限公司2020年QC小组优秀成果集[M].北京:中国电力出版社,2021.
- [3]国家电网有限公司.国家电网有限公司关于进一步加强生产现场作业风险管控工作的通知[EB/OL].[2022-01-30].https://new.qq.com/rain/a/20220420A046BI00.
- [4]王慧芳,曹靖,罗麟.电力文本数据挖掘现状及挑战[J].浙江电力,2019,38(3):1-7.WANG Huifang,CAO Jing,LUO Lin.Current status and challenges of power text data mining[J].Zhejiang Electric Power,2019,38(3):1-7.
- [5]孙宏云,李喜旺.面向配电网数据的命名实体识别[J].计算机系统应用,2023,32(2):387-393.SUN Hongyun,LI Xiwang.Named entity recognition for power distribution network data[J].Computer Systems&Applications,2023,32(2):387-393.
- [6]徐会芳,张中浩,谈元鹏,等.面向电网调度领域的实体识别技术[J].电力建设,2021,42(10):71-77.XU Huifang,ZHANG Zhonghao,TAN Yuanpeng,et al,Research on entity recognition technology in power grid dispatching field[J]Electric Power Construction,2021,42(10):71-77.
- [7]谢腾,杨俊安,刘辉.基于BERT-BiLSTM-CRF模型的中文实体识别[J].计算机系统应用,2020,29(7):48-55.XIE Teng,YANG Junan,LIU Hui.Chinese entity recognition based on BERT-BiLSTM-CRF model[J].Computer Systems&Applications,2020,29(7):48-55.
- [8]VASWANI A,NOAM Shazeer,NIKI Parmar,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems,California USA,2018.
- [9]DEVLIN J,CHANG M W,LEE K,et al. BERT:pretraining of deep bidirectional transformers for language understanding[EB/OL]. 2018:1810.04805. https://arxiv.org/abs/1810.04805v2.
- [10]刘甜甜,彭放,卢伟龙,等.基于YOLOv7的智能电网外部安全帽佩戴风险因素识别与检测[J].电测与仪表,2024,61(12):42-48.LIU Tiantian,PENG Fang,LU Weilong,et al.Identification and detection of external risk factors for safety helmet wearing in smart grid based on YOLOv7[J]. Electrical Measurement&Instrumentation,2024,61(12):42-48.
- [11]马爱清,张成武,徐进帅,等.基于姿态估计的输电线路带电作业的人体数字孪生研究[J].高压电器,2024,60(10):21-32.MA Aiqing,ZHANG Chengwu,XU Jinshuai,et al. Research on human digital twins for live working on transmission lines based on attitude estimation[J]. High Voltage Apparatus,2024,60(10):21-32.
- [12]周文青,刘刚.基于深度学习和无人机图像的架空线路缺陷巡检综述[J].电力工程技术,2024,43(2):73-82.ZHOU Wenqing,LIU Gang.Review of overhead line defect inspection based on deep learning and UAV images[J].Electric Power Engineering Technology,2024,43(2):73-82.
- [13]戎毅成,谭鑫,高瑾,等.基于物联网的电网施工智慧工地数字化管理平台建设[J].山东电力技术,2023,50(4):22-27.RONG Yicheng,TAN Xin,GAO Jin,et al.Construction of digital management platform for power grid intelligent construction site based on Internet of Things[J].Shandong Electric Power,2023,50(4):22-27.
- [14]HU Z T,HOU W,LIU X X.Deep learning for named entity recognition:a survey[J].Neural Computing and Applications,2024,36(16):8995-9022.
- [15]王家凯,黄佩卓,李勇乐,等.电力非结构化大文本特征提取研究[J].浙江电力,2024,43(6):117-124.WANG Jiakai,HUANG Peizhuo,LI YongLe,et al. Research on feature extraction of unstructured large power texts[J].Zhejiang Electric Power,2024,43(6):117-124.
- [16]赵俊华,文福拴,黄建伟,等.基于大语言模型的电力系统通用人工智能展望:理论与应用[J].电力系统自动化,2024,48(6):13-28.ZHAO Junhua,WEN Fushuan,HUANG Jianwei,et al.Prospect of general artificial intelligence in power system based on large language model:theory and application[J].Automation of Electric Power Systems,2024,48(6):13-28.
- [17]刘梓权,王慧芳,曹靖,等.基于卷积神经网络的电力设备缺陷文本分类模型研究[J].电网技术,2018,42(2):644-650.LIU Ziquan,WANG Huifang,CAO Jing,et al.A classification model of power equipment defect texts based on convolutional neural network[J].Power System Technology,2018,42(2):644-650.
- [18]王凯,赵刚,龚晓成,等.面向电力调度知识建模的半监督实体识别方法[J].电网技术,2023,47(9):3855-3863.WANG Kai,ZHAO Gang,GONG Xiaocheng,et al.Semisupervised entity identification method for power dispatching knowledge modeling[J].Power System Technology,2023,47(9):3855-3863.
- [19]王冠南,郭丽娟,彭曙蓉,等.基于正则表达式和Jaccard系数的智能变电站录波通道同源匹配[J].浙江电力,2024,43(1):20-27.WANG Guannan,GUO Lijuan,PENG Shurong,et al.Homologous matching of recording channels in intelligent substations based on regu-lar expression and Jaccard similarity coefficient[J].Zhejiang Electric Power,2024,43(1):20-27.
- [20]LI X,FENG J,MENG Y,et al. A unified MRC framework for named entity recognition[C/OL]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Online:Association for Computational Linguistics,2020:5849-5859[2024-04-17].https://aclanthology.org/2020.acl-main.519.
- [21]XIE Q,LUONG M T,HOVY E,et al.Self-training with noisy student improves image net classification[C/OL]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).Seattle,WA,USA:IEEE,2020:10684-10695[2024-04-17].https://ieeexplore.ieee.org/document/9156610/.
- [22]国家电网有限公司企业标准.国家电网有限公司电力建设安全工作规程第8部分:配电部分[EB/OL]. 2023[2023-03-22]. https://www. doc88. com/p-04761379542607.html.
- [23]ZHUANG L,WAYNE L,YA S,et al. A robustly optimized BERT pre-training approach with post-training[C/OL]//Proceedings of the 20th Chinese National Conference on Computational Linguistics. Huhhot,China:Chinese Information Processing Society of China,2021:1218-1227[2024-10-13]. https://aclanthology. org/2021. ccl-1.108.
- [24]ZHANG Z,HAN X,LIU Z,et al.ERNIE:Enhanced language representation with informative entities[C/OL]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence,Italy:Association for Computational Linguistics,2019:1441-1451[2024-10-13]. https://aclanthology. org/P19-1139. DOI:10.18653/v1/P19-1139.
- [25]HUANG Z H,XU W,YU K.Bidirectional LSTM-CRF models for sequence tagging[J]. ArXiv e-Prints,2015:arXiv:1508.01991.
- [26]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:prevent NN from overfitting[J].Journal of Machine Learning Research,2014,15:1929-1958.
- [27]XIE Q Z,LUONG M T,HOVY E,et al. Self-training with noisy student improves ImageNet classification[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).June 13-19,2020,Seattle,WA,USA.IEEE,2020:10684-10695.
- [28]周志华.机器学习[M].北京:清华大学出版社,2016.
- [29]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[EB/OL]. 2013:1301.3781. https://arxiv. org/abs/1301.3781v3.
- 实体识别
- 机器阅读理解
- 电网现场作业
- 风险管控稽查
- BERT
- 小样本学习
entity recognition - MRC
- grid field operation
- risk control and inspection
- BERT
- FSL