欢迎访问《电测与仪表》杂志社唯一官方网站

文章摘要

基于改进式k-prototypes聚类的坏数据辨识与修正

Bad data identification and correction method based on improved k-prototypes clustering

Received:January 23, 2020 Revised:February 16, 2020

DOI：10.19753/j.issn1001-1390.2022.02.002

中文关键词: k-prototypes 聚类混合数据集聚类坏数据辨识类中心置换修正法工业负荷预处理

英文关键词: k-prototypes clustering, mixed dataset clustering, bad data identification, correction with centroid vector replacing, industrial load data preprocessing

基金项目:国家重点研发计划项目（2016YFB0901300）

Author Name	Affiliation	E-mail
Wang Xiaoci	College of Electrical Engineering Zhejiang University	982128684@qq.com
Dong Shufeng^*	College of Electrical Engineering Zhejiang University	dongshufeng@zju.edu.cn
Liu Yuquan	Guangzhou Power Supply Bureau Co.	290547225@qq.com
Wang Li	Guangzhou Power Supply Bureau Co.	13809779650@139.com
Li Junge	Guangzhou Power Supply Bureau Co.	13802767894@139.com

Hits: 1679

Download times: 586

中文摘要:

工业领域很多技术的实现都以准确的负荷数据为基础，而工厂现有的负荷数据测量体系常因为通信、存储等故障，导致负荷数据中出现大量坏数据。因此，提出基于改进式k-prototypes聚类的坏数据辨识与修正方法，通过在聚类中引入非负荷数据特征，削弱负荷坏数据对聚类结果的影响，使坏数据辨识和修复结果更准确。改进式k-prototypes算法通过随机初始化，并行计算择优，克服了标准k-prototypes容易随初始聚类中心陷入局部最优解的缺陷；并通过聚类数量的自适应处理，解决了主观决定聚类数量的问题。基于聚类结果，根据正态分布原则确定负荷数据可行域，识别坏数据，并利用类中心置换法进行修正。实验表明，该方法较只考虑负荷数据的模糊均值聚类法效果更好，坏数据识别的召回率与修正的准确率显著提高。

英文摘要:

The realization of many technologies in the industrial field is based on accurate load data, while the existing measurement system in factories often results in a large number of bad data due to communication and storage failures. Therefore, an industrial load data identification and correction method based on improved k-prototypes clustering algorithm is proposed to reduce the impact of bad load data on the clustering results by introducing characteristics of non-load data in clustering, so as to make the identification and repair results more accurate. Through random initialization and parallel calculation optimization, the improved k-prototypes algorithm overcomes the defect that standard algorithm tends to fall into the local optimal solution. And the problem of subjectively determining the number of clusters is solved by adaptive processing. Based on the clustering results, the feasible region of load data is determined according to the principle of normal distribution, and the bad data is identified. The identified bad data is corrected by centroid vector replacing. Experiments show that the proposed method outperforms the fuzzy C-means clustering method which only considers the load data, and the recall rate and correction accuracy of bad data identification are significantly improved.

View Full Text View/Add Comment Download reader