马红明,马浩,杨迪,吴宏波,刘家丞,李骥.基于奇异值阈值理论的电力营销数据在线清洗方法[J].电测与仪表,2024,61(9):120-126. MA Hongming,MA Hao,YANG Di,WU Hongbo,LIU Jiacheng,LI Ji.An Online Data Cleaning Algorithm for Power Marketing Data Based on Singular Value Thresholding Theory[J].Electrical Measurement & Instrumentation,2024,61(9):120-126.
基于奇异值阈值理论的电力营销数据在线清洗方法
An Online Data Cleaning Algorithm for Power Marketing Data Based on Singular Value Thresholding Theory
Under the framework of energy Internet, power marketing big data is the foundation to support many advanced applications of smart grid, and data cleaning is extremely important for power marketing big data. However, the data missing problem will inevitably appear in the actual power grid operation, which greatly affects the analysis and use of data. To solve the above problem, this paper proposes an online data cleaning framework and method based on spark platform, which combines similar user clustering and singular value thresholding theory. Firstly, with the help of singular value decomposition, it is proved that the power data has the characteristics of approximate low rank. On this basis, considering the power consumption difference of power users, this paper proposes an online data cleaning frame-work and method which integrates the improved K-Nearest Neighbor clustering and the theory of singular value threshold-ing. At the same time, in order to solve the problem of slow cal-culation of singular value thresholding model, a sliding time window online recovery strategy is proposed to accelerate the repair speed and improve the recovery accuracy. Finally, the effectiveness of the proposed algorithm is verified by power marketing data of Hebei Province. The experimental results show that the online recovery algorithm can repair the large-scale default data of power marketing more quickly and effectively.