用户电力数据的爆炸式增长给原始数据修正带来许多难点。本文提出用函数型数据分析(Functional Data Analysis, FDA)对错误和缺失数据进行补全。通过函数估计方法,将原有观测个体的离散数据映射到一个新的函数空间,从而将数据中缺失的成分利用相似用户曲线特征进行修复,并搭建了针对大数据的数据清洗整体框架。在真实数据集上的测试结果表明,该算法能够准确地提取用户的用电特征曲线,并对错误数据和缺失数据进行准确地修复。
英文摘要:
The explosive growth of user power data has brought many difficulties to the repairing of raw data. This paper proposes the use of Functional Data Analysis (FDA) to complete the error and missing data. Through the function estimation method, the discrete data of the original observed are mapped to a new function space. The missing components in the data are repaired by using similar user curve features. This paper also builds the framework of data cleaning for big data. The test results on the real data set show that the algorithm can accurately extract the user's power consumption characteristic curve and accurately repair the erroneous data and the missing data.