The power grid is rapidly becoming highly informationized and automated, and the scale of data generated is also rapidly expanding. However, one of the main obstacles to the effective use of power big data is the lack of efficient data labeling methods. By proposing a flexible framework to mark load patterns and usage habits in a non-intrusive way, the composite data tag file is used for smart grid functions such as demand response, energy management and load monitoring. The data is automatically preprocessed using signal processing techniques such as matched filter. The generation of data marker files is realized by using two key constructions, namely, the generative adversary network and the kernel density estimator. In addition, various components in these structures are optimized to ensure the stability of the learning process. Unique evaluation indicators are also identified to measure the performance of the composite dataset. The synthetic dataset is compared with the real dataset, and the smart grid machine learning algorithm is trained and tested on this basis. The simulation results verify the effectiveness of the proposed method.