In view of the micro grid random optimization scheduling problem, this paper proposes a micro grid online optimization algorithm based on deep reinforcement learning. Using the deep neural network to approximate the state-action value function, discretize the action of the battery as the output of the neural network, then use nonlinear programming to solve the remaining decision variables and calculate the immediate return, and obtain the optimal strategy through the Q learning algorithm. In order to make the neural network adapt to the randomness of wind and wind load, according to the wind, photovoltaic and load power prediction curves and their prediction errors, Monte Carlo sampling was used to generate multiple sets of training curves to train the neural network. After the training is completed, save the weights. According to the real-time input status of the microgrid, the neural network can output the actions of the battery in real time so as to realize the online optimal dispatching of the microgrid. Compared with day-ahead optimization results under different fluctuations of wind power, photovoltaic power and load power, the effectiveness and superiority of this algorithm in online optimization of microgrid are verified.