Cascading faults are prone to cause significant risk of instability hazards to the distribution network system due to the dependency between its components, and eventually lead to the complete collapse of the whole network. Considering that early warning of cascading faults can facilitate the implementation of self-healing techniques, this paper introduces a spatio-temporal deep network to predict the response time of cascading faults. The spatial and time-domain characteristics presented by the network topology and electrical parameters of the distribution system are extracted using the convolution operation in the model combined with both up- and down-sampling. The response time is classified into three classes according to the degree of urgency, and the prediction task of the model is formulated as a multiclassification problem. Experimental results on the UIUC 150-bus power system show that the model is able to obtain more accurate prediction accuracy compared to other conventional methods by using the initial state of the power network and the initial set of faults of the elements at the beginning of the cascading fault. These results reveal the complete dynamic profile of cascading fault propagation and determine the response time for each cascading fault scenario, providing new insights for early warning of large cascading faults occurring in the system.