As the number of classes increases in traditional multiple classification and recognition tasks, there is often the problem of a long tail: the sample data is mainly distributed in a few classes. In the detection of domain names generating malware (DGA - domain generation algorithm), due to the variability of malware, the number of classes of DGA is also increasing and shows a long tail nature. However, in previous DGA detection research focused on the classes of a large amount of data so they do not address the long tail characteristics. We propose an effective knowledge transfer DGA detection model that transfers the knowledge learned in the previous stage of training to the next stage, and optimizes the impact of the long tail problem on the detection model. In order to inherit the continuity of the model, we propose a data balance review method, which can alleviate the catastrophic forgetting problem of transfer learning and detect new classes without retraining the whole model. Finally, the macro average F1 score of our model is 76.6%, 8.74% higher than ATT_BiLSTM and 6.34% higher than ATT_CNN_BiLSTM. So our model optimizes the long tail problem and better predicts all classes.