Aiming to quickly, accurately, and automatically recognize various defects (leakage, crack, spalling) from massive metro tunnel lining image data, a two-step deep learning-based framework is proposed. The datasets for metro tunnel lining image classification and multi-defect detection are first established via a self-developed metro tunnel inspection drone (MTID), mobile phones, and digital cameras. Relying on the classification dataset, a tunnel lining image classification network (TLCNet) is developed by coupling convolutional neural networks with visual Transformers in the first step of the defect recognition framework. Through which the defective images are separated and saved with an accuracy of 85.60 %±1.58 % and frames per second of 81. In the second step, a tunnel lining defect detection network (TDDNet) is presented via designing multiple detection heads and introducing attention modules upon You Only Look Once network, detecting multi-defect from defective images with 71.51 % mAP, 69.83 % F1 score, 75.95 giga FLOPs, 37.78 million parameters on the object detection dataset. Extensive comparison, ablation, and generalization experiments indicate the superiority of the TLCNet and TDDNet. Furthermore, an intelligent platform integrating TLCNet and TDDNet is established, which realizes multi-defect recognition in practical metro tunnel lining inspection tasks.