Deep learning (DL) models for medical image analysis are majorly bottlenecked by the lack of well-annotated datasets. Bronchoalveolar lavage (BAL) is a minimally invasive procedure to diagnose lung cancer, but BAL cytology suffers from low sensitivity. The success of DL in BAL cytology is rare due to the rarity of exfoliated tumor cells (ETCs) and their subtle morphological differences from normal cells. Single-cell DNA sequencing (scDNA-Seq) is utilized as an objective ground truth of ETC annotation for generating an unbiased, accurately annotated dataset comprising 580 ETCs and 1106 benign cells from BAL cytology slides. A DL model is developed, to distinguish ETC from benign cells in BAL fluid, achieving an Area Under the Curve of 0.997 and 0.956 for detecting large- and small-sized ETCs, respectively. The model is applied in a discovery cohort (n = 156) to establish BAL-based cytopathologic diagnostic model for lung cancer. The model is evaluated in a validation cohort (n = 158), and yielded 47.6% sensitivity and 97.7% specificity in lung cancer diagnosis, outperforming cytology with improved sensitivity (47.6% vs 19.0%). In an external validation cohort (n = 141), the model achieved 60.0% sensitivity and 92.5% specificity in lung cancer diagnosis.