This study introduces SleepHybridNet, a lightweight hybrid CNN-Transformer model designed to enhance the classification of non-rapid eye movement stage 1 (N1) sleep using single-channel electroencephalogram (EEG) signals. Accurate identification of the N1 stage is of critical importance in both sleep neuroscience and clinical practice. However, due to the ambiguous features during N1 stage, current deep learning models still struggle to achieve satisfactory performance. To address these challenges, SleepHybridNet integrates multi-scale feature fusion and sequence modeling through a novel architecture. It consists of a Multi-Scale Convolutional Neural Network (MSCNN) module, a Transformer encoder, a spectral feature extraction unit, and a multi-task classifier. Experimental results based on the publicly available Sleep-EDF Expanded dataset demonstrate that SleepHybridNet outperforms existing methods in both classification accuracy and generalization capability. Specifically, the model achieves an overall accuracy of 88.2% and an F1-score of 0.633 for the N1 stage, showing superior performance particularly in underrepresented classes such as N1 and N3 stages. With only 5.1 M parameters, the lightweight design of the model can enable practical deployment in clinical settings, bridging the gap between high-performance deep learning algorithms and practical applicability in sleep medicine. Future work may explore the integration of multimodal data from wearable sensors to further expand its use in diverse application scenarios.