Abstract Global rehabilitation demands (2.41 billion people) urgently require advanced motion intention recognition for exoskeletons. Surface electromyography signals face challenges in positional ambiguity and noise sensitivity during lower limb motion decoding. We propose DCTran, a hybrid CNN-Transformer model featuring (1) adaptive positional encoding (tAPE/eRPE) dynamically aligning muscle activation phases; (2) a frequency-aware network (1D DFD-FFN) reducing parameters by 81.5$\%$ via spectral gating; and (3) dynamic augmentation (DWRA/TDE) enhancing cross-subject robustness. Evaluated on OYMotion (six subjects, six motions) and public ENABL3S datasets, DCTran achieved 91.86$\%$ and 94.38$\%$ accuracy, outperforming ConvTran by +5.2$\%$. Ablation studies validated tAPE/eRPE (+4.86$\%$ accuracy) and 1D DFD-FFN (+3.6$\%$) contributions. This enables real-time exoskeleton control and multimodal physiological fusion.