Improved cancer genomic diagnosis and prognosis are vital to accurate medical therapy. Deep learning methods offered an end-to-end solution to enhance the precision of analysis. With the fast pace of pre-trained Transformer models, it remains uncertain whether some novel approaches such as the sparsely gated mixture of expert (MOE) and self-attention mechanisms can further improve the precision of cancer prognosis and classification. In this paper, we introduce a novel sparsely gated cancer diagnosis and prognosis framework called Gene-MOE exploiting the potential of the MOE layers and the proposed mixture of attention expert (MOAE) layers to enhance the analysis accuracy. Additionally, we address overfitting challenges by integrating pan-cancer information from 33 distinct cancer types through pre-training. For survival analysis, Gene-MOE achieves the best Concordance Index compared with state-of-the-art models on 12 of 14 cancer types. For cancer classification, the total accuracy of the classification model for 33 cancer classifications reached 95.8%, representing the best performance compared to state-of-the-art models. For cancer subtyping, Gene-MOE achieves the best result on at least one metric of the log10 P-values and the number of significant clinical on seven of nine cancers. These results indicate that Gene-MOE holds strong potential for these downstream tasks.