There has been an unanticipated increase in the number of cases of Autism Spectrum Disorder (ASD) in the present era. Its late detection due to the negligence of its early symptoms aggravates the complications in the day-to-day life of an autistic person. Artificial Intelligence (AI)-based classification framework can assist doctors in its early detection, and it can help autistic people to ameliorate their lifestyle. The less number of works using Structural Magnetic Resonance Imaging (sMRI) compared to the Functional Magnetic Resonance Imaging (fMRI) with AI-based approaches gives the motivation to develop the classification system for the detection of ASD with sMRI scans. In the past few years, huge numbers of involvement of CNN-based approaches in the computer-vision application have been witnessed by the research community. The Vision Transformer (ViT) network based on the idea of Transformers in Natural Language Processing has done revelation with its performances in image recognition. The proposed work focuses on the development of a classification system utilizing the ViT network for ASD detection. The two different variants of ViT i.e., ViT-B16 and ViT-B32 have been utilized with additional modification for the experimentation. The proposed Prediction Level Fusion of Vision Transformers (PF-ViTs) based network has exhibited impressive performances compared to the sMRI-based state-of-the-art works (SOTAW) by achieving an accuracy of 94.24%, a precision of 96.03%, a sensitivity of 92.36%, a specificity of 96.14%, a F1 score of 94.16%, and AUC score of 98.45% towards the detection of ASD.