Background: Postoperative delirium (POD) is a prevalent complication in elderly surgical patients. It is associated with long-term cognitive impairment and increased dementia risk. However, reliable tools to predict POD are currently lacking. Methods: We enrolled 316 arthroplasty patients (aged ≥ 65 years) in this study. Preoperative assessments comprised neuropsychological tests (i.e., Mini-Mental State Examination [MMSE] and Montreal Cognitive Assessment [MoCA]), molecular biomarkers of serum/cerebrospinal fluid (CSF), and saccadic tasks. POD was diagnosed by expertized persons based on the Confusion Assessment Method test. We compared the effectiveness of abovementioned three types of assessments in predicting the occurrence of POD. Results: The incidence of POD was 8.2% (26/316). MMSE and MoCA scales, serum neurofilament light chain (NfL) levels, and five saccadic parameters values (reaction time, primary saccade error, saccadic gains in pro-saccades; peak velocity in anti-saccades and memory guided saccades) differed significantly ( p < 0.05) between POD and non-POD participants. The logistic regression classifier model revealed higher predictive accuracy when using saccadic parameters (area under the receiver operating characteristic curve [AUROC] = 0.81, 95% confidence interval [CI]: 0.70–0.92) than that by using MMSE and MoCA scores (AUROC = 0.64, 95% CI: 0.53–0.76), or NfL levels (AUROC = 0.61, 95% CI: 0.50–0.72). The multilayer perceptron machine learning classifier model further increased the accuracy (AUROC = 0.89, 95% CI: 0.82–0.94) by using saccadic parameters to predict POD occurrence. Conclusion: Saccadic parameters exhibited higher accuracy in predicting the occurrence of POD than MMSE and MoCA scores and molecular test results. Therefore, saccadic parameters may serve as a complementary behavioral biomarker for predicting the occurrence of POD in elderly arthroplasty patients.