Abstract Objective The objective of this study is to evaluate retrieval-augmented prediction for forecasting hospital length of stay (LOS) following surgery compared to traditional machine learning (ML), standalone large language models (LLMs), and retrieval-augmented generation (RAG) approaches. Materials and Methods Spine surgery cases were extracted from electronic health records. Structured features and operative notes were concatenated into natural language patient representations, embedded using Sentence-Bidirectional Encoder Representations from Transformer, and stored in a vector database. Eight predictive models were implemented, including a baseline model, standalone ML with embeddings, standalone LLM (Gemma 3:27B), and combinations of these with retrieval-augmented prediction or generation. The retrieval-augmented prediction model computed a similarity-weighted average LOS from nearest neighbors. Performance was assessed using R2, mean absolute value (MAE), and root mean square error (RMSE). Results Retrieval-augmented prediction alone outperformed standalone ML and LLM models (R2 = 0.39, MAE = 4.47). Combining ML or LLM outputs with retrieval-augmented prediction further improved performance. The best performing model was a neural network blended with retrieval-augmented prediction (R2 = 0.52, MAE = 4.16). LLM-RAG alone reached R2 = 0.19, which improved to 0.47 when combined with retrieval-augmented predictions. Retrieval-augmented prediction consistently reduced MAE and RMSE by up to 32% and 38%, respectively. Discussion Retrieval-augmented prediction offers interpretable and resource-efficient forecasting by semantically leveraging prior patient cases without generative modeling. It consistently outperformed RAG and ML across metrics, approximating clinical reasoning via similarity-based inference. Conclusion Retrieval-augmented prediction significantly enhances LOS prediction accuracy over standard ML and LLM models. Its interpretability and scalability make it a promising solution for integrating predictive analytics into clinical workflows.