State Departments of Transportation (DOTs) are facing workforce shortages and a decline in experienced construction workers. One approach to address this problem is through the active adoption of modern technologies, particularly artificial intelligence (AI) tools. Multimodal large language models (LLMs) offer emerging capabilities, such as instruction following and question answering, making them suitable for numerous practical applications in civil engineering. A key area of interest is effectively communicating complex concepts to train engineers and construction workers; however, there remains a need for a structured approach to selecting appropriate models and evaluating LLMs’ knowledge for specialized engineering areas. This paper presents an evaluation of twenty-six different LLMs using over 100 automatically generated questions, revealing a wide range of accuracies in concrete pavement construction expertise. The proposed methodology uses carefully selected, domain-specific documents and best practices to automatically build a customized framework for assessing model performance in niche fields. It also demonstrates that techniques like the “Retry” method and the integration of domain-specific information through retrieval augmented generation (RAG) can significantly enhance LLM accuracy. The results indicate that open-source, small- and medium-sized models combined with RAG, which can run efficiently on laptop computers, could be deployed for future applications. These techniques show promise for model improvement without additional fine-tuning or retraining, significantly reducing the computational requirements for tailored engineering tasks.