摘要
Abstract Large Language Models (LLMs) have emerged in recent years as one of the most significant advancements in artificial intelligence. These models enabled the development of intelligent chatbots capable of answering a wide range of user queries by referencing vast amounts of information. However, LLMs have many limitations. Their responses can be unpredictable, often lack domain-specific knowledge, and may include contextual misunderstandings and hallucinations. To address these challenges, Retrieval-Augmented Generation (RAG) has become a prominent technique. RAG systems decouple world knowledge from the model's parameters, combining the generative capabilities of LLMs with advanced information retrieval, processing, and vector storage techniques. These integrations result in responses that are more accurate, contextually relevant, and significantly less prone to hallucination. More recently, Multimodal Large Language Models (MLLMs) have extended LLM capabilities beyond text to include modalities such as images, tables, videos, and charts. These multimodal capabilities are critical for domains like oil and gas (O&G), where data exists in diverse formats. Traditional RAG systems often rely on static pipelines, which limit their effectiveness in handling multimodal queries. However, agentic-based RAG architectures offer a promising solution by enabling dynamic and adaptive processing. In this paper, we introduce a novel GenAI-based multimodal agentic AI assistant for drilling and completion applications (D&C). This domain known for data complexity and variability. By analyzing historical wells' D&C reports, wells' performance, geological characteristics, and production trends from nearby wells, the system enables engineers to identify optimal drilling plans and improve field development. It demonstrates that the GenAI assistant performs well in handling complex, domain-specific questions that required synthesizing several types of data. A series of experiments show significant improvements (contextual understanding, hallucination prevention, and overall reliability) compared to traditional approaches.