人工智能
机器人学
水下
GSM演进的增强数据速率
计算机视觉
图像(数学)
计算机科学
机器人
地质学
海洋学
作者
Prabha Sundaravadivel,Preetha J. Roselyn,N. Vedachalam,Vincent I. Jeyaraj,Aparna Ramesh,Aaditya Khanal
摘要
Image-based Large Language Models (LLMs) are AI models that can understand the captured images and generate textual content based on the analysis of images or visual data. Incorporating the LLMs for assessing water quality, pressure, and environmental conditions can help analyze historical data and predict potential risks and threats in underwater environments. This can improve the intervention of autonomous underwater vehicles ( AUV) and remotely operated vehicles ( ROV) during emergencies where the visual data must be interpreted to make informed decisions. While LLMs are primarily associated with processing and generating text, they can be integrated with images through a process known as multimodal learning, where text and images are combined for tasks that involve both modalities. Implementing such frameworks is challenging when deployed in low-power microcontrollers primarily used in monitoring systems. This research proposes evaluating multimodal tokens to enable edge computing in bio-inspired robots to monitor the underwater environment. This can help break down large real-time videos into tokens of text-based instructions associated with the description of images. The mini-robots will transmit the collected "tokens" to the nearest AUV or ROV, where the image-based LLM will be deployed. We propose to evaluate this image-based LLM in our NVIDIA Jetson Nano-based AUV. In the proposed architecture, the mini-robots can move along the length of the water column to capture images of the underwater environment. Our proposed model is evaluated to generate texts for boat and fish images. This proposed framework with integrated image-based tokens can significantly reduce the response time and data traffic in underwater real-time monitoring systems.
科研通智能强力驱动
Strongly Powered by AbleSci AI