计算机科学
RGB颜色模型
卷积神经网络
人工智能
人工神经网络
深度学习
特征(语言学)
模式识别(心理学)
计算机视觉
语言学
哲学
作者
Daniel Carreira,Nuno M. M. Rodrigues,Rolando Miragaia,Paulo César Costa,José Ribeiro,Fábio Gaspar,Ántónio Pereira
标识
DOI:10.1016/j.asoc.2024.112088
摘要
From smart sensors on assembly lines to robots performing complex tasks, the fourth industrial revolution is rapidly transforming manufacturing. The growing prominence of 3D cameras in the industry has led the computer vision community to explore innovative ways of integrating depth and color data to achieve higher precision, essential for ensuring product quality in manufacturing. In this study, we introduce an innovative branched convolutional neural network designed to produce high-speed classification of multimodal images, such as RGB-Depth (RGB-D) images. The fundamental concept underlying the branched approach is the specialization of each branch as a dedicated feature extractor for a single modality, followed by their merge (intermediate fusion) to enable effective classification. Feeding our model is our novel multimodal dataset, named CeramicNet, composed of 8 classes that include RGB, depth, and RGB-D variations to enable extensive experimentation and evaluation of the models which, to the best of our knowledge, has not been previously introduced in the computer vision community. We conducted a series of experiments on the CeramicNet dataset. These experiments aimed at fine-tuning the model, assessing the influence of various depth technologies, exploring individual modalities, examining their collective impact, and performing comprehensive data analysis. Comparing our solution against seven widely used models, we achieved remarkable results, securing the top position with a precision of 99.89, with a lead of over 1% against the nearest competitor. What is more, the proposed solution yields an inference time of 127.6 ms — being nearly three times faster than the second-best performer.
科研通智能强力驱动
Strongly Powered by AbleSci AI