人工智能
帕斯卡(单位)
计算机科学
判别式
稳健性(进化)
模式识别(心理学)
视觉对象识别的认知神经科学
分类
目标检测
机器学习
深度学习
特征提取
计算机视觉
上下文图像分类
三维单目标识别
训练集
水准点(测量)
建筑
主成分分析
人工神经网络
对象(语法)
上下文模型
背景(考古学)
一般化
基于对象
作者
CK Sudhakar,Suganthi Santhanam
标识
DOI:10.1109/icoici65217.2025.11254598
摘要
Having profound implications for autonomous systems, intelligent surveillance, and human-computer interaction, object detection and classification of images are one of the principal challenges in computer vision. Using the vast benchmark datasets of PASCAL VOC and ImageNet, this study introduces a deep learning system for object recognition and classification. The model employs a hybrid structure combining Vision Transformers (ViT) and ResNet-50 for extracting global context and spatial features. On the more diversified and complex PASCAL VOC dataset, the pre-trained weights of ImageNet are transferred for the purpose of generalization and speed of convergence. By thorough testing on PASCAL VOC 2007 and 2012 datasets, the model outperformed baseline models like Faster R-CNN and YOLOv5 by 2.7% and 1.9%, respectively, in terms of mean Average Precision (mAP) at 83.4% on VOC 2007 and 82.1% on VOC 2012. With 78.9% top-1 classification accuracy, ImageNet-based training significantly improved the system discriminative ability for 1,000 object categories. Greater robustness to occlusion and class imbalance were achieved with increased data augmentation, optimized focus loss, and improved attention modules.
科研通智能强力驱动
Strongly Powered by AbleSci AI