计算机科学
分割
估计
RGB颜色模型
人工智能
变压器
词汇
卡路里
生物
语言学
哲学
物理
管理
量子力学
电压
经济
内分泌学
作者
Satayu Parinayok,Yoko Yamakata,Kiyoharu Aizawa
标识
DOI:10.1145/3595916.3626452
摘要
Nutrition plays a vital role in overall health and well-being. With a highly accurate nutrient estimation model, we develop a tool that displays nutritional values from food images, thereby reducing the labor-intensiveness of dietary assessment. We propose a method that uses depth data with RGB images and incorporates an open-vocabulary segmentation process that separates food from non-food instances, coupled with two-stage self-attention Transformer decoder. Our model outperforms the current state-of-the-art method, with an average percent MAE of 17.2% on Nutrition5k, an RGB-D food image dataset with calories, mass, and three macronutrients annotated. Our study also focuses on the significance of the food and background regions for calorie, mass, and nutrient estimation. We analyze the impact of non-food regions on each estimation task, with results suggesting that background information is crucial for calorie, mass, and carbohydrate estimation but not as essential for protein and fat estimation. The qualitative results also show that the model attends to regions with a high corresponding nutritional value. Implementation codes and pre-trained models are provided at https://github.com/Oatsty/nutrition5k.
科研通智能强力驱动
Strongly Powered by AbleSci AI