Lv1
90 积分 2025-11-13 加入
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
3天前
已完结
MSG-CLIP: Enhancing CLIP’s ability to learn fine-grained structural associations through multi-modal scene graph alignment
1个月前
已完结
MSG-CLIP: Enhancing CLIP’s ability to learn fine-grained structural associations through multi-modal scene graph alignment
1个月前
已关闭
MSG-CLIP: Enhancing CLIP’s ability to learn fine-grained structural associations through multi-modal scene graph alignment
2个月前
已关闭
Investigating Compositional Challenges in Vision-Language Models for Visual Grounding
2个月前
已完结
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
2个月前
已关闭
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
3个月前
已完结
Remote sensing scene graph generation for improved retrieval based on spatial relationships
3个月前
已完结
Bootstrapping Interactive Image–Text Alignment for Remote Sensing Image Captioning
3个月前
已完结
Regression Test cases selection using Natural Language Processing
6个月前
已完结