ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.11593
  4. Cited By
Improving Image Captioning Descriptiveness by Ranking and LLM-based
  Fusion

Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion

20 June 2023
Simone Bianco
Luigi Celona
Marco Donzella
Paolo Napoletano
ArXivPDFHTML

Papers citing "Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion"

16 / 16 papers shown
Title
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Ruotian Peng
Haiying He
Yake Wei
Yandong Wen
D. Hu
VLM
39
0
0
09 Apr 2025
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
Yi Nian
Shenzhe Zhu
Yuehan Qin
Li Li
Ziyi Wang
Chaowei Xiao
Yue Zhao
32
0
0
03 Apr 2025
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection
Yibo Yan
Shen Wang
Jiahao Huo
Philip S. Yu
Xuming Hu
Qingsong Wen
200
6
0
23 Mar 2025
Knowledge Bridger: Towards Training-free Missing Multi-modality Completion
Knowledge Bridger: Towards Training-free Missing Multi-modality Completion
Guanzhou Ke
Shengfeng He
Xinyu Wang
Bo Wang
Guoqing Chao
Yuyao Zhang
Yi Xie
HeXing Su
73
0
0
27 Feb 2025
Image Embedding Sampling Method for Diverse Captioning
Image Embedding Sampling Method for Diverse Captioning
Sania Waheed
Na Min An
62
0
0
14 Feb 2025
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia
Mingxing Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Zeang Sheng
Botian Shi
Tao Chen
Junchi Yan
Bo Zhang
91
8
0
16 Dec 2024
Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics
Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics
Sara Ghazanfari
Siddharth Garg
Nicolas Flammarion
Prashanth Krishnamurthy
Farshad Khorrami
Francesco Croce
VLM
100
0
0
13 Dec 2024
Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual
  Concepts?
Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?
Shailaja Keyur Sampat
Maitreya Patel
Yezhou Yang
Chitta Baral
26
0
0
17 Oct 2024
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li
Xuewen Liu
Dongrong Fu
Jianquan Li
Qingyi Gu
Kurt Keutzer
Zhen Dong
EGVM
VGen
DiffM
95
1
0
26 Aug 2024
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge
Fangyin Wei
Siddharth Gururani
Nayeon Lee
Xuan Li
Huayu Chen
CoGe
DiffM
35
14
0
30 Apr 2024
Inserting Faces inside Captions: Image Captioning with Attention Guided
  Merging
Inserting Faces inside Captions: Image Captioning with Attention Guided Merging
Yannis Tevissen
Khalil Guetari
Marine Tassel
Erwan Kerleroux
Frédéric Petitpont
48
0
0
20 Mar 2024
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Mingsheng Li
Xin Chen
C. Zhang
Sijin Chen
Erik Cambria
Fukun Yin
Gang Yu
Tao Chen
38
24
0
17 Dec 2023
Learning Distinct and Representative Styles for Image Captioning
Learning Distinct and Representative Styles for Image Captioning
Qi Chen
Chaorui Deng
Qi Wu
VLM
45
23
0
17 Sep 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
392
4,171
0
28 Jan 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
202
406
0
13 Jul 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
277
525
0
04 Feb 2021
1