ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.4555
  4. Cited By
Show and Tell: A Neural Image Caption Generator

Show and Tell: A Neural Image Caption Generator

17 November 2014
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
    3DV
ArXivPDFHTML

Papers citing "Show and Tell: A Neural Image Caption Generator"

50 / 2,022 papers shown
Title
Analyzing Quality, Bias, and Performance in Text-to-Image Generative
  Models
Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models
Nila Masrourisaadat
Nazanin Sedaghatkish
Fatemeh Sarshartehrani
Edward A. Fox
37
6
0
28 Jun 2024
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables
  Open-World Instruction Following Agents
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents
Zihao Wang
Shaofei Cai
Zhancun Mu
Haowei Lin
Ceyao Zhang
Xuejie Liu
Qing Li
Guy Van den Broeck
Xiaojian Ma
Yitao Liang
LM&Ro
46
12
0
27 Jun 2024
Enhancing Scientific Figure Captioning Through Cross-modal Learning
Enhancing Scientific Figure Captioning Through Cross-modal Learning
Mateo Alejandro Rojas
Rafael Carranza
44
0
0
24 Jun 2024
Reading Is Believing: Revisiting Language Bottleneck Models for Image
  Classification
Reading Is Believing: Revisiting Language Bottleneck Models for Image Classification
Honori Udo
Takafumi Koshinaka
VLM
43
0
0
22 Jun 2024
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning
Panagiotis Kaliosis
John Pavlopoulos
Foivos Charalampakos
Georgios Moschovis
Ion Androutsopoulos
MedIm
26
1
0
20 Jun 2024
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative
  Image Caption Enrichment
From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment
Yusuke Hirota
Ryo Hachiuma
Chao-Han Huck Yang
Yuta Nakashima
VLM
43
3
0
20 Jun 2024
Disturbing Image Detection Using LMM-Elicited Emotion Embeddings
Disturbing Image Detection Using LMM-Elicited Emotion Embeddings
Maria Tzelepi
Vasileios Mezaris
18
3
0
18 Jun 2024
Do More Details Always Introduce More Hallucinations in LVLM-based Image
  Captioning?
Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?
Mingqian Feng
Yunlong Tang
Zeliang Zhang
Chenliang Xu
42
3
0
18 Jun 2024
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A
  Survey
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey
Hao Yang
Yanyan Zhao
Yang Wu
Shilong Wang
Tian Zheng
Hongbo Zhang
Zongyang Ma
Wanxiang Che
Bing Qin
42
8
0
12 Jun 2024
Tell Me What's Next: Textual Foresight for Generic UI Representations
Tell Me What's Next: Textual Foresight for Generic UI Representations
Andrea Burns
Kate Saenko
Bryan A. Plummer
LM&Ro
AI4TS
36
4
0
12 Jun 2024
Stealthy Targeted Backdoor Attacks against Image Captioning
Stealthy Targeted Backdoor Attacks against Image Captioning
Wenshu Fan
Hongwei Li
Wenbo Jiang
Meng Hao
Shui Yu
Xiao Zhang
DiffM
27
6
0
09 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
40
13
0
08 Jun 2024
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
OCCAM: Towards Cost-Efficient and Accuracy-Aware Classification Inference
Dujian Ding
Bicheng Xu
L. Lakshmanan
VLM
44
1
0
06 Jun 2024
Exploiting LMM-based knowledge for image classification tasks
Exploiting LMM-based knowledge for image classification tasks
Maria Tzelepi
Vasileios Mezaris
VLM
40
3
0
05 Jun 2024
Story Generation from Visual Inputs: Techniques, Related Tasks, and
  Challenges
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
Daniel A. P. Oliveira
Eugénio Ribeiro
David Martins de Matos
VGen
31
3
0
04 Jun 2024
Ultrasound Report Generation with Cross-Modality Feature Alignment via
  Unsupervised Guidance
Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance
Jun Li
Tongkun Su
Baoliang Zhao
Faqin Lv
Qiong Wang
Nassir Navab
Yin Hu
Zhongliang Jiang
MedIm
18
3
0
02 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
34
0
0
01 Jun 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias
  Towards Vision-Language Tasks
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
31
0
0
27 May 2024
ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with
  LLM-Enhanced Cardiological Text
ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text
Han Yu
Peikun Guo
Akane Sano
34
16
0
26 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
43
0
23 May 2024
A Survey on Multi-modal Machine Translation: Tasks, Methods and
  Challenges
A Survey on Multi-modal Machine Translation: Tasks, Methods and Challenges
Huangjun Shen
Liangying Shao
Wenbo Li
Zhibin Lan
Zhanyu Liu
Jinsong Su
41
2
0
21 May 2024
Resolving Word Vagueness with Scenario-guided Adapter for Natural
  Language Inference
Resolving Word Vagueness with Scenario-guided Adapter for Natural Language Inference
Y. Liu
Mengyu Li
Di Liang
Ximing Li
Fausto Giunchiglia
Lan Huang
Xiaoyue Feng
Renchu Guan
39
3
0
21 May 2024
Automated Multi-level Preference for MLLMs
Automated Multi-level Preference for MLLMs
Mengxi Zhang
Wenhao Wu
Yu Lu
Yuxin Song
Kang Rong
...
Jianbo Zhao
Fanglong Liu
Yifan Sun
Haocheng Feng
Jingdong Wang
MLLM
69
10
0
18 May 2024
ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal
  Image Dataset
ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset
Johannes Ruckert
Louise Bloch
Raphael Brüngel
Ahmad Idrissi-Yaghir
Henning Schafer
...
A. G. S. D. Herrera
Henning Müller
Peter A. Horn
F. Nensa
Christoph M. Friedrich
42
26
0
16 May 2024
Contextual Emotion Recognition using Large Vision Language Models
Contextual Emotion Recognition using Large Vision Language Models
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
VLM
73
3
0
14 May 2024
Topicwise Separable Sentence Retrieval for Medical Report Generation
Topicwise Separable Sentence Retrieval for Medical Report Generation
Junting Zhao
Yang Zhou
Zhihao Chen
Huazhu Fu
Liang Wan
MedIm
25
1
0
07 May 2024
Compressed Image Captioning using CNN-based Encoder-Decoder Framework
Compressed Image Captioning using CNN-based Encoder-Decoder Framework
Md Alif
Mahmudul Hasan
Shovon Bhowmick
50
1
0
28 Apr 2024
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision
  Language Models
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
M. Kapadnis
Sohan Patnaik
Abhilash Nandy
Sourjyadip Ray
Pawan Goyal
Debdoot Sheet
VLM
33
3
0
27 Apr 2024
Exploring the Distinctiveness and Fidelity of the Descriptions Generated
  by Large Vision-Language Models
Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models
Yuhang Huang
Zihan Wu
Chongyang Gao
Jiawei Peng
Xu Yang
32
2
0
26 Apr 2024
Self-supervised visual learning in the low-data regime: a comparative
  evaluation
Self-supervised visual learning in the low-data regime: a comparative evaluation
Sotirios Konstantakos
Despina Ioanna Chalkiadaki
Ioannis Mademlis
Yuki M. Asano
E. Gavves
Georgios Th. Papadopoulos
42
6
0
26 Apr 2024
MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed
  3D Human Motions
MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human Motions
Sheng Yan
Mengyuan Liu
Yong Wang
Yang Liu
Chong Chen
Hong Liu
46
0
0
21 Apr 2024
Transfer Learning for Molecular Property Predictions from Small Data
  Sets
Transfer Learning for Molecular Property Predictions from Small Data Sets
Thorren Kirschbaum
A. Bande
AI4CE
24
1
0
20 Apr 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for
  Live Video Commenting
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViT
VGen
34
4
0
19 Apr 2024
SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For
  Pre-trained Models
SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models
Jiangyi Deng
Shengyuan Pang
Yanjiao Chen
Liangming Xia
Yijie Bai
Haiqin Weng
Wenyuan Xu
AAML
41
6
0
19 Apr 2024
Binder: Hierarchical Concept Representation through Order Embedding of
  Binary Vectors
Binder: Hierarchical Concept Representation through Order Embedding of Binary Vectors
Croix Gyurek
Niloy Talukder
Mohammad A. Hasan
30
2
0
16 Apr 2024
LaDiC: Are Diffusion Models Really Inferior to Autoregressive
  Counterparts for Image-to-Text Generation?
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
Yuchi Wang
Shuhuai Ren
Rundong Gao
Linli Yao
Qingyan Guo
Kaikai An
Jianhong Bai
Xu Sun
DiffM
VLM
49
6
0
16 Apr 2024
SoundingActions: Learning How Actions Sound from Narrated Egocentric
  Videos
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen
Kumar Ashutosh
Rohit Girdhar
David Harwath
Kristen Grauman
EgoV
SSL
28
6
0
08 Apr 2024
Bi-LORA: A Vision-Language Approach for Synthetic Image Detection
Bi-LORA: A Vision-Language Approach for Synthetic Image Detection
Mamadou Keita
W. Hamidouche
Hessen Bougueffa Eutamene
Abdenour Hadid
Abdelmalik Taleb-Ahmed
69
7
0
02 Apr 2024
Memory-based Cross-modal Semantic Alignment Network for Radiology Report
  Generation
Memory-based Cross-modal Semantic Alignment Network for Radiology Report Generation
Yitian Tao
Liyan Ma
Jing Yu
Han Zhang
MedIm
34
6
0
31 Mar 2024
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Bu Jin
Yupeng Zheng
Pengfei Li
Weize Li
Yuhang Zheng
...
Kun Zhan
Peng Jia
Xiaoxiao Long
Yilun Chen
Hao Zhao
3DV
79
15
0
28 Mar 2024
Text Data-Centric Image Captioning with Interactive Prompts
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
38
0
0
28 Mar 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
41
0
0
26 Mar 2024
Image Captioning in news report scenario
Image Captioning in news report scenario
Tianrui Liu
Qi Cai
Changxin Xu
Bo Hong
Jize Xiong
Yuxin Qiao
Tsungwei Yang
38
11
0
24 Mar 2024
Can 3D Vision-Language Models Truly Understand Natural Language?
Can 3D Vision-Language Models Truly Understand Natural Language?
Weipeng Deng
Jihan Yang
Runyu Ding
Jiahui Liu
Yijiang Li
Xiaojuan Qi
Edith C.H. Ngai
39
4
0
21 Mar 2024
VL-Mamba: Exploring State Space Models for Multimodal Learning
VL-Mamba: Exploring State Space Models for Multimodal Learning
Yanyuan Qiao
Zheng Yu
Longteng Guo
Sihan Chen
Zijia Zhao
Mingzhen Sun
Qi Wu
Jing Liu
Mamba
43
65
0
20 Mar 2024
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling
Daniel Duenias
Brennan Nichyporuk
Tal Arbel
Tammy Riklin-Raviv
42
3
0
20 Mar 2024
From Pixels to Insights: A Survey on Automatic Chart Understanding in
  the Era of Large Foundation Models
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
Kung-Hsiang Huang
Hou Pong Chan
Yi R. Fung
Haoyi Qiu
Mingyang Zhou
Chenyu You
Shih-Fu Chang
Chenhui Xu
AI4TS
66
14
0
18 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
54
10
0
12 Mar 2024
Enhancing Image Caption Generation Using Reinforcement Learning with
  Human Feedback
Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback
L. AdarshN
V. ArunP
L. AravindhN
29
1
0
11 Mar 2024
How to Understand Named Entities: Using Common Sense for News Captioning
How to Understand Named Entities: Using Common Sense for News Captioning
Ning Xu
Yanhui Wang
Tingting Zhang
Hongshuo Tian
Mohan Kankanhalli
An-An Liu
34
0
0
11 Mar 2024
Previous
123456...394041
Next