ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.4555
  4. Cited By
Show and Tell: A Neural Image Caption Generator

Show and Tell: A Neural Image Caption Generator

17 November 2014
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
    3DV
ArXivPDFHTML

Papers citing "Show and Tell: A Neural Image Caption Generator"

50 / 2,022 papers shown
Title
Transformer based Multitask Learning for Image Captioning and Object
  Detection
Transformer based Multitask Learning for Image Captioning and Object Detection
Debolena Basak
P. K. Srijith
M. Desarkar
24
1
0
10 Mar 2024
HistGen: Histopathology Report Generation via Local-Global Feature
  Encoding and Cross-modal Context Interaction
HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction
Zhengrui Guo
Jiabo Ma
Ying Xu
Yihui Wang
Liansheng Wang
Hao Chen
50
18
0
08 Mar 2024
Rule-driven News Captioning
Rule-driven News Captioning
Ning Xu
Tingting Zhang
Hongshuo Tian
An-An Liu
65
0
0
08 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
39
14
0
06 Mar 2024
VIXEN: Visual Text Comparison Network for Image Difference Captioning
VIXEN: Visual Text Comparison Network for Image Difference Captioning
Alexander Black
Jing Shi
Yifei Fai
Tu Bui
John Collomosse
47
5
0
29 Feb 2024
Vision Language Model-based Caption Evaluation Method Leveraging Visual
  Context Extraction
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction
Koki Maeda
Shuhei Kurita
Taiki Miyanishi
Naoaki Okazaki
40
2
0
28 Feb 2024
ArcSin: Adaptive ranged cosine Similarity injected noise for
  Language-Driven Visual Tasks
ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks
Yang Liu
Xiaomin Yu
Gongyu Zhang
Christos Bergeles
Prokar Dasgupta
Alejandro Granados
Sebastien Ourselin
48
2
0
27 Feb 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
38
3
0
19 Feb 2024
Momentor: Advancing Video Large Language Model with Fine-Grained
  Temporal Reasoning
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian
Juncheng Billy Li
Yu-hao Wu
Yaobo Ye
Hao Fei
Tat-Seng Chua
Yueting Zhuang
Siliang Tang
MLLM
LRM
60
47
0
18 Feb 2024
Aligning Modalities in Vision Large Language Models via Preference
  Fine-tuning
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Yiyang Zhou
Chenhang Cui
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
VLM
MLLM
38
89
0
18 Feb 2024
Question-Instructed Visual Descriptions for Zero-Shot Video Question
  Answering
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
David Romero
Thamar Solorio
109
4
0
16 Feb 2024
Arbitrary Polynomial Separations in Trainable Quantum Machine Learning
Arbitrary Polynomial Separations in Trainable Quantum Machine Learning
Eric R. Anschuetz
Xun Gao
45
4
0
13 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
371
0
09 Feb 2024
Text-Guided Image Clustering
Text-Guided Image Clustering
Andreas Stephan
Lukas Miklautz
Kevin Sidak
Jan Philip Wahle
Bela Gipp
Claudia Plant
Benjamin Roth
11
4
0
05 Feb 2024
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web
  Tasks
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Jing Yu Koh
Robert Lo
Lawrence Jang
Vikram Duvvur
Ming Chong Lim
Po-Yu Huang
Graham Neubig
Shuyan Zhou
Ruslan Salakhutdinov
Daniel Fried
23
0
0
24 Jan 2024
Unsupervised Learning of Graph from Recipes
Unsupervised Learning of Graph from Recipes
Aissatou Diallo
Antonis Bikakis
Luke Dickens
Anthony Hunter
Rob Miller
SSL
17
0
0
22 Jan 2024
On the Audio Hallucinations in Large Audio-Video Language Models
On the Audio Hallucinations in Large Audio-Video Language Models
Taichi Nishimura
Shota Nakada
Masayoshi Kondo
VLM
30
5
0
18 Jan 2024
KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
Anh-Cuong Pham
Van-Quang Nguyen
Thi-Hong Vuong
Quang-Thuy Ha
29
1
0
16 Jan 2024
Object-oriented backdoor attack against image captioning
Object-oriented backdoor attack against image captioning
Meiling Li
Nan Zhong
Xinpeng Zhang
Zhenxing Qian
Sheng Li
13
8
0
05 Jan 2024
ChartAssisstant: A Universal Chart Multimodal Language Model via
  Chart-to-Table Pre-training and Multitask Instruction Tuning
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
Fanqing Meng
Wenqi Shao
Quanfeng Lu
Peng Gao
Kaipeng Zhang
Yu Qiao
Ping Luo
31
46
0
04 Jan 2024
SFGANS Self-supervised Future Generator for human ActioN Segmentation
SFGANS Self-supervised Future Generator for human ActioN Segmentation
Or Berman
Adam Goldbraikh
S. Laufer
24
0
0
31 Dec 2023
ChartBench: A Benchmark for Complex Visual Reasoning in Charts
ChartBench: A Benchmark for Complex Visual Reasoning in Charts
Zhengzhuo Xu
Sinan Du
Yiyan Qi
Chengjin Xu
Chun Yuan
Jian Guo
37
36
0
26 Dec 2023
Medical Report Generation based on Segment-Enhanced Contrastive
  Representation Learning
Medical Report Generation based on Segment-Enhanced Contrastive Representation Learning
Ruoqing Zhao
Xi Wang
Hongliang Dai
Pan Gao
Piji Li
MedIm
29
3
0
26 Dec 2023
Semantic Draw Engineering for Text-to-Image Creation
Semantic Draw Engineering for Text-to-Image Creation
Yang Li
Huaqiang Jiang
Yangkai Wu
29
1
0
23 Dec 2023
Cycle-Consistency Learning for Captioning and Grounding
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang
Jiajun Deng
Mingbo Jia
ObjD
42
7
0
23 Dec 2023
Continual Learning: Forget-free Winning Subnetworks for Video
  Representations
Continual Learning: Forget-free Winning Subnetworks for Video Representations
Haeyong Kang
Jaehong Yoon
Sung Ju Hwang
Chang D. Yoo
CLL
39
2
0
19 Dec 2023
UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic
  Cross-modal Learnable Prompts
UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic Cross-modal Learnable Prompts
Chenlu Zhan
Yufei Zhang
Yu Lin
Gaoang Wang
Hongwei Wang
VLM
MedIm
33
5
0
18 Dec 2023
Image Content Generation with Causal Reasoning
Image Content Generation with Causal Reasoning
Xiaochuan Li
Baoyu Fan
Runze Zhang
Liang Jin
Di Wang
Zhenhua Guo
Yaqian Zhao
Rengang Li
LRM
83
6
0
12 Dec 2023
DiffuVST: Narrating Fictional Scenes with Global-History-Guided
  Denoising Models
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models
Shengguang Wu
Mei Yuan
Qi Su
DiffM
17
0
0
12 Dec 2023
RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning
RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning
Jiashuo Fan
Yaoyuan Liang
Leyao Liu
Shao-Lun Huang
Lei Zhang
30
2
0
11 Dec 2023
PixLore: A Dataset-driven Approach to Rich Image Captioning
PixLore: A Dataset-driven Approach to Rich Image Captioning
Diego Bonilla
VLM
14
3
0
08 Dec 2023
User-Aware Prefix-Tuning is a Good Learner for Personalized Image
  Captioning
User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning
Xuan Wang
Guanhong Wang
Wenhao Chai
Jiayu Zhou
Gaoang Wang
37
4
0
08 Dec 2023
Object Recognition as Next Token Prediction
Object Recognition as Next Token Prediction
Kaiyu Yue
Borchun Chen
Jonas Geiping
Hengduo Li
Tom Goldstein
Ser-Nam Lim
40
9
0
04 Dec 2023
MedXChat: A Unified Multimodal Large Language Model Framework towards
  CXRs Understanding and Generation
MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation
Ling Yang
Zhanyu Wang
Zhenghao Chen
Xinyu Liang
Luping Zhou
LM&MA
MedIm
58
6
0
04 Dec 2023
Enhancing Image Captioning with Neural Models
Enhancing Image Captioning with Neural Models
Pooja Bhatnagar
Sai Mrunaal
Sachin Kamnure
VLM
42
0
0
01 Dec 2023
C-SAW: Self-Supervised Prompt Learning for Image Generalization in
  Remote Sensing
C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing
Avigyan Bhattacharya
Mainak Singha
Ankit Jha
Biplab Banerjee
SSL
VLM
28
6
0
27 Nov 2023
WsiCaption: Multiple Instance Generation of Pathology Reports for
  Gigapixel Whole-Slide Images
WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images
Pingyi Chen
Honglin Li
Chenglu Zhu
Sunyi Zheng
Zhongyi Shui
Lin Yang
26
7
0
27 Nov 2023
DECap: Towards Generalized Explicit Caption Editing via Diffusion
  Mechanism
DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
Zhen Wang
Xinyun Jiang
Jun Xiao
Tao Chen
Long Chen
DiffM
30
1
0
25 Nov 2023
A Systematic Review of Deep Learning-based Research on Radiology Report
  Generation
A Systematic Review of Deep Learning-based Research on Radiology Report Generation
Chang Liu
Yuanhe Tian
Yan Song
MedIm
34
15
0
23 Nov 2023
Scalable AI Generative Content for Vehicular Network Semantic
  Communication
Scalable AI Generative Content for Vehicular Network Semantic Communication
Hao Feng
Yi Yang
Zhu Han
24
4
0
23 Nov 2023
Rethinking Radiology Report Generation via Causal Reasoning and
  Counterfactual Augmentation
Rethinking Radiology Report Generation via Causal Reasoning and Counterfactual Augmentation
Xiao Song
Jiafan Liu
Yun Li
Wenbin Lei
Ruxin Wang
CML
29
0
0
22 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
43
255
0
21 Nov 2023
Violet: A Vision-Language Model for Arabic Image Captioning with Gemini
  Decoder
Violet: A Vision-Language Model for Arabic Image Captioning with Gemini Decoder
Abdelrahman Mohamed
Fakhraddin Alwajih
El Moatez Billah Nagoudi
Alcides Alcoba Inciarte
Muhammad Abdul-Mageed
VLM
MLLM
30
7
0
15 Nov 2023
Improving Image Captioning via Predicting Structured Concepts
Improving Image Captioning via Predicting Structured Concepts
Ting Wang
Weidong Chen
Yuanhe Tian
Yan Song
Zhendong Mao
34
8
0
14 Nov 2023
Multi Sentence Description of Complex Manipulation Action Videos
Multi Sentence Description of Complex Manipulation Action Videos
Fatemeh Ziaeetabar
Reza Safabakhsh
S. Momtazi
M. Tamosiunaite
F. Worgotter
31
1
0
13 Nov 2023
GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot
  Learning
GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning
Guangyue Xu
Joyce Chai
Parisa Kordjamshidi
VLM
23
16
0
09 Nov 2023
Newvision: application for helping blind people using deep learning
Newvision: application for helping blind people using deep learning
Kumar Srinivas
Bobba
Vamsi Krishna
Surendra Bolla
Dinesh Bugga
19
0
0
05 Nov 2023
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation
  Protocols
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols
Iqra Qasim
Alexander Horsch
Dilip K. Prasad
22
6
0
05 Nov 2023
Complex Organ Mask Guided Radiology Report Generation
Complex Organ Mask Guided Radiology Report Generation
Tiancheng Gu
Dongnan Liu
Zhiyuan Li
Weidong Cai
MedIm
27
14
0
04 Nov 2023
RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence
  Learning
RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence Learning
Ziyu Wang
Wenhao Jiang
Zixuan Zhang
Wei Tang
Junchi Yan
21
0
0
03 Nov 2023
Previous
12345...394041
Next