ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation
v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXiv (abs)PDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,184 papers shown
Title
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce
  Data Annotation Required in Visual Commonsense Tasks
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks
Navid Rezaei
Marek Reformat
VLM
50
2
0
25 Apr 2022
Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds
Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds
Heng Wang
Chaoyi Zhang
Jianhui Yu
Weidong (Tom) Cai
3DPC
115
39
0
22 Apr 2022
Automated Audio Captioning using Audio Event Clues
Automated Audio Captioning using Audio Event Clues
Aycsegul Ozkaya Eren
M. Sert
56
0
0
18 Apr 2022
Caption Feature Space Regularization for Audio Captioning
Caption Feature Space Regularization for Audio Captioning
Yiming Zhang
Hong Yu
Ruoyi Du
Zhanyu Ma
Yuan Dong
122
1
0
18 Apr 2022
End-to-end Dense Video Captioning as Sequence Generation
End-to-end Dense Video Captioning as Sequence Generation
Wanrong Zhu
Bo Pang
Ashish V. Thapliyal
William Yang Wang
Radu Soricut
DiffM
56
34
0
18 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for
  Vision-and-Language Tasks
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
64
47
0
16 Apr 2022
Image Captioning In the Transformer Age
Image Captioning In the Transformer Age
Yangliu Xu
Li Li
Haiyang Xu
Songfang Huang
Fei Huang
Jianfei Cai
ViT
59
6
0
15 Apr 2022
Video Captioning: a comparative review of where we are and which could
  be the route
Video Captioning: a comparative review of where we are and which could be the route
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
61
11
0
12 Apr 2022
On Distinctive Image Captioning via Comparing and Reweighting
On Distinctive Image Captioning via Comparing and Reweighting
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
91
16
0
08 Apr 2022
Hierarchical Self-supervised Representation Learning for Movie
  Understanding
Hierarchical Self-supervised Representation Learning for Movie Understanding
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
94
26
0
06 Apr 2022
LAMNER: Code Comment Generation Using Character Language Model and Named
  Entity Recognition
LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition
Rishab Sharma
Fuxiang Chen
Fatemeh H. Fard
86
4
0
05 Apr 2022
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
LRMNAI
102
20
0
05 Apr 2022
Learning Audio-Video Modalities from Image Captions
Learning Audio-Video Modalities from Image Captions
Arsha Nagrani
Paul Hongsuck Seo
Bryan Seybold
Anja Hauth
Santiago Manén
Chen Sun
Cordelia Schmid
CLIP
93
86
0
01 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLMLRM
166
589
0
01 Apr 2022
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
Yuxuan Wang
Difei Gao
Licheng Yu
Stan Weixian Lei
Matt Feiszli
Mike Zheng Shou
113
25
0
01 Apr 2022
CREATE: A Benchmark for Chinese Short Video Retrieval and Title
  Generation
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation
Ziqi Zhang
Yuxin Chen
Zongyang Ma
Zhongang Qi
Chunfen Yuan
Bing Li
Ying Shan
Weiming Hu
VGen
58
8
0
31 Mar 2022
Counterfactual Cycle-Consistent Learning for Instruction Following and
  Generation in Vision-Language Navigation
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation
Hongru Wang
Wei Liang
Jianbing Shen
Luc Van Gool
Wenguan Wang
97
58
0
30 Mar 2022
End to End Lip Synchronization with a Temporal AutoEncoder
End to End Lip Synchronization with a Temporal AutoEncoder
Yoav Shalev
Lior Wolf
41
7
0
30 Mar 2022
Interactive Audio-text Representation for Automated Audio Captioning
  with Contrastive Learning
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
Chen Chen
Nana Hou
Yuchen Hu
Heqing Zou
Xiaofeng Qi
Chng Eng Siong
VLM
84
21
0
29 Mar 2022
Quantifying Societal Bias Amplification in Image Captioning
Quantifying Societal Bias Amplification in Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
76
48
0
29 Mar 2022
End-to-End Transformer Based Model for Image Captioning
End-to-End Transformer Based Model for Image Captioning
Yiyu Wang
Jungang Xu
Yingfei Sun
VLMViT
64
125
0
29 Mar 2022
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External
  Knowledge
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge
D. Vo
Hong Chen
Akihiro Sugimoto
Hideki Nakayama
132
14
0
28 Mar 2022
Visual Abductive Reasoning
Visual Abductive Reasoning
Chen Liang
Wenguan Wang
Tianfei Zhou
Yi Yang
LRM
92
40
0
26 Mar 2022
CICERO: A Dataset for Contextualized Commonsense Inference in Dialogues
CICERO: A Dataset for Contextualized Commonsense Inference in Dialogues
Deepanway Ghosal
Siqi Shen
Navonil Majumder
Rada Mihalcea
Soujanya Poria
105
54
0
25 Mar 2022
Linking Emergent and Natural Languages via Corpus Transfer
Linking Emergent and Natural Languages via Corpus Transfer
Shunyu Yao
Mo Yu
Yang Zhang
Karthik Narasimhan
J. Tenenbaum
Chuang Gan
86
17
0
24 Mar 2022
Affective Feedback Synthesis Towards Multimodal Text and Image Data
Affective Feedback Synthesis Towards Multimodal Text and Image Data
Puneet Kumar
Gaurav Bhatt
Omkar Ingle
Daksh Goyal
Balasubramanian Raman
EGVM
63
4
0
23 Mar 2022
Improving Meta-learning for Low-resource Text Classification and
  Generation via Memory Imitation
Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation
Ying Zhao
Zhiliang Tian
Huaxiu Yao
Yinhe Zheng
Dongkyu Lee
Yiping Song
Jian Sun
N. Zhang
70
20
0
22 Mar 2022
LocATe: End-to-end Localization of Actions in 3D with Transformers
LocATe: End-to-end Localization of Actions in 3D with Transformers
Jiankai Sun
Bolei Zhou
Michael J. Black
Arjun Chandrasekaran
143
8
0
21 Mar 2022
M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source
  Code Summarization
M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization
Yuexiu Gao
Chen Lyu
50
38
0
18 Mar 2022
DU-VLG: Unifying Vision-and-Language Generation via Dual
  Sequence-to-Sequence Pre-training
DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training
Luyang Huang
Guocheng Niu
Jiachen Liu
Xinyan Xiao
Hua Wu
VLMCoGe
56
8
0
17 Mar 2022
K-VQG: Knowledge-aware Visual Question Generation for Common-sense
  Acquisition
K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition
Kohei Uehara
Tatsuya Harada
98
10
0
15 Mar 2022
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Chengpeng Dai
Fuhai Chen
Xiaoshuai Sun
Rongrong Ji
QiXiang Ye
Yongjian Wu
79
1
0
13 Mar 2022
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization
Shankar Kanthara
Rixie Tiffany Ko Leong
Xiang Lin
Ahmed Masry
Megh Thakkar
Enamul Hoque
Shafiq Joty
121
150
0
12 Mar 2022
Taking an Emotional Look at Video Paragraph Captioning
Taking an Emotional Look at Video Paragraph Captioning
Qinyu Li
Tengpeng Li
Hanli Wang
Changan Chen
47
5
0
12 Mar 2022
REX: Reasoning-aware and Grounded Explanation
REX: Reasoning-aware and Grounded Explanation
Shi Chen
Qi Zhao
91
18
0
11 Mar 2022
Knowledge-enriched Attention Network with Group-wise Semantic for Visual
  Storytelling
Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling
Tengpeng Li
Hanli Wang
Bin He
Changan Chen
DiffM
88
10
0
10 Mar 2022
StyleBabel: Artistic Style Tagging and Captioning
StyleBabel: Artistic Style Tagging and Captioning
Dan Ruta
Andrew Gilbert
Pranav Aggarwal
Naveen Marri
Ajinkya Kale
...
Hailin Jin
Baldo Faieta
Alex Filipkowski
Zhe Lin
John Collomosse
64
12
0
10 Mar 2022
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yu-Gang Jiang
3DPC
115
48
0
10 Mar 2022
NLX-GPT: A Model for Natural Language Explanations in Vision and
  Vision-Language Tasks
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Fawaz Sammani
Tanmoy Mukherjee
Nikos Deligiannis
MILMELMLRM
138
68
0
09 Mar 2022
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Changdae Oh
Junhyuk So
Hoyoon Byun
Yongtaek Lim
Minchul Shin
Jong-June Jeon
Kyungwoo Song
139
30
0
08 Mar 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual
  Concept Recognition
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
Peipei Zhu
Tianlin Li
Yong Luo
Zhenglong Sun
Wei-Shi Zheng
Yaowei Wang
Chen Chen
102
12
0
07 Mar 2022
Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models
Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models
Shengnan An
Yifei Li
Zeqi Lin
Qian Liu
Bei Chen
Qiang Fu
Weizhu Chen
Nanning Zheng
Jian-Guang Lou
VLMAAML
93
43
0
07 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
Leveraging Pre-trained BERT for Audio Captioning
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
115
30
0
06 Mar 2022
RACE: Retrieval-Augmented Commit Message Generation
RACE: Retrieval-Augmented Commit Message Generation
Ensheng Shi
Yanlin Wang
Wei Tao
Lun Du
Hongyu Zhang
Shi Han
Dongmei Zhang
Hongbin Sun
VLM
73
44
0
05 Mar 2022
FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in
  Context
FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context
Pinaki Nath Chowdhury
Aneeshan Sain
A. Bhunia
Tao Xiang
Yulia Gryaditskaya
Yi-Zhe Song
3DV
93
54
0
04 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large
  Models
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TSVLM
83
37
0
03 Mar 2022
A Deep Neural Framework for Image Caption Generation Using GRU-Based
  Attention Mechanism
A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism
Rashid Khan
Shujah Islam
Khadija Kanwal
Mansoor Iqbal
Md. Imran Hossain
Z. Ye
3DV
35
18
0
03 Mar 2022
COLD Decoding: Energy-based Constrained Text Generation with Langevin
  Dynamics
COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics
Lianhui Qin
Sean Welleck
Daniel Khashabi
Yejin Choi
AI4CE
126
152
0
23 Feb 2022
Exploiting long-term temporal dynamics for video captioning
Exploiting long-term temporal dynamics for video captioning
Yuyu Guo
Jingqiu Zhang
Lianli Gao
53
18
0
22 Feb 2022
CaMEL: Mean Teacher Learning for Image Captioning
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViTVLM
84
30
0
21 Feb 2022
Previous
123...242526...424344
Next