ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1411.5726
  4. Cited By
CIDEr: Consensus-based Image Description Evaluation

CIDEr: Consensus-based Image Description Evaluation

20 November 2014
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
ArXivPDFHTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,142 papers shown
Title
RoViST:Learning Robust Metrics for Visual Storytelling
RoViST:Learning Robust Metrics for Visual Storytelling
Eileen Wang
S. Han
Josiah Poon
30
7
0
08 May 2022
Attract me to Buy: Advertisement Copywriting Generation with Multimodal
  Multi-structured Information
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information
Zhipeng Zhang
Xinglin Hou
K. Niu
Zhongzhen Huang
T. Ge
Yuning Jiang
Qi Wu
Peifeng Wang
31
4
0
07 May 2022
Language Models Can See: Plugging Visual Controls in Text Generation
Language Models Can See: Plugging Visual Controls in Text Generation
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLM
MLLM
62
97
0
05 May 2022
Towards Robust and Semantically Organised Latent Representations for
  Unsupervised Text Style Transfer
Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer
Vivian Lai
Ruijia Cheng
Wenjuan Zhang
30
13
0
04 May 2022
Tragedy Plus Time: Capturing Unintended Human Activities from
  Weakly-labeled Videos
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos
Arnav Chakravarthy
Zhiyuan Fang
Yezhou Yang
40
2
0
28 Apr 2022
Controllable Image Captioning
Luka Maxwell
38
0
0
28 Apr 2022
CapOnImage: Context-driven Dense-Captioning on Image
CapOnImage: Context-driven Dense-Captioning on Image
Yiqi Gao
Xinglin Hou
Yuanmeng Zhang
T. Ge
Yuning Jiang
Peifeng Wang
33
10
0
27 Apr 2022
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo
  and Text
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
Pinaki Nath Chowdhury
A. Bhunia
Aneeshan Sain
Subhadeep Koley
Tao Xiang
Yi-Zhe Song
45
29
0
25 Apr 2022
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce
  Data Annotation Required in Visual Commonsense Tasks
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks
Navid Rezaei
Marek Reformat
VLM
17
2
0
25 Apr 2022
Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds
Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds
Heng Wang
Chaoyi Zhang
Jianhui Yu
Weidong (Tom) Cai
3DPC
25
39
0
22 Apr 2022
Automated Audio Captioning using Audio Event Clues
Automated Audio Captioning using Audio Event Clues
Aycsegul Ozkaya Eren
M. Sert
26
0
0
18 Apr 2022
Caption Feature Space Regularization for Audio Captioning
Caption Feature Space Regularization for Audio Captioning
Yiming Zhang
Hong Yu
Ruoyi Du
Zhanyu Ma
Yuan Dong
21
1
0
18 Apr 2022
End-to-end Dense Video Captioning as Sequence Generation
End-to-end Dense Video Captioning as Sequence Generation
Wanrong Zhu
Bo Pang
Ashish V. Thapliyal
William Yang Wang
Radu Soricut
DiffM
19
32
0
18 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for
  Vision-and-Language Tasks
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
22
43
0
16 Apr 2022
Image Captioning In the Transformer Age
Image Captioning In the Transformer Age
Yangliu Xu
Li Li
Haiyang Xu
Songfang Huang
Fei Huang
Jianfei Cai
ViT
27
5
0
15 Apr 2022
Video Captioning: a comparative review of where we are and which could
  be the route
Video Captioning: a comparative review of where we are and which could be the route
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
32
11
0
12 Apr 2022
On Distinctive Image Captioning via Comparing and Reweighting
On Distinctive Image Captioning via Comparing and Reweighting
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
51
16
0
08 Apr 2022
Hierarchical Self-supervised Representation Learning for Movie
  Understanding
Hierarchical Self-supervised Representation Learning for Movie Understanding
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
46
24
0
06 Apr 2022
LAMNER: Code Comment Generation Using Character Language Model and Named
  Entity Recognition
LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition
Rishab Sharma
Fuxiang Chen
Fatemeh H. Fard
55
2
0
05 Apr 2022
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
LRM
NAI
38
20
0
05 Apr 2022
Learning Audio-Video Modalities from Image Captions
Learning Audio-Video Modalities from Image Captions
Arsha Nagrani
Paul Hongsuck Seo
Bryan Seybold
Anja Hauth
Santiago Manén
Chen Sun
Cordelia Schmid
CLIP
24
83
0
01 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
83
575
0
01 Apr 2022
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
Yuxuan Wang
Difei Gao
Licheng Yu
Stan Weixian Lei
Matt Feiszli
Mike Zheng Shou
17
24
0
01 Apr 2022
CREATE: A Benchmark for Chinese Short Video Retrieval and Title
  Generation
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation
Ziqi Zhang
Yuxin Chen
Zongyang Ma
Zhongang Qi
Chunfen Yuan
Bing Li
Ying Shan
Weiming Hu
VGen
32
8
0
31 Mar 2022
Counterfactual Cycle-Consistent Learning for Instruction Following and
  Generation in Vision-Language Navigation
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation
Hongru Wang
Wei Liang
Jianbing Shen
Luc Van Gool
Wenguan Wang
40
55
0
30 Mar 2022
End to End Lip Synchronization with a Temporal AutoEncoder
End to End Lip Synchronization with a Temporal AutoEncoder
Yoav Shalev
Lior Wolf
28
7
0
30 Mar 2022
Interactive Audio-text Representation for Automated Audio Captioning
  with Contrastive Learning
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
Chen Chen
Nana Hou
Yuchen Hu
Heqing Zou
Xiaofeng Qi
Chng Eng Siong
VLM
26
21
0
29 Mar 2022
Quantifying Societal Bias Amplification in Image Captioning
Quantifying Societal Bias Amplification in Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
24
48
0
29 Mar 2022
End-to-End Transformer Based Model for Image Captioning
End-to-End Transformer Based Model for Image Captioning
Yiyu Wang
Jungang Xu
Yingfei Sun
VLM
ViT
28
117
0
29 Mar 2022
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External
  Knowledge
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge
D. Vo
Hong Chen
Akihiro Sugimoto
Hideki Nakayama
19
13
0
28 Mar 2022
Visual Abductive Reasoning
Visual Abductive Reasoning
Chen Liang
Wenguan Wang
Tianfei Zhou
Yi Yang
LRM
26
38
0
26 Mar 2022
CICERO: A Dataset for Contextualized Commonsense Inference in Dialogues
CICERO: A Dataset for Contextualized Commonsense Inference in Dialogues
Deepanway Ghosal
Siqi Shen
Navonil Majumder
Rada Mihalcea
Soujanya Poria
35
51
0
25 Mar 2022
Linking Emergent and Natural Languages via Corpus Transfer
Linking Emergent and Natural Languages via Corpus Transfer
Shunyu Yao
Mo Yu
Yang Zhang
Karthik Narasimhan
J. Tenenbaum
Chuang Gan
29
15
0
24 Mar 2022
Affective Feedback Synthesis Towards Multimodal Text and Image Data
Affective Feedback Synthesis Towards Multimodal Text and Image Data
Puneet Kumar
Gaurav Bhatt
Omkar Ingle
Daksh Goyal
Balasubramanian Raman
EGVM
38
3
0
23 Mar 2022
Improving Meta-learning for Low-resource Text Classification and
  Generation via Memory Imitation
Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation
Ying Zhao
Zhiliang Tian
Huaxiu Yao
Yinhe Zheng
Dongkyu Lee
Yiping Song
Jian Sun
N. Zhang
32
20
0
22 Mar 2022
LocATe: End-to-end Localization of Actions in 3D with Transformers
LocATe: End-to-end Localization of Actions in 3D with Transformers
Jiankai Sun
Bolei Zhou
Michael J. Black
Arjun Chandrasekaran
64
8
0
21 Mar 2022
M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source
  Code Summarization
M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization
Yuexiu Gao
Chen Lyu
16
31
0
18 Mar 2022
DU-VLG: Unifying Vision-and-Language Generation via Dual
  Sequence-to-Sequence Pre-training
DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training
Luyang Huang
Guocheng Niu
Jiachen Liu
Xinyan Xiao
Hua Wu
VLM
CoGe
19
7
0
17 Mar 2022
K-VQG: Knowledge-aware Visual Question Generation for Common-sense
  Acquisition
K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition
Kohei Uehara
Tatsuya Harada
86
10
0
15 Mar 2022
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Chengpeng Dai
Fuhai Chen
Xiaoshuai Sun
Rongrong Ji
QiXiang Ye
Yongjian Wu
22
1
0
13 Mar 2022
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization
Shankar Kanthara
Rixie Tiffany Ko Leong
Xiang Lin
Ahmed Masry
Megh Thakkar
Enamul Hoque
Chenyu You
27
137
0
12 Mar 2022
Taking an Emotional Look at Video Paragraph Captioning
Taking an Emotional Look at Video Paragraph Captioning
Qinyu Li
Tengpeng Li
Hanli Wang
Changan Chen
24
4
0
12 Mar 2022
REX: Reasoning-aware and Grounded Explanation
REX: Reasoning-aware and Grounded Explanation
Shi Chen
Qi Zhao
30
18
0
11 Mar 2022
Knowledge-enriched Attention Network with Group-wise Semantic for Visual
  Storytelling
Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling
Tengpeng Li
Hanli Wang
Bin He
Changan Chen
DiffM
30
9
0
10 Mar 2022
StyleBabel: Artistic Style Tagging and Captioning
StyleBabel: Artistic Style Tagging and Captioning
Dan Ruta
Andrew Gilbert
Pranav Aggarwal
Naveen Marri
Ajinkya Kale
...
Hailin Jin
Baldo Faieta
Alex Filipkowski
Zhe Lin
John Collomosse
32
12
0
10 Mar 2022
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yu-Gang Jiang
3DPC
21
47
0
10 Mar 2022
NLX-GPT: A Model for Natural Language Explanations in Vision and
  Vision-Language Tasks
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Fawaz Sammani
Tanmoy Mukherjee
Nikos Deligiannis
MILM
ELM
LRM
32
67
0
09 Mar 2022
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Changdae Oh
Junhyuk So
Hoyoon Byun
Yongtaek Lim
Minchul Shin
Jong-June Jeon
Kyungwoo Song
38
26
0
08 Mar 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual
  Concept Recognition
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
Peipei Zhu
Tianlin Li
Yong Luo
Zhenglong Sun
Wei-Shi Zheng
Yaowei Wang
Chen Chen
32
12
0
07 Mar 2022
Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models
Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models
Shengnan An
Yifei Li
Zeqi Lin
Qian Liu
Bei Chen
Qiang Fu
Weizhu Chen
Nanning Zheng
Jian-Guang Lou
VLM
AAML
52
40
0
07 Mar 2022
Previous
123...232425...414243
Next