v1v2 (latest)

CIDEr: Consensus-based Image Description Evaluation

20 November 2014

Ramakrishna Vedantam

C. L. Zitnick

Devi Parikh

ArXiv (abs)PDF HTML

Papers citing "CIDEr: Consensus-based Image Description Evaluation"

50 / 2,184 papers shown

Title
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows Jianqiao Zhao Yanyang Li Wanyu Du Yangfeng Ji Dong Yu Michael R. Lyu Liwei Wang 72 4 0 14 Feb 2022
I-Tuning: Tuning Frozen Language Models with Image for Lightweight Image Captioning Ziyang Luo Zhipeng Hu Yadong Xi Rongsheng Zhang Jing Ma VLM 52 14 0 14 Feb 2022
Describing image focused in cognitive and visual details for visually impaired people: An approach to generating inclusive paragraphs Daniel Louzada Fernandes Marcos Henrique Fonseca Ribeiro F. Cerqueira Michel Melo Silva 37 7 0 10 Feb 2022
Image Difference Captioning with Pre-training and Contrastive Learning Linli Yao Weiying Wang Qin Jin SSL VLM 81 43 0 09 Feb 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models Jaemin Cho Abhaysinh Zala Joey Tianyi Zhou ViT 258 193 0 08 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework Peng Wang An Yang Rui Men Junyang Lin Shuai Bai Zhikang Li Jianxin Ma Chang Zhou Jingren Zhou Hongxia Yang MLLM ObjD 258 884 0 07 Feb 2022
Webly Supervised Concept Expansion for General Purpose Vision Models Amita Kamath Christopher Clark Tanmay Gupta Eric Kolve Derek Hoiem Aniruddha Kembhavi VLM 97 55 0 04 Feb 2022
Joint Speech Recognition and Audio Captioning Chaitanya Narisetty E. Tsunoo Xuankai Chang Yosuke Kashiwagi Michael Hentschel Shinji Watanabe 47 10 0 03 Feb 2022
Deep Learning Approaches on Image Captioning: A Review Taraneh Ghandi H. Pourreza H. Mahyar VLM 136 101 0 31 Jan 2022
A Frustratingly Simple Approach for End-to-End Image Captioning Ziyang Luo Yadong Xi Rongsheng Zhang Jing Ma VLM MLLM 79 16 0 30 Jan 2022
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human Assessment Luis Lebron Yvette Graham Kevin McGuinness K. Kouramas Noel E. O'Connor 89 3 0 25 Jan 2022
Transformers in Medical Imaging: A Survey Fahad Shamshad Salman Khan Syed Waqas Zamir Muhammad Haris Khan Munawar Hayat Fahad Shahbaz Khan Huazhu Fu ViT LM&MA MedIm 197 712 0 24 Jan 2022
Improving Chest X-Ray Report Generation by Leveraging Warm Starting Aaron Nicolson Jason Dowling Bevan Koopman ViT LM&MA MedIm 101 97 0 24 Jan 2022
WIDAR -- Weighted Input Document Augmented ROUGE Raghav Jain Vaibhav Mavi Anubhav Jangra S. Saha 59 4 0 23 Jan 2022
End-to-end Generative Pretraining for Multimodal Video Captioning Paul Hongsuck Seo Arsha Nagrani Anurag Arnab Cordelia Schmid 76 170 0 20 Jan 2022
Instance-aware Prompt Learning for Language Understanding and Generation Feihu Jin Jinliang Lu Jiajun Zhang Chengqing Zong 57 33 0 18 Jan 2022
What Makes the Story Forward? Inferring Commonsense Explanations as Prompts for Future Event Generation Li Lin Yixin Cao Lifu Huang Shuang Li Xuming Hu Lijie Wen Jianmin Wang AI4TS 79 16 0 18 Jan 2022
Prior Knowledge Enhances Radiology Report Generation Song Wang Liyan Tang Mingquan Lin George Shih Ying Ding Yifan Peng MedIm 65 24 0 11 Jan 2022
Local Information Assisted Attention-free Decoder for Audio Captioning Feiyang Xiao Jian Guan Haiyan Lan Qiaoxi Zhu Wenwu Wang 98 11 0 10 Jan 2022
Glance and Focus Networks for Dynamic Visual Recognition Gao Huang Yulin Wang Kangchen Lv Haojun Jiang Wenhui Huang Pengfei Qi S. Song 3DH 150 50 0 09 Jan 2022
Compact Bidirectional Transformer for Image Captioning Yuanen Zhou Zhenzhen Hu Daqing Liu Huixia Ben Meng Wang VLM 67 17 0 06 Jan 2022
All You Need In Sign Language Production R. Rastgoo Kourosh Kiani Sergio Escalera V. Athitsos Mohammad Sabokrou 51 8 0 05 Jan 2022
StyleM: Stylized Metrics for Image Captioning Built with Contrastive N-grams Chengxi Li Brent Harrison 110 3 0 04 Jan 2022
Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment Shuxin Yang Xian Wu Shen Ge S.Kevin Zhou Li Xiao MedIm 70 97 0 30 Dec 2021
Knowledge Matters: Radiology Report Generation with General and Specific Knowledge Shuxin Yang Xian Wu Shen Ge S.Kevin Zhou Li Xiao MedIm 91 120 0 30 Dec 2021
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation Philipp Harzig Moritz Einfalt Rainer Lienhart ViT 63 2 0 28 Dec 2021
Multimodal Image Synthesis and Editing: The Generative AI Era Fangneng Zhan Yingchen Yu Rongliang Wu Jiahui Zhang Shijian Lu Lingjie Liu Adam Kortylewski Christian Theobalt Eric Xing EGVM 198 51 0 27 Dec 2021
ScanQA: 3D Question Answering for Spatial Scene Understanding Daich Azuma Taiki Miyanishi Shuhei Kurita M. Kawanabe 106 208 0 20 Dec 2021
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics Ximing Lu Sean Welleck Peter West Liwei Jiang Jungo Kasai ... Lianhui Qin Youngjae Yu Rowan Zellers Noah A. Smith Yejin Choi 70 165 0 16 Dec 2021
Dense Video Captioning Using Unsupervised Semantic Information Valter Estevam Rayson Laroca Hélio Pedrini David Menotti 93 10 0 15 Dec 2021
KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation Xin Liu Dayiheng Liu Baosong Yang Haibo Zhang Junwei Ding Wenqing Yao Weihua Luo Haiying Zhang Jinsong Su LRM 61 8 0 15 Dec 2021
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising Jianjie Luo Yehao Li Yingwei Pan Ting Yao Hongyang Chao Tao Mei VLM 74 42 0 14 Dec 2021
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning Wenqiao Zhang Haochen Shi Jiannan Guo Shengyu Zhang Qingpeng Cai Juncheng Li Sihui Luo Yueting Zhuang DiffM 100 46 0 13 Dec 2021
Contextualized Scene Imagination for Generative Commonsense Reasoning Peifeng Wang Jonathan Zamora Junfeng Liu Filip Ilievski Muhao Chen Xiang Ren ReLM LRM 101 16 0 12 Dec 2021
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation Tianyi Liu Zuxuan Wu Wenhan Xiong Jingjing Chen Yu-Gang Jiang VLM MLLM 88 10 0 10 Dec 2021
Injecting Semantic Concepts into End-to-End Image Captioning Zhiyuan Fang Jianfeng Wang Xiaowei Hu Lin Liang Zhe Gan Lijuan Wang Yezhou Yang Zicheng Liu ViT VLM 86 91 0 09 Dec 2021
Self-Supervised Image-to-Text and Text-to-Image Synthesis Anindya Sundar Das S. Saha SSL 28 5 0 09 Dec 2021
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand Jungo Kasai Keisuke Sakaguchi Ronan Le Bras Lavinia Dunagan Jacob Morrison Alexander R. Fabbri Yejin Choi Noah A. Smith 101 40 0 08 Dec 2021
Search and Learn: Improving Semantic Coverage for Data-to-Text Generation Shailza Jolly Zi Xuan Zhang Andreas Dengel Lili Mou 75 11 0 06 Dec 2021
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation Pierre Colombo Chloe Clave Pablo Piantanida 134 44 0 02 Dec 2021
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding Dave Zhenyu Chen Qirui Wu Matthias Nießner Angel X. Chang 81 32 0 02 Dec 2021
Object-Centric Unsupervised Image Captioning Zihang Meng David Yang Xuefei Cao Ashish Shah Ser-Nam Lim OCL VLM 80 12 0 02 Dec 2021
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter Bang-ju Yang Tong Zhang Yuexian Zou CLIP 70 20 0 30 Nov 2021
Neural Attention for Image Captioning: Review of Outstanding Methods Zanyar Zohourianshahzadi Jugal Kalita VLM 95 47 0 29 Nov 2021
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Yoad Tewel Yoav Shalev Idan Schwartz Lior Wolf VLM 122 197 0 29 Nov 2021
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning Kevin Qinghong Lin Linjie Li Chung-Ching Lin Faisal Ahmed Zhe Gan Zicheng Liu Yumao Lu Lijuan Wang ViT 85 247 0 25 Nov 2021
Less is More: Generating Grounded Navigation Instructions from Landmarks Su Wang Ceslee Montgomery Jordi Orbay Vighnesh Birodkar Aleksandra Faust Izzeddin Gur Natasha Jaques Austin Waters Jason Baldridge Peter Anderson 135 64 0 25 Nov 2021
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets Marcella Cornia Lorenzo Baraldi G. Fiameni Rita Cucchiara 109 12 0 24 Nov 2021
Hierarchical Modular Network for Video Captioning Hanhua Ye Guorong Li Yuankai Qi Shuhui Wang Qingming Huang Ming-Hsuan Yang 127 70 0 24 Nov 2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Zhengyuan Yang Zhe Gan Jianfeng Wang Xiaowei Hu Faisal Ahmed Zicheng Liu Yumao Lu Lijuan Wang 146 117 0 23 Nov 2021