Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1812.08658
Cited By
v1
v2
v3 (latest)
nocaps: novel object captioning at scale
20 December 2018
Harsh Agrawal
Karan Desai
Yufei Wang
Xinlei Chen
Rishabh Jain
Mark Johnson
Dhruv Batra
Devi Parikh
Stefan Lee
Peter Anderson
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"nocaps: novel object captioning at scale"
50 / 57 papers shown
Title
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
317
9
0
17 Apr 2025
ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion
Rana Muhammad Shahroz Khan
Dongwen Tang
Pingzhi Li
Kai Wang
Tianlong Chen
AI4CE
520
1
0
31 Mar 2025
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
Hao Yin
Guangzong Si
Zilei Wang
406
1
0
17 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
195
3
0
04 Mar 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
148
9
0
21 Feb 2025
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao
Wenqi Pei
Yifei Tao
Haiyang Mei
Mike Zheng Shou
125
0
0
20 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
333
7
0
12 Feb 2025
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
Andrew D. Bagdanov
CLIP
VLM
163
5
0
06 Feb 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
Shivalika Singh
Nakul Sharma
Manish Gupta
Anand Mishra
138
1
0
28 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
244
134
0
10 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
332
59
0
03 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
137
0
0
03 Jan 2025
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
199
0
0
04 Dec 2024
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
217
1
0
03 Dec 2024
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
D. Song
Sicheng Lai
Shunian Chen
Lichao Sun
Benyou Wang
453
1
0
06 Nov 2024
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
Zifeng Zhu
Mengzhao Jia
Zizhuo Zhang
Lang Li
Meng Jiang
LRM
125
5
0
18 Oct 2024
Can We Predict Performance of Large Models across Vision-Language Tasks?
Qinyu Zhao
Ming Xu
Kartik Gupta
Akshay Asthana
Liang Zheng
Stephen Gould
102
0
0
14 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
185
37
0
04 Oct 2024
Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu
Weiyang Liu
Haiwen Feng
Zhen Liu
Tim Z. Xiao
Katherine M. Collins
J. Tenenbaum
Adrian Weller
Michael J. Black
Bernhard Schölkopf
121
14
0
15 Aug 2024
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Shengkang Wang
Hongzhan Lin
Ziyang Luo
Zhen Ye
Guang Chen
Jing Ma
144
4
0
17 Jun 2024
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Zicheng Zhang
H. Wu
Chunyi Li
Yingjie Zhou
Wei Sun
Xiongkuo Min
Zijian Chen
Xiaohong Liu
Weisi Lin
Guangtao Zhai
EGVM
125
18
0
05 Jun 2024
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
Hongzhan Lin
Ziyang Luo
Bo Wang
Ruichao Yang
Jing Ma
108
31
0
03 Jan 2024
EvalAI: Towards Better Evaluation Systems for AI Agents
Deshraj Yadav
Rishabh Jain
Harsh Agrawal
Prithvijit Chattopadhyay
Taranjeet Singh
Akash Jain
Shivkaran Singh
Stefan Lee
Dhruv Batra
ELM
67
56
0
10 Feb 2019
PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track
Takuya Akiba
Tommi Kerola
Yusuke Niitani
Toru Ogawa
Shotaro Sano
Shuji Suzuki
61
20
0
04 Sep 2018
Partially-Supervised Image Captioning
Peter Anderson
Stephen Gould
Mark Johnson
75
32
0
15 Jun 2018
Do CIFAR-10 Classifiers Generalize to CIFAR-10?
Benjamin Recht
Rebecca Roelofs
Ludwig Schmidt
Vaishaal Shankar
OOD
FedML
ELM
184
414
0
01 Jun 2018
Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling
Luheng He
Kenton Lee
Omer Levy
Luke Zettlemoyer
134
189
0
12 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,201
0
20 Apr 2018
Decoupled Novel Object Captioner
Yuehua Wu
Linchao Zhu
Lu Jiang
Yi Yang
72
63
0
11 Apr 2018
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
237
436
0
27 Mar 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
235
11,569
0
15 Feb 2018
Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
VLM
76
147
0
17 Aug 2017
Extreme clicking for efficient object annotation
Dim P. Papadopoulos
J. Uijlings
Frank Keller
V. Ferrari
71
246
0
09 Aug 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
131
4,224
0
25 Jul 2017
Domain Adaptation for Visual Applications: A Comprehensive Survey
G. Csurka
OOD
90
507
0
17 Feb 2017
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Jiasen Lu
Caiming Xiong
Devi Parikh
R. Socher
134
1,456
0
06 Dec 2016
Guided Open Vocabulary Image Captioning with Constrained Beam Search
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
84
237
0
02 Dec 2016
Improved Image Captioning via Policy Gradient optimization of SPIDEr
Siqi Liu
Zhenhai Zhu
Ning Ye
S. Guadarrama
Kevin Patrick Murphy
168
446
0
01 Dec 2016
Speed/accuracy trade-offs for modern convolutional object detectors
Jonathan Huang
V. Rathod
Chen Sun
Menglong Zhu
Anoop Korattikara Balan
...
Ian S. Fischer
Z. Wojna
Yang Song
S. Guadarrama
Kevin Patrick Murphy
3DH
3DV
108
2,573
0
30 Nov 2016
Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images
Junhua Mao
Jiajing Xu
Yushi Jing
Alan Yuille
46
48
0
24 Nov 2016
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
527
10,358
0
16 Nov 2016
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
108
1,921
0
29 Jul 2016
Captioning Images with Diverse Objects
Subhashini Venugopalan
Lisa Anne Hendricks
Marcus Rohrbach
Raymond J. Mooney
Trevor Darrell
Kate Saenko
VLM
76
178
0
24 Jun 2016
Rich Image Captioning in the Wild
Kenneth Tran
Xiaodong He
Lei Zhang
Jian Sun
Cornelia Carapcea
Chris Thrasher
Chris Buehler
Chris Sienkiewicz
VLM
60
124
0
30 Mar 2016
We don't need no bounding-boxes: Training object class detectors using only human verification
Dim P. Papadopoulos
J. Uijlings
Frank Keller
V. Ferrari
VLM
ObjD
68
138
0
26 Feb 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
237
5,766
0
23 Feb 2016
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
Lisa Anne Hendricks
Subhashini Venugopalan
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
Trevor Darrell
CoGe
78
284
0
17 Nov 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
543
62,477
0
04 Jun 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
242
2,497
0
01 Apr 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
352
10,091
0
10 Feb 2015
1
2
Next