Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,119 papers shown
Title
Towards Unifying Reference Expression Generation and Comprehension
Duo Zheng
Tao Kong
Ya Jing
Jiaan Wang
Xiaojie Wang
ObjD
64
6
0
24 Oct 2022
Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction
Yue Yang
Artemis Panagopoulou
Marianna Apidianaki
Mark Yatskar
Chris Callison-Burch
113
2
0
24 Oct 2022
When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks
Ankur Sikarwar
Arkil Patel
Navin Goyal
ViT
98
11
0
23 Oct 2022
McQueen: a Benchmark for Multimodal Conversational Query Rewrite
Yifei Yuan
Chen Shi
Runze Wang
Liyi Chen
Feijun Jiang
Yuan You
W. Lam
41
6
0
23 Oct 2022
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination
Yue Yang
Wenlin Yao
Hongming Zhang
Xiaoyang Wang
Dong Yu
Jianshu Chen
VLM
99
22
0
21 Oct 2022
SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation
Zekun Li
Jina Kim
Yao-Yi Chiang
Muhao Chen
133
32
0
21 Oct 2022
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?
Mitja Nikolaus
Emmanuelle Salin
Stéphane Ayache
Abdellah Fourtassi
Benoit Favre
88
14
0
21 Oct 2022
Dissecting Deep Metric Learning Losses for Image-Text Retrieval
Hong Xuan
Xi Chen
73
2
0
21 Oct 2022
Communication breakdown: On the low mutual intelligibility between human and neural captioning
Roberto Dessì
Eleonora Gualdoni
Francesca Franzon
Gemma Boleda
Marco Baroni
VLM
120
6
0
20 Oct 2022
Image-Text Retrieval with Binary and Continuous Label Supervision
Zheng Li
Caili Guo
Zerun Feng
Lei Li
Ying Jin
Yufeng Zhang
VLM
73
4
0
20 Oct 2022
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation
Yu Zhao
Jianguo Wei
Zhichao Lin
Yueheng Sun
Meishan Zhang
Hao Fei
79
16
0
20 Oct 2022
Prompting through Prototype: A Prototype-based Prompt Learning on Pretrained Vision-Language Models
Yue Zhang
Hongliang Fei
Dingcheng Li
Tan Yu
Ping Li
VPVLM
VLM
71
9
0
19 Oct 2022
Grounded Video Situation Recognition
Zeeshan Khan
C. V. Jawahar
Makarand Tapaswi
102
14
0
19 Oct 2022
VTC: Improving Video-Text Retrieval with User Comments
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
123
8
0
19 Oct 2022
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation
Pengfei Li
Beiwen Tian
Yongliang Shi
Xiaoxue Chen
Hao Zhao
Guyue Zhou
Ya Zhang
125
22
0
19 Oct 2022
LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation
Hongcheng Guo
Jiaheng Liu
Haoyang Huang
Jian Yang
Zhoujun Li
Dongdong Zhang
Zheng Cui
Furu Wei
93
22
0
19 Oct 2022
Dense but Efficient VideoQA for Intricate Compositional Reasoning
Jihyeon Janel Lee
Wooyoung Kang
Eun-Sol Kim
CoGe
59
4
0
19 Oct 2022
Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering
Jialin Wu
Raymond J. Mooney
RALM
140
11
0
18 Oct 2022
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
Zifeng Wang
Zhenbang Wu
Dinesh Agarwal
Jimeng Sun
CLIP
VLM
MedIm
138
436
0
18 Oct 2022
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective
Zheng Ma
Shi Zong
Mianzhi Pan
Jianbing Zhang
Shujian Huang
Xinyu Dai
Jiajun Chen
61
4
0
18 Oct 2022
Contrastive Language-Image Pre-Training with Knowledge Graphs
Xuran Pan
Tianzhu Ye
Dongchen Han
S. Song
Gao Huang
VLM
CLIP
83
54
0
17 Oct 2022
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
A. M. H. Tiong
Junnan Li
Boyang Albert Li
Silvio Savarese
Guosheng Lin
MLLM
133
109
0
17 Oct 2022
COFAR: Commonsense and Factual Reasoning in Image Search
Prajwal Gatti
A. S. Penamakuri
Revant Teotia
Anand Mishra
Shubhashis Sengupta
Roshni Ramnani
ReLM
LRM
39
4
0
16 Oct 2022
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning
Tiannan Wang
Wangchunshu Zhou
Yan Zeng
Xinsong Zhang
VLM
82
44
0
14 Oct 2022
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Wenliang Dai
Zihan Liu
Ziwei Ji
Jane Polak Scowcroft
Pascale Fung
MLLM
VLM
101
67
0
14 Oct 2022
MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human Activity Recognition
Ziqi Gao
Yuntao wang
Jianguo Chen
Junliang Xing
Shwetak N. Patel
Xin Liu
Yuanchun Shi
70
3
0
14 Oct 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
Mojtaba Valipour
Mehdi Rezagholizadeh
I. Kobyzev
A. Ghodsi
170
185
0
14 Oct 2022
Can Language Representation Models Think in Bets?
Zhi–Bin Tang
Mayank Kejriwal
55
6
0
14 Oct 2022
SQA3D: Situated Question Answering in 3D Scenes
Xiaojian Ma
Silong Yong
Zilong Zheng
Qing Li
Yitao Liang
Song-Chun Zhu
Siyuan Huang
LM&Ro
97
160
0
14 Oct 2022
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting
Oscar Manas
Pau Rodríguez López
Saba Ahmadi
Aida Nematzadeh
Yash Goyal
Aishwarya Agrawal
VLM
VPVLM
65
51
0
13 Oct 2022
OpenCQA: Open-ended Question Answering with Charts
Shankar Kantharaj
Do Xuan Long
Rixie Tiffany Ko Leong
J. Tan
Enamul Hoque
Shafiq Joty
85
53
0
12 Oct 2022
One does not fit all! On the Complementarity of Vision Encoders for Vision and Language Tasks
Gregor Geigle
Chen Cecilia Liu
Jonas Pfeiffer
Iryna Gurevych
VLM
72
1
0
12 Oct 2022
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
Qiming Peng
Yinxu Pan
Wenjin Wang
Bin Luo
Zhenyu Zhang
...
Shi Feng
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
83
83
0
12 Oct 2022
Hate-CLIPper: Multimodal Hateful Meme Classification based on Cross-modal Interaction of CLIP Features
Gokul Karthik Kumar
Karthik Nandakumar
VLM
CLIP
97
66
0
12 Oct 2022
Understanding Embodied Reference with Touch-Line Transformer
Yongqian Li
Xiaoxue Chen
Hao Zhao
Jiangtao Gong
Guyue Zhou
Federico Rossano
Yixin Zhu
177
17
0
11 Oct 2022
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model
Yatai Ji
Junjie Wang
Yuan Gong
Lin Zhang
Yan Zhu
Hongfa Wang
Jiaxing Zhang
Tetsuya Sakai
Yujiu Yang
MLLM
82
33
0
11 Oct 2022
Generating Executable Action Plans with Environmentally-Aware Language Models
Maitrey Gramopadhye
D. Szafir
LM&Ro
LLMAG
104
24
0
10 Oct 2022
Transformer-based Localization from Embodied Dialog with Large-scale Pre-training
Meera Hahn
James M. Rehg
LM&Ro
104
4
0
10 Oct 2022
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
90
22
0
09 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
105
2
0
08 Oct 2022
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
Wanrong Zhu
An Yan
Yujie Lu
Wenda Xu
Xinze Wang
Miguel P. Eckstein
William Yang Wang
137
36
0
07 Oct 2022
VLSNR:Vision-Linguistics Coordination Time Sequence-aware News Recommendation
Songhao Han
Wei-Ping Huang
Xiaotian Luan Beihang University
AI4TS
81
3
0
06 Oct 2022
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
Wenhu Chen
Hexiang Hu
Xi Chen
Pat Verga
William W. Cohen
RALM
102
160
0
06 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
119
19
0
05 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data
Panos Achlioptas
M. Ovsjanikov
Leonidas Guibas
Sergey Tulyakov
109
12
0
04 Oct 2022
Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes
Ke Shen
Mayank Kejriwal
89
4
0
03 Oct 2022
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach
Georgios Tziafas
Hamidreza Kasaei
LM&Ro
101
3
0
03 Oct 2022
Multimodal Analogical Reasoning over Knowledge Graphs
Ningyu Zhang
Lei Li
Xiang Chen
Xiaozhuan Liang
Shumin Deng
Huajun Chen
149
28
0
01 Oct 2022
Construction and Evaluation of a Self-Attention Model for Semantic Understanding of Sentence-Final Particles
Shuhei Mandokoro
N. Oka
Akane Matsushima
Chie Fukada
Yuko Yoshimura
Koji Kawahara
Kazuaki Tanaka
54
1
0
01 Oct 2022
A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering
Xiaofei Huang
Hongfang Gong
MedIm
111
14
0
01 Oct 2022
Previous
1
2
3
...
21
22
23
...
41
42
43
Next