Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,119 papers shown
Title
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval
Youbo Lei
Feifei He
Chen Chen
Yingbin Mo
Sijia Li
Defeng Xie
H. Lu
VLM
108
0
0
30 Oct 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
70
3
0
30 Oct 2023
MOSEL: Inference Serving Using Dynamic Modality Selection
Bodun Hu
Le Xu
Jeongyoon Moon
N. Yadwadkar
Aditya Akella
62
4
0
27 Oct 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Alan Yuille
CoGe
107
14
0
27 Oct 2023
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages
Mohammad Akbari
Saeed Ranjbar Alvar
Behnam Kamranian
Amin Banitalebi-Dehkordi
Yong Zhang
AI4CE
46
0
0
26 Oct 2023
Learning Temporal Sentence Grounding From Narrated EgoVideos
Kevin Flanagan
Dima Damen
Michael Wray
69
3
0
26 Oct 2023
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments
Mengxue Qu
Yu-Huan Wu
Wu Liu
Xiaodan Liang
Jingkuan Song
Yao-Min Zhao
Yunchao Wei
45
17
0
26 Oct 2023
M2C: Towards Automatic Multimodal Manga Complement
Hongcheng Guo
Boyang Wang
Jiaqi Bai
Jiaheng Liu
Jian Yang
Zhoujun Li
94
10
0
26 Oct 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
73
0
0
25 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
84
10
0
25 Oct 2023
V
D
\mathbb{VD}
VD
-
G
R
\mathbb{GR}
GR
: Boosting
V
\mathbb{V}
V
isual
D
\mathbb{D}
D
ialog with Cascaded Spatial-Temporal Multi-Modal
G
R
\mathbb{GR}
GR
aphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
61
4
0
25 Oct 2023
Learning to Explain: A Model-Agnostic Framework for Explaining Black Box Models
Oren Barkan
Yuval Asher
Amit Eshel
Yehonatan Elisha
Noam Koenigstein
77
5
0
25 Oct 2023
Emergent Communication in Interactive Sketch Question Answering
Zixing Lei
Yiming Zhang
Yuxin Xiong
Siheng Chen
83
2
0
24 Oct 2023
Multimodal Representations for Teacher-Guided Compositional Visual Reasoning
Wafa Aissa
Marin Ferecatu
M. Crucianu
LRM
70
0
0
24 Oct 2023
Deep Integrated Explanations
Oren Barkan
Yehonatan Elisha
Jonathan Weill
Yuval Asher
Amit Eshel
Noam Koenigstein
FAtt
109
7
0
23 Oct 2023
Hallucination Detection for Grounded Instruction Generation
Lingjun Zhao
Khanh Nguyen
Hal Daumé
HILM
83
7
0
23 Oct 2023
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models
Xinyi Chen
Raquel Fernández
Sandro Pezzelle
VLM
62
10
0
23 Oct 2023
M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis
Fei Zhao
Chunhui Li
Zhen Wu
Yawen Ouyang
Jianbing Zhang
Xinyu Dai
92
20
0
23 Oct 2023
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
Baohao Liao
Michael Kozielski
Sanjika Hewavitharana
Jiangbo Yuan
Shahram Khadivi
Tomer Lancewicki
SSL
32
0
0
22 Oct 2023
Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation
Anastasia Kritharoula
Maria Lymperaiou
Giorgos Stamou
96
6
0
21 Oct 2023
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Siyu Zhang
Ye-Ting Chen
Fang Wang
Yaoru Sun
Jun Yang
Lizhi Bai
SSL
68
0
0
20 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
115
36
0
20 Oct 2023
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
Yuduo Wang
Pedram Ghamisi
68
6
0
19 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
69
1
0
18 Oct 2023
BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification
Mithun Das
Animesh Mukherjee
73
7
0
18 Oct 2023
NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning
Haowei Wang
Jiayi Ji
Tianyu Guo
Yilong Yang
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
98
5
0
17 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
87
7
0
17 Oct 2023
Towards Training-free Open-world Segmentation via Image Prompt Foundation Models
Lv Tang
Peng-Tao Jiang
Haoke Xiao
Bo Li
VLM
94
11
0
17 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
174
125
0
16 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
70
2
0
15 Oct 2023
Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering
Shuwen Yang
Anran Wu
Xingjiao Wu
Luwei Xiao
Tianlong Ma
Cheng Jin
Liang He
69
4
0
15 Oct 2023
Penetrative AI: Making LLMs Comprehend the Physical World
Huatao Xu
Liying Han
Qirui Yang
Mo Li
Mani Srivastava
77
62
0
14 Oct 2023
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
Jiayi Ji
Haowei Wang
Changli Wu
Yiwei Ma
Xiaoshuai Sun
Rongrong Ji
112
1
0
14 Oct 2023
Mapping Memes to Words for Multimodal Hateful Meme Classification
Giovanni Burbi
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
A. Bimbo
63
19
0
12 Oct 2023
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Jingru Gan
Xinzhe Han
Shuhui Wang
Qingming Huang
81
0
0
12 Oct 2023
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Yueming Lyu
Kang Zhao
Bo Peng
H. Chen
Yue Jiang
Yingya Zhang
Jing Dong
Caifeng Shan
83
2
0
12 Oct 2023
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Che Liu
Sibo Cheng
Miaojing Shi
Anand Shah
Wenjia Bai
Rossella Arcucci
94
27
0
11 Oct 2023
I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction
Yusheng Huang
Zhouhan Lin
62
5
0
10 Oct 2023
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling
Haogeng Liu
Qihang Fan
Tingkai Liu
Linjie Yang
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
VGen
57
12
0
08 Oct 2023
Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift
Yihao Xue
Siddharth Joshi
Dang Nguyen
Baharan Mirzasoleiman
VLM
76
4
0
08 Oct 2023
SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval
Sunjae Yoon
Gwanhyeong Koo
Dahyun Kim
Changdong Yoo
93
12
0
08 Oct 2023
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova
Anna Kukleva
Xudong Hong
Christian Rupprecht
Bernt Schiele
Hilde Kuehne
111
26
0
07 Oct 2023
Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction
Korawat Charoenpitaks
Van-Quang Nguyen
Masanori Suganuma
Masahiro Takahashi
Ryoma Niihara
Takayuki Okatani
104
1
0
07 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Ziyi Yin
Muchao Ye
Tianrong Zhang
Tianyu Du
Jinguo Zhu
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
AAML
VLM
CoGe
89
44
0
07 Oct 2023
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
89
12
0
05 Oct 2023
Multimodal Question Answering for Unified Information Extraction
Yuxuan Sun
Kai Zhang
Yu-Chuan Su
67
8
0
04 Oct 2023
Proactive Human-Robot Interaction using Visuo-Lingual Transformers
Pranay Mathur
LM&Ro
28
1
0
04 Oct 2023
SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering
Bruno Souza
Marius Aasan
Hélio Pedrini
Adín Ramirez Rivera
SSL
86
2
0
03 Oct 2023
GRID: A Platform for General Robot Intelligence Development
Sai H. Vemprala
Shuhang Chen
Abhinav Shukla
Dinesh Narayanan
Ashish Kapoor
95
10
0
02 Oct 2023
RegBN: Batch Normalization of Multimodal Data with Regularization
Morteza Ghahremani
Christian Wachinger
99
7
0
01 Oct 2023
Previous
1
2
3
...
10
11
12
...
41
42
43
Next