ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,119 papers shown
Title
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient
  image-text retrieval
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval
Youbo Lei
Feifei He
Chen Chen
Yingbin Mo
Sijia Li
Defeng Xie
H. Lu
VLM
108
0
0
30 Oct 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
70
3
0
30 Oct 2023
MOSEL: Inference Serving Using Dynamic Modality Selection
MOSEL: Inference Serving Using Dynamic Modality Selection
Bodun Hu
Le Xu
Jeongyoon Moon
N. Yadwadkar
Aditya Akella
62
4
0
27 Oct 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Alan Yuille
CoGe
107
14
0
27 Oct 2023
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural
  Languages
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages
Mohammad Akbari
Saeed Ranjbar Alvar
Behnam Kamranian
Amin Banitalebi-Dehkordi
Yong Zhang
AI4CE
46
0
0
26 Oct 2023
Learning Temporal Sentence Grounding From Narrated EgoVideos
Learning Temporal Sentence Grounding From Narrated EgoVideos
Kevin Flanagan
Dima Damen
Michael Wray
69
3
0
26 Oct 2023
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open
  Environments
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments
Mengxue Qu
Yu-Huan Wu
Wu Liu
Xiaodan Liang
Jingkuan Song
Yao-Min Zhao
Yunchao Wei
45
17
0
26 Oct 2023
M2C: Towards Automatic Multimodal Manga Complement
M2C: Towards Automatic Multimodal Manga Complement
Hongcheng Guo
Boyang Wang
Jiaqi Bai
Jiaheng Liu
Jian Yang
Zhoujun Li
94
10
0
26 Oct 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
73
0
0
25 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
84
10
0
25 Oct 2023
$\mathbb{VD}$-$\mathbb{GR}$: Boosting $\mathbb{V}$isual
  $\mathbb{D}$ialog with Cascaded Spatial-Temporal Multi-Modal
  $\mathbb{GR}$aphs
VD\mathbb{VD}VD-GR\mathbb{GR}GR: Boosting V\mathbb{V}Visual D\mathbb{D}Dialog with Cascaded Spatial-Temporal Multi-Modal GR\mathbb{GR}GRaphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
61
4
0
25 Oct 2023
Learning to Explain: A Model-Agnostic Framework for Explaining Black Box
  Models
Learning to Explain: A Model-Agnostic Framework for Explaining Black Box Models
Oren Barkan
Yuval Asher
Amit Eshel
Yehonatan Elisha
Noam Koenigstein
77
5
0
25 Oct 2023
Emergent Communication in Interactive Sketch Question Answering
Emergent Communication in Interactive Sketch Question Answering
Zixing Lei
Yiming Zhang
Yuxin Xiong
Siheng Chen
83
2
0
24 Oct 2023
Multimodal Representations for Teacher-Guided Compositional Visual
  Reasoning
Multimodal Representations for Teacher-Guided Compositional Visual Reasoning
Wafa Aissa
Marin Ferecatu
M. Crucianu
LRM
70
0
0
24 Oct 2023
Deep Integrated Explanations
Deep Integrated Explanations
Oren Barkan
Yehonatan Elisha
Jonathan Weill
Yuval Asher
Amit Eshel
Noam Koenigstein
FAtt
109
7
0
23 Oct 2023
Hallucination Detection for Grounded Instruction Generation
Hallucination Detection for Grounded Instruction Generation
Lingjun Zhao
Khanh Nguyen
Hal Daumé
HILM
83
7
0
23 Oct 2023
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained
  Multimodal Models
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models
Xinyi Chen
Raquel Fernández
Sandro Pezzelle
VLM
62
10
0
23 Oct 2023
M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal
  Aspect-based Sentiment Analysis
M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis
Fei Zhao
Chunhui Li
Zhen Wu
Yawen Ouyang
Jianbing Zhang
Xinyu Dai
92
20
0
23 Oct 2023
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
Baohao Liao
Michael Kozielski
Sanjika Hewavitharana
Jiangbo Yuan
Shahram Khadivi
Tomer Lancewicki
SSL
32
0
0
22 Oct 2023
Large Language Models and Multimodal Retrieval for Visual Word Sense
  Disambiguation
Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation
Anastasia Kritharoula
Maria Lymperaiou
Giorgos Stamou
96
6
0
21 Oct 2023
Multiscale Superpixel Structured Difference Graph Convolutional Network
  for VL Representation
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Siyu Zhang
Ye-Ting Chen
Fang Wang
Yaoru Sun
Jun Yang
Lizhi Bai
SSL
68
0
0
20 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
115
36
0
20 Oct 2023
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question
  Answering
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
Yuduo Wang
Pedram Ghamisi
68
6
0
19 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot
  Interactions
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
69
1
0
18 Oct 2023
BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification
BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification
Mithun Das
Animesh Mukherjee
73
7
0
18 Oct 2023
NICE: Improving Panoptic Narrative Detection and Segmentation with
  Cascading Collaborative Learning
NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning
Haowei Wang
Jiayi Ji
Tianyu Guo
Yilong Yang
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
98
5
0
17 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of
  Multi-modal Large Models
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
87
7
0
17 Oct 2023
Towards Training-free Open-world Segmentation via Image Prompt
  Foundation Models
Towards Training-free Open-world Segmentation via Image Prompt Foundation Models
Lv Tang
Peng-Tao Jiang
Haoke Xiao
Bo Li
VLM
94
11
0
17 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and
  Outlook
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TSSyDa
174
125
0
16 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
70
2
0
15 Oct 2023
Progressive Evidence Refinement for Open-domain Multimodal Retrieval
  Question Answering
Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering
Shuwen Yang
Anran Wu
Xingjiao Wu
Luwei Xiao
Tianlong Ma
Cheng Jin
Liang He
69
4
0
15 Oct 2023
Penetrative AI: Making LLMs Comprehend the Physical World
Penetrative AI: Making LLMs Comprehend the Physical World
Huatao Xu
Liying Han
Qirui Yang
Mo Li
Mani Srivastava
77
62
0
14 Oct 2023
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
Jiayi Ji
Haowei Wang
Changli Wu
Yiwei Ma
Xiaoshuai Sun
Rongrong Ji
112
1
0
14 Oct 2023
Mapping Memes to Words for Multimodal Hateful Meme Classification
Mapping Memes to Words for Multimodal Hateful Meme Classification
Giovanni Burbi
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
A. Bimbo
63
19
0
12 Oct 2023
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Jingru Gan
Xinzhe Han
Shuhui Wang
Qingming Huang
81
0
0
12 Oct 2023
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Yueming Lyu
Kang Zhao
Bo Peng
H. Chen
Yue Jiang
Yingya Zhang
Jing Dong
Caifeng Shan
83
2
0
12 Oct 2023
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Che Liu
Sibo Cheng
Miaojing Shi
Anand Shah
Wenjia Bai
Rossella Arcucci
94
27
0
11 Oct 2023
I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal
  Information Extraction
I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction
Yusheng Huang
Zhouhan Lin
62
5
0
10 Oct 2023
Video-Teller: Enhancing Cross-Modal Generation with Fusion and
  Decoupling
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling
Haogeng Liu
Qihang Fan
Tingkai Liu
Linjie Yang
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
VGen
57
12
0
08 Oct 2023
Understanding the Robustness of Multi-modal Contrastive Learning to
  Distribution Shift
Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift
Yihao Xue
Siddharth Joshi
Dang Nguyen
Baharan Mirzasoleiman
VLM
76
4
0
08 Oct 2023
SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval
SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval
Sunjae Yoon
Gwanhyeong Koo
Dahyun Kim
Changdong Yoo
93
12
0
08 Oct 2023
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova
Anna Kukleva
Xudong Hong
Christian Rupprecht
Bernt Schiele
Hilde Kuehne
111
26
0
07 Oct 2023
Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction
Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction
Korawat Charoenpitaks
Van-Quang Nguyen
Masanori Suganuma
Masahiro Takahashi
Ryoma Niihara
Takayuki Okatani
104
1
0
07 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via
  Pre-trained Models
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Ziyi Yin
Muchao Ye
Tianrong Zhang
Tianyu Du
Jinguo Zhu
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
AAMLVLMCoGe
89
44
0
07 Oct 2023
Expedited Training of Visual Conditioned Language Generation via
  Redundancy Reduction
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
89
12
0
05 Oct 2023
Multimodal Question Answering for Unified Information Extraction
Multimodal Question Answering for Unified Information Extraction
Yuxuan Sun
Kai Zhang
Yu-Chuan Su
67
8
0
04 Oct 2023
Proactive Human-Robot Interaction using Visuo-Lingual Transformers
Proactive Human-Robot Interaction using Visuo-Lingual Transformers
Pranay Mathur
LM&Ro
28
1
0
04 Oct 2023
SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based
  Question Answering
SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering
Bruno Souza
Marius Aasan
Hélio Pedrini
Adín Ramirez Rivera
SSL
86
2
0
03 Oct 2023
GRID: A Platform for General Robot Intelligence Development
GRID: A Platform for General Robot Intelligence Development
Sai H. Vemprala
Shuhang Chen
Abhinav Shukla
Dinesh Narayanan
Ashish Kapoor
95
10
0
02 Oct 2023
RegBN: Batch Normalization of Multimodal Data with Regularization
RegBN: Batch Normalization of Multimodal Data with Regularization
Morteza Ghahremani
Christian Wachinger
99
7
0
01 Oct 2023
Previous
123...101112...414243
Next