ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,119 papers shown
Title
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Paul Pu Liang
Zihao Deng
Martin Q. Ma
James Zou
Louis-Philippe Morency
Ruslan Salakhutdinov
SSL
98
56
0
08 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP
Dealing with Semantic Underspecification in Multimodal NLP
Sandro Pezzelle
73
10
0
08 Jun 2023
Object Detection with Transformers: A Review
Object Detection with Transformers: A Review
Tahira Shehzadi
K. Hashmi
D. Stricker
Muhammad Zeshan Afzal
ViTMU
109
29
0
07 Jun 2023
On the Generalization of Multi-modal Contrastive Learning
On the Generalization of Multi-modal Contrastive Learning
Qi Zhang
Yifei Wang
Yisen Wang
79
26
0
07 Jun 2023
Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA
  Tasks? A: Self-Train on Unlabeled Images!
Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Zaid Khan
B. Vijaykumar
S. Schulter
Xiang Yu
Y. Fu
Manmohan Chandraker
VLMMLLM
98
18
0
06 Jun 2023
MolFM: A Multimodal Molecular Foundation Model
MolFM: A Multimodal Molecular Foundation Model
Yi Luo
Kai Yang
Massimo Hong
Xingyi Liu
Zaiqing Nie
78
40
0
06 Jun 2023
Referring Expression Comprehension Using Language Adaptive Inference
Referring Expression Comprehension Using Language Adaptive Inference
Wei Su
Peihan Miao
Huanzhang Dou
Yongjian Fu
Xi Li
ObjD
65
20
0
06 Jun 2023
Diversifying Joint Vision-Language Tokenization Learning
Diversifying Joint Vision-Language Tokenization Learning
Vardaan Pahuja
A. Piergiovanni
A. Angelova
80
0
0
06 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
104
1
0
04 Jun 2023
Table and Image Generation for Investigating Knowledge of Entities in
  Pre-trained Vision and Language Models
Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
VLM
65
1
0
03 Jun 2023
Benchmarking Robustness of Adaptation Methods on Pre-trained
  Vision-Language Models
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models
Shuo Chen
Jindong Gu
Zhen Han
Yunpu Ma
Philip Torr
Volker Tresp
VPVLMVLM
127
21
0
03 Jun 2023
"Let's not Quote out of Context": Unified Vision-Language Pretraining
  for Context Assisted Image Captioning
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning
Abisek Rajakumar Kalarani
P. Bhattacharyya
Niyati Chhaya
Sumit Shekhar
CoGeVLM
124
9
0
01 Jun 2023
UniDiff: Advancing Vision-Language Models with Generative and
  Discriminative Learning
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning
Xiao Dong
Runhu Huang
Xiaoyong Wei
Zequn Jie
Jianxing Yu
Jian Yin
Xiaodan Liang
VLMDiffM
77
1
0
01 Jun 2023
Make Pre-trained Model Reversible: From Parameter to Memory Efficient
  Fine-Tuning
Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning
Baohao Liao
Shaomu Tan
Christof Monz
KELM
105
30
0
01 Jun 2023
Adapting Pre-trained Language Models to Vision-Language Tasks via
  Dynamic Visual Prompting
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
Shubin Huang
Qiong Wu
Yiyi Zhou
Weijie Chen
Rongsheng Zhang
Xiaoshuai Sun
Rongrong Ji
VLMVPVLMLRM
59
0
0
01 Jun 2023
PV2TEA: Patching Visual Modality to Textual-Established Information
  Extraction
PV2TEA: Patching Visual Modality to Textual-Established Information Extraction
Hejie Cui
Rongmei Lin
Nasser Zalmout
Chenwei Zhang
Jingbo Shang
Carl Yang
Xian Li
VLM
87
4
0
01 Jun 2023
Prompt Algebra for Task Composition
Prompt Algebra for Task Composition
Pramuditha Perera
Matthew Trager
Luca Zancato
Alessandro Achille
Stefano Soatto
VLM
77
8
0
01 Jun 2023
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
Ning Ding
Yehui Tang
Zhongqian Fu
Chaoting Xu
Kai Han
Yunhe Wang
MLLMVLM
57
2
0
01 Jun 2023
ManagerTower: Aggregating the Insights of Uni-Modal Experts for
  Vision-Language Representation Learning
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Xiao Xu
Bei Li
Chenfei Wu
Shao-Yen Tseng
Anahita Bhiwandiwalla
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
AIFinVLM
78
4
0
31 May 2023
Chatting Makes Perfect: Chat-based Image Retrieval
Chatting Makes Perfect: Chat-based Image Retrieval
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
138
16
0
31 May 2023
Joint Adaptive Representations for Image-Language Learning
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
76
0
0
31 May 2023
Attention-Based Methods For Audio Question Answering
Attention-Based Methods For Audio Question Answering
Parthasaarathy Sudarsanam
Tuomas Virtanen
73
3
0
31 May 2023
Language-Conditioned Imitation Learning with Base Skill Priors under
  Unstructured Data
Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data
Hongkuan Zhou
Zhenshan Bing
Xiangtong Yao
Xiaojie Su
Chenguang Yang
Kai-Qi Huang
Alois C. Knoll
LM&Ro
92
20
0
30 May 2023
Generate then Select: Open-ended Visual Question Answering Guided by
  World Knowledge
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge
Xingyu Fu
Shenmin Zhang
Gukyeong Kwon
Pramuditha Perera
Henghui Zhu
...
Zhiguo Wang
Vittorio Castelli
Patrick Ng
Dan Roth
Bing Xiang
87
22
0
30 May 2023
Scalable Performance Analysis for Vision-Language Models
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
73
2
0
30 May 2023
Enhanced Chart Understanding in Vision and Language Task via Cross-modal
  Pre-training on Plot Table Pairs
Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Mingyang Zhou
Yi R. Fung
Long Chen
Christopher Thomas
Heng Ji
Shih-Fu Chang
110
13
0
29 May 2023
HGT: A Hierarchical GCN-Based Transformer for Multimodal Periprosthetic Joint Infection Diagnosis Using CT Images and Text
Ruiyang Li
Fujun Yang
Xianjie Liu
Hon-Yi Shi
75
0
0
29 May 2023
Deeply Coupled Cross-Modal Prompt Learning
Deeply Coupled Cross-Modal Prompt Learning
Xuejing Liu
Wei Tang
Jinghui Lu
Rui Zhao
Zhaojun Guo
Fei Tan
VLM
77
17
0
29 May 2023
MemeGraphs: Linking Memes to Knowledge Graphs
MemeGraphs: Linking Memes to Knowledge Graphs
Vasiliki Kougia
Simon Fetzel
Thomas Kirchmair
Erion cCano
Sina Moayed Baharlou
Sahand Sharifzadeh
Benjamin Roth
89
11
0
28 May 2023
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature
  Adaptation of Vision-Language Models
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Zhiwei Jia
P. Narayana
Arjun Reddy Akula
G. Pruthi
Haoran Su
Sugato Basu
Varun Jampani
VLMOffRL
86
4
0
28 May 2023
MPCHAT: Towards Multimodal Persona-Grounded Conversation
MPCHAT: Towards Multimodal Persona-Grounded Conversation
Jaewoo Ahn
Yeda Song
Sangdoo Yun
Gunhee Kim
53
22
0
27 May 2023
Modularized Zero-shot VQA with Pre-trained Models
Modularized Zero-shot VQA with Pre-trained Models
Rui Cao
Jing Jiang
LRM
93
3
0
27 May 2023
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
Claytone Sikasote
Eunice Mukonde
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
63
8
0
26 May 2023
Calibration of Transformer-based Models for Identifying Stress and
  Depression in Social Media
Calibration of Transformer-based Models for Identifying Stress and Depression in Social Media
Loukas Ilias
S. Mouzakitis
D. Askounis
83
46
0
26 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
87
23
0
25 May 2023
Training Data Extraction From Pre-trained Language Models: A Survey
Training Data Extraction From Pre-trained Language Models: A Survey
Shotaro Ishihara
122
48
0
25 May 2023
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched
  Contextualization
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched Contextualization
Shivam Sharma
S Ramaneswaran
Udit Arora
Md. Shad Akhtar
Tanmoy Chakraborty
77
9
0
25 May 2023
READ: Recurrent Adaptation of Large Transformers
READ: Recurrent Adaptation of Large Transformers
Sida I. Wang
John Nguyen
Ke Li
Carole-Jean Wu
55
11
0
24 May 2023
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient
  Vision-Language Models
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
Zekun Wang
Jingchang Chen
Wangchunshu Zhou
Haichao Zhu
Jiafeng Liang
Liping Shan
Ming Liu
Dongliang Xu
Qing Yang
Bing Qin
VLM
102
5
0
24 May 2023
MMNet: Multi-Mask Network for Referring Image Segmentation
MMNet: Multi-Mask Network for Referring Image Segmentation
Yimin Yan
Xingjian He
Wenxuan Wan
Qingbin Liu
EgoV
62
2
0
24 May 2023
Meta-learning For Vision-and-language Cross-lingual Transfer
Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu
Frank Keller
VLM
85
2
0
24 May 2023
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and
  Compositional Experts
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts
Yunshui Li
Binyuan Hui
Zhichao Yin
Min Yang
Fei Huang
Yongbin Li
MoE
91
21
0
24 May 2023
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for
  Autonomous Driving Scenario
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario
Tianwen Qian
Jingjing Chen
Linhai Zhuo
Yang Jiao
Yueping Jiang
102
158
0
24 May 2023
Exploring Diverse In-Context Configurations for Image Captioning
Exploring Diverse In-Context Configurations for Image Captioning
Xu Yang
Yongliang Wu
Mingzhuo Yang
Haokun Chen
Xin Geng
MLLM
87
64
0
24 May 2023
UniChart: A Universal Vision-language Pretrained Model for Chart
  Comprehension and Reasoning
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Shafiq Joty
LRM
95
113
0
24 May 2023
RE$^2$: Region-Aware Relation Extraction from Visually Rich Documents
RE2^22: Region-Aware Relation Extraction from Visually Rich Documents
Pritika Ramu
Sijia Wang
Lalla Mouatadid
Joy Rimchala
Lifu Huang
56
0
0
24 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
113
5
0
23 May 2023
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Manuel Tran
Yashin Dicente Cid
Amal Lahiani
Fabian J. Theis
Tingying Peng
Eldad Klaiman
77
2
0
23 May 2023
DetGPT: Detect What You Need via Reasoning
DetGPT: Detect What You Need via Reasoning
Renjie Pi
Jiahui Gao
Shizhe Diao
Boyao Wang
Hanze Dong
...
Lewei Yao
Jianhua Han
Hang Xu
Lingpeng Kong Tong Zhang
Tong Zhang
LRMLM&Ro
86
99
0
23 May 2023
EDIS: Entity-Driven Image Search over Multimodal Web Content
EDIS: Entity-Driven Image Search over Multimodal Web Content
Siqi Liu
Weixi Feng
Tsu-Jui Fu
Wenhu Chen
Wenjie Wang
VLM
119
10
0
23 May 2023
Previous
123...141516...414243
Next