ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSLVLM
ArXiv (abs)PDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,119 papers shown
Title
PROSE: Predicting Operators and Symbolic Expressions using Multimodal
  Transformers
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
90
18
0
28 Sep 2023
Social Media Fashion Knowledge Extraction as Captioning
Social Media Fashion Knowledge Extraction as Captioning
Yifei Yuan
Wenxuan Zhang
Yang Deng
Wai Lam
54
1
0
28 Sep 2023
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
Yangyang Guo
Haoyu Zhang
Yongkang Wong
Liqiang Nie
Mohan Kankanhalli
VLM
71
4
0
28 Sep 2023
PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation
PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation
Shizhe Chen
Ricardo Garcia Pinel
Cordelia Schmid
Ivan Laptev
LM&Ro3DPC
101
39
0
27 Sep 2023
ADGym: Design Choices for Deep Anomaly Detection
ADGym: Design Choices for Deep Anomaly Detection
Minqi Jiang
Chaochuan Hou
Ao Zheng
Songqiao Han
Hailiang Huang
Qingsong Wen
Xiyang Hu
Yue Zhao
99
16
0
27 Sep 2023
DECO: Dense Estimation of 3D Human-Scene Contact In The Wild
DECO: Dense Estimation of 3D Human-Scene Contact In The Wild
Shashank Tripathi
Agniv Chatterjee
Jean-Claude Passy
Hongwei Yi
Dimitrios Tzionas
Michael J. Black
3DH
85
23
0
26 Sep 2023
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial
  Datasets
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets
Daria Reshetova
Swetava Ganguli
C. V. K. Iyer
Vipul Pandey
61
3
0
26 Sep 2023
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Hila Levi
Guy Heller
Dan Levi
Ethan Fetaya
OCLVLM
74
4
0
26 Sep 2023
Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
Zhihao Zhang
Yiwei Chen
Weizhan Zhang
Caixia Yan
Qinghua Zheng
Qi Wang
Wang Chen
43
6
0
26 Sep 2023
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via
  Multi-Modal Causal Attention
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Z. Yao
Xiaoxia Wu
Conglong Li
Minjia Zhang
Heyang Qi
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
102
11
0
25 Sep 2023
VidChapters-7M: Video Chapters at Scale
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
102
28
0
25 Sep 2023
Survey of Social Bias in Vision-Language Models
Survey of Social Bias in Vision-Language Models
Nayeon Lee
Yejin Bang
Holy Lovenia
Samuel Cahyawijaya
Wenliang Dai
Pascale Fung
VLM
132
19
0
24 Sep 2023
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Xin Li
Dongze Lian
Zhihe Lu
Jiawang Bai
Zhibo Chen
Xinchao Wang
VLM
113
66
0
24 Sep 2023
Semi-Supervised Domain Generalization for Object Detection via
  Language-Guided Feature Alignment
Semi-Supervised Domain Generalization for Object Detection via Language-Guided Feature Alignment
Sina Malakouti
Adriana Kovashka
ObjD
71
2
0
24 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
133
7
0
23 Sep 2023
Multi-modal Domain Adaptation for REG via Relation Transfer
Multi-modal Domain Adaptation for REG via Relation Transfer
Yifan Ding
Liqiang Wang
Boqing Gong
68
0
0
23 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
120
199
0
20 Sep 2023
StructChart: Perception, Structuring, Reasoning for Visual Chart
  Understanding
StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding
Renqiu Xia
Bo Zhang
Hao Peng
Hancheng Ye
Xiangchao Yan
Peng Ye
Botian Shi
Yu Qiao
Junchi Yan
116
0
0
20 Sep 2023
Predicate Classification Using Optimal Transport Loss in Scene Graph
  Generation
Predicate Classification Using Optimal Transport Loss in Scene Graph Generation
Sorachi Kurita
Satoshi Oyama
Itsuki Noda
OT
71
0
0
19 Sep 2023
Collaborative Three-Stream Transformers for Video Captioning
Collaborative Three-Stream Transformers for Video Captioning
Hao Wang
Libo Zhang
Hengrui Fan
Tiejian Luo
73
7
0
18 Sep 2023
Decompose Semantic Shifts for Composed Image Retrieval
Decompose Semantic Shifts for Composed Image Retrieval
Xingyu Yang
Daqing Liu
Heng Zhang
Yong Luo
Chaoyue Wang
Jing Zhang
63
2
0
18 Sep 2023
MAPLE: Mobile App Prediction Leveraging Large Language Model Embeddings
MAPLE: Mobile App Prediction Leveraging Large Language Model Embeddings
Yonchanok Khaokaew
Hao Xue
Flora D. Salim
VLMAI4TS
42
1
0
15 Sep 2023
Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking
Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking
Wenzhang Wei
Zhipeng Gui
Changguang Wu
Anqi Zhao
D. Peng
Huayi Wu
85
0
0
15 Sep 2023
Improving Multimodal Classification of Social Media Posts by Leveraging
  Image-Text Auxiliary Tasks
Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks
Danae Sánchez Villegas
Daniel Preoctiuc-Pietro
Nikolaos Aletras
66
3
0
14 Sep 2023
PRE: Vision-Language Prompt Learning with Reparameterization Encoder
PRE: Vision-Language Prompt Learning with Reparameterization Encoder
Anh Pham Thi Minh
An Duc Nguyen
Georgios Tzimiropoulos
VPVLMVLM
85
3
0
14 Sep 2023
PROGrasp: Pragmatic Human-Robot Communication for Object Grasping
PROGrasp: Pragmatic Human-Robot Communication for Object Grasping
Gi-Cheon Kang
Junghyun Kim
Jaein Kim
Byoung-Tak Zhang
105
5
0
14 Sep 2023
DePT: Decoupled Prompt Tuning
DePT: Decoupled Prompt Tuning
Ji Zhang
Shihan Wu
Lianli Gao
Hengtao Shen
Jingkuan Song
VLM
80
33
0
14 Sep 2023
VLSlice: Interactive Vision-and-Language Slice Discovery
VLSlice: Interactive Vision-and-Language Slice Discovery
Eric Slyman
Minsuk Kahng
Stefan Lee
VLM
62
9
0
13 Sep 2023
Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on
  Resource-constrained Devices
Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices
Mohamed Imed Eddine Ghebriout
Halima Bouzidi
Smail Niar
Hamza Ouarnoughi
78
3
0
12 Sep 2023
Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed
  Hinglish Memes
Overview of Memotion 3: Sentiment and Emotion Analysis of Codemixed Hinglish Memes
Shreyash Mishra
S. Suryavardan
Megha Chakraborty
Parth Patwa
Anku Rani
...
Amitava Das
A. Sheth
Manoj Kumar Chinnakotla
Asif Ekbal
Srijan Kumar
56
5
0
12 Sep 2023
Incorporating Pre-trained Model Prompting in Multimodal Stock Volume
  Movement Prediction
Incorporating Pre-trained Model Prompting in Multimodal Stock Volume Movement Prediction
Ruibo Chen
Zhiyuan Zhang
Yi Liu
Ruihan Bao
Keiko Harimoto
Xu Sun
AIFinAI4TS
76
0
0
11 Sep 2023
Multi3DRefer: Grounding Text Description to Multiple 3D Objects
Multi3DRefer: Grounding Text Description to Multiple 3D Objects
Yiming Zhang
ZeMing Gong
Angel X. Chang
137
77
0
11 Sep 2023
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language
  Models
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
LRM
99
27
0
08 Sep 2023
From Text to Mask: Localizing Entities Using the Attention of
  Text-to-Image Diffusion Models
From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models
Changming Xiao
Qi Yang
Feng Zhou
Changshui Zhang
95
17
0
08 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng Zhang
Han Hu
DongDong Chen
Baining Guo
DiffMVLM
126
107
0
07 Sep 2023
A Multimodal Analysis of Influencer Content on Twitter
A Multimodal Analysis of Influencer Content on Twitter
Danae Sánchez Villegas
Catalina Goanta
Nikolaos Aletras
117
6
0
06 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
76
2
0
06 Sep 2023
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical
  Learning
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
Wei Suo
Mengyang Sun
Weisong Liu
Yi-Meng Gao
Peifeng Wang
Yanning Zhang
Qi Wu
LRM
72
7
0
05 Sep 2023
Parameter and Computation Efficient Transfer Learning for
  Vision-Language Pre-trained Models
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Qiong Wu
Wei Yu
Yiyi Zhou
Shubin Huang
Xiaoshuai Sun
Rongrong Ji
VLM
86
7
0
04 Sep 2023
Unified Pre-training with Pseudo Texts for Text-To-Image Person
  Re-identification
Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification
Zhiyin Shao
Xinyu Zhang
Changxing Ding
Jian Wang
Jingdong Wang
100
19
0
04 Sep 2023
BDC-Adapter: Brownian Distance Covariance for Better Vision-Language
  Reasoning
BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning
Yi Zhang
Ce Zhang
Zihan Liao
Yushun Tang
Zhihai He
BDLVLM
111
10
0
03 Sep 2023
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for
  Vision-Language Models
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models
Cheng Shi
Sibei Yang
VLM
93
21
0
03 Sep 2023
Distraction-free Embeddings for Robust VQA
Distraction-free Embeddings for Robust VQA
Atharvan Dogra
Deeksha Varshney
Ashwin Kalyan
Ameet Deshpande
Neeraj Kumar
104
0
0
31 Aug 2023
ViLTA: Enhancing Vision-Language Pre-training through Textual
  Augmentation
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
Weihan Wang
Zhiyong Yang
Bin Xu
Juanzi Li
Yankui Sun
VLM
96
8
0
31 Aug 2023
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning
  Based on Visually Grounded Conversations
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
Kilichbek Haydarov
Xiaoqian Shen
Avinash Madasu
Mahmoud Salem
Jia Li
Gamaleldin F. Elsayed
Mohamed Elhoseiny
71
4
0
30 Aug 2023
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
  Detection
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Yifan Xu
Mengdan Zhang
Xiaoshan Yang
Changsheng Xu
ObjD
84
5
0
30 Aug 2023
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for
  Multimodal Machine Translation
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
Devaansh Gupta
Siddhant Kharbanda
Jiawei Zhou
Wanhua Li
Hanspeter Pfister
D. Wei
VLM
89
13
0
29 Aug 2023
CoVR: Learning Composed Video Retrieval from Web Video Captions
CoVR: Learning Composed Video Retrieval from Web Video Captions
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
91
21
0
28 Aug 2023
A Unified Transformer-based Network for multimodal Emotion Recognition
A Unified Transformer-based Network for multimodal Emotion Recognition
Kamran Ali
Charles E. Hughes
83
1
0
27 Aug 2023
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language
  Pretraining?
Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?
Fei Wang
Liang Ding
Jun Rao
Ye Liu
Li Shen
Changxing Ding
96
15
0
24 Aug 2023
Previous
123...111213...414243
Next