ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07490
  4. Cited By
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

20 August 2019
Hao Hao Tan
Joey Tianyi Zhou
    VLM
    MLLM
ArXivPDFHTML

Papers citing "LXMERT: Learning Cross-Modality Encoder Representations from Transformers"

50 / 1,512 papers shown
Title
Zero-shot Referring Image Segmentation with Global-Local Context
  Features
Zero-shot Referring Image Segmentation with Global-Local Context Features
S. Yu
Paul Hongsuck Seo
Jeany Son
14
49
0
31 Mar 2023
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models
Sifan Long
Zhen Zhao
Junkun Yuan
Zichang Tan
Jiangjiang Liu
Luping Zhou
Sheng-sheng Wang
Jingdong Wang
VLM
36
2
0
30 Mar 2023
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with
  GPT and Prototype Guidance
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance
Zoey Guo
Yiwen Tang
Renrui Zhang
Dong Wang
Zhigang Wang
Bin Zhao
Xuelong Li
40
54
0
29 Mar 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLM
VLM
42
24
0
29 Mar 2023
KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
Xiangyang Li
Zihan Wang
Jiahao Yang
Yaowei Wang
Shuqiang Jiang
LM&Ro
26
38
0
28 Mar 2023
Text-to-Image Diffusion Models are Zero-Shot Classifiers
Text-to-Image Diffusion Models are Zero-Shot Classifiers
Kevin Clark
P. Jaini
DiffM
VLM
38
108
0
27 Mar 2023
Curriculum Learning for Compositional Visual Reasoning
Curriculum Learning for Compositional Visual Reasoning
Wafa Aissa
Marin Ferecatu
M. Crucianu
LRM
36
3
0
27 Mar 2023
Equivariant Similarity for Vision-Language Foundation Models
Equivariant Similarity for Vision-Language Foundation Models
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
46
44
0
25 Mar 2023
Task-Attentive Transformer Architecture for Continual Learning of
  Vision-and-Language Tasks Using Knowledge Distillation
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation
Yuliang Cai
Jesse Thomason
Mohammad Rostami
VLM
CLL
24
11
0
25 Mar 2023
VILA: Learning Image Aesthetics from User Comments with Vision-Language
  Pretraining
VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
Junjie Ke
Keren Ye
Jiahui Yu
Yonghui Wu
P. Milanfar
Feng Yang
VLM
57
56
0
24 Mar 2023
Accelerating Vision-Language Pretraining with Free Language Modeling
Accelerating Vision-Language Pretraining with Free Language Modeling
Teng Wang
Yixiao Ge
Feng Zheng
Ran Cheng
Ying Shan
Xiaohu Qie
Ping Luo
VLM
MLLM
93
9
0
24 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Text with Knowledge Graph Augmented Transformer for Video Captioning
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
32
48
0
22 Mar 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
MAGVLT: Masked Generative Vision-and-Language Transformer
Sungwoong Kim
DaeJin Jo
Donghoon Lee
Jongmin Kim
VLM
47
12
0
21 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
38
30
0
21 Mar 2023
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining
  on Visual Language Understanding
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
Morris Alper
Michael Fiman
Hadar Averbuch-Elor
VLM
LRM
31
16
0
21 Mar 2023
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation
  with Question Answering
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu
Benlin Liu
Jungo Kasai
Yizhong Wang
Mari Ostendorf
Ranjay Krishna
Noah A. Smith
EGVM
46
213
0
21 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
48
48
0
21 Mar 2023
Audio-Text Models Do Not Yet Leverage Natural Language
Audio-Text Models Do Not Yet Leverage Natural Language
Ho-Hsiang Wu
Oriol Nieto
J. P. Bello
Justin Salamon
VLM
21
28
0
19 Mar 2023
Label Name is Mantra: Unifying Point Cloud Segmentation across
  Heterogeneous Datasets
Label Name is Mantra: Unifying Point Cloud Segmentation across Heterogeneous Datasets
Yixun Liang
Hao He
Shishi Xiao
Hao Lu
Yingke Chen
3DPC
31
3
0
19 Mar 2023
Divide and Conquer: Answering Questions with Object Factorization and
  Compositional Reasoning
Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
Shi Chen
Qi Zhao
49
5
0
18 Mar 2023
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
Hao Zhang
Yeo Keat Ee
Basura Fernando
VLM
34
3
0
18 Mar 2023
MultiModal Bias: Introducing a Framework for Stereotypical Bias
  Assessment beyond Gender and Race in Vision Language Models
MultiModal Bias: Introducing a Framework for Stereotypical Bias Assessment beyond Gender and Race in Vision Language Models
Sepehr Janghorbani
Gerard de Melo
VLM
47
11
0
16 Mar 2023
Logical Implications for Visual Question Answering Consistency
Logical Implications for Visual Question Answering Consistency
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
23
9
0
16 Mar 2023
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action
  Recognition with Language Knowledge
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Wei Lin
Leonid Karlinsky
Nina Shvetsova
Horst Possegger
Mateusz Koziñski
Yikang Shen
Rogerio Feris
Hilde Kuehne
Horst Bischof
VLM
102
38
0
15 Mar 2023
Lana: A Language-Capable Navigator for Instruction Following and
  Generation
Lana: A Language-Capable Navigator for Instruction Following and Generation
Xiaohan Wang
Wenguan Wang
Jiayi Shao
Yi Yang
LLMAG
LM&Ro
46
38
0
15 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
52
437
0
14 Mar 2023
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Bo He
Jun Wang
Jielin Qiu
Trung Bui
Abhinav Shrivastava
Zhaowen Wang
27
66
0
13 Mar 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
Scaling Vision-Language Models with Sparse Mixture of Experts
Sheng Shen
Z. Yao
Chunyuan Li
Trevor Darrell
Kurt Keutzer
Yuxiong He
VLM
MoE
26
63
0
13 Mar 2023
DeltaEdit: Exploring Text-free Training for Text-Driven Image
  Manipulation
DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
Yueming Lyu
Tianwei Lin
Fu Li
Dongliang He
Jing Dong
Tien-Ping Tan
41
39
0
11 Mar 2023
Single-branch Network for Multimodal Training
Single-branch Network for Multimodal Training
M. S. Saeed
Shah Nawaz
M. H. Khan
M. Zaheer
Karthik Nandakumar
Muhammad Haroon Yousaf
Arif Mahmood
19
13
0
10 Mar 2023
Understanding and Constructing Latent Modality Structures in Multi-modal
  Representation Learning
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
Qian Jiang
Changyou Chen
Han Zhao
Liqun Chen
Q. Ping
S. D. Tran
Yi Xu
Belinda Zeng
Trishul Chilimbi
57
40
0
10 Mar 2023
Refined Vision-Language Modeling for Fine-grained Multi-modal
  Pre-training
Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training
Lisai Zhang
Qingcai Chen
Zhijian Chen
Yunpeng Han
Zhonghua Li
Bo Zhao
VLM
33
1
0
09 Mar 2023
TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test
  Questions
TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test Questions
He Zhu
Xihua Li
Xuemin Zhao
Yunbo Cao
Shan Yu
23
0
0
09 Mar 2023
Toward Unsupervised Realistic Visual Question Answering
Toward Unsupervised Realistic Visual Question Answering
Yuwei Zhang
Chih-Hui Ho
Nuno Vasconcelos
CoGe
27
2
0
09 Mar 2023
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Yimeng Zhang
Xin Chen
Jinghan Jia
Sijia Liu
Ke Ding
28
25
0
09 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
40
514
0
07 Mar 2023
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation
  Using Scene Object Spectrum Grounding
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
Minyoung Hwang
Jaeyeon Jeong
Minsoo Kim
Yoonseon Oh
Songhwai Oh
43
19
0
07 Mar 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware
  Attention
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIP
VLM
49
44
0
06 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
49
22
0
04 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion
  Tasks
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
29
38
0
04 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
94
11
0
03 Mar 2023
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource
  Visual Question Answering
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
Jingjing Jiang
Nanning Zheng
MoE
45
6
0
02 Mar 2023
VQA with Cascade of Self- and Co-Attention Blocks
VQA with Cascade of Self- and Co-Attention Blocks
Aakansha Mishra
Ashish Anand
Prithwijit Guha
35
0
0
28 Feb 2023
TextIR: A Simple Framework for Text-based Editable Image Restoration
TextIR: A Simple Framework for Text-based Editable Image Restoration
Yun-Hao Bai
Cairong Wang
Shuzhao Xie
Chao Dong
Chun Yuan
Zhi Wang
DiffM
35
15
0
28 Feb 2023
Multi-Layer Attention-Based Explainability via Transformers for Tabular
  Data
Multi-Layer Attention-Based Explainability via Transformers for Tabular Data
Andrea Trevino Gavito
Diego Klabjan
J. Utke
LMTD
28
3
0
28 Feb 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
44
223
0
27 Feb 2023
Understanding Social Media Cross-Modality Discourse in Linguistic Space
Understanding Social Media Cross-Modality Discourse in Linguistic Space
Chunpu Xu
Hanzhuo Tan
Jing Li
Piji Li
24
7
0
26 Feb 2023
Deep Learning for Video-Text Retrieval: a Review
Deep Learning for Video-Text Retrieval: a Review
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
26
14
0
24 Feb 2023
Entity-Level Text-Guided Image Manipulation
Entity-Level Text-Guided Image Manipulation
Yikai Wang
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Wei Zhang
Yanwei Fu
VGen
34
3
0
22 Feb 2023
X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval
  Augmentation
X-TRA: Improving Chest X-ray Tasks with Cross-Modal Retrieval Augmentation
Tom van Sonsbeek
M. Worring
23
13
0
22 Feb 2023
Previous
123...111213...293031
Next