ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07490
  4. Cited By
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

20 August 2019
Hao Hao Tan
Joey Tianyi Zhou
    VLM
    MLLM
ArXivPDFHTML

Papers citing "LXMERT: Learning Cross-Modality Encoder Representations from Transformers"

50 / 1,507 papers shown
Title
Prompting Large Language Models with Rationale Heuristics for
  Knowledge-based Visual Question Answering
Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Fengyuan Liu
LRM
122
58
0
22 Dec 2024
Consistency of Compositional Generalization across Multiple Levels
Consistency of Compositional Generalization across Multiple Levels
Chuanhao Li
Zhen Li
Chenchen Jing
Xiaomeng Fan
Wenbo Ye
Yuwei Wu
Yunde Jia
CoGe
84
0
0
18 Dec 2024
BioBridge: Unified Bio-Embedding with Bridging Modality in Code-Switched
  EMR
BioBridge: Unified Bio-Embedding with Bridging Modality in Code-Switched EMR
Jangyeong Jeon
Sangyeon Cho
Dongjoon Lee
Changhee Lee
Junyeong Kim
75
0
0
16 Dec 2024
ViSymRe: Vision-guided Multimodal Symbolic Regression
ViSymRe: Vision-guided Multimodal Symbolic Regression
Da Li
Junping Yin
Jin Xu
Xinxin Li
Juan Zhang
85
1
0
15 Dec 2024
Rebalanced Vision-Language Retrieval Considering Structure-Aware
  Distillation
Rebalanced Vision-Language Retrieval Considering Structure-Aware Distillation
Yang Yang
Wenjuan Xi
Luping Zhou
Jinhui Tang
77
0
0
14 Dec 2024
GAMED: Knowledge Adaptive Multi-Experts Decoupling for Multimodal Fake News Detection
GAMED: Knowledge Adaptive Multi-Experts Decoupling for Multimodal Fake News Detection
Lingzhi Shen
Yunfei Long
Xiaohao Cai
Imran Razzak
Guanming Chen
Kang Liu
Shoaib Jameel
MoE
69
3
0
11 Dec 2024
Attention Head Purification: A New Perspective to Harness CLIP for
  Domain Generalization
Attention Head Purification: A New Perspective to Harness CLIP for Domain Generalization
Yingfan Wang
Guoliang Kang
VLM
82
0
0
10 Dec 2024
Unified Framework for Open-World Compositional Zero-shot Learning
Unified Framework for Open-World Compositional Zero-shot Learning
Hirunima Jayasekara
Khoi Pham
Nirat Saini
Abhinav Shrivastava
64
0
0
05 Dec 2024
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal
  Alignment
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li
Yifei Xing
X. Lan
Xuzhao Li
Haifeng Chen
D. Jiang
Mamba
74
0
0
01 Dec 2024
From Audio Deepfake Detection to AI-Generated Music Detection -- A
  Pathway and Overview
From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview
Yupei Li
M. Milling
Lucia Specia
Björn Schuller
89
6
0
30 Nov 2024
Approximate Fiber Product: A Preliminary Algebraic-Geometric Perspective on Multimodal Embedding Alignment
Dongfang Zhao
64
0
0
30 Nov 2024
VLM-HOI: Vision Language Models for Interpretable Human-Object
  Interaction Analysis
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis
Donggoo Kang
Dasol Jeong
Hyunmin Lee
Sangwoo Park
Hasil Park
Sunkyu Kwon
Yeongjoon Kim
Joonki Paik
MLLM
VLM
76
0
0
27 Nov 2024
Task Progressive Curriculum Learning for Robust Visual Question
  Answering
Task Progressive Curriculum Learning for Robust Visual Question Answering
Ahmed Akl
Abdelwahed Khamis
Zhe Wang
Ali Cheraghian
Sara Khalifa
Kewen Wang
OOD
80
0
0
26 Nov 2024
Cross-Modal Pre-Aligned Method with Global and Local Information for
  Remote-Sensing Image and Text Retrieval
Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval
Zengbao Sun
Ming Zhao
Gaorui Liu
Andre Kaup
94
3
0
22 Nov 2024
Learning to Reason Iteratively and Parallelly for Complex Visual
  Reasoning Scenarios
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
79
2
0
20 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
29
1
0
17 Nov 2024
Multi-Modal interpretable automatic video captioning
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
33
0
0
11 Nov 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
29
0
0
11 Nov 2024
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability
  Vision-Language Attack
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack
Xiaojun Jia
Sensen Gao
Qing-Wu Guo
Ke Ma
Yihao Huang
Simeng Qin
Yang Liu
Ivor Tsang Fellow
Xiaochun Cao
AAML
46
3
0
04 Nov 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous
  Driving
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wei Liu
Xinyu Wang
VLM
MLLM
LRM
46
14
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of
  Prefix-tuning for Large Multi-modal Models
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
59
1
0
29 Oct 2024
Improving Generalization in Visual Reasoning via Self-Ensemble
Improving Generalization in Visual Reasoning via Self-Ensemble
Tien-Huy Nguyen
Quang-Khai Tran
Anh-Tuan Quang-Hoang
VLM
LRM
55
5
0
28 Oct 2024
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
Zhilin Zhang
Jie Wang
Zhanghao Qin
Ruiqi Zhu
Xiaoliang Gong
MedIm
48
0
0
28 Oct 2024
Sensor2Text: Enabling Natural Language Interactions for Daily Activity
  Tracking Using Wearable Sensors
Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable Sensors
Wenqiang Chen
Jiaxuan Cheng
Leyao Wang
Wei Zhao
Wojciech Matusik
33
1
0
26 Oct 2024
A Survey of Multimodal Sarcasm Detection
A Survey of Multimodal Sarcasm Detection
Shafkat Farabi
Tharindu Ranasinghe
Diptesh Kanojia
Yu Kong
Marcos Zampieri
26
4
0
24 Oct 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language
  Tuning
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Zhiwei Hao
Jianyuan Guo
Li Shen
Yong Luo
Han Hu
Yonggang Wen
VLM
26
0
0
23 Oct 2024
How to Build a Pre-trained Multimodal model for Simultaneously Chatting
  and Decision-making?
How to Build a Pre-trained Multimodal model for Simultaneously Chatting and Decision-making?
Zuojin Tang
Bin-Bin Hu
Chenyang Zhao
De Ma
Gang Pan
Bin Liu
23
0
0
21 Oct 2024
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for
  Bangla
ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla
Deeparghya Dutta Barua
Md Sakib Ul Rahman Sourove
Md Farhan Ishmam
Fabiha Haider
Fariha Tanjim Shifat
Md Fahim
Md Farhad Alam
29
0
0
19 Oct 2024
Vision-Language Navigation with Energy-Based Policy
Vision-Language Navigation with Energy-Based Policy
Rui Liu
Wenguan Wang
Yuqing Yang
40
3
0
18 Oct 2024
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping
  Language-Image Pre-training
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Muhe Ding
Yang Ma
Pengda Qin
Jianlong Wu
Yuhong Li
Liqiang Nie
23
1
0
18 Oct 2024
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using
  Transformer-based Method in Vietnamese Text-based Visual Question Answering
ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering
Nghia Hieu Nguyen
Tho Thanh Quan
Ngan Luu-Thuy Nguyen
31
0
0
18 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic
  Reasoning Tasks
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLM
CoGe
ReLM
VLM
LRM
37
0
0
17 Oct 2024
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object
  Tracking and Segmentation
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Changcheng Xiao
Qiong Cao
Yujie Zhong
Xiang Zhang
Tao Wang
Canqun Yang
L. Lan
28
0
0
17 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
56
9
0
16 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic
  Modeling
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
28
2
0
14 Oct 2024
M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop
  Chain-of-Thought
M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought
G. Kumari
Kirtan Jain
Asif Ekbal
23
1
0
11 Oct 2024
DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human
  Engagement Estimation
DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation
Jia Li
Yangchen Yu
Yin Chen
Yu Zhang
Peng Jia
Yunbo Xu
Ziqiang Li
Hao Wu
Richang Hong
35
1
0
11 Oct 2024
FLIER: Few-shot Language Image Models Embedded with Latent
  Representations
FLIER: Few-shot Language Image Models Embedded with Latent Representations
Zhinuo Zhou
Peng Zhou
Xiaoyong Pan
VLM
28
0
0
10 Oct 2024
Exploring Efficient Foundational Multi-modal Models for Video
  Summarization
Exploring Efficient Foundational Multi-modal Models for Video Summarization
Karan Samel
Apoorva Beedu
Nitish Sontakke
Irfan Essa
37
1
0
09 Oct 2024
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Negar Nejatishahidin
Madhukar Reddy Vongala
Jana Kosecka
37
3
0
09 Oct 2024
DAViD: Domain Adaptive Visually-Rich Document Understanding with
  Synthetic Insights
DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights
Yihao Ding
S. Han
Zechuan Li
Hyunsuk Chung
20
0
0
02 Oct 2024
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Minoh Jeong
Min Namgung
Zae Myung Kim
Dongyeop Kang
Yao-Yi Chiang
Alfred Hero
25
0
0
02 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
37
0
0
01 Oct 2024
A Multimodal Single-Branch Embedding Network for Recommendation in
  Cold-Start and Missing Modality Scenarios
A Multimodal Single-Branch Embedding Network for Recommendation in Cold-Start and Missing Modality Scenarios
Christian Ganhor
Marta Moscati
Anna Hausberger
Shah Nawaz
Markus Schedl
40
2
0
26 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
55
7
0
23 Sep 2024
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple
  Operators for Forecasting Fluid Dynamics
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
Yuxuan Liu
Jingmin Sun
Xinjie He
Griffin Pinney
Zecheng Zhang
Hayden Schaeffer
AI4CE
43
6
0
15 Sep 2024
Generating Event-oriented Attribution for Movies via Two-Stage
  Prefix-Enhanced Multimodal LLM
Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM
Yuanjie Lyu
Tong Xu
Zihan Niu
Bo Peng
Jing Ke
Enhong Chen
28
0
0
14 Sep 2024
ComAlign: Compositional Alignment in Vision-Language Models
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGe
VLM
30
0
0
12 Sep 2024
Data Augmentation via Latent Diffusion for Saliency Prediction
Data Augmentation via Latent Diffusion for Saliency Prediction
Bahar Aydemir
Deblina Bhattacharjee
Tong Zhang
Mathieu Salzmann
Sabine Süsstrunk
31
1
0
11 Sep 2024
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large
  Language Model
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
Zhen Yang
Jinhao Chen
Zhengxiao Du
Wenmeng Yu
Weihan Wang
Wenyi Hong
Zhihuan Jiang
Bin Xu
Yuxiao Dong
Jie Tang
VLM
LRM
32
8
0
10 Sep 2024
Previous
12345...293031
Next