Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.07651
Cited By
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
16 July 2021
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Chenyu You
Caiming Xiong
Guosheng Lin
FaML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Align before Fuse: Vision and Language Representation Learning with Momentum Distillation"
50 / 1,195 papers shown
Title
LAIP: Learning Local Alignment from Image-Phrase Modeling for Text-based Person Search
Haiguang Wang
Yu Wu
Mengxia Wu
Cao Min
Min Zhang
37
2
0
16 Jun 2024
Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data
Jiahan Zhang
Qinglai Wei
Feng Liu
Lei Feng
VLM
31
7
0
15 Jun 2024
Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis
Zongyue Qin
Yunsheng Bai
Atefeh Sohrabizadeh
Zijian Ding
Ziniu Hu
Yizhou Sun
Jason Cong
28
2
0
13 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLM
LRM
49
1
0
13 Jun 2024
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Miaosen Zhang
Yixuan Wei
Zhen Xing
Yifei Ma
Zuxuan Wu
...
Zheng-Wei Zhang
Qi Dai
Chong Luo
Xin Geng
Baining Guo
VLM
51
1
0
13 Jun 2024
Language-driven Grasp Detection
An Dinh Vuong
Minh Nhat Vu
Baoru Huang
Nghia Nguyen
Hieu Le
T. Vo
Anh Nguyen
VLM
41
19
0
13 Jun 2024
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models
Samar Fares
Klea Ziu
Toluwani Aremu
N. Durasov
Martin Takáč
Pascal Fua
Karthik Nandakumar
Ivan Laptev
VLM
AAML
40
4
0
13 Jun 2024
Efficient Multi-View Fusion and Flexible Adaptation to View Missing in Cardiovascular System Signals
Qihan Hu
Daomiao Wang
Hong Wu
Jian Liu
Cuiwei Yang
43
0
0
13 Jun 2024
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
Matthieu Futeral
A. Zebaze
Pedro Ortiz Suarez
Julien Abadji
Rémi Lacroix
Cordelia Schmid
Rachel Bawden
Benoît Sagot
39
3
0
13 Jun 2024
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
Xuannan Liu
Zekun Li
Peipei Li
Shuhan Xia
Xing Cui
Linzhi Huang
Huaibo Huang
Weihong Deng
Zhaofeng He
44
14
0
13 Jun 2024
ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery
Kam Woh Ng
Xiatian Zhu
Yi-Zhe Song
Tao Xiang
37
2
0
12 Jun 2024
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
Irene Huang
Wei Lin
M. Jehanzeb Mirza
Jacob A. Hansen
Sivan Doveh
...
Trevor Darrel
Chuang Gan
Aude Oliva
Rogerio Feris
Leonid Karlinsky
CoGe
LRM
43
7
0
12 Jun 2024
Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
Elaheh Baharlouei
Mahsa Shafaei
Yigeng Zhang
Hugo Jair Escalante
Thamar Solorio
46
0
0
12 Jun 2024
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions
Renjie Pi
Jianshu Zhang
Jipeng Zhang
Rui Pan
Zhekai Chen
Tong Zhang
3DV
47
19
0
11 Jun 2024
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
Shuvendu Roy
Yasaman Parhizkar
Franklin Ogidi
Vahid Reza Khazaie
Michael Colacci
Ali Etemad
Elham Dolatabadi
Arash Afkanpour
VLM
52
1
0
11 Jun 2024
Learning Domain-Invariant Features for Out-of-Context News Detection
Yimeng Gu
Mengqi Zhang
Ignacio Castro
Shu Wu
Gareth Tyson
45
2
0
11 Jun 2024
Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation
Jinyuan Li
Ziyan Li
Han Li
Jianfei Yu
Rui Xia
Di Sun
Gang Pan
40
2
0
11 Jun 2024
BrainChat: Decoding Semantic Information from fMRI using Vision-language Pretrained Models
Wanaiu Huang
23
1
0
10 Jun 2024
Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data With Soft Alignment
Zijia Song
Z. Zang
Yelin Wang
Guozheng Yang
Jiangbin Zheng
Kaicheng Yu
Wanyu Chen
Stan Z. Li
36
0
0
09 Jun 2024
Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
Sai Munikoti
Ian Stewart
Sameera Horawalavithana
Henry Kvinge
Tegan H. Emerson
Sandra E Thompson
Karl Pazdernik
38
2
0
08 Jun 2024
MLLM-SR: Conversational Symbolic Regression base Multi-Modal Large Language Models
Yanjie Li
Weijun Li
Lina Yu
Min Wu
Jingyi Liu
Wenqiang Li
Shu Wei
Yusong Deng
OffRL
31
3
0
08 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
40
13
0
08 Jun 2024
Low-Rank Similarity Mining for Multimodal Dataset Distillation
Yue Xu
Zhilin Lin
Yusong Qiu
Cewu Lu
Yong-Lu Li
DD
47
4
0
06 Jun 2024
Attribute-Aware Implicit Modality Alignment for Text Attribute Person Search
Xin Wang
Fangfang Liu
Zheng Li
Caili Guo
46
1
0
06 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
35
6
0
05 Jun 2024
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li
Haopeng Li
S. Erfani
Lei Feng
James Bailey
Feng Liu
VLM
34
3
0
05 Jun 2024
Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
Qiaomu Miao
Alexandros Graikos
Jingwei Zhang
Sounak Mondal
Minh Hoai
Dimitris Samaras
38
0
0
04 Jun 2024
Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking
Zefeng Zhang
Shuaiyi Nie
Chuang Zhang
Yunzhi Liang
Wenyuan Zhang
Siqi Wang
Tingwen Liu
OT
37
2
0
04 Jun 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
56
0
0
04 Jun 2024
OLIVE: Object Level In-Context Visual Embeddings
Timothy Ossowski
Junjie Hu
OCL
VLM
57
0
0
02 Jun 2024
Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding
Xiaolong Sun
Liushuai Shi
Le Wang
Sanpin Zhou
Kun Xia
Yabing Wang
Gang Hua
27
2
0
31 May 2024
Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models
A. Bavaresco
A. Testoni
Raquel Fernández
33
2
0
31 May 2024
Knowledge-grounded Adaptation Strategy for Vision-language Models: Building Unique Case-set for Screening Mammograms for Residents Training
Aisha Urooj Khan
John W. Garrett
Tyler Bradshaw
Lonie R. Salkowski
Jiwoong Jeong
Amara Tariq
Imon Banerjee
VLM
28
1
0
30 May 2024
Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training
Jinxia Yang
Bing-Huang Su
Wayne Xin Zhao
Ji-Rong Wen
40
2
0
30 May 2024
Evaluating Vision-Language Models on Bistable Images
Artemis Panagopoulou
Coby Melkin
Chris Callison-Burch
49
0
0
29 May 2024
ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions
Honglin Lin
Siyu Li
Gu Nan
Chaoyue Tang
Xueting Wang
...
Yankai Rong
Zhili Zhou
Yutong Gao
Qimei Cui
Xiaofeng Tao
33
0
0
29 May 2024
Topological Perspectives on Optimal Multimodal Embedding Spaces
Abdul Aziz
Abdul Rahim
BDL
42
0
0
29 May 2024
MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback
Jakob Spiegelberg
Hanno Gottschalk
MLLM
65
5
0
29 May 2024
Dataset Growth
Ziheng Qin
Zhaopan Xu
Yukun Zhou
Zangwei Zheng
Zebang Cheng
...
Xiaojiang Peng
Radu Timofte
Hongxun Yao
Kai Wang
Yang You
DD
32
0
0
28 May 2024
Multi-level Interaction Modeling for Protein Mutational Effect Prediction
Yuanle Mo
Xin Hong
Bowen Gao
Yinjun Jia
Yanyan Lan
AI4CE
29
2
0
28 May 2024
Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification
Weizhen He
Yiheng Deng
Yunfeng Yan
Feng Zhu
Yizhou Wang
Lei Bai
Qingsong Xie
Donglian Qi
Wanli Ouyang
Shixiang Tang
95
2
0
28 May 2024
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Jin Wang
Shichao Dong
Yapeng Zhu
Kelu Yao
Weidong Zhao
Chao Li
Ping Luo
CoGe
LRM
48
2
0
27 May 2024
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Xiaolin Chen
Liqiang Nie
Mohan S. Kankanhalli
LRM
25
8
0
27 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
31
0
0
27 May 2024
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Mustafa Shukor
Matthieu Cord
68
5
0
26 May 2024
CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection
Lin Zhu
Yifeng Yang
Qinying Gu
Xinbing Wang
Cheng Zhou
Nanyang Ye
VLM
34
2
0
26 May 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
44
8
0
25 May 2024
From Orthogonality to Dependency: Learning Disentangled Representation for Multi-Modal Time-Series Sensing Signals
Ruichu Cai
Zhifan Jiang
Zijian Li
Weilin Chen
Xuexin Chen
Zhifeng Hao
Yifan Shen
Guan-Hong Chen
Kun Zhang
40
1
0
25 May 2024
LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image
Ruikai Cui
Xibin Song
Weixuan Sun
Senbo Wang
Weizhe Liu
...
Taizhang Shang
Yang Li
Nick Barnes
Hongdong Li
Pan Ji
3DV
53
5
0
24 May 2024
ProtFAD: Introducing function-aware domains as implicit modality towards protein function perception
Mingqing Wang
Zhiwei Nie
Yonghong He
Zhixiang Ren
27
0
0
24 May 2024
Previous
1
2
3
...
5
6
7
...
22
23
24
Next