ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.07651
  4. Cited By
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
v1v2 (latest)

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

16 July 2021
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
    FaML
ArXiv (abs)PDFHTMLGithub (1658★)

Papers citing "Align before Fuse: Vision and Language Representation Learning with Momentum Distillation"

31 / 1,231 papers shown
Title
Contrastive Vision-Language Pre-training with Limited Resources
Contrastive Vision-Language Pre-training with Limited Resources
Quan Cui
Boyan Zhou
Yu Guo
Weidong Yin
Hao Wu
Osamu Yoshie
Yubo Chen
VLMCLIP
53
34
0
17 Dec 2021
Distilled Dual-Encoder Model for Vision-Language Understanding
Distilled Dual-Encoder Model for Vision-Language Understanding
Zekun Wang
Wenhui Wang
Haichao Zhu
Ming Liu
Bing Qin
Furu Wei
VLMFedML
85
33
0
16 Dec 2021
DistilCSE: Effective Knowledge Distillation For Contrastive Sentence
  Embeddings
DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings
Chaochen Gao
Xing Wu
Peng Wang
Jue Wang
Liangjun Zang
Zhongyuan Wang
Songlin Hu
29
4
0
10 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIPVLM
151
719
0
08 Dec 2021
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Yi-Liang Nie
Linjie Li
Zhe Gan
Shuohang Wang
Chenguang Zhu
Michael Zeng
Zicheng Liu
Joey Tianyi Zhou
Lijuan Wang
58
6
0
08 Dec 2021
General Facial Representation Learning in a Visual-Linguistic Manner
General Facial Representation Learning in a Visual-Linguistic Manner
Yinglin Zheng
Hao Yang
Ting Zhang
Jianmin Bao
Dongdong Chen
Yangyu Huang
Lu Yuan
Dong Chen
Ming Zeng
Fang Wen
CVBM
205
176
0
06 Dec 2021
VarCLR: Variable Semantic Representation Pre-training via Contrastive
  Learning
VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
Qibin Chen
Jeremy Lacomis
Edward J. Schwartz
Graham Neubig
Bogdan Vasilescu
Claire Le Goues
VLM
77
35
0
05 Dec 2021
LAFITE: Towards Language-Free Training for Text-to-Image Generation
LAFITE: Towards Language-Free Training for Text-to-Image Generation
Yufan Zhou
Ruiyi Zhang
Changyou Chen
Chunyuan Li
Chris Tensmeyer
Tong Yu
Jiuxiang Gu
Jinhui Xu
Tong Sun
VLM
86
168
0
27 Nov 2021
Scaling Up Vision-Language Pre-training for Image Captioning
Scaling Up Vision-Language Pre-training for Image Captioning
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Zhengyuan Yang
Zicheng Liu
Yumao Lu
Lijuan Wang
MLLMVLM
170
249
0
24 Nov 2021
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language
  Modeling
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Faisal Ahmed
Zicheng Liu
Yumao Lu
Lijuan Wang
143
117
0
23 Nov 2021
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
176
907
0
22 Nov 2021
TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating
  Visio-Linguistic Reasoning
TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning
Keng Ji Chow
Samson Tan
MingSung Kan
LRM
54
4
0
21 Nov 2021
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image
  Segmentation
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation
Zizhang Li
Mengmeng Wang
Jianbiao Mei
Yong Liu
73
19
0
21 Nov 2021
Advancing High-Resolution Video-Language Representation with Large-Scale
  Video Transcriptions
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue
Tiankai Hang
Yanhong Zeng
Yuchong Sun
Bei Liu
Huan Yang
Jianlong Fu
B. Guo
AI4TSVLM
78
194
0
19 Nov 2021
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang
Xiaowei Hu
Zhe Gan
Zhengyuan Yang
Xiyang Dai
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
73
57
0
19 Nov 2021
Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
M. Gao
Chen Xing
Juan Carlos Niebles
Junnan Li
Ran Xu
Wenhao Liu
Caiming Xiong
VLMObjD
104
86
0
18 Nov 2021
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual
  Concepts
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng
Xinsong Zhang
Hang Li
VLMCLIP
93
308
0
16 Nov 2021
CLIP2TV: Align, Match and Distill for Video-Text Retrieval
CLIP2TV: Align, Match and Distill for Video-Text Retrieval
Zijian Gao
Qingbin Liu
Weiqi Sun
S. Chen
Dedan Chang
Lili Zhao
VLMCLIP
58
17
0
10 Nov 2021
FILIP: Fine-grained Interactive Language-Image Pre-Training
FILIP: Fine-grained Interactive Language-Image Pre-Training
Lewei Yao
Runhu Huang
Lu Hou
Guansong Lu
Minzhe Niu
Hang Xu
Xiaodan Liang
Zhenguo Li
Xin Jiang
Chunjing Xu
VLMCLIP
113
643
0
09 Nov 2021
An Empirical Study of Training End-to-End Vision-and-Language
  Transformers
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Zi-Yi Dou
Yichong Xu
Zhe Gan
Jianfeng Wang
Shuohang Wang
...
Pengchuan Zhang
Lu Yuan
Nanyun Peng
Zicheng Liu
Michael Zeng
VLM
102
381
0
03 Nov 2021
VLMo: Unified Vision-Language Pre-Training with
  Mixture-of-Modality-Experts
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Hangbo Bao
Wenhui Wang
Li Dong
Qiang Liu
Owais Khan Mohammed
Kriti Aggarwal
Subhojit Som
Furu Wei
VLMMLLMMoE
104
559
0
03 Nov 2021
VLDeformer: Vision-Language Decomposed Transformer for Fast Cross-Modal
  Retrieval
VLDeformer: Vision-Language Decomposed Transformer for Fast Cross-Modal Retrieval
Lisai Zhang
Hongfa Wu
Qingcai Chen
Yimeng Deng
Zhonghua Li
Dejiang Kong
Bo Zhao
Joanna Siebert
Yunpeng Han
ViTVLM
98
21
0
20 Oct 2021
SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text
  Joint Pre-Training
SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training
Ankur Bapna
Yu-An Chung
Na Wu
Anmol Gulati
Ye Jia
J. Clark
Melvin Johnson
Jason Riesa
Alexis Conneau
Yu Zhang
VLM
137
96
0
20 Oct 2021
Cascaded Fast and Slow Models for Efficient Semantic Code Search
Cascaded Fast and Slow Models for Efficient Semantic Code Search
Akhilesh Deepak Gotmare
Junnan Li
Shafiq Joty
Guosheng Lin
63
10
0
15 Oct 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLMCLIP
350
1,062
0
09 Oct 2021
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
221
372
0
17 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in
  Multimodal Transformers
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Stella Frank
Emanuele Bugliarello
Desmond Elliott
74
82
0
09 Sep 2021
Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained
  Models
Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models
Chenfeng Xu
Shijia Yang
Tomer Galanti
Bichen Wu
Xiangyu Yue
Bohan Zhai
Wei Zhan
Peter Vajda
Kurt Keutzer
Masayoshi Tomizuka
3DPC
62
55
0
08 Jun 2021
Playing Lottery Tickets with Vision and Language
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
142
56
0
23 Apr 2021
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for
  Improved Cross-Modal Retrieval
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval
Gregor Geigle
Jonas Pfeiffer
Nils Reimers
Ivan Vulić
Iryna Gurevych
104
60
0
22 Mar 2021
A Primer on Contrastive Pretraining in Language Processing: Methods,
  Lessons Learned and Perspectives
A Primer on Contrastive Pretraining in Language Processing: Methods, Lessons Learned and Perspectives
Nils Rethmeier
Isabelle Augenstein
SSLVLM
159
94
0
25 Feb 2021
Previous
123...232425