Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.15110
Cited By
Masked Vision-Language Transformer in Fashion
27 October 2022
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Daniel Gehrig
Luc Van Gool
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Masked Vision-Language Transformer in Fashion"
45 / 45 papers shown
Title
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He
Yihe Deng
Shixiang Tang
Qihao Chen
Qingsong Xie
...
Feng Zhu
Rui Zhao
Wanli Ouyang
Donglian Qi
Yunfeng Yan
106
20
0
13 Jun 2023
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
427
7,705
0
11 Nov 2021
Paradigm Shift in Natural Language Processing
Tianxiang Sun
Xiangyang Liu
Xipeng Qiu
Xuanjing Huang
152
82
0
26 Sep 2021
MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling
Tarik Arici
M. S. Seyfioglu
T. Neiman
Yi Tian Xu
Son N. Tran
Trishul Chilimbi
Belinda Zeng
Ismail B. Tutar
VLM
37
15
0
24 Sep 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
53
10
0
21 Aug 2021
Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications
Sandhini Agarwal
Gretchen Krueger
Jack Clark
Alec Radford
Jong Wook Kim
Miles Brundage
52
142
0
05 Aug 2021
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
225
2,812
0
15 Jun 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Haiyang Xu
Ming Yan
Chenliang Li
Bin Bi
Songfang Huang
Wenming Xiao
Fei Huang
VLM
55
119
0
03 Jun 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
300
587
0
22 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
129
272
0
07 Apr 2021
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
VLM
56
121
0
30 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
409
21,347
0
25 Mar 2021
M6: A Chinese Multimodal Pretrainer
Junyang Lin
Rui Men
An Yang
Chan Zhou
Ming Ding
...
Yong Li
Wei Lin
Jingren Zhou
J. Tang
Hongxia Yang
VLM
MoE
70
133
0
01 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
814
29,167
0
26 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
493
3,709
0
24 Feb 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
385
4,919
0
24 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
114
661
0
11 Feb 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Wonjae Kim
Bokyung Son
Ildoo Kim
VLM
CLIP
112
1,739
0
05 Feb 2021
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Xudong Lin
Gedas Bertasius
Jue Wang
Shih-Fu Chang
Devi Parikh
Lorenzo Torresani
VGen
86
67
0
28 Jan 2021
Salient Object Detection via Integrity Learning
Mingchen Zhuge
Deng-Ping Fan
Nian Liu
Dingwen Zhang
Dong Xu
Ling Shao
AAML
73
304
0
19 Jan 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
534
40,739
0
22 Oct 2020
Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards
Xuewen Yang
Heming Zhang
Di Jin
Yingru Liu
Chi-Hao Wu
Jianchao Tan
Dongliang Xie
Jue Wang
Xin Wang
47
68
0
06 Aug 2020
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval
D. Gao
Linbo Jin
Ben Chen
Minghui Qiu
Peng Li
Yi Wei
Yitao Hu
Haozhe Jasper Wang
OOD
66
133
0
20 May 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
90
1,934
0
13 Apr 2020
From Paris to Berlin: Discovering Fashion Style Influences Around the World
Ziad Al-Halah
Kristen Grauman
AI4TS
44
26
0
03 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
124
438
0
02 Apr 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
76
261
0
22 Jan 2020
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
142
1,661
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
227
2,474
0
20 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
200
900
0
16 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
54
173
0
14 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
217
3,667
0
06 Aug 2019
Position Focused Attention Network for Image-Text Matching
Yaxiong Wang
Hao-Hsiang Yang
Xueming Qian
Lin Ma
Jing Lu
Biao Li
Xin Fan
33
171
0
23 Jul 2019
Fashion++: Minimal Edits for Outfit Improvement
Wei-Lin Hsiao
Isay Katsman
Chao-Yuan Wu
Devi Parikh
Kristen Grauman
51
67
0
19 Apr 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.5K
94,511
0
11 Oct 2018
Fashion-Gen: The Generative Fashion Dataset and Challenge
Negar Rostamzadeh
Seyedarian Hosseini
Thomas Boquet
Wojciech Stokowiec
Yanzhe Zhang
Christian Jauvin
C. Pal
29
129
0
21 Jun 2018
Learning Type-Aware Embeddings for Fashion Compatibility
Mariya I. Vasileva
Bryan A. Plummer
Krishna Dusad
Shreya Rajpal
Ranjitha Kumar
David A. Forsyth
59
225
0
25 Mar 2018
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
74
1,151
0
21 Mar 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
638
130,942
0
12 Jun 2017
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.0K
193,426
0
10 Dec 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
469
62,122
0
04 Jun 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
1.6K
76,917
0
18 May 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
318
10,050
0
10 Feb 2015
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Ryan Kiros
Ruslan Salakhutdinov
R. Zemel
VLM
111
1,397
0
10 Nov 2014
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
375
43,524
0
01 May 2014
1