Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.03126
Cited By
v1
v2 (latest)
DM
2
^2
2
S
2
^2
2
: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention
7 September 2022
Shunsuke Kitada
Yuki Iwazaki
Riku Togashi
Hitoshi Iyatomi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DM$^2$S$^2$: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention"
30 / 30 papers shown
Title
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents
Teakgyu Hong
Donghyun Kim
Mingi Ji
Wonseok Hwang
Daehyun Nam
Sungrae Park
VLM
73
153
0
10 Aug 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
192
510
0
29 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
657
41,103
0
22 Oct 2020
Deep Multi-Modal Sets
A. Reiter
Menglin Jia
Pu Yang
Ser-Nam Lim
BDL
62
4
0
03 Mar 2020
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
133
707
0
31 Dec 2019
Rosetta: Large scale system for text detection and recognition in images
Fedor Borisyuk
Albert Gordo
V. Sivakumar
78
299
0
11 Oct 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
352
941
0
24 Sep 2019
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
140
247
0
06 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
662
24,464
0
26 Jul 2019
What Makes Training Multi-Modal Classification Networks Hard?
Weiyao Wang
Du Tran
Matt Feiszli
111
453
0
29 May 2019
Character Region Awareness for Text Detection
Youngmin Baek
Bado Lee
Dongyoon Han
Sangdoo Yun
Hwalsuk Lee
64
784
0
03 Apr 2019
MFAS: Multimodal Fusion Architecture Search
Juan-Manuel Perez-Rua
Valentin Vielzeuf
S. Pateux
M. Baccouche
F. Jurie
76
178
0
15 Mar 2019
Modality Attention for End-to-End Audio-visual Speech Recognition
Pan Zhou
Wenwen Yang
Wei Chen
Yanfeng Wang
Jia Jia
55
69
0
13 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
Question-Guided Hybrid Convolution for Visual Question Answering
Peng Gao
Pan Lu
Hongsheng Li
Shuang Li
Yikang Li
Guosheng Lin
Xiaogang Wang
130
68
0
08 Aug 2018
Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text
Ruotong Wang
R. Hwa
Adriana Kovashka
51
54
0
21 Jul 2018
AllenNLP: A Deep Semantic Natural Language Processing Platform
Matt Gardner
Joel Grus
Mark Neumann
Oyvind Tafjord
Pradeep Dasigi
Nelson F. Liu
Matthew E. Peters
Michael Schmitz
Luke Zettlemoyer
VLM
86
1,283
0
20 Mar 2018
Multimodal Named Entity Recognition for Short Social Media Posts
Seungwhan Moon
Leonardo Neves
Vitor R. Carvalho
62
155
0
22 Feb 2018
Efficient Large-Scale Multi-Modal Classification
D. Kiela
Edouard Grave
Armand Joulin
Tomas Mikolov
84
148
0
06 Feb 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
704
131,652
0
12 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
104
2,932
0
26 May 2017
Deep Sets
Manzil Zaheer
Satwik Kottur
Siamak Ravanbakhsh
Barnabás Póczós
Ruslan Salakhutdinov
Alex Smola
408
2,464
0
10 Mar 2017
Gated Multimodal Units for Information Fusion
John Arevalo
Thamar Solorio
Manuel Montes-y-Gómez
Fabio Gonzalez
90
381
0
07 Feb 2017
Deep CTR Prediction in Display Advertising
Junxuan Chen
Baigui Sun
Hao Li
Hongtao Lu
Xiansheng Hua
3DV
110
131
0
20 Sep 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
305
1,465
0
06 Jun 2016
Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering
Ruining He
Julian McAuley
154
2,061
0
04 Feb 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,020
0
10 Dec 2015
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
Baoguang Shi
X. Bai
Cong Yao
VLM
213
2,487
0
21 Jul 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.9K
150,115
0
22 Dec 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
560
27,311
0
01 Sep 2014
1