ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03126
  4. Cited By
DM$^2$S$^2$: Deep Multi-Modal Sequence Sets with Hierarchical Modality
  Attention
v1v2 (latest)

DM2^22S2^22: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention

7 September 2022
Shunsuke Kitada
Yuki Iwazaki
Riku Togashi
Hitoshi Iyatomi
ArXiv (abs)PDFHTML

Papers citing "DM$^2$S$^2$: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention"

30 / 30 papers shown
Title
BROS: A Pre-trained Language Model Focusing on Text and Layout for
  Better Key Information Extraction from Documents
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents
Teakgyu Hong
Donghyun Kim
Mingi Ji
Wonseok Hwang
Daehyun Nam
Sungrae Park
VLM
73
153
0
10 Aug 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document
  Understanding
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViTMLLM
192
510
0
29 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
657
41,103
0
22 Oct 2020
Deep Multi-Modal Sets
Deep Multi-Modal Sets
A. Reiter
Menglin Jia
Pu Yang
Ser-Nam Lim
BDL
62
4
0
03 Mar 2020
LayoutLM: Pre-training of Text and Layout for Document Image
  Understanding
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
133
707
0
31 Dec 2019
Rosetta: Large scale system for text detection and recognition in images
Rosetta: Large scale system for text detection and recognition in images
Fedor Borisyuk
Albert Gordo
V. Sivakumar
78
299
0
11 Oct 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLMVLM
352
941
0
24 Sep 2019
Supervised Multimodal Bitransformers for Classifying Images and Text
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
140
247
0
06 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
662
24,464
0
26 Jul 2019
What Makes Training Multi-Modal Classification Networks Hard?
What Makes Training Multi-Modal Classification Networks Hard?
Weiyao Wang
Du Tran
Matt Feiszli
111
453
0
29 May 2019
Character Region Awareness for Text Detection
Character Region Awareness for Text Detection
Youngmin Baek
Bado Lee
Dongyoon Han
Sangdoo Yun
Hwalsuk Lee
64
784
0
03 Apr 2019
MFAS: Multimodal Fusion Architecture Search
MFAS: Multimodal Fusion Architecture Search
Juan-Manuel Perez-Rua
Valentin Vielzeuf
S. Pateux
M. Baccouche
F. Jurie
76
178
0
15 Mar 2019
Modality Attention for End-to-End Audio-visual Speech Recognition
Modality Attention for End-to-End Audio-visual Speech Recognition
Pan Zhou
Wenwen Yang
Wei Chen
Yanfeng Wang
Jia Jia
55
69
0
13 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
94,891
0
11 Oct 2018
Question-Guided Hybrid Convolution for Visual Question Answering
Question-Guided Hybrid Convolution for Visual Question Answering
Peng Gao
Pan Lu
Hongsheng Li
Shuang Li
Yikang Li
Guosheng Lin
Xiaogang Wang
130
68
0
08 Aug 2018
Equal But Not The Same: Understanding the Implicit Relationship Between
  Persuasive Images and Text
Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text
Ruotong Wang
R. Hwa
Adriana Kovashka
51
54
0
21 Jul 2018
AllenNLP: A Deep Semantic Natural Language Processing Platform
AllenNLP: A Deep Semantic Natural Language Processing Platform
Matt Gardner
Joel Grus
Mark Neumann
Oyvind Tafjord
Pradeep Dasigi
Nelson F. Liu
Matthew E. Peters
Michael Schmitz
Luke Zettlemoyer
VLM
86
1,283
0
20 Mar 2018
Multimodal Named Entity Recognition for Short Social Media Posts
Multimodal Named Entity Recognition for Short Social Media Posts
Seungwhan Moon
Leonardo Neves
Vitor R. Carvalho
62
155
0
22 Feb 2018
Efficient Large-Scale Multi-Modal Classification
Efficient Large-Scale Multi-Modal Classification
D. Kiela
Edouard Grave
Armand Joulin
Tomas Mikolov
84
148
0
06 Feb 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
704
131,652
0
12 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
104
2,932
0
26 May 2017
Deep Sets
Deep Sets
Manzil Zaheer
Satwik Kottur
Siamak Ravanbakhsh
Barnabás Póczós
Ruslan Salakhutdinov
Alex Smola
408
2,464
0
10 Mar 2017
Gated Multimodal Units for Information Fusion
Gated Multimodal Units for Information Fusion
John Arevalo
Thamar Solorio
Manuel Montes-y-Gómez
Fabio Gonzalez
90
381
0
07 Feb 2017
Deep CTR Prediction in Display Advertising
Deep CTR Prediction in Display Advertising
Junxuan Chen
Baigui Sun
Hao Li
Hongtao Lu
Xiansheng Hua
3DV
110
131
0
20 Sep 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
305
1,465
0
06 Jun 2016
Ups and Downs: Modeling the Visual Evolution of Fashion Trends with
  One-Class Collaborative Filtering
Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering
Ruining He
Julian McAuley
154
2,061
0
04 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,020
0
10 Dec 2015
An End-to-End Trainable Neural Network for Image-based Sequence
  Recognition and Its Application to Scene Text Recognition
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
Baoguang Shi
X. Bai
Cong Yao
VLM
213
2,487
0
21 Jul 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.9K
150,115
0
22 Dec 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
560
27,311
0
01 Sep 2014
1