ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.03344
  4. Cited By
Universal Multimodal Representation for Language Understanding

Universal Multimodal Representation for Language Understanding

9 January 2023
Zhuosheng Zhang
Kehai Chen
Rui Wang
Masao Utiyama
Eiichiro Sumita
Z. Li
Hai Zhao
    SSL
ArXivPDFHTML

Papers citing "Universal Multimodal Representation for Language Understanding"

50 / 63 papers shown
Title
Good for Misconceived Reasons: An Empirical Revisiting on the Need for
  Visual Context in Multimodal Machine Translation
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation
Zhiyong Wu
Lingpeng Kong
W. Bi
Xiang Li
B. Kao
LRM
37
81
0
30 May 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
76
300
0
22 Feb 2021
A Survey on Visual Transformer
A Survey on Visual Transformer
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
169
2,202
0
23 Dec 2020
Vokenization: Improving Language Understanding with Contextualized,
  Visual-Grounded Supervision
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Hao Tan
Joey Tianyi Zhou
CLIP
54
121
0
14 Oct 2020
Dynamic Context-guided Capsule Network for Multimodal Machine
  Translation
Dynamic Context-guided Capsule Network for Multimodal Machine Translation
Huan Lin
Fandong Meng
Jinsong Su
Yongjing Yin
Zhengyuan Yang
Yubin Ge
Jie Zhou
Jiebo Luo
56
80
0
04 Sep 2020
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine
  Translation
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
Yongjing Yin
Fandong Meng
Jinsong Su
Chulun Zhou
Zhengyuan Yang
Jie Zhou
Jiebo Luo
60
143
0
17 Jul 2020
Structured Multimodal Attentions for TextVQA
Structured Multimodal Attentions for TextVQA
Chenyu Gao
Qi Zhu
Peng Wang
Hui Li
Yuliang Liu
Anton Van Den Hengel
Qi Wu
67
59
0
01 Jun 2020
Quantifying Attention Flow in Transformers
Quantifying Attention Flow in Transformers
Samira Abnar
Willem H. Zuidema
132
792
0
02 May 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
90
1,934
0
13 Apr 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
337
937
0
24 Sep 2019
FlowSeq: Non-Autoregressive Conditional Sequence Generation with
  Generative Flow
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow
Xuezhe Ma
Chunting Zhou
Xian Li
Graham Neubig
Eduard H. Hovy
AI4TS
BDL
46
191
0
05 Sep 2019
Semantics-aware BERT for Language Understanding
Semantics-aware BERT for Language Understanding
Zhuosheng Zhang
Yuwei Wu
Zhao Hai
Z. Li
Shuailiang Zhang
Xi Zhou
Xiang Zhou
40
368
0
05 Sep 2019
Handling Syntactic Divergence in Low-resource Machine Translation
Handling Syntactic Divergence in Low-resource Machine Translation
Chunting Zhou
Xuezhe Ma
Junjie Hu
Graham Neubig
53
26
0
30 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
145
1,661
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
227
2,474
0
20 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
200
900
0
16 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
217
3,667
0
06 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
524
24,351
0
26 Jul 2019
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine
  Translation
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation
Shantipriya Parida
Ondrej Bojar
S. Dash
49
63
0
21 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
220
8,415
0
19 Jun 2019
Distilling Translations with Visual Awareness
Distilling Translations with Visual Awareness
Julia Ive
Pranava Madhyastha
Lucia Specia
VLM
139
76
0
18 Jun 2019
Visually Grounded Neural Syntax Acquisition
Visually Grounded Neural Syntax Acquisition
Freda Shi
Jiayuan Mao
Kevin Gimpel
Karen Livescu
NAI
59
85
0
07 Jun 2019
Learning to Compose and Reason with Language Tree Structures for Visual
  Grounding
Learning to Compose and Reason with Language Tree Structures for Visual Grounding
Richang Hong
Daqing Liu
Xiaoyu Mo
Xiangnan He
Hanwang Zhang
ReLM
LRM
77
159
0
05 Jun 2019
VideoBERT: A Joint Model for Video and Language Representation Learning
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLM
SSL
75
1,243
0
03 Apr 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
95
3,147
0
01 Apr 2019
Image search using multilingual texts: a cross-modal learning approach
  between image and text
Image search using multilingual texts: a cross-modal learning approach between image and text
Maxime Portaz
Hicham Randrianarivo
A. Nivaggioli
Estelle Maudet
Christophe Servan
Sylvain Peyronnet
52
12
0
27 Mar 2019
Probing the Need for Visual Context in Multimodal Machine Translation
Probing the Need for Visual Context in Multimodal Machine Translation
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
Loïc Barrault
71
142
0
20 Mar 2019
Multi-Task Deep Neural Networks for Natural Language Understanding
Multi-Task Deep Neural Networks for Natural Language Understanding
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
AI4CE
121
1,270
0
31 Jan 2019
Glyce: Glyph-vectors for Chinese Character Representations
Glyce: Glyph-vectors for Chinese Character Representations
Yuxian Meng
Wei Wu
Fei Wang
Xiaoya Li
Ping Nie
J. Mei
Muyu Li
Qinghong Han
Xiaofei Sun
Jiwei Li
VLM
56
192
0
29 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.5K
94,511
0
11 Oct 2018
The MeMAD Submission to the WMT18 Multimodal Translation Task
The MeMAD Submission to the WMT18 Multimodal Translation Task
Stig-Arne Gronroos
B. Huet
M. Kurimo
Jorma T. Laaksonen
B. Mérialdo
...
Mats Sjöberg
U. Sulubacak
Jörg Tiedemann
Raphael Troncy
Raúl Vázquez
42
64
0
31 Aug 2018
Neural Network Acceptability Judgments
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
215
1,406
0
31 May 2018
COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval
COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval
Xirong Li
Chaoxi Xu
Xiaoxu Wang
Weiyu Lan
Zhengxiong Jia
Gang Yang
Jieping Xu
106
152
0
22 May 2018
What you can cram into a single vector: Probing sentence embeddings for
  linguistic properties
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Alexis Conneau
Germán Kruszewski
Guillaume Lample
Loïc Barrault
Marco Baroni
317
892
0
03 May 2018
Phrase-Based & Neural Unsupervised Machine Translation
Phrase-Based & Neural Unsupervised Machine Translation
Guillaume Lample
Myle Ott
Alexis Conneau
Ludovic Denoyer
MarcÁurelio Ranzato
82
683
0
20 Apr 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
944
7,141
0
20 Apr 2018
Finding beans in burgers: Deep semantic-visual embedding with
  localization
Finding beans in burgers: Deep semantic-visual embedding with localization
Martin Engilberge
Louis Chevallier
P. Pérez
Matthieu Cord
58
95
0
05 Apr 2018
Deep contextualized word representations
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
190
11,542
0
15 Feb 2018
An Exploration of Word Embedding Initialization in Deep-Learning Tasks
An Exploration of Word Embedding Initialization in Deep-Learning Tasks
Tom Kocmi
Ondrej Bojar
75
32
0
24 Nov 2017
Learning Multi-Modal Word Representation Grounded in Visual Context
Learning Multi-Modal Word Representation Grounded in Visual Context
Éloi Zablocki
Benjamin Piwowarski
Laure Soulier
Patrick Gallinari
SSL
55
30
0
09 Nov 2017
Regularizing Deep Neural Networks by Noise: Its Interpretation and
  Optimization
Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization
Hyeonwoo Noh
Tackgeun You
Jonghwan Mun
Bohyung Han
NoLa
61
199
0
14 Oct 2017
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and
  Cross-lingual Focused Evaluation
SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation
Daniel Cer
Mona T. Diab
Eneko Agirre
I. Lopez-Gazpio
Lucia Specia
377
1,880
0
31 Jul 2017
The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference
  with Sentence Representations
The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference with Sentence Representations
Nikita Nangia
Adina Williams
Angeliki Lazaridou
Samuel R. Bowman
AI4TS
47
91
0
25 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
640
130,942
0
12 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
77
2,917
0
26 May 2017
STAIR Captions: Constructing a Large-Scale Japanese Image Caption
  Dataset
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset
Yuya Yoshikawa
Yutaro Shigeto
A. Takeuchi
3DV
51
118
0
02 May 2017
Data Augmentation for Low-Resource Neural Machine Translation
Data Augmentation for Low-Resource Neural Machine Translation
Marzieh Fadaee
Arianna Bisazza
Christof Monz
94
469
0
01 May 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
60
498
0
11 Apr 2017
Aggregated Residual Transformations for Deep Neural Networks
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
489
10,305
0
16 Nov 2016
A Convolutional Encoder Model for Neural Machine Translation
A Convolutional Encoder Model for Neural Machine Translation
Jonas Gehring
Michael Auli
David Grangier
Yann N. Dauphin
75
449
0
07 Nov 2016
12
Next