Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.02315
Cited By
v1
v2 (latest)
12-in-1: Multi-Task Vision and Language Representation Learning
5 December 2019
Jiasen Lu
Vedanuj Goswami
Marcus Rohrbach
Devi Parikh
Stefan Lee
VLM
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"12-in-1: Multi-Task Vision and Language Representation Learning"
50 / 55 papers shown
Title
What to align in multimodal contrastive learning?
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
118
4
0
11 Sep 2024
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLM
VLM
136
233
0
07 Jul 2023
The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents
Kurt Shuster
Da Ju
Stephen Roller
Emily Dinan
Y-Lan Boureau
Jason Weston
74
82
0
09 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
439
20,181
0
23 Oct 2019
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
110
447
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
352
941
0
24 Sep 2019
Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels
Felix J. S. Bragman
Ryutaro Tanno
Sebastien Ourselin
Daniel C. Alexander
M. Jorge Cardoso
62
85
0
26 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
155
1,666
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
247
2,483
0
20 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
202
905
0
16 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
62
173
0
14 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
138
1,955
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
229
3,684
0
06 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
659
24,464
0
26 Jul 2019
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Mandar Joshi
Danqi Chen
Yinhan Liu
Daniel S. Weld
Luke Zettlemoyer
Omer Levy
147
1,965
0
24 Jul 2019
OmniNet: A unified architecture for multi-modal multi-task learning
Subhojeet Pramanik
Priyanka Agrawal
A. Hussain
41
41
0
17 Jul 2019
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark
Minh-Thang Luong
Urvashi Khandelwal
Christopher D. Manning
Quoc V. Le
62
230
0
10 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
232
8,433
0
19 Jun 2019
Which Tasks Should Be Learned Together in Multi-task Learning?
Trevor Scott Standley
Amir Zamir
Dawn Chen
Leonidas Guibas
Jitendra Malik
Silvio Savarese
103
517
0
18 May 2019
Many Task Learning with Task Routing
Gjorgji Strezoski
Nanne van Noord
Marcel Worring
MoE
64
96
0
28 Mar 2019
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Drew A. Hudson
Christopher D. Manning
CoGe
NAI
59
137
0
25 Feb 2019
Multi-Task Deep Neural Networks for Natural Language Understanding
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
AI4CE
127
1,271
0
31 Jan 2019
Cross-lingual Language Model Pretraining
Guillaume Lample
Alexis Conneau
82
2,747
0
22 Jan 2019
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
88
52
0
03 Dec 2018
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
158
881
0
27 Nov 2018
Visual Entailment Task for Visually-Grounded Language Learning
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
44
53
0
26 Nov 2018
A Corpus for Reasoning About Natural Language Grounded in Photographs
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
103
604
0
01 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
The Natural Language Decathlon: Multitask Learning as Question Answering
Bryan McCann
N. Keskar
Caiming Xiong
R. Socher
AIMat
MLLM
BDL
142
645
0
20 Jun 2018
Neural Baby Talk
Jiasen Lu
Jianwei Yang
Dhruv Batra
Devi Parikh
VLM
230
435
0
27 Mar 2018
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
84
1,151
0
21 Mar 2018
MAttNet: Modular Attention Network for Referring Expression Comprehension
Licheng Yu
Zhe Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Joey Tianyi Zhou
Tamara L. Berg
ObjD
97
828
0
24 Jan 2018
Embodied Question Answering
Abhishek Das
Samyak Datta
Georgia Gkioxari
Stefan Lee
Devi Parikh
Dhruv Batra
LM&Ro
93
646
0
30 Nov 2017
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Peter Anderson
Qi Wu
Damien Teney
Jake Bruce
Mark Johnson
Niko Sünderhauf
Ian Reid
Stephen Gould
Anton Van Den Hengel
LM&Ro
95
1,308
0
20 Nov 2017
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
144
2,142
0
14 Nov 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
121
4,216
0
25 Jul 2017
Distral: Robust Multitask Reinforcement Learning
Yee Whye Teh
V. Bapst
Wojciech M. Czarnecki
John Quan
J. Kirkpatrick
R. Hadsell
N. Heess
Razvan Pascanu
161
553
0
13 Jul 2017
An Overview of Multi-Task Learning in Deep Neural Networks
Sebastian Ruder
CVBM
149
2,828
0
15 Jun 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
701
131,652
0
12 Jun 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
342
3,246
0
02 Dec 2016
Modeling Relationships in Referential Expressions with Compositional Modular Networks
Ronghang Hu
Marcus Rohrbach
Jacob Andreas
Trevor Darrell
Kate Saenko
82
406
0
30 Nov 2016
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
142
998
0
26 Nov 2016
GuessWhat?! Visual object discovery through multi-modal dialogue
H. D. Vries
Florian Strub
A. Chandar
Olivier Pietquin
Hugo Larochelle
Aaron Courville
VLM
106
428
0
23 Nov 2016
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
517
10,330
0
16 Nov 2016
Reinforcement Learning with Unsupervised Auxiliary Tasks
Max Jaderberg
Volodymyr Mnih
Wojciech M. Czarnecki
Tom Schaul
Joel Z Leibo
David Silver
Koray Kavukcuoglu
SSL
106
1,228
0
16 Nov 2016
UberNet: Training a `Universal' Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory
Iasonas Kokkinos
SSeg
SSL
142
673
0
07 Sep 2016
Adversarial Feature Learning
Jiasen Lu
Philipp Krahenbuhl
Trevor Darrell
GAN
111
1,610
0
31 May 2016
Cross-stitch Networks for Multi-task Learning
Ishan Misra
Abhinav Shrivastava
Abhinav Gupta
M. Hebert
90
1,348
0
12 Apr 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
217
5,747
0
23 Feb 2016
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
Emilio Parisotto
Jimmy Lei Ba
Ruslan Salakhutdinov
OffRL
86
599
0
19 Nov 2015
1
2
Next