Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.03052
Cited By
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
7 February 2022
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework"
50 / 648 papers shown
Title
Building Multimodal AI Chatbots
Mingyu Lee
27
3
0
21 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
26
102
0
17 Apr 2023
Efficient Multimodal Fusion via Interactive Prompting
Yaowei Li
Ruijie Quan
Linchao Zhu
Yezhou Yang
28
44
0
13 Apr 2023
Boosting Cross-task Transferability of Adversarial Patches with Visual Relations
Tony Ma
Songze Li
Yisong Xiao
Shunchang Liu
30
1
0
11 Apr 2023
Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
Tianjiao Li
Lin Geng Foo
Ping Hu
Xindi Shang
Hossein Rahmani
Zehuan Yuan
J. Liu
32
7
0
09 Apr 2023
SegGPT: Segmenting Everything In Context
Xinlong Wang
Xiaosong Zhang
Yue Cao
Wen Wang
Chunhua Shen
Tiejun Huang
VOS
MLLM
VLM
30
199
0
06 Apr 2023
Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce
Yang Jin
Yongzhi Li
Zehuan Yuan
Yadong Mu
22
13
0
06 Apr 2023
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Noa Garcia
Yusuke Hirota
Yankun Wu
Yuta Nakashima
EGVM
33
51
0
06 Apr 2023
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Jihan Yang
Runyu Ding
Weipeng Deng
Zhe Wang
Xiaojuan Qi
20
61
0
03 Apr 2023
Towards Flexible Multi-modal Document Models
Naoto Inoue
Kotaro Kikuchi
E. Simo-Serra
Mayu Otani
Kota Yamaguchi
40
20
0
31 Mar 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Lucas Beyer
Bo Wan
Gagan Madan
Filip Pavetić
Andreas Steiner
...
Emanuele Bugliarello
Xiao Wang
Qihang Yu
Liang-Chieh Chen
Xiaohua Zhai
49
8
0
30 Mar 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Peng Gao
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Hongsheng Li
Yu Qiao
MLLM
33
739
0
28 Mar 2023
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
A. Maharana
Amita Kamath
Christopher Clark
Mohit Bansal
Aniruddha Kembhavi
33
3
0
28 Mar 2023
WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
Jongheon Jeong
Yang Zou
Taewan Kim
Dongqing Zhang
Avinash Ravichandran
O. Dabeer
VLM
67
186
0
26 Mar 2023
IFSeg: Image-free Semantic Segmentation via Vision-Language Model
Sukmin Yun
S. Park
Paul Hongsuck Seo
Jinwoo Shin
VLM
MLLM
49
14
0
25 Mar 2023
Train/Test-Time Adaptation with Retrieval
L. Zancato
Alessandro Achille
Tian Yu Liu
Matthew Trager
Pramuditha Perera
Stefano Soatto
TTA
OOD
16
12
0
25 Mar 2023
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
Haoxuan You
Mandy Guo
Zhecan Wang
Kai-Wei Chang
Jason Baldridge
Jiahui Yu
DiffM
44
12
0
23 Mar 2023
Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering
T. M. Thai
Son T. Luu
32
0
0
22 Mar 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
Sungwoong Kim
DaeJin Jo
Donghoon Lee
Jongmin Kim
VLM
34
11
0
21 Mar 2023
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu
Benlin Liu
Jungo Kasai
Yizhong Wang
Mari Ostendorf
Ranjay Krishna
Noah A. Smith
EGVM
27
206
0
21 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLM
VLM
24
29
0
20 Mar 2023
Text2Tex: Text-driven Texture Synthesis via Diffusion Models
Dave Zhenyu Chen
Yawar Siddiqui
Hsin-Ying Lee
Sergey Tulyakov
Matthias Nießner
DiffM
22
191
0
20 Mar 2023
Multi-modal reward for visual relationships-based image captioning
Ali Abedi
Hossein Karshenas
Peyman Adibi
36
2
0
19 Mar 2023
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
Seungju Han
Jack Hessel
Nouha Dziri
Yejin Choi
Youngjae Yu
VGen
25
16
0
17 Mar 2023
A Picture is Worth a Thousand Words: Language Models Plan from Pixels
Anthony Z. Liu
Lajanugen Logeswaran
Sungryull Sohn
Honglak Lee
LM&Ro
14
6
0
16 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
45
430
0
14 Mar 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
121
66
0
13 Mar 2023
ViM: Vision Middleware for Unified Downstream Transferring
Yutong Feng
Biao Gong
Jianwen Jiang
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
32
1
0
13 Mar 2023
Contextually-rich human affect perception using multimodal scene information
Digbalay Bose
Rajat Hebbar
Krishna Somandepalli
Shrikanth Narayanan
25
3
0
13 Mar 2023
Universal Instance Perception as Object Discovery and Retrieval
B. Yan
Yi-Xin Jiang
Jiannan Wu
D. Wang
Ping Luo
Zehuan Yuan
Huchuan Lu
VOS
VLM
LRM
32
161
0
12 Mar 2023
HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
Shixiang Tang
Cheng Chen
Qingsong Xie
Meilin Chen
Yizhou Wang
...
Feng Zhu
Haiyang Yang
Li Yi
Rui Zhao
Wanli Ouyang
VLM
19
35
0
10 Mar 2023
Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors
Kento Kawaharazuka
Yoshiki Obinata
Naoaki Kanazawa
K. Okada
Masayuki Inaba
LM&Ro
28
11
0
10 Mar 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIP
MLLM
VLM
3DV
61
74
0
10 Mar 2023
Weakly-Supervised HOI Detection from Interaction Labels Only and Language/Vision-Language Priors
Mesut Erhan Unal
Adriana Kovashka
VLM
16
5
0
09 Mar 2023
VQA-based Robotic State Recognition Optimized with Genetic Algorithm
Kento Kawaharazuka
Yoshiki Obinata
Naoaki Kanazawa
K. Okada
Masayuki Inaba
14
16
0
09 Mar 2023
Models See Hallucinations: Evaluating the Factuality in Video Captioning
Hui Liu
Xiaojun Wan
HILM
34
10
0
06 Mar 2023
UniHCP: A Unified Model for Human-Centric Perceptions
Yuanzheng Ci
Yizhou Wang
Meilin Chen
Shixiang Tang
Lei Bai
Feng Zhu
Rui Zhao
F. Yu
Donglian Qi
Wanli Ouyang
82
51
0
06 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
34
21
0
04 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
16
38
0
04 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
89
11
0
03 Mar 2023
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
Jingjing Jiang
Nanning Zheng
MoE
32
6
0
02 Mar 2023
EVJVQA Challenge: Multilingual Visual Question Answering
N. Nguyen
Nghia Hieu Nguyen
Duong T.D. Vo
K. Tran
Kiet Van Nguyen
25
7
0
23 Feb 2023
Backdoor Attacks to Pre-trained Unified Foundation Models
Zenghui Yuan
Yixin Liu
Kai Zhang
Pan Zhou
Lichao Sun
AAML
22
10
0
18 Feb 2023
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
Zhihong Chen
Shizhe Diao
Benyou Wang
Guanbin Li
Xiang Wan
MedIm
17
29
0
17 Feb 2023
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Jiang Liu
Hui Ding
Zhaowei Cai
Yuting Zhang
R. Satzoda
Vijay Mahadevan
R. Manmatha
ObjD
15
120
0
14 Feb 2023
IC3: Image Captioning by Committee Consensus
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
26
17
0
02 Feb 2023
Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data
Alon Albalak
Colin Raffel
William Yang Wang
19
12
0
01 Feb 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Haiyang Xu
Qinghao Ye
Mingshi Yan
Yaya Shi
Jiabo Ye
...
Guohai Xu
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
MLLM
VLM
MoE
38
160
0
01 Feb 2023
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
21
26
0
01 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
265
4,229
0
30 Jan 2023
Previous
1
2
3
...
10
11
12
13
Next