ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.03052
  4. Cited By
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

7 February 2022
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
    MLLM
    ObjD
ArXivPDFHTML

Papers citing "OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework"

50 / 648 papers shown
Title
Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs
  and Their Human Owners
Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners
Jieyi Huang
Chunhao Zhang
Yufei Wang
Mengyue Wu
Ke Zhu
24
0
0
21 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
39
174
0
20 Sep 2023
Grasp-Anything: Large-scale Grasp Dataset from Foundation Models
Grasp-Anything: Large-scale Grasp Dataset from Foundation Models
An Vuong
Minh Nhat Vu
Hieu Le
Baoru Huang
B. Huynh
T. Vo
Andreas Kugi
Anh Nguyen
VLM
21
28
0
18 Sep 2023
PromptST: Prompt-Enhanced Spatio-Temporal Multi-Attribute Prediction
PromptST: Prompt-Enhanced Spatio-Temporal Multi-Attribute Prediction
Zijian Zhang
Xiangyu Zhao
Qidong Liu
Chunxu Zhang
Qian Ma
Wanyu Wang
Hongwei Zhao
Yiqi Wang
Zitao Liu
AI4TS
89
17
0
18 Sep 2023
PROGrasp: Pragmatic Human-Robot Communication for Object Grasping
PROGrasp: Pragmatic Human-Robot Communication for Object Grasping
Gi-Cheon Kang
Junghyun Kim
Jaein Kim
Byoung-Tak Zhang
29
4
0
14 Sep 2023
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Hongbin Ye
Tong Liu
Aijia Zhang
Wei Hua
Weiqiang Jia
HILM
48
77
0
13 Sep 2023
Frequency-Aware Masked Autoencoders for Multimodal Pretraining on
  Biosignals
Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals
Ran Liu
Ellen L. Zippi
Hadi Pouransari
Chris Sandino
Jingping Nie
Hanlin Goh
Erdrin Azemi
Ali Moin
39
12
0
12 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
46
458
0
11 Sep 2023
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language
  Models
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
LRM
36
25
0
08 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng-Wei Zhang
Han Hu
Dongdong Chen
Baining Guo
DiffM
VLM
53
93
0
07 Sep 2023
DetermiNet: A Large-Scale Diagnostic Dataset for Complex
  Visually-Grounded Referencing using Determiners
DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners
Clarence Lee
M Ganesh Kumar
Cheston Tan
28
3
0
07 Sep 2023
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
Taehoon Kim
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Mark A Marsden
...
Yujin Wang
Yimu Wang
Tiancheng Gu
Xingchang Lv
Mingmao Sun
VLM
22
4
0
05 Sep 2023
Automatic Diary Generation System including Information on Joint
  Experiences between Humans and Robots
Automatic Diary Generation System including Information on Joint Experiences between Humans and Robots
Aiko Ichikura
Kento Kawaharazuka
Yoshiki Obinata
Koki Shinjo
K. Okada
Masayuki Inaba
11
3
0
05 Sep 2023
Recognition of Heat-Induced Food State Changes by Time-Series Use of
  Vision-Language Model for Cooking Robot
Recognition of Heat-Induced Food State Changes by Time-Series Use of Vision-Language Model for Cooking Robot
Naoaki Kanazawa
Kento Kawaharazuka
Yoshiki Obinata
K. Okada
Masayuki Inaba
LM&Ro
19
5
0
04 Sep 2023
MAGMA: Music Aligned Generative Motion Autodecoder
MAGMA: Music Aligned Generative Motion Autodecoder
Sohan Anisetty
Amit Raj
James Hays
26
0
0
03 Sep 2023
A Fine-Grained Image Description Generation Method Based on Joint
  Objectives
A Fine-Grained Image Description Generation Method Based on Joint Objectives
Yifan Zhang
Chunzhen Lin
Donglin Cao
Dazhen Lin
EGVM
15
0
0
02 Sep 2023
Towards Addressing the Misalignment of Object Proposal Evaluation for
  Vision-Language Tasks via Semantic Grounding
Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding
Joshua Forster Feinglass
Yezhou Yang
27
2
0
01 Sep 2023
TouchStone: Evaluating Vision-Language Models by Language Models
TouchStone: Evaluating Vision-Language Models by Language Models
Shuai Bai
Shusheng Yang
Jinze Bai
Peng Wang
Xing Zhang
Junyang Lin
Xinggang Wang
Chang Zhou
Jingren Zhou
MLLM
37
44
0
31 Aug 2023
ViLTA: Enhancing Vision-Language Pre-training through Textual
  Augmentation
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
Weihan Wang
Zhiyong Yang
Bin Xu
Juanzi Li
Yankui Sun
VLM
28
8
0
31 Aug 2023
Towards Unified Token Learning for Vision-Language Tracking
Towards Unified Token Learning for Vision-Language Tracking
Yaozong Zheng
Bineng Zhong
Qihua Liang
Guorong Li
Rongrong Ji
Xianxian Li
29
28
0
27 Aug 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
50
808
0
24 Aug 2023
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt
  interaction tasks
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks
Zichao Dong
Weikun Zhang
Xufeng Huang
Hang Ji
Xin Zhan
Junbo Chen
VLM
21
4
0
24 Aug 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across
  Languages
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
35
48
0
23 Aug 2023
RefEgo: Referring Expression Comprehension Dataset from First-Person
  Perception of Ego4D
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D
Shuhei Kurita
Naoki Katsura
Eri Onami
EgoV
26
14
0
23 Aug 2023
Federated Learning in Big Model Era: Domain-Specific Multimodal Large
  Models
Federated Learning in Big Model Era: Domain-Specific Multimodal Large Models
Zengxiang Li
Zhaoxiang Hou
Hui Liu
Ying Wang
Tongzhi Li
...
Chao Shi
Che-Sheng Yang
Weishan Zhang
Zelei Liu
Liang Xu
FedML
16
2
0
22 Aug 2023
Instruction Tuning for Large Language Models: A Survey
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang
Linfeng Dong
Xiaoya Li
Sen Zhang
Xiaofei Sun
...
Jiwei Li
Runyi Hu
Tianwei Zhang
Fei Wu
Guoyin Wang
LM&MA
24
546
0
21 Aug 2023
Whether you can locate or not? Interactive Referring Expression
  Generation
Whether you can locate or not? Interactive Referring Expression Generation
Fulong Ye
Yuxing Long
Fangxiang Feng
Xiaojie Wang
34
4
0
19 Aug 2023
Tackling Vision Language Tasks Through Learning Inner Monologues
Tackling Vision Language Tasks Through Learning Inner Monologues
Diji Yang
Kezhen Chen
Jinmeng Rao
Xiaoyuan Guo
Yawen Zhang
Jie Yang
Yujie Zhang
MLLM
29
9
0
19 Aug 2023
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual
  Questions
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu
Y. Xu
Y. Li
W. Li
Zhengzhang Chen
Zhuowen Tu
MLLM
VLM
30
123
0
19 Aug 2023
A tailored Handwritten-Text-Recognition System for Medieval Latin
A tailored Handwritten-Text-Recognition System for Medieval Latin
Philipp Koch
Gilary Vera Nunez
Esteban Garces Arias
C. Heumann
Matthias Schoffel
Alexander Haberlin
Matthias Aßenmacher
17
2
0
18 Aug 2023
DiffDis: Empowering Generative Diffusion Model with Cross-Modal
  Discrimination Capability
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability
Runhu Huang
Jianhua Han
Guansong Lu
Xiaodan Liang
Yihan Zeng
Wei Zhang
Hang Xu
DiffM
28
2
0
18 Aug 2023
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language
  Tasks
Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks
Fawaz Sammani
Nikos Deligiannis
13
5
0
17 Aug 2023
Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual
  and Semantic Credit Assignment
Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment
Qi Chen
Chaorui Deng
Zixiong Huang
Bowen Zhang
Mingkui Tan
Qi Wu
EGVM
19
0
0
16 Aug 2023
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
Kaicheng Yang
Jiankang Deng
Xiang An
Jiawei Li
Ziyong Feng
Jia Guo
Jing Yang
Tongliang Liu
VLM
CLIP
48
45
0
16 Aug 2023
Exploring Transfer Learning in Medical Image Segmentation using
  Vision-Language Models
Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models
K. Poudel
Manish Dhakal
Prasiddha Bhandari
Rabin Adhikari
Safal Thapaliya
Bishesh Khanal
VLM
30
17
0
15 Aug 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
Foundation Model is Efficient Multimodal Multitask Model Selector
Fanqing Meng
Wenqi Shao
Zhanglin Peng
Chong Jiang
Kaipeng Zhang
Yu Qiao
Ping Luo
30
13
0
11 Aug 2023
OpenProteinSet: Training data for structural biology at scale
OpenProteinSet: Training data for structural biology at scale
Gustaf Ahdritz
N. Bouatta
S. Kadyan
Lukas Jarosch
Daniel Berenberg
Ian Fisk
Andrew Watkins
Stephen Ra
Richard Bonneau
Mohammed AlQuraishi
AI4CE
28
11
0
10 Aug 2023
Cloth2Tex: A Customized Cloth Texture Generation Pipeline for 3D Virtual
  Try-On
Cloth2Tex: A Customized Cloth Texture Generation Pipeline for 3D Virtual Try-On
Daiheng Gao
Xu Chen
Xindi Zhang
Qi Wang
Ke Sun
Bang Zhang
Liefeng Bo
Qi-Xing Huang
DiffM
35
5
0
08 Aug 2023
Distributionally Robust Classification on a Data Budget
Distributionally Robust Classification on a Data Budget
Ben Feuer
Ameya Joshi
Minh Pham
C. Hegde
OOD
37
2
0
07 Aug 2023
Foundation Model based Open Vocabulary Task Planning and Executive
  System for General Purpose Service Robots
Foundation Model based Open Vocabulary Task Planning and Executive System for General Purpose Service Robots
Yoshiki Obinata
Naoaki Kanazawa
Kento Kawaharazuka
Iori Yanokura
Soon-Hyeob Kim
K. Okada
Masayuki Inaba
LM&Ro
27
7
0
07 Aug 2023
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating
  Vision-Language Models
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models
Zheng Ma
Mianzhi Pan
Wenhan Wu
Ka Leong Cheng
Jianbing Zhang
Shujian Huang
Jiajun Chen
VLM
CoGe
26
3
0
06 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene
  Identification
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
30
2
0
05 Aug 2023
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic
  and Regional Comprehension
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang-feng Zhou
Chaohui Yu
Shaofeng Zhang
Sitong Wu
Zhibin Wang
Fan Wang
34
27
0
03 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge
  using Vision-Language Pre-Training Model
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIP
VLM
27
10
0
02 Aug 2023
Lowis3D: Language-Driven Open-World Instance-Level 3D Scene
  Understanding
Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding
Runyu Ding
Jihan Yang
Chuhui Xue
Wenqing Zhang
Song Bai
Xiaojuan Qi
3DV
VLM
21
28
0
01 Aug 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
61
42
0
30 Jul 2023
Described Object Detection: Liberating Object Detection with Flexible
  Expressions
Described Object Detection: Liberating Object Detection with Flexible Expressions
Chi Xie
Zhao Zhang
YiXuan Wu
Feng Zhu
Rui Zhao
Shuang Liang
ObjD
39
31
0
24 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future
  Challenges
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
32
18
0
21 Jul 2023
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with
  Human Feedback
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback
Ashish Singh
Prateek R. Agarwal
Zixuan Huang
Arpita Singh
Tong Yu
Sungchul Kim
Victor S. Bursztyn
N. Vlassis
Ryan A. Rossi
36
6
0
20 Jul 2023
Findings of Factify 2: Multimodal Fake News Detection
Findings of Factify 2: Multimodal Fake News Detection
S. Suryavardan
Shreyash Mishra
Megha Chakraborty
Parth Patwa
Anku Rani
...
Amitava Das
Amit P. Sheth
Manoj Kumar Chinnakotla
Asif Ekbal
Srijan Kumar
30
14
0
19 Jul 2023
Previous
123...789...111213
Next