ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.06336
  4. Cited By
Language Models are General-Purpose Interfaces

Language Models are General-Purpose Interfaces

13 June 2022
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
    MLLM
ArXivPDFHTML

Papers citing "Language Models are General-Purpose Interfaces"

50 / 88 papers shown
Title
Test-Time Visual In-Context Tuning
Test-Time Visual In-Context Tuning
Jiahao Xie
A. Tonioni
N. Rauschmayr
F. Tombari
Bernt Schiele
OOD
VLM
62
0
0
27 Mar 2025
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
Kaustubh Ponkshe
Raghav Singhal
Eduard A. Gorbunov
Alexey Tumanov
Samuel Horváth
Praneeth Vepakomma
66
1
0
29 Nov 2024
Generative Timelines for Instructed Visual Assembly
Generative Timelines for Instructed Visual Assembly
Alejandro Pardo
Jui-hsien Wang
Bernard Ghanem
Josef Sivic
Bryan C. Russell
Fabian Caba Heilbron
VGen
67
0
0
19 Nov 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
58
11
0
26 Sep 2024
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive
  Technology
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology
Xin Jiang
Junwei Zheng
Ruiping Liu
Jiahang Li
Jiaming Zhang
Sven Matthiesen
Rainer Stiefelhagen
VLM
23
0
0
21 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
36
1
0
19 Sep 2024
ExoViP: Step-by-step Verification and Exploration with Exoskeleton
  Modules for Compositional Visual Reasoning
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Y. Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
34
3
0
05 Aug 2024
Foundational Models for Pathology and Endoscopy Images: Application for
  Gastric Inflammation
Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation
H. Kerdegari
Kyle Higgins
Dennis Veselkov
I. Laponogov
I. Poļaka
...
Junior Andrea Pescino
M. Leja
M. Dinis-Ribeiro
T. F. Kanonnikoff
Kirill Veselkov
35
3
0
26 Jun 2024
Revisiting Referring Expression Comprehension Evaluation in the Era of
  Large Multimodal Models
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
Jierun Chen
Fangyun Wei
Jinjing Zhao
Sizhe Song
Bohuai Wu
Zhuoxuan Peng
S.-H. Gary Chan
Hongyang R. Zhang
33
8
0
24 Jun 2024
Towards Natural Language-Driven Assembly Using Foundation Models
Towards Natural Language-Driven Assembly Using Foundation Models
O. Joglekar
Tal Lancewicki
Shir Kozlovsky
Vladimir Tchuiev
Zohar Feldman
Dotan Di Castro
LM&Ro
31
0
0
23 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
32
0
0
01 Jun 2024
ARC: A Generalist Graph Anomaly Detector with In-Context Learning
ARC: A Generalist Graph Anomaly Detector with In-Context Learning
Yixin Liu
Shiyuan Li
Yu Zheng
Qingfeng Chen
Chengqi Zhang
Shirui Pan
35
10
0
27 May 2024
3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset
3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset
Junjie Zhang
Tianci Hu
Xiaoshui Huang
Yongshun Gong
Dan Zeng
38
2
0
23 Apr 2024
Private Attribute Inference from Images with Vision-Language Models
Private Attribute Inference from Images with Vision-Language Models
Batuhan Tömekçe
Mark Vero
Robin Staab
Martin Vechev
VLM
PILM
62
6
0
16 Apr 2024
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal
  Learning with Missing Modalities and Data Scarcity
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
Zhuo Zhi
Ziquan Liu
M. Elbadawi
Adam Daneshmend
Mine Orlu
Abdul Basit
Andreas Demosthenous
Miguel R. D. Rodrigues
34
2
0
14 Mar 2024
Toward Generalist Anomaly Detection via In-context Residual Learning
  with Few-shot Sample Prompts
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
Jiawen Zhu
Guansong Pang
VLM
55
35
0
11 Mar 2024
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
Liting Lin
Heng Fan
Zhipeng Zhang
Yaowei Wang
Yong-mei Xu
Haibin Ling
44
24
0
08 Mar 2024
Embodied Understanding of Driving Scenarios
Embodied Understanding of Driving Scenarios
Yunsong Zhou
Linyan Huang
Qingwen Bu
Jia Zeng
Tianyu Li
Hang Qiu
Hongzi Zhu
Minyi Guo
Yu Qiao
Hongyang Li
LM&Ro
55
31
0
07 Mar 2024
ByteComposer: a Human-like Melody Composition Method based on Language
  Model Agent
ByteComposer: a Human-like Melody Composition Method based on Language Model Agent
Xia Liang
Xingjian Du
Jiaju Lin
Pei Zou
Yuan Wan
Bilei Zhu
35
4
0
24 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
120
364
0
09 Feb 2024
Can MLLMs Perform Text-to-Image In-Context Learning?
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng
Wonjun Kang
Yicong Chen
Hyung Il Koo
Kangwook Lee
MLLM
28
9
0
02 Feb 2024
Small Language Model Meets with Reinforced Vision Vocabulary
Small Language Model Meets with Reinforced Vision Vocabulary
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
En Yu
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
VLM
57
40
0
23 Jan 2024
The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large
  Language Models
The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models
Kian Ahrabian
Zhivar Sourati
Kexuan Sun
Jiarui Zhang
Yifan Jiang
Fred Morstatter
Jay Pujara
LRM
26
9
0
22 Jan 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via
  Multi-modal Feature Synchronizer
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian
Xizhou Zhu
Yuwen Xiong
Weiyun Wang
Zhe Chen
...
Tong Lu
Jie Zhou
Hongsheng Li
Yu Qiao
Jifeng Dai
AuLLM
85
42
0
18 Jan 2024
VL-GPT: A Generative Pre-trained Transformer for Vision and Language
  Understanding and Generation
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Jinguo Zhu
Xiaohan Ding
Yixiao Ge
Yuying Ge
Sijie Zhao
Hengshuang Zhao
Xiaohua Wang
Ying Shan
ViT
VLM
13
32
0
14 Dec 2023
FoMo Rewards: Can we cast foundation models as reward functions?
FoMo Rewards: Can we cast foundation models as reward functions?
Ekdeep Singh Lubana
Johann Brehmer
P. D. Haan
Taco S. Cohen
OffRL
LRM
43
2
0
06 Dec 2023
CoVLM: Composing Visual Entities and Relationships in Large Language
  Models Via Communicative Decoding
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Junyan Li
Delin Chen
Yining Hong
Zhenfang Chen
Peihao Chen
Yikang Shen
Chuang Gan
MLLM
22
14
0
06 Nov 2023
Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large
  Language Models
Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large Language Models
Eren Unlu
Unver Ciftci
33
0
0
27 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
41
48
0
23 Oct 2023
How (not) to ensemble LVLMs for VQA
How (not) to ensemble LVLMs for VQA
Lisa Alazraki
Lluis Castrejon
Mostafa Dehghani
Fantine Huot
J. Uijlings
Thomas Mensink
27
3
0
10 Oct 2023
Kosmos-G: Generating Images in Context with Multimodal Large Language
  Models
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Xichen Pan
Li Dong
Shaohan Huang
Zhiliang Peng
Wenhu Chen
Furu Wei
VLM
6
62
0
04 Oct 2023
Making LLaMA SEE and Draw with SEED Tokenizer
Making LLaMA SEE and Draw with SEED Tokenizer
Yuying Ge
Sijie Zhao
Ziyun Zeng
Yixiao Ge
Chen Li
Xintao Wang
Ying Shan
32
128
0
02 Oct 2023
Self-Supervised Open-Ended Classification with Small Visual Language
  Models
Self-Supervised Open-Ended Classification with Small Visual Language Models
Mohammad Mahdi Derakhshani
Ivona Najdenkoska
Cees G. M. Snoek
M. Worring
Yuki M. Asano
VLM
22
0
0
30 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
34
170
0
20 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
25
63
0
20 Sep 2023
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual
  Tokenization
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
Yang Jin
Kun Xu
Kun Xu
Liwei Chen
Chao Liao
...
Xiaoqiang Lei
Di Zhang
Wenwu Ou
Kun Gai
Yadong Mu
MLLM
VLM
14
41
0
09 Sep 2023
PointLLM: Empowering Large Language Models to Understand Point Clouds
PointLLM: Empowering Large Language Models to Understand Point Clouds
Runsen Xu
Xiaolong Wang
Tai Wang
Yilun Chen
Jiangmiao Pang
Dahua Lin
MLLM
51
149
0
31 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and
  Understanding of the Open World
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
33
84
0
03 Aug 2023
Multimodal Neurons in Pretrained Text-Only Transformers
Multimodal Neurons in Pretrained Text-Only Transformers
Sarah Schwettmann
Neil Chowdhury
Samuel J. Klein
David Bau
Antonio Torralba
MILM
30
27
0
03 Aug 2023
When Large Language Models Meet Personalization: Perspectives of
  Challenges and Opportunities
When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities
Jin Chen
Zheng Liu
Xunpeng Huang
Chenwang Wu
Qi Liu
...
Yuxuan Lei
Xiaolong Chen
Xingmei Wang
Defu Lian
Enhong Chen
ALM
24
110
0
31 Jul 2023
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
  Control
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Xi Chen
...
Ted Xiao
Peng-Tao Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
LRM
30
1,091
0
28 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
26
118
0
25 Jul 2023
Retentive Network: A Successor to Transformer for Large Language Models
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
58
301
0
17 Jul 2023
Emu: Generative Pretraining in Multimodality
Emu: Generative Pretraining in Multimodality
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
32
126
0
11 Jul 2023
AmadeusGPT: a natural language interface for interactive animal
  behavioral analysis
AmadeusGPT: a natural language interface for interactive animal behavioral analysis
Shaokai Ye
Jessy Lauer
Mu Zhou
Alexander Mathis
Mackenzie W. Mathis
MLLM
LLMAG
27
17
0
10 Jul 2023
Unified Language Representation for Question Answering over Text,
  Tables, and Images
Unified Language Representation for Question Answering over Text, Tables, and Images
Yu Bowen
Cheng Fu
Haiyang Yu
Fei Huang
Yongbin Li
LMTD
22
20
0
29 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
36
694
0
26 Jun 2023
Encyclopedic VQA: Visual questions about detailed properties of
  fine-grained categories
Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories
Thomas Mensink
J. Uijlings
Lluis Castrejon
A. Goel
Felipe Cadar
Howard Zhou
Fei Sha
A. Araújo
V. Ferrari
34
37
0
15 Jun 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
  Framework, and Benchmark
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
Zhen-fei Yin
Jiong Wang
Jianjian Cao
Zhelun Shi
Dingning Liu
...
Lei Bai
Xiaoshui Huang
Zhiyong Wang
Jing Shao
Wanli Ouyang
MLLM
22
152
0
11 Jun 2023
ImageNetVC: Zero- and Few-Shot Visual Commonsense Evaluation on 1000
  ImageNet Categories
ImageNetVC: Zero- and Few-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Heming Xia
Qingxiu Dong
Lei Li
Jingjing Xu
Tianyu Liu
Ziwei Qin
Zhifang Sui
MLLM
VLM
16
3
0
24 May 2023
12
Next