Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.02779
Cited By
Unifying Vision-and-Language Tasks via Text Generation
4 February 2021
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Unifying Vision-and-Language Tasks via Text Generation"
50 / 368 papers shown
Title
Unifying 3D Vision-Language Understanding via Promptable Queries
Ziyu Zhu
Zhuofan Zhang
Xiaojian Ma
Xuesong Niu
Yixin Chen
Baoxiong Jia
Zhidong Deng
Siyuan Huang
Qing Li
48
21
0
19 May 2024
G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios
Zeyu Wang
Yuanchun Shi
Yuntao wang
Yuchen Yao
Kun Yan
Yuhan Wang
Lei Ji
Xuhai Xu
Chun Yu
40
7
0
13 May 2024
Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
Shibo Jie
Yehui Tang
Ning Ding
Zhi-Hong Deng
Kai Han
Yunhe Wang
VLM
33
6
0
09 May 2024
GOLD: Geometry Problem Solver with Natural Language Description
Jiaxin Zhang
Yashar Moshfeghi
ReLM
19
4
0
01 May 2024
3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset
Xinyu Ma
Xuebo Liu
Derek F. Wong
Jun Rao
Bei Li
Liang Ding
Lidia S. Chao
Dacheng Tao
Min Zhang
41
2
0
29 Apr 2024
ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images
Huy Quang Pham
Thang Kien-Bao Nguyen
Quan Van Nguyen
Dan Quang Tran
Nghia Hieu Nguyen
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
33
3
0
29 Apr 2024
Generative AI for Visualization: State of the Art and Future Directions
Yilin Ye
Jianing Hao
Yihan Hou
Zhan Wang
Shishi Xiao
Yuyu Luo
Wei Zeng
39
39
0
28 Apr 2024
BRAVE: Broadening the visual encoding of vision-language models
Ouguzhan Fatih Kar
A. Tonioni
Petra Poklukar
Achin Kulshrestha
Amir Zamir
Federico Tombari
MLLM
VLM
50
25
0
10 Apr 2024
Facial Affective Behavior Analysis with Instruction Tuning
Yifan Li
Anh Dao
Wentao Bao
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
CVBM
60
15
0
07 Apr 2024
Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
Rongjie Li
Yu Wu
Xuming He
MLLM
LRM
VLM
30
2
0
01 Apr 2024
Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering in Autonomous Driving
Akshay Gopalkrishnan
Ross Greer
Mohan M. Trivedi
VLM
46
21
0
28 Mar 2024
Semantic Map-based Generation of Navigation Instructions
Chengzu Li
Chao Zhang
Simone Teufel
R. Doddipatla
Svetlana Stoyanchev
34
2
0
28 Mar 2024
Toward Interactive Regional Understanding in Vision-Large Language Models
Jungbeom Lee
Sanghyuk Chun
Sangdoo Yun
VLM
28
1
0
27 Mar 2024
A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions
Shun Inadumi
Seiya Kawano
Akishige Yuguchi
Yasutomo Kawanishi
Koichiro Yoshino
36
1
0
26 Mar 2024
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu
Weihao Ye
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
MoE
49
1
0
22 Mar 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
150
310
0
21 Mar 2024
Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond
Tianxin Wei
Bowen Jin
Ruirui Li
Hansi Zeng
Zhengyang Wang
...
Qingyu Yin
Hanqing Lu
Suhang Wang
Jingrui He
Xianfeng Tang
45
15
0
15 Mar 2024
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Haiyang Wang
Hao Tang
Li Jiang
Shaoshuai Shi
Muhammad Ferjad Naeem
Hongsheng Li
Bernt Schiele
Liwei Wang
VLM
46
10
0
14 Mar 2024
Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
MoE
46
2
0
14 Mar 2024
ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning
Ahmed Masry
Mehrad Shahmohammadi
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
44
31
0
14 Mar 2024
Masked AutoDecoder is Effective Multi-Task Vision Generalist
Han Qiu
Jiaxing Huang
Peng Gao
Lewei Lu
Xiaoqin Zhang
Shijian Lu
51
4
0
12 Mar 2024
Multimodal Transformer for Comics Text-Cloze
Emanuele Vivoli
Joan Lafuente Baeza
Ernest Valveny Llobet
Dimosthenis Karatzas
38
4
0
06 Mar 2024
Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review
Iryna Hartsock
Ghulam Rasool
49
62
0
04 Mar 2024
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi
Qi Dong
Luis Goncalves
Zhuowen Tu
Stefano Soatto
VLM
47
3
0
04 Mar 2024
SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials
Wonjoong Kim
S. Park
Yeonjun In
Seokwon Han
Chanyoung Park
LRM
ReLM
32
3
0
22 Feb 2024
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation
Yunxin Li
Baotian Hu
Wenhan Luo
Lin Ma
Yuxin Ding
Min-Ling Zhang
53
1
0
21 Feb 2024
SoMeLVLM: A Large Vision Language Model for Social Media Processing
Xinnong Zhang
Haoyu Kuang
Xinyi Mou
Hanjia Lyu
Kun Wu
Siming Chen
Jiebo Luo
Xuanjing Huang
Zhongyu Wei
MLLM
36
5
0
20 Feb 2024
Visual In-Context Learning for Large Vision-Language Models
Yucheng Zhou
Xiang Li
Qianning Wang
Jianbing Shen
MLLM
27
58
0
18 Feb 2024
Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?
Tiantian Feng
Daniel Yang
Digbalay Bose
Shrikanth Narayanan
37
4
0
14 Feb 2024
Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search
Yifei Yuan
Clemencia Siro
Mohammad Aliannejadi
Maarten de Rijke
Wai Lam
21
6
0
12 Feb 2024
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Xiangxiang Chu
Limeng Qiao
Xinyu Zhang
Shuang Xu
Fei Wei
...
Xiaofei Sun
Yiming Hu
Xinyang Lin
Bo-Wen Zhang
Chunhua Shen
VLM
MLLM
33
98
0
06 Feb 2024
Towards Unified Interactive Visual Grounding in The Wild
Jie Xu
Hanbo Zhang
Qingyi Si
Yifeng Li
Xuguang Lan
Tao Kong
LM&Ro
30
5
0
30 Jan 2024
GAPS: Geometry-Aware Problem Solver
Jiaxin Zhang
Yinghui Jiang
Yashar Moshfeghi
AIMat
AI4CE
22
3
0
29 Jan 2024
Voila-A: Aligning Vision-Language Models with User's Gaze Attention
Kun Yan
Lei Ji
Zeyu Wang
Yuntao Wang
Nan Duan
Shuai Ma
58
8
0
22 Dec 2023
ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training
Rongsheng Wang
Qingsong Yao
Haoran Lai
Zhiyang He
Xiaodong Tao
Zihang Jiang
S.Kevin Zhou
VLM
MedIm
49
6
0
20 Dec 2023
p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models
Haoyuan Wu
Xinyun Zhang
Peng Xu
Peiyu Liao
Xufeng Yao
Bei Yu
VLM
19
0
0
17 Dec 2023
Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question Answering and Summarization
Do Xuan Long
Mohammad Hassanpour
Ahmed Masry
P. Kavehzadeh
Enamul Hoque
Shafiq R. Joty
LRM
30
9
0
17 Dec 2023
Exploration of visual prompt in Grounded pre-trained open-set detection
Qibo Chen
Weizhong Jin
Shuchang Li
Mengdi Liu
Li Yu
Jian Jiang
Xiaozheng Wang
VLM
15
0
0
14 Dec 2023
Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
Mohammad-Javad Davari
Eugene Belilovsky
MoMe
40
56
0
11 Dec 2023
DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers
Aaron Mir
Eduardo Alonso
Esther Mondragón
DiffM
38
2
0
11 Dec 2023
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun
Dohoon Kim
Taesup Moon
VLM
13
4
0
11 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq R. Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
41
45
0
30 Nov 2023
PALM: Predicting Actions through Language Models
Sanghwan Kim
Daoji Huang
Yongqin Xian
Otmar Hilliges
Luc Van Gool
Xi Wang
VLM
22
10
0
29 Nov 2023
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Cheng-Lin Liu
18
11
0
25 Nov 2023
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts
Yunshi Lan
Xiang Li
Xin Liu
Yang Li
Wei Qin
Weining Qian
LRM
ReLM
38
24
0
15 Nov 2023
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Renjie Pi
Lewei Yao
Jiahui Gao
Jipeng Zhang
Tong Zhang
MLLM
28
30
0
11 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
36
7
0
07 Nov 2023
DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
Ge Zheng
Bin Yang
Jiajin Tang
Hong-Yu Zhou
Sibei Yang
LRM
MLLM
29
93
0
25 Oct 2023
What's Left? Concept Grounding with Logic-Enhanced Foundation Models
Joy Hsu
Jiayuan Mao
Joshua B. Tenenbaum
Jiajun Wu
VLM
ReLM
LRM
30
21
0
24 Oct 2023
Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
37
2
0
24 Oct 2023
Previous
1
2
3
4
5
6
7
8
Next