ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.16410
  4. Cited By
Towards Language Models That Can See: Computer Vision Through the LENS
  of Natural Language

Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language

28 June 2023
William Berrios
Gautam Mittal
Tristan Thrush
Douwe Kiela
Amanpreet Singh
    MLLMVLM
ArXiv (abs)PDFHTMLGithub (350★)

Papers citing "Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language"

45 / 45 papers shown
Title
How Can Objects Help Video-Language Understanding?
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
117
1
0
10 Apr 2025
A Survey of Large Language Models in Mental Health Disorder Detection on Social Media
A Survey of Large Language Models in Mental Health Disorder Detection on Social Media
Zhuohan Ge
Nicole Hu
Darian Li
Yubo Wang
Shihao Qi
Yuming Xu
Han Shi
Junxuan Zhang
AI4MH
119
0
0
03 Apr 2025
Deep Learning for Climate Action: Computer Vision Analysis of Visual Narratives on X
Katharina Prasse
Marcel Kleinmann
Inken Adam
Kerstin Beckersjuergen
Andreas Edte
...
Timotheus Gumpp
Steffen Jung
Isaac Bravo
Stefanie Walter
Margret Keuper
72
0
0
12 Mar 2025
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru
Dunant Cusipuma
David Ortega
Victor Flores-Benites
Arturo Deza
OOD
159
0
0
10 Mar 2025
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
Yuqi Pang
Bowen Yang
Haoqin Tu
Yun Cao
Zeyu Zhang
LRMMLLM
99
0
0
17 Feb 2025
Improving Fine-grained Visual Understanding in VLMs through Text-Only
  Training
Improving Fine-grained Visual Understanding in VLMs through Text-Only Training
Dasol Choi
Guijin Son
Soo Yong Kim
Gio Paik
Seunghyeok Hong
VLMCoGe
96
1
0
17 Dec 2024
The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning
Longju Bai
Angana Borah
Oana Ignat
Rada Mihalcea
VLM
136
3
0
18 Nov 2024
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with
  Captions in 28 Languages
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
Youssef Mohamed
Runjia Li
Ibrahim Said Ahmad
Kilichbek Haydarov
Philip Torr
Kenneth Church
Mohamed Elhoseiny
VLM
94
11
0
06 Nov 2024
Rethinking Sparse Lexical Representations for Image Retrieval in the Age
  of Rising Multi-Modal Large Language Models
Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models
K. Nakata
Daisuke Miyashita
Youyang Ng
Yasuto Hoshi
J. Deguchi
64
0
0
29 Aug 2024
Target Prompting for Information Extraction with Vision Language Model
Target Prompting for Information Extraction with Vision Language Model
Dipankar Medhi
VLM
62
0
0
07 Aug 2024
SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large
  Language Models
SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models
Zheng Lin
Xuanjie Hu
Yuxin Zhang
Zhe Chen
Zihan Fang
Xianhao Chen
Ang Li
Praneeth Vepakomma
Yue Gao
96
37
0
01 Jul 2024
RITA: A Real-time Interactive Talking Avatars Framework
RITA: A Real-time Interactive Talking Avatars Framework
Wuxinlin Cheng
Cheng Wan
Yupeng Cao
Sihan Chen
75
0
0
18 Jun 2024
GPT-4o: Visual perception performance of multimodal large language
  models in piglet activity understanding
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
Yiqi Wu
Xiaodan Hu
Ziming Fu
Siling Zhou
Jiangong Li
MLLM
70
12
0
14 Jun 2024
Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in
  Recognizing Color-Emotion Associations
Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in Recognizing Color-Emotion Associations
Hanna-Sophia Widhoelzl
Ece Takmaz
71
2
0
10 May 2024
Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in
  Radiology with General-Domain Large Language Model
Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model
Seonhee Cho
Choonghan Kim
Jiho Lee
Chetan Chilkunda
Sujin Choi
Joo Heung Yoon
75
1
0
29 Apr 2024
Towards Incremental Learning in Large Language Models: A Critical Review
Towards Incremental Learning in Large Language Models: A Critical Review
M. Jovanovic
Peter Voss
ELMCLLKELM
116
5
0
28 Apr 2024
Leveraging Large Language Models for Multimodal Search
Leveraging Large Language Models for Multimodal Search
Oriol Barbany
Michael Huang
Xinliang Zhu
Arnab Dhua
92
10
0
24 Apr 2024
BLINK: Multimodal Large Language Models Can See but Not Perceive
BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu
Yushi Hu
Bangzheng Li
Yu Feng
Haoyu Wang
Xudong Lin
Dan Roth
Noah A. Smith
Wei-Chiu Ma
Ranjay Krishna
VLMLRMMLLM
148
150
0
18 Apr 2024
Light the Night: A Multi-Condition Diffusion Framework for Unpaired
  Low-Light Enhancement in Autonomous Driving
Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
Jinlong Li
Baolu Li
Zhengzhong Tu
Xinyu Liu
Qing Guo
Felix Juefei Xu
Runsheng Xu
Hongkai Yu
DiffM
123
26
0
07 Apr 2024
TINA: Think, Interaction, and Action Framework for Zero-Shot Vision
  Language Navigation
TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation
Dingbang Li
Wenzhou Chen
Xin Lin
LLMAGLM&Ro
77
4
0
13 Mar 2024
Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized
  Visual Class Discovery
Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery
Haiyang Zheng
Nan Pu
Wenjing Li
N. Sebe
Zhun Zhong
91
7
0
12 Mar 2024
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
QiHao Zhao
Yalun Dai
Hao Li
Wei Hu
Fan Zhang
Jun Liu
87
17
0
09 Mar 2024
Intelligent Director: An Automatic Framework for Dynamic Visual
  Composition using ChatGPT
Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT
Sixiao Zheng
Jingyang Huo
Yu Wang
Yanwei Fu
VGenDiffM
69
1
0
24 Feb 2024
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large
  Language Models
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models
Didi Zhu
Zhongyi Sun
Zexi Li
Tao Shen
Ke Yan
Shouhong Ding
Kun Kuang
Chao Wu
CLLKELMMoMe
124
31
0
19 Feb 2024
Question-Instructed Visual Descriptions for Zero-Shot Video Question
  Answering
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
David Romero
Thamar Solorio
146
4
0
16 Feb 2024
Similarity-based Neighbor Selection for Graph LLMs
Similarity-based Neighbor Selection for Graph LLMs
Rui Li
Jiwei Li
Jiawei Han
Guoyin Wang
60
5
0
06 Feb 2024
Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting
  the Variation in Human Signals during Visuo-Linguistic Processes
Describing Images Fast and Slow\textit{Fast and Slow}Fast and Slow: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes
Ece Takmaz
Sandro Pezzelle
Raquel Fernández
24
1
0
02 Feb 2024
How Can Large Language Models Understand Spatial-Temporal Data?
How Can Large Language Models Understand Spatial-Temporal Data?
Lei Liu
Shuo Yu
Runze Wang
Zhenxun Ma
Yanming Shen
AI4TS
69
27
0
25 Jan 2024
Democratizing Fine-grained Visual Recognition with Large Language Models
Democratizing Fine-grained Visual Recognition with Large Language Models
Mingxuan Liu
Subhankar Roy
Wenjing Li
Zhun Zhong
N. Sebe
Elisa Ricci
VLM
106
13
0
24 Jan 2024
Image Safeguarding: Reasoning with Conditional Vision Language Model and
  Obfuscating Unsafe Content Counterfactually
Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually
Mazal Bethany
Brandon Wherry
Nishant Vishwamitra
Peyman Najafirad
DiffM
58
4
0
19 Jan 2024
VLLaVO: Mitigating Visual Gap through LLMs
VLLaVO: Mitigating Visual Gap through LLMs
Shuhao Chen
Yulong Zhang
Weisen Jiang
Jiangang Lu
Yu Zhang
VLM
123
2
0
06 Jan 2024
MIND: Multi-Task Incremental Network Distillation
MIND: Multi-Task Incremental Network Distillation
Jacopo Bonato
Francesco Pelosin
Luigi Sabetta
Alessandro Nicolosi
CLL
98
10
0
05 Dec 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of
  Vision-Language Models
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Liu
EgoVLRM
112
20
0
27 Nov 2023
Leveraging Diffusion Perturbations for Measuring Fairness in Computer
  Vision
Leveraging Diffusion Perturbations for Measuring Fairness in Computer Vision
Nicholas Lui
Bryan Chia
William Berrios
Candace Ross
Douwe Kiela
53
2
0
25 Nov 2023
GATGPT: A Pre-trained Large Language Model with Graph Attention Network
  for Spatiotemporal Imputation
GATGPT: A Pre-trained Large Language Model with Graph Attention Network for Spatiotemporal Imputation
Yakun Chen
Xianzhi Wang
Guandong Xu
AI4TS
110
32
0
24 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shangwen Wang
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
116
7
0
10 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei
Chenxi Liu
Siyuan Qiao
Zhishuai Zhang
Alan Yuille
Jiahui Yu
VLMDiffM
103
11
0
01 Nov 2023
Defining a New NLP Playground
Defining a New NLP Playground
Sha Li
Chi Han
Pengfei Yu
Carl Edwards
Manling Li
...
Yi R. Fung
Charles Yu
Joel R. Tetreault
Eduard H. Hovy
Heng Ji
120
5
0
31 Oct 2023
Woodpecker: Hallucination Correction for Multimodal Large Language
  Models
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Tong Xu
Hao Wang
Dianbo Sui
Yunhang Shen
Ke Li
Xingguo Sun
Enhong Chen
VLMMLLM
108
133
0
24 Oct 2023
Vision-by-Language for Training-Free Compositional Image Retrieval
Vision-by-Language for Training-Free Compositional Image Retrieval
Shyamgopal Karthik
Karsten Roth
Massimiliano Mancini
Zeynep Akata
CoGe
115
61
0
13 Oct 2023
Rephrase, Augment, Reason: Visual Grounding of Questions for
  Vision-Language Models
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
ReLMLRM
76
8
0
09 Oct 2023
Application of frozen large-scale models to multimodal task-oriented
  dialogue
Application of frozen large-scale models to multimodal task-oriented dialogue
Tatsuki Kawamoto
Takuma Suzuki
Ko Miyama
Takumi Meguro
Tomohiro Takagi
61
1
0
02 Oct 2023
Language as the Medium: Multimodal Video Classification through text
  only
Language as the Medium: Multimodal Video Classification through text only
Laura Hanu
A. Vero
James Thewlis
74
3
0
19 Sep 2023
Investigating the Catastrophic Forgetting in Multimodal Large Language
  Models
Investigating the Catastrophic Forgetting in Multimodal Large Language Models
Yuexiang Zhai
Shengbang Tong
Xiao Li
Mu Cai
Qing Qu
Yong Jae Lee
Yi Ma
VLMMLLMCLL
171
88
0
19 Sep 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
146
127
0
25 Jul 2023
1