ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Communities
  3. ...

Neighbor communities

0 / 0 papers shown
Title
Top Contributors
Name# Papers# Citations
Social Events
DateLocationEvent
  1. Home
  2. Communities
  3. VLM

Vision-Language Models

VLM
More data

Models that can understand and generate both visual and textual information.

Neighbor communities

51015

Featured Papers

0 / 0 papers shown
Title

All papers

50 / 13,747 papers shown
Title
CausalCLIP: Causally-Informed Feature Disentanglement and Filtering for Generalizable Detection of Generated Images
CausalCLIP: Causally-Informed Feature Disentanglement and Filtering for Generalizable Detection of Generated Images
Bo Liu
Qiao Qin
Qinghui He
CMLVLM
8
0
0
15 Dec 2025
Textual Gradients are a Flawed Metaphor for Automatic Prompt Optimization
Textual Gradients are a Flawed Metaphor for Automatic Prompt Optimization
Daniel Melcer
Qi Chen
Wen-Hao Chiang
Shweta Garg
Pranav Garg
Christian Bock
VLM
4
0
0
15 Dec 2025
Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views
Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views
Tingyang Chen
Cong Fu
Jiahua Wu
Haotian Wu
Hua Fan
Xiangyu Ke
Yunjun Gao
Yabo Ni
Anxiang Zeng
VLM
4
0
0
15 Dec 2025
StarryGazer: Leveraging Monocular Depth Estimation Models for Domain-Agnostic Single Depth Image Completion
StarryGazer: Leveraging Monocular Depth Estimation Models for Domain-Agnostic Single Depth Image Completion
Sangmin Hong
Suyoung Lee
Kyoung Mu Lee
VLMMDE
4
0
0
15 Dec 2025
LongVie 2: Multimodal Controllable Ultra-Long Video World Model
LongVie 2: Multimodal Controllable Ultra-Long Video World Model
Jianxiong Gao
Zhaoxi Chen
Xian Liu
Junhao Zhuang
Chengming Xu
Jianfeng Feng
Yu Qiao
Yanwei Fu
Chenyang Si
Ziwei Liu
VGenSyDaVLM
0
0
0
15 Dec 2025
VLCache: Computing 2% Vision Tokens and Reusing 98% for Vision-Language Inference
VLCache: Computing 2% Vision Tokens and Reusing 98% for Vision-Language Inference
Shengling Qin
Hao Yu
Chenxin Wu
Zheng Li
Yizhong Cao
...
Yi Zhang
Zhengheng Wang
Shuai Bai
Jianwei Zhang
Junyang Lin
VLM
33
0
0
15 Dec 2025
DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model
DiG: Differential Grounding for Enhancing Fine-Grained Perception in Multimodal Large Language Model
Zhou Tao
Shida Wang
Yongxiang Hua
Haoyu Cao
Linli Xu
ObjDVLM
0
0
0
14 Dec 2025
Content-Aware Ad Banner Layout Generation with Two-Stage Chain-of-Thought in Vision Language Models
Content-Aware Ad Banner Layout Generation with Two-Stage Chain-of-Thought in Vision Language Models
Kei Yoshitake
Kento Hosono
Ken Kobayashi
Kazuhide Nakata
VLM
0
0
0
14 Dec 2025
FysicsWorld: A Unified Full-Modality Benchmark for Any-to-Any Understanding, Generation, and Reasoning
FysicsWorld: A Unified Full-Modality Benchmark for Any-to-Any Understanding, Generation, and Reasoning
Yue Jiang
Dingkang Yang
Minghao Han
Jinghang Han
Zizhi Chen
Yizhou Liu
Mingcheng Li
Peng Zhai
Lihua Zhang
VGenVLM
0
0
0
14 Dec 2025
Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching
Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching
Wonseok Choi
Sohwi Lim
Nam Hyeon-Woo
Moon Ye-Bin
Dong-Ju Jeong
Jinyoung Hwang
Tae-Hyun Oh
VLM
0
0
0
14 Dec 2025
Optimal Resource Allocation for ML Model Training and Deployment under Concept Drift
Optimal Resource Allocation for ML Model Training and Deployment under Concept Drift
Hasan Burhan Beytur
Gustavo de Veciana
Haris Vikalo
Kevin S Chan
VLM
4
0
0
14 Dec 2025
$β$-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment
βββ-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment
Fatimah Zohra
Chen Zhao
Hani Itani
Bernard Ghanem
CLIPVLM
69
0
0
14 Dec 2025
Adapting Multimodal Foundation Models for Few-Shot Learning: A Comprehensive Study on Contrastive Captioners
Adapting Multimodal Foundation Models for Few-Shot Learning: A Comprehensive Study on Contrastive Captioners
N.K.B.M.P.K.B. Narasinghe
Uthayasanker Thayasivam
OffRLVLM
60
0
0
14 Dec 2025
Efficient Vision-Language Reasoning via Adaptive Token Pruning
Efficient Vision-Language Reasoning via Adaptive Token Pruning
Xue Li
Xiaonan Song
Henry Hu
VLM
4
0
0
14 Dec 2025
Open Horizons: Evaluating Deep Models in the Wild
Open Horizons: Evaluating Deep Models in the Wild
Ayush Vaibhav Bhatti
Deniz Karakay
Debottama Das
Nilotpal Rajbongshi
Yuito Sugimoto
VLM
0
0
0
13 Dec 2025
More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models
More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models
Hoang Anh Just
Yifei Fan
Handong Zhao
Jiuxiang Gu
Ruiyi Zhang
Simon Jenni
Kushal Kafle
Ruoxi Jia
Jing Shi
ReLMVLMLRM
0
0
0
13 Dec 2025
The American Ghost in the Machine: How language models align culturally and the effects of cultural prompting
The American Ghost in the Machine: How language models align culturally and the effects of cultural prompting
James Luther
Donald Brown
VLM
4
0
0
13 Dec 2025
WeDetect: Fast Open-Vocabulary Object Detection as Retrieval
WeDetect: Fast Open-Vocabulary Object Detection as Retrieval
Shenghao Fu
Yukun Su
Fengyun Rao
Jing Lyu
Xiaohua Xie
Wei-Shi Zheng
ObjDVLM
7
0
0
13 Dec 2025
MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models
MetaTPT: Meta Test-time Prompt Tuning for Vision-Language Models
Yuqing Lei
Yingjun Du
Yawen Huang
Xiantong Zhen
Ling Shao
VLM
4
0
0
13 Dec 2025
Semantic Distance Measurement based on Multi-Kernel Gaussian Processes
Semantic Distance Measurement based on Multi-Kernel Gaussian Processes
Yinzhu Cheng
Haihua Xie
Yaqing Wang
Miao He
Mingming Sun
VLM
4
0
0
13 Dec 2025
MLLM Machine Unlearning via Visual Knowledge Distillation
MLLM Machine Unlearning via Visual Knowledge Distillation
Yuhang Wang
Zhenxing Niu
Haoxuan Ji
Guangyu He
Haichang Gao
Gang Hua
MUVLM
126
0
0
12 Dec 2025
xGR: Efficient Generative Recommendation Serving at Scale
xGR: Efficient Generative Recommendation Serving at Scale
Qingxiao Sun
Tongxuan Liu
Shen Zhang
Siyu Wu
Peijun Yang
...
Minchao Zhang
Xinyu Liu
Ke Zhang
Depei Qian
Hailong Yang
VLM
8
0
0
12 Dec 2025
Cross-modal Context-aware Learning for Visual Prompt Guided Multimodal Image Understanding in Remote Sensing
Cross-modal Context-aware Learning for Visual Prompt Guided Multimodal Image Understanding in Remote Sensing
Xu Zhang
Jiabin Fang
Zhuoming Ding
Jin Yuan
Xuan Liu
Qianjun Zhang
Zhiyong Li
VLM
12
0
0
12 Dec 2025
Seeing to Act, Prompting to Specify: A Bayesian Factorization of Vision Language Action Policy
Seeing to Act, Prompting to Specify: A Bayesian Factorization of Vision Language Action Policy
Kechun Xu
Zhenjie Zhu
Anzhe Chen
Shuqi Zhao
Qing Huang
Yifei Yang
Haojian Lu
Rong Xiong
Masayoshi Tomizuka
Yue Wang
VLM
0
0
0
12 Dec 2025
BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models
BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models
Xiaoyu Ma
Zhengqing Yuan
Zheyuan Zhang
Kaiwen Shi
Lichao Sun
Yanfang Ye
VLM
16
0
0
12 Dec 2025
Benchmarking the Generality of Vision-Language-Action Models
Benchmarking the Generality of Vision-Language-Action Models
Pranav Guruprasad
Sudipta Chowdhury
Harsh Sikka
Mridul Sharma
Helen Lu
Sean Rivera
Aryan Khurana
Hangliang Ren
Yangyue Wang
VLM
16
0
0
12 Dec 2025
Do We Need Reformer for Vision? An Experimental Comparison with Vision Transformers
Do We Need Reformer for Vision? An Experimental Comparison with Vision Transformers
Ali El Bellaj
Mohammed-Amine Cheddadi
Rhassan Berber
VLM
4
0
0
12 Dec 2025
Semantic search for 100M+ galaxy images using AI-generated captions
Semantic search for 100M+ galaxy images using AI-generated captions
Nolan Koblischke
Liam Parker
Francois Lanusse
Irina Espejo Morales
Jo Bovy
Shirley Ho
VLM
8
0
0
12 Dec 2025
PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction
PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction
Brandon Smock
Valerie Faucon-Morin
Max Sokolov
Libin Liang
Tayyibah Khanam
Maury Courtland
ViTLMTDVLM
12
0
0
11 Dec 2025
LabelFusion: Learning to Fuse LLMs and Transformer Classifiers for Robust Text Classification
LabelFusion: Learning to Fuse LLMs and Transformer Classifiers for Robust Text Classification
Michael Schlee
Christoph Weisser
Timo Kivimäki
Melchizedek Mashiku
Benjamin Saefken
VLM
36
0
0
11 Dec 2025
Multilingual VLM Training: Adapting an English-Trained VLM to French
Multilingual VLM Training: Adapting an English-Trained VLM to French
Jules Lahmi
Alexis Roger
VLM
8
0
0
11 Dec 2025
PIAST: Rapid Prompting with In-context Augmentation for Scarce Training data
PIAST: Rapid Prompting with In-context Augmentation for Scarce Training data
Pawel Batorski
Paul Swoboda
VLM
0
0
0
11 Dec 2025
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
Delong Chen
Mustafa Shukor
Theo Moutakanni
Willy Chung
Jade Yu
Tejaswi Kasarla
Allen Bolourchi
Yann LeCun
Pascale Fung
VLM
44
0
0
11 Dec 2025
Limits and Gains of Test-Time Scaling in Vision-Language Reasoning
Limits and Gains of Test-Time Scaling in Vision-Language Reasoning
Mohammadjavad Ahmadpour
Amirmahdi Meighani
Payam Taebi
Omid Ghahroodi
Amirmohammad Izadi
Mahdieh Soleymani Baghshah
LRMVLM
12
0
0
11 Dec 2025
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models
Tong Zhang
Carlos Hinojosa
Bernard Ghanem
DiffMVLM
57
0
0
11 Dec 2025
Self-Ensemble Post Learning for Noisy Domain Generalization
Self-Ensemble Post Learning for Noisy Domain Generalization
Wang Lu
Jindong Wang
OODVLM
44
0
0
11 Dec 2025
ClusIR: Towards Cluster-Guided All-in-One Image Restoration
ClusIR: Towards Cluster-Guided All-in-One Image Restoration
Shengkai Hu
Jiaqi Ma
Jun Wan
Wenwen Min
Yongcheng Jing
Lefei Zhang
Dacheng Tao
VLM
17
0
0
11 Dec 2025
Learning complete and explainable visual representations from itemized text supervision
Learning complete and explainable visual representations from itemized text supervision
Yiwei Lyu
Chenhui Zhao
Soumyanil Banerjee
Shixuan Liu
Akshay Rao
Akhil Kondepudi
Honglak Lee
Todd C. Hollon
CLIPVLM
4
0
0
11 Dec 2025
Vision-Language Models for Infrared Industrial Sensing in Additive Manufacturing Scene Description
Vision-Language Models for Infrared Industrial Sensing in Additive Manufacturing Scene Description
Nazanin Mahjourian
Vinh Nguyen
VLM
0
0
0
11 Dec 2025
Efficient-VLN: A Training-Efficient Vision-Language Navigation Model
Efficient-VLN: A Training-Efficient Vision-Language Navigation Model
Duo Zheng
Shijia Huang
Yanyang Li
Liwei Wang
VLM
36
0
0
11 Dec 2025
BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models
BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models
Shengao Wang
Wenqi Wang
Zecheng Wang
Max Whitton
Michael Wakeham
...
Aaron Mueller
Bryan A. Plummer
Kate Saenko
Venkatesh Saligrama
Boqing Gong
VLM
4
0
0
11 Dec 2025
VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
VLM-NCD:Novel Class Discovery with Vision-Based Large Language Models
Yuetong Su
Baoguo Wei
Xinyu Wang
Xu Li
Lixin Li
VLM
16
0
0
11 Dec 2025
HPM-KD: Hierarchical Progressive Multi-Teacher Framework for Knowledge Distillation and Efficient Model Compression
HPM-KD: Hierarchical Progressive Multi-Teacher Framework for Knowledge Distillation and Efficient Model Compression
Gustavo Coelho Haase
Paulo Henrique Dourado da Silva
VLM
12
0
0
10 Dec 2025
DeepSeek's WEIRD Behavior: The cultural alignment of Large Language Models and the effects of prompt language and cultural prompting
DeepSeek's WEIRD Behavior: The cultural alignment of Large Language Models and the effects of prompt language and cultural prompting
James Luther
Donald Brown
VLM
68
0
0
10 Dec 2025
Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation
Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation
Nadeem Nazer
Hongkuan Zhou
Lavdim Halilaj
Ylli Sadikaj
Steffen Staab
VLM
40
0
0
10 Dec 2025
ZeroOS: A Universal Modular Library OS for zkVMs
ZeroOS: A Universal Modular Library OS for zkVMs
Guangxian Zou
Isaac Zhang
Ryan Zarick
Kelvin Wong
Thomas Kim
Daniel L.-K. Wong
Saeid Yazdinejad
Dan Boneh
VLM
120
0
0
10 Dec 2025
GLaD: Geometric Latent Distillation for Vision-Language-Action Models
GLaD: Geometric Latent Distillation for Vision-Language-Action Models
Minghao Guo
Meng Cao
Jiachen Tao
Rongtao Xu
Yan Yan
Xiaodan Liang
Ivan Laptev
Xiaojun Chang
VLM
52
0
0
10 Dec 2025
MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata
MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata
Yihao Liu
Chenyu Gao
Lianrui Zuo
Michael E. Kim
Brian D. Boyd
...
Lori L. Beason-Held
Susan M. Resnick
Timothy J. Hohman
Warren D. Taylor
Bennett A. Landman
MedImVLM
196
0
0
10 Dec 2025
STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale
STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale
Han Chen
Steven Zhu
Yingrui Li
VLMLRM
8
0
0
10 Dec 2025
Independent Density Estimation
Independent Density Estimation
Jiahao Liu
VLM
16
0
0
10 Dec 2025
Loading #Papers per Month with "VLM"
Past speakers
Name (-)
Top Contributors
Name (-)
Top Organizations at ResearchTrend.AI
Name (-)
Social Events
DateLocationEvent
No social events available