Neighbor communities
0 / 0 papers shown
Top Contributors
| Name | # Papers | # Citations |
|---|---|---|
Social Events
| Date | Location | Event |
|---|---|---|
| Name | # Papers | # Citations |
|---|---|---|
| Date | Location | Event |
|---|---|---|
Study and develop models that can generalize to unseen compositions of known concepts.
Disentanglement by means of action-induced representations Gorka Muñoz-Gil Hendrik Poulsen Nautrup Arunava Majumder Paulin de Schoulepnikoff Florian Fürrutter Marius Krumm Hans J. Briegel | |||
VRIQ: Benchmarking and Analyzing Visual-Reasoning IQ of VLMs Tina Khezresmaeilzadeh Jike Zhong Konstantinos Psounis | |||
Disentangled Representation Learning via Flow Matching Jinjin Chi Taoping Liu Mengtao Yin Ximing Li Yongcheng Jing Dacheng Tao | |||
VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text? Qingán Liu Juntong Feng Yuhao Wang Xinzhe Han Yujie Cheng Yue Zhu Haiwen Diao Yunzhi Zhuge Huchuan Lu | |||
Sequential Group Composition: A Window into the Mechanics of Deep Learning Giovanni Luca Marchetti Daniel Kunin Adele Myers Francisco Acosta Nina Miolane | |||
Auto-Comp: An Automated Pipeline for Scalable Compositional Probing of Contrastive Vision-Language Models Cristian Sbrolli Matteo Matteucci Toshihiko Yamasaki | |||
SANEval: Open-Vocabulary Compositional Benchmarks with Failure-mode Diagnosis Rishav Pramanik Ian E. Nielsen Jeff Smith Saurav Pandit Ravi P. Ramachandran Zhaozheng Yin | |||
FlexCausal: Flexible Causal Disentanglement via Structural Flow Priors and Manifold-Aware Interventions Yutao Jin Yuang Tao Junyong Zhai | |||
XFACTORS: Disentangled Information Bottleneck via Contrastive Supervision Alexandre Myara Nicolas Bourriez Thomas Boyer Thomas Lemercier Ihab Bendidi Auguste Genovesio | |||
Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs Chi Zhang Wenxuan Ding Jiale Liu Mingrui Wu Qingyun Wu Ray Mooney | |||
LOGICAL-COMMONSENSEQA: A Benchmark for Logical Commonsense Reasoning Obed Junias Maria Leonor Pacheco | |||
Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing Tingyu Song Yanzhao Zhang Mingxin Li Zhuoning Guo Dingkun Long Pengjun Xie Siyue Zhang Yilun Zhao Shu Wu | |||
ConceptCaps: a Distilled Concept Dataset for Interpretability in Music Models Bruno Sienkiewicz Łukasz Neumann Mateusz Modrzejewski | |||
The Spatial Blindspot of Vision-Language Models Nahid Alam Leema Krishna Murali Siddhant Bharadwaj Patrick Liu Timothy Chung Drishti Sharma Akshata A Kranthi Kiran Wesley Tam Bala Krishna S Vegesna | |||
CtD: Composition through Decomposition in Emergent CommunicationInternational Conference on Learning Representations (ICLR), 2026 Boaz Carmeli Ron Meir Yonatan Belinkov | |||
Beyond Accuracy: Evaluating Grounded Visual Evidence in Thinking with Images Xuchen Li Xuzhao Li Renjie Pi Shiyu Hu Jian Zhao Jiahui Gao | |||
VULCA-Bench: A Multicultural Vision-Language Benchmark for Evaluating Cultural Understanding Haorui Yu Ramon Ruiz-Dolz Diji Yang Hang He Fengrui Zhang Qiufeng Yi | |||
LitVISTA: A Benchmark for Narrative Orchestration in Literary Text Mingzhe Lu Yiwen Wang Yanbing Liu Qi You Chong Liu ...Haoyu Dong Wenyu Zhang Jiarui Zhang Yue Hu Yunpeng Li | |||
Boosting Latent Diffusion Models via Disentangled Representation Alignment John Page Xuesong Niu Kai Wu Kun Gai | |||
V-FAT: Benchmarking Visual Fidelity Against Text-bias Ziteng Wang Yujie He Guanliang Li Siqi Yang Jiaqi Xiong Songxiang Liu | |||
Eye-Q: A Multilingual Benchmark for Visual Word Puzzle Solving and Image-to-Phrase Reasoning Ali Najar Alireza Mirrokni Arshia Izadyari Sadegh Mohammadian Amir Homayoon Sharifizade Asal Meskin Mobin Bagherian Ehsaneddin Asgari | |||
Exploring Compositionality in Vision Transformers using Wavelet Representations Akshad Shyam Purushottamdas Pranav K Nayak Divya Mehul Rajparia Deekshith Patel Yashmitha Gogineni Konda Reddy Mopuri Sumohana S. Channappayya | |||
Same or Not? Enhancing Visual Perception in Vision-Language Models Damiano Marsili Aditya Mehta Ryan Y. Lin Georgia Gkioxari | |||
VisRes Bench: On Evaluating the Visual Reasoning Capabilities of VLMs Brigitta Malagurski Törtei Yasser Dahou Ngoc Dung Huynh Wamiq Reyaz Para Phúc H. Lê Khac Ankit Singh Sofian Chaybouti Sanath Narayan | |||
VL4Gaze: Unleashing Vision-Language Models for Gaze Following Shijing Wang Chaoqun Cui Yaping Huang Hyung Jin Chang Yihua Cheng | |||
Self-Attention with State-Object Weighted Combination for Compositional Zero Shot Learning Cheng-Hong Chang Pei-Hsuan Tsai | |||
TextEditBench: Evaluating Reasoning-aware Text Editing Beyond Rendering Rui Gui Yang Wan Haochen Han Dongxing Mao Fangming Liu Min Li Alex Jinpeng Wang | |||
DeX-Portrait: Disentangled and Expressive Portrait Animation via Explicit and Latent Motion Representations Yuxiang Shi Zhe Li Yanwen Wang Hao Zhu Xun Cao Ligang Liu | |||
From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts? Aaron Mueller Andrew Lee Shruti Joshi Ekdeep Singh Lubana Dhanya Sridhar Patrik Reizinger | |||
Infinity and Beyond: Compositional Alignment in VAR and Diffusion T2I Models Hossein Shahabadi Niki Sepasian Arash Marioriyad Ali Sharifi-Zarchi Mahdieh Soleymani Baghshah | |||
FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint Jiapeng Tang Kai Li Chengxiang Yin Liuhao Ge Fei Jiang ...Matthias Nießner Christian Häne Timur Bagautdinov Egor Zakharov Peihong Guo | |||
Learning by Analogy: A Causal Framework for Composition Generalization Lingjing Kong Shaoan Xie Yang Jiao Yetian Chen Yanhui Guo Simone Shao Yan Gao Guangyi Chen Kun Zhang | |||
Disentangled and Distilled Encoder for Out-of-Distribution Reasoning with Rademacher Guarantees Zahra Rahiminasab Michael Yuhas Arvind Easwaran | |||
Composing Concepts from Images and Videos via Concept-prompt Binding Xianghao Kong Zeyu Zhang Yuwei Guo Zhuoran Zhao Songchun Zhang Anyi Rao | |||
VisualActBench: Can VLMs See and Act like a Human? Daoan Zhang Pai Liu Xiaofei Zhou Yuan Ge Guangchen Lan Jing Bi Christopher Brinton Ehsan Hoque Jiebo Luo | |||
AgentComp: From Agentic Reasoning to Compositional Mastery in Text-to-Image Models Arman Zarei Jiacheng Pan Matthew Gwilliam Soheil Feizi Zhenheng Yang | |||
MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition Xinyu Wei Kangrui Cen Hongyang Wei Zhen Guo Bairui Li Zeqing Wang Jinrui Zhang Lei Zhang | |||
Relational Visual Similarity Thao Nguyen Sicheng Mo Krishna Kumar Singh Yilin Wang Jing Shi Nicholas Kolkin Eli Shechtman Yong Jae Lee Yuheng Li | |||
VisChainBench: A Benchmark for Multi-Turn, Multi-Image Visual Reasoning Beyond Language Priors Wenbo Lyu Yingjun Du Jinglin Zhao Xianton Zhen Ling Shao | |||
Inferring Compositional 4D Scenes without Ever Seeing One Ahmet Berke Gokmen Ajad Chhatkuli Luc Van Gool Danda Pani Paudel | |||
ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged Images Yunfei Zhang Yizhuo He Yuanxun Shao Zhengtao Yao Haoyan Xu Junhao Dong Zhen Yao Zhikang Dong | |||
| Name (-) |
|---|
| Name (-) |
|---|
| Name (-) |
|---|
| Date | Location | Event | |
|---|---|---|---|
| No social events available | |||