Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1901.06706
Cited By
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
20 January 2019
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Entailment: A Novel Task for Fine-Grained Image Understanding"
50 / 230 papers shown
Title
Task-Core Memory Management and Consolidation for Long-term Continual Learning
Tianyu Huai
Jie Zhou
Yuxuan Cai
Qin Chen
Wen Wu
Xingjiao Wu
Xipeng Qiu
Liang He
CLL
30
0
0
15 May 2025
What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift
Jiamin Chang
Hao Li
Hammond Pearce
Ruoxi Sun
Bo-wen Li
Minhui Xue
38
0
0
28 Apr 2025
LiveVQA: Live Visual Knowledge Seeking
Mingyang Fu
Yuyang Peng
Benlin Liu
Yao Wan
Danny Chen
28
0
0
07 Apr 2025
Taxonomy-Aware Evaluation of Vision-Language Models
Vésteinn Snæbjarnarson
Kevin Du
Niklas Stoehr
Serge J. Belongie
Ryan Cotterell
Nico Lang
Stella Frank
32
0
0
07 Apr 2025
Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning
Yang Liu
Xinming Zhang
Jiale Du
Xinbo Gao
Jungong Han
OODD
49
0
0
28 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
135
0
0
02 Mar 2025
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
Zeqing Wang
Wentao Wan
Qiqing Lao
Runmeng Chen
Minjie Lang
Keze Wang
Liang Lin
Liang Lin
LRM
103
3
0
17 Feb 2025
PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Hammad A. Ayyubi
Xuande Feng
Junzhang Liu
Xudong Lin
Zhecan Wang
Shih-Fu Chang
45
0
0
24 Jan 2025
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Yue Zhang
Ziqiao Ma
Jialu Li
Yanyuan Qiao
Zun Wang
J. Chai
Qi Wu
Joey Tianyi Zhou
Parisa Kordjamshidi
LRM
63
18
0
31 Dec 2024
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
Yue Zhang
Liqiang Jing
Vibhav Gogate
116
2
0
19 Dec 2024
Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language Model's Prediction Rationality
Qitong Wang
Tang Li
Kien X. Nguyen
Xi Peng
85
0
0
17 Dec 2024
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
29
0
0
11 Nov 2024
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
Sara Ghaboura
Ahmed Heakl
Omkar Thawakar
Ali Alharthi
Ines Riahi
Abduljalil Saif
Jorma T. Laaksonen
F. Khan
Salman Khan
Rao Muhammad Anwer
45
1
0
24 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
56
9
0
16 Oct 2024
Efficient and Effective Universal Adversarial Attack against Vision-Language Pre-training Models
Fan Yang
Yihao Huang
Kaidi Wang
Ling Shi
G. Pu
Yang Liu
Haoran Wang
AAML
VLM
23
2
0
15 Oct 2024
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy
Hong Li
Zhiquan Tan
Xingyu Li
Weiran Huang
CLL
MoMe
43
1
0
14 Oct 2024
Recent Advances of Multimodal Continual Learning: A Comprehensive Survey
Dianzhi Yu
Xinni Zhang
Yankai Chen
Aiwei Liu
Yifei Zhang
Philip S. Yu
Irwin King
VLM
CLL
44
9
0
07 Oct 2024
NL-Eye: Abductive NLI for Images
Mor Ventura
Michael Toker
Nitay Calderon
Zorik Gekhman
Yonatan Bitton
Roi Reichart
28
1
0
03 Oct 2024
M
2
^2
2
PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
Taowen Wang
Yiyang Liu
James Liang
Junhan Zhao
Yiming Cui
...
Zenglin Xu
Cheng Han
Lifu Huang
Qifan Wang
Dongfang Liu
MLLM
VLM
LRM
22
16
0
24 Sep 2024
Finetuning CLIP to Reason about Pairwise Differences
Dylan Sam
Devin Willmott
João Dias Semedo
J. Zico Kolter
VLM
71
3
0
15 Sep 2024
Reference-free Hallucination Detection for Large Vision-Language Models
Qing Li
Chenyang Lyu
Jiahui Geng
Derui Zhu
Maxim Panov
Fakhri Karray
24
6
0
11 Aug 2024
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks
Juhwan Choi
Junehyoung Kwon
Jungmin Yun
Seunguk Yu
Youngbin Kim
43
1
0
29 Jul 2024
I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction
Zaiqiao Meng
Hao Zhou
Yifang Chen
37
4
0
19 Jul 2024
Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning
Mustafa Dogan
.Ilker Kesen
Iacer Calixto
Aykut Erdem
Erkut Erdem
LRM
29
1
0
17 Jul 2024
Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison
Qian Yang
Weixiang Yan
Aishwarya Agrawal
CoGe
26
4
0
10 Jul 2024
Explainable Image Recognition via Enhanced Slot-attention Based Classifier
Bowen Wang
Liangzhi Li
Jiahao Zhang
Yuta Nakashima
Hajime Nagahara
OCL
44
0
0
08 Jul 2024
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
VLM
CoGe
35
5
0
01 Jul 2024
Transferring Knowledge from Large Foundation Models to Small Downstream Models
Shikai Qiu
Boran Han
Danielle C. Maddix
Shuai Zhang
Yuyang Wang
Andrew Gordon Wilson
38
1
0
11 Jun 2024
Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation
Jinyuan Li
Ziyan Li
Han Li
Jianfei Yu
Rui Xia
Di Sun
Gang Pan
40
2
0
11 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
40
13
0
08 Jun 2024
Universal Adversarial Perturbations for Vision-Language Pre-trained Models
Pengfei Zhang
Zi Huang
Guangdong Bai
AAML
39
11
0
09 May 2024
POV Learning: Individual Alignment of Multimodal Models using Human Perception
Simon Werner
Katharina Christ
Laura Bernardy
Marion G. Müller
Achim Rettinger
21
0
0
07 May 2024
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
Wonjae Kim
Sanghyuk Chun
Taekyung Kim
Dongyoon Han
Sangdoo Yun
44
7
0
26 Apr 2024
MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning
Yifan Jiang
Jiarui Zhang
Kexuan Sun
Zhivar Sourati
Kian Ahrabian
Kaixin Ma
Filip Ilievski
Jay Pujara
LRM
37
11
0
21 Apr 2024
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination
Dingchen Yang
Bowen Cao
Guang Chen
Changjun Jiang
51
7
0
21 Mar 2024
Effectiveness Assessment of Recent Large Vision-Language Models
Yao Jiang
Xinyu Yan
Ge-Peng Ji
Keren Fu
Meijun Sun
Huan Xiong
Deng-Ping Fan
Fahad Shahbaz Khan
34
14
0
07 Mar 2024
VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing
Zhiyuan Chang
Mingyang Li
Junjie Wang
Cheng Li
Qing Wang
22
0
0
05 Mar 2024
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning
Kate Sanders
Nathaniel Weir
Benjamin Van Durme
LRM
31
11
0
29 Feb 2024
Probing Multimodal Large Language Models for Global and Local Semantic Representations
Mingxu Tao
Quzhe Huang
Kun Xu
Liwei Chen
Yansong Feng
Dongyan Zhao
26
5
0
27 Feb 2024
ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks
Yang Liu
Xiaomin Yu
Gongyu Zhang
Christos Bergeles
Prokar Dasgupta
Alejandro Granados
Sebastien Ourselin
45
2
0
27 Feb 2024
Multimodal Instruction Tuning with Conditional Mixture of LoRA
Ying Shen
Zhiyang Xu
Qifan Wang
Yu Cheng
Wenpeng Yin
Lifu Huang
39
13
0
24 Feb 2024
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Yunxin Li
Xinyu Chen
Baotian Hu
Haoyuan Shi
Min-Ling Zhang
44
3
0
21 Feb 2024
Assessing News Thumbnail Representativeness: Counterfactual text can enhance the cross-modal matching ability
Yejun Yoon
Seunghyun Yoon
Kunwoo Park
21
0
0
17 Feb 2024
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition
Jinyuan Li
Han Li
Di Sun
Jiahao Wang
Wenkun Zhang
Zan Wang
Gang Pan
34
7
0
15 Feb 2024
LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering
Yuhan Chen
Lumei Su
Lihua Chen
Zhiwei Lin
MLLM
17
1
0
29 Jan 2024
Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks
Yuliang Cai
Mohammad Rostami
33
4
0
27 Jan 2024
Memory-Inspired Temporal Prompt Interaction for Text-Image Classification
Xinyao Yu
Hao Sun
Ziwei Niu
Rui Qin
Zhenjia Bai
Yen-Wei Chen
Lanfen Lin
VLM
39
2
0
26 Jan 2024
p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models
Haoyuan Wu
Xinyun Zhang
Peng Xu
Peiyu Liao
Xufeng Yao
Bei Yu
VLM
19
0
0
17 Dec 2023
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun
Dohoon Kim
Taesup Moon
VLM
13
4
0
11 Dec 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
J. Park
Jack Hessel
Khyathi Raghavi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
26
11
0
08 Dec 2023
1
2
3
4
5
Next