ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.00491
  4. Cited By
A Corpus for Reasoning About Natural Language Grounded in Photographs
v1v2v3 (latest)

A Corpus for Reasoning About Natural Language Grounded in Photographs

1 November 2018
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
    LRM
ArXiv (abs)PDFHTML

Papers citing "A Corpus for Reasoning About Natural Language Grounded in Photographs"

50 / 419 papers shown
Title
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLMSyDaVLM
196
870
0
06 Aug 2024
ExoViP: Step-by-step Verification and Exploration with Exoskeleton
  Modules for Compositional Visual Reasoning
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yanjie Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
125
5
0
05 Aug 2024
The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models
The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models
Simone Caldarella
Massimiliano Mancini
Elisa Ricci
Rahaf Aljundi
PILM
89
2
0
02 Aug 2024
Pyramid Coder: Hierarchical Code Generator for Compositional Visual
  Question Answering
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering
Ruoyue Shen
Nakamasa Inoue
Koichi Shinoda
84
1
0
30 Jul 2024
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
Jihyung Kil
Zheda Mai
Justin Lee
Zihe Wang
Kerrie Cheng
Jingyan Bai
Ye Liu
A. Chowdhury
Wei-Lun Chao
CoGeVLM
163
19
0
23 Jul 2024
MIBench: Evaluating Multimodal Large Language Models over Multiple
  Images
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Haowei Liu
Xi Zhang
Haiyang Xu
Yaya Shi
Chaoya Jiang
...
Ji Zhang
Fei Huang
Chunfen Yuan
Bing Li
Weiming Hu
VLM
100
15
0
21 Jul 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large
  Multimodal Models
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Feng Li
Renrui Zhang
Hao Zhang
Yuanhan Zhang
Bo Li
Wei Li
Zejun Ma
Chunyuan Li
MLLMVLM
163
234
0
10 Jul 2024
Pseudo-triplet Guided Few-shot Composed Image Retrieval
Pseudo-triplet Guided Few-shot Composed Image Retrieval
Bohan Hou
Haoqiang Lin
Haokun Wen
Meng Liu
Xuemeng Song
105
5
0
08 Jul 2024
OneDiff: A Generalist Model for Image Difference Captioning
OneDiff: A Generalist Model for Image Difference Captioning
Erdong Hu
Longteng Guo
Tongtian Yue
Zijia Zhao
Shuning Xue
Jing Liu
VLM
139
2
0
08 Jul 2024
OmChat: A Recipe to Train Multimodal Language Models with Strong Long
  Context and Video Understanding
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Tiancheng Zhao
Qianqian Zhang
Kyusong Lee
Peng Liu
Lu Zhang
Chunxin Fang
Jiajia Liao
Kelei Jiang
Yibo Ma
Ruochen Xu
MLLMVLM
101
5
0
06 Jul 2024
HEMM: Holistic Evaluation of Multimodal Foundation Models
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paul Pu Liang
Akshay Goindani
Talha Chafekar
Leena Mathur
Haofei Yu
Ruslan Salakhutdinov
Louis-Philippe Morency
102
17
0
03 Jul 2024
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding
  Evaluation
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
VLMCoGe
81
5
0
01 Jul 2024
FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large
  Language Models
FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models
Yiyuan Li
Shichao Sun
Pengfei Liu
LRM
147
0
0
01 Jul 2024
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning
Jiali Cheng
Hadi Amiri
BDL
114
4
0
21 Jun 2024
Revealing Vision-Language Integration in the Brain with Multimodal
  Networks
Revealing Vision-Language Integration in the Brain with Multimodal Networks
Vighnesh Subramaniam
C. Conwell
Christopher Wang
Gabriel Kreiman
Boris Katz
Ignacio Cases
Andrei Barbu
121
12
0
20 Jun 2024
Contrast Sets for Evaluating Language-Guided Robot Policies
Contrast Sets for Evaluating Language-Guided Robot Policies
Abrar Anwar
Rohan Gupta
Jesse Thomason
102
3
0
19 Jun 2024
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs
Xueqing Wu
Zongyu Lin
Songyan Zhao
Te-Lin Wu
Pan Lu
Nanyun Peng
Kai-Wei Chang
LRM
109
2
0
19 Jun 2024
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
  Instruction-Tuning Dataset for LVLMs
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Ziyu Liu
Tao Chu
Yuhang Zang
Xilin Wei
Xiaoyi Dong
...
Zijian Liang
Yuanjun Xiong
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
104
43
0
17 Jun 2024
MuirBench: A Comprehensive Benchmark for Robust Multi-image
  Understanding
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang
Xingyu Fu
James Y. Huang
Zekun Li
Qin Liu
...
Kai-Wei Chang
Dan Roth
Sheng Zhang
Hoifung Poon
Muhao Chen
VLM
148
59
0
13 Jun 2024
Comparison Visual Instruction Tuning
Comparison Visual Instruction Tuning
Wei Lin
M. Jehanzeb Mirza
Sivan Doveh
Rogerio Feris
Raja Giryes
Sepp Hochreiter
Leonid Karlinsky
98
4
0
13 Jun 2024
ReMI: A Dataset for Reasoning with Multiple Images
ReMI: A Dataset for Reasoning with Multiple Images
Mehran Kazemi
Nishanth Dikkala
Ankit Anand
Petar Dević
Ishita Dasgupta
...
Bahare Fatemi
Pranjal Awasthi
Dee Guo
Sreenivas Gollapudi
Ahmed Qureshi
LRMVLM
120
17
0
13 Jun 2024
Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question
  Answering
Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question Answering
Tao Li
Linjun Shou
Xuejun Liu
80
0
0
03 Jun 2024
Dataset Growth
Dataset Growth
Ziheng Qin
Zhaopan Xu
Yukun Zhou
Zangwei Zheng
Zebang Cheng
...
Xiaojiang Peng
Radu Timofte
Hongxun Yao
Kai Wang
Yang You
DD
69
2
0
28 May 2024
The Evolution of Multimodal Model Architectures
The Evolution of Multimodal Model Architectures
S. Wadekar
Abhishek Chaurasia
Aman Chadha
Eugenio Culurciello
121
18
0
28 May 2024
From Text to Pixel: Advancing Long-Context Understanding in MLLMs
From Text to Pixel: Advancing Long-Context Understanding in MLLMs
Yujie Lu
Xiujun Li
Tsu-Jui Fu
Miguel P. Eckstein
William Y. Wang
VLM
75
2
0
23 May 2024
EMR-Merging: Tuning-Free High-Performance Model Merging
EMR-Merging: Tuning-Free High-Performance Model Merging
Chenyu Huang
Peng Ye
Tao Chen
Tong He
Xiangyu Yue
Wanli Ouyang
MoMe
91
45
0
23 May 2024
ColorFoil: Investigating Color Blindness in Large Vision and Language Models
ColorFoil: Investigating Color Blindness in Large Vision and Language Models
Ahnaf Mozib Samin
M. F. Ahmed
Md. Mushtaq Shahriyar Rafee
VLM
121
3
0
19 May 2024
What matters when building vision-language models?
What matters when building vision-language models?
Hugo Laurençon
Léo Tronchon
Matthieu Cord
Victor Sanh
VLM
120
178
0
03 May 2024
MANTIS: Interleaved Multi-Image Instruction Tuning
MANTIS: Interleaved Multi-Image Instruction Tuning
Dongfu Jiang
Xuan He
Huaye Zeng
Cong Wei
Max Ku
Qian Liu
Wenhu Chen
VLMMLLM
145
125
0
02 May 2024
Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed
  Image Retrieval
Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval
Young Kyun Jang
Dat Huynh
Ashish Shah
Wen-Kai Chen
Ser-Nam Lim
141
20
0
01 May 2024
Visual Delta Generator with Large Multi-modal Models for Semi-supervised
  Composed Image Retrieval
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Young Kyun Jang
Donghyun Kim
Zihang Meng
Dat Huynh
Ser-Nam Lim
84
12
0
23 Apr 2024
Multi-Head Mixture-of-Experts
Multi-Head Mixture-of-Experts
Xun Wu
Shaohan Huang
Wenhui Wang
Furu Wei
MoE
103
15
0
23 Apr 2024
Continual Learning for Smart City: A Survey
Continual Learning for Smart City: A Survey
Li Yang
Zhipeng Luo
Shi-sheng Zhang
Fei Teng
Tian-Jie Li
HAI
98
9
0
01 Apr 2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Kai Zhang
Yi Luan
Hexiang Hu
Kenton Lee
Siyuan Qiao
Wenhu Chen
Yu-Chuan Su
Ming-Wei Chang
VLMLRM
113
40
0
28 Mar 2024
Not All Attention is Needed: Parameter and Computation Efficient
  Transfer Learning for Multi-modal Large Language Models
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu
Weihao Ye
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
MoE
88
1
0
22 Mar 2024
What Are Tools Anyway? A Survey from the Language Model Perspective
What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang
Zhoujun Cheng
Hao Zhu
Daniel Fried
Graham Neubig
130
33
0
18 Mar 2024
Introducing Routing Functions to Vision-Language Parameter-Efficient
  Fine-Tuning with Low-Rank Bottlenecks
Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
MoE
90
2
0
14 Mar 2024
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for
  Accelerating Vision-Language Transformer
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
Jianjian Cao
Peng Ye
Shengze Li
Chong Yu
Yansong Tang
Jiwen Lu
Tao Chen
88
23
0
05 Mar 2024
What Is Missing in Multilingual Visual Reasoning and How to Fix It
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLMLRM
217
6
0
03 Mar 2024
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM
  Context Fusion
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
Ziyue Wang
Chi Chen
Yiqi Zhu
Ziyue Wang
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Maosong Sun
Yang Liu
118
5
0
19 Feb 2024
DoRA: Weight-Decomposed Low-Rank Adaptation
DoRA: Weight-Decomposed Low-Rank Adaptation
Shih-yang Liu
Chien-Yi Wang
Hongxu Yin
Pavlo Molchanov
Yu-Chiang Frank Wang
Kwang-Ting Cheng
Min-Hung Chen
187
423
0
14 Feb 2024
GITA: Graph to Visual and Textual Integration for Vision-Language Graph
  Reasoning
GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning
Yanbin Wei
Shuai Fu
Weisen Jiang
Zejian Zhang
Zhixiong Zeng
Qi Wu
James T. Kwok
Yu Zhang
97
16
0
03 Feb 2024
Dynamic Transformer Architecture for Continual Learning of Multimodal
  Tasks
Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks
Yuliang Cai
Mohammad Rostami
81
4
0
27 Jan 2024
Efficient Vision-and-Language Pre-training with Text-Relevant Image
  Patch Selection
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
Wei Ye
Chaoya Jiang
Haiyang Xu
Chenhao Ye
Chenliang Li
Mingshi Yan
Shikun Zhang
Songhang Huang
Fei Huang
VLM
86
0
0
11 Jan 2024
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual
  Concept Understanding
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding
Yatong Bai
Utsav Garg
Apaar Shanker
Haoming Zhang
Samyak Parajuli
...
Eugenia D Fomitcheva
E. Branson
Aerin Kim
Somayeh Sojoudi
Kyunghyun Cho
60
2
0
09 Jan 2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLMMLLM
102
175
0
28 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLMMLLM
179
36
0
19 Dec 2023
VQA4CIR: Boosting Composed Image Retrieval with Visual Question
  Answering
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
Chun-Mei Feng
Yang Bai
Yaoyu Zhang
Zhen Li
Salman Khan
Wangmeng Zuo
Xinxing Xu
Rick Siow Mong Goh
Yong-Jin Liu
98
7
0
19 Dec 2023
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
Zhi Gao
Yuntao Du
Xintong Zhang
Xiaojian Ma
Wenjuan Han
Song-Chun Zhu
Qing Li
LLMAGVLM
141
25
0
18 Dec 2023
Training-free Zero-shot Composed Image Retrieval with Local Concept
  Reranking
Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking
Shitong Sun
Fanghua Ye
Shaogang Gong
88
15
0
14 Dec 2023
Previous
123456789
Next