ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.00491
  4. Cited By
A Corpus for Reasoning About Natural Language Grounded in Photographs
v1v2v3 (latest)

A Corpus for Reasoning About Natural Language Grounded in Photographs

1 November 2018
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
    LRM
ArXiv (abs)PDFHTML

Papers citing "A Corpus for Reasoning About Natural Language Grounded in Photographs"

50 / 419 papers shown
Title
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavaš
VLMMLLM
123
32
0
13 Jul 2023
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
66
5
0
06 Jul 2023
Several categories of Large Language Models (LLMs): A Short Survey
Several categories of Large Language Models (LLMs): A Short Survey
Saurabh Pahune
Manoj Chandrasekharan
AILaw
69
17
0
05 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal
  Inputs?
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
113
79
0
05 Jul 2023
Visual Instruction Tuning with Polite Flamingo
Visual Instruction Tuning with Polite Flamingo
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
126
48
0
03 Jul 2023
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual
  Question Answering
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering
A. S. Penamakuri
Manish Gupta
Mithun Das Gupta
Anand Mishra
78
7
0
29 Jun 2023
ICSVR: Investigating Compositional and Syntactic Understanding in Video
  Retrieval Models
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models
Avinash Madasu
Vasudev Lal
CoGe
116
3
0
28 Jun 2023
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Qiong Wu
Shubin Huang
Yiyi Zhou
Pingyang Dai
Annan Shu
Guannan Jiang
Rongrong Ji
VLMVPVLM
42
2
0
27 Jun 2023
Improving Reference-based Distinctive Image Captioning with Contrastive
  Rewards
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
76
9
0
25 Jun 2023
Zero-shot Composed Text-Image Retrieval
Zero-shot Composed Text-Image Retrieval
Yikun Liu
Jiangchao Yao
Ya Zhang
Yanfeng Wang
Weidi Xie
83
25
0
12 Jun 2023
Global and Local Semantic Completion Learning for Vision-Language
  Pre-training
Global and Local Semantic Completion Learning for Vision-Language Pre-training
Rong-Cheng Tu
Yatai Ji
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
100
4
0
12 Jun 2023
Modular Visual Question Answering via Code Generation
Modular Visual Question Answering via Code Generation
Sanjay Subramanian
Medhini Narasimhan
Kushal Khangaonkar
Kevin Kaichuang Yang
Arsha Nagrani
Cordelia Schmid
Andy Zeng
Trevor Darrell
Dan Klein
77
51
0
08 Jun 2023
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot
  Vision-Language Tasks
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks
Yanan Sun
Zi-Qi Zhong
Qi Fan
Chi-Keung Tang
Yu-Wing Tai
VLM
78
4
0
07 Jun 2023
Benchmarking Robustness of Adaptation Methods on Pre-trained
  Vision-Language Models
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models
Shuo Chen
Jindong Gu
Zhen Han
Yunpu Ma
Philip Torr
Volker Tresp
VPVLMVLM
129
21
0
03 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and
  Outlook of Recent Work
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
110
0
0
02 Jun 2023
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
Ning Ding
Yehui Tang
Zhongqian Fu
Chaoting Xu
Kai Han
Yunhe Wang
MLLMVLM
57
2
0
01 Jun 2023
ManagerTower: Aggregating the Insights of Uni-Modal Experts for
  Vision-Language Representation Learning
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Xiao Xu
Bei Li
Chenfei Wu
Shao-Yen Tseng
Anahita Bhiwandiwalla
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
AIFinVLM
83
4
0
31 May 2023
Too Large; Data Reduction for Vision-Language Pre-Training
Too Large; Data Reduction for Vision-Language Pre-Training
Alex Jinpeng Wang
Kevin Qinghong Lin
David Junhao Zhang
Stan Weixian Lei
Mike Zheng Shou
VLM
93
24
0
31 May 2023
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Kim Hoang Tran
Anh Duy Le Dinh
Tien-Phat Nguyen
Thinh Phan
Pha Nguyen
Khoa Luu
Don Adjeroh
Gianfranco Doretto
Ngan Hoang Le
VOT
99
7
0
28 May 2023
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
Qingqing Cao
Bhargavi Paranjape
Hannaneh Hajishirzi
MLLMVLM
83
27
0
27 May 2023
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating
  Vision-Language Transformers
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Anyi Rao
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLM
138
23
0
27 May 2023
MPCHAT: Towards Multimodal Persona-Grounded Conversation
MPCHAT: Towards Multimodal Persona-Grounded Conversation
Jaewoo Ahn
Yeda Song
Sangdoo Yun
Gunhee Kim
60
22
0
27 May 2023
Are Diffusion Models Vision-And-Language Reasoners?
Are Diffusion Models Vision-And-Language Reasoners?
Benno Krojer
Elinor Poole-Dayan
Vikram S. Voleti
Christopher Pal
Siva Reddy
111
14
0
25 May 2023
Candidate Set Re-ranking for Composed Image Retrieval with Dual
  Multi-modal Encoder
Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder
Zheyuan Liu
Weixuan Sun
Damien Teney
Stephen Gould
96
19
0
25 May 2023
Weakly Supervised Vision-and-Language Pre-training with Relative
  Representations
Weakly Supervised Vision-and-Language Pre-training with Relative Representations
Chi Chen
Peng Li
Maosong Sun
Yang Liu
75
2
0
24 May 2023
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient
  Vision-Language Models
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
Zekun Wang
Jingchang Chen
Wangchunshu Zhou
Haichao Zhu
Jiafeng Liang
Liping Shan
Ming Liu
Dongliang Xu
Qing Yang
Bing Qin
VLM
104
5
0
24 May 2023
Text encoders bottleneck compositionality in contrastive vision-language
  models
Text encoders bottleneck compositionality in contrastive vision-language models
Amita Kamath
Jack Hessel
Kai-Wei Chang
CoGeCLIPVLM
99
21
0
24 May 2023
Meta-learning For Vision-and-language Cross-lingual Transfer
Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu
Frank Keller
VLM
92
2
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image
  Regions
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjDVLM
121
8
0
24 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
115
5
0
23 May 2023
Images in Language Space: Exploring the Suitability of Large Language
  Models for Vision & Language Tasks
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
Sherzod Hakimov
David Schlangen
VLM
73
5
0
23 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner
  and Dense Captioner
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Qingbin Liu
87
1
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLMMLLMObjD
151
122
0
18 May 2023
Probing the Role of Positional Information in Vision-Language Models
Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch
Jindrich Libovický
65
8
0
17 May 2023
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
Chulun Zhou
Yunlong Liang
Fandong Meng
Jinan Xu
Jinsong Su
Jie Zhou
VLM
71
4
0
13 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
99
25
0
12 May 2023
Egocentric Hierarchical Visual Semantics
Egocentric Hierarchical Visual Semantics
L. Erculiani
A. Bontempelli
Andrea Passerini
Fausto Giunchiglia
OCL
45
2
0
09 May 2023
Scene Text Recognition with Image-Text Matching-guided Dictionary
Scene Text Recognition with Image-Text Matching-guided Dictionary
Jiajun Wei
Hongjian Zhan
X. Tu
Yue Lu
Umapada Pal
VLM
53
0
0
08 May 2023
Vision Language Pre-training by Contrastive Learning with Cross-Modal
  Similarity Regulation
Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation
Chaoya Jiang
Wei Ye
Haiyang Xu
Miang yan
Shikun Zhang
Jie Zhang
Fei Huang
VLM
106
16
0
08 May 2023
Visual Reasoning: from State to Transformation
Visual Reasoning: from State to Transformation
Xin Hong
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
69
4
0
02 May 2023
An Empirical Study of Multimodal Model Merging
An Empirical Study of Multimodal Model Merging
Yi-Lin Sung
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Joey Tianyi Zhou
Lijuan Wang
MoMe
125
42
0
28 Apr 2023
Multi-Modal Representation Learning with Text-Driven Soft Masks
Multi-Modal Representation Learning with Text-Driven Soft Masks
Jaeyoo Park
Bohyung Han
SSL
68
4
0
03 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Vision-Language Models for Vision Tasks: A Survey
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
169
553
0
03 Apr 2023
Bi-directional Training for Composed Image Retrieval via Text Prompt
  Learning
Bi-directional Training for Composed Image Retrieval via Text Prompt Learning
Zheyuan Liu
Weixuan Sun
Yicong Hong
Damien Teney
Stephen Gould
132
34
0
29 Mar 2023
Natural Language Reasoning, A Survey
Natural Language Reasoning, A Survey
Fei Yu
Hongbo Zhang
Prayag Tiwari
Benyou Wang
ReLMLRM
182
64
0
26 Mar 2023
Task-Attentive Transformer Architecture for Continual Learning of
  Vision-and-Language Tasks Using Knowledge Distillation
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation
Yuliang Cai
Jesse Thomason
Mohammad Rostami
VLMCLL
83
12
0
25 Mar 2023
Accelerating Vision-Language Pretraining with Free Language Modeling
Accelerating Vision-Language Pretraining with Free Language Modeling
Teng Wang
Yixiao Ge
Feng Zheng
Ran Cheng
Ying Shan
Xiaohu Qie
Ping Luo
VLMMLLM
133
10
0
24 Mar 2023
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion
Geonmo Gu
Sanghyuk Chun
Wonjae Kim
HeeJae Jun
Yoohoon Kang
Sangdoo Yun
DiffM
149
59
0
21 Mar 2023
Data Roaming and Quality Assessment for Composed Image Retrieval
Data Roaming and Quality Assessment for Composed Image Retrieval
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
107
28
0
16 Mar 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
Scaling Vision-Language Models with Sparse Mixture of Experts
Sheng Shen
Z. Yao
Chunyuan Li
Trevor Darrell
Kurt Keutzer
Yuxiong He
VLMMoE
83
68
0
13 Mar 2023
Previous
123456789
Next