Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.10584
Cited By
Learning to Describe Differences Between Pairs of Similar Images
31 August 2018
Harsh Jhamtani
Taylor Berg-Kirkpatrick
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning to Describe Differences Between Pairs of Similar Images"
29 / 29 papers shown
Title
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
56
0
0
28 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
95
0
0
26 Mar 2025
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
252
0
0
18 Feb 2025
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
100
1
0
03 Dec 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
151
1
0
04 Sep 2024
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
VLM
40
5
0
08 Aug 2024
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Chenggang Yan
Qin Huang
40
5
0
16 Jul 2024
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning
Zishan Gu
Fenglin Liu
Changchang Yin
Ping Zhang
LRM
LM&MA
58
0
0
19 May 2024
Learning to Visually Connect Actions and their Effects
Eric Peh
Paritosh Parmar
Basura Fernando
24
2
0
19 Jan 2024
C-NERF: Representing Scene Changes as Directional Consistency Difference-based NeRF
Rui Huang
Binbin Jiang
Qingyi Zhao
William Wang
Yuxiang Zhang
Qing Guo
30
2
0
05 Dec 2023
Self-supervised Cross-view Representation Reconstruction for Change Captioning
Yunbin Tu
Liang Li
Filippos Christianos
Zheng-Jun Zha
Zhibin Li
Qingming Huang
SSL
24
24
0
28 Sep 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
31
77
0
12 Aug 2023
Visual Instruction Tuning with Polite Flamingo
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
34
42
0
03 Jul 2023
Neighborhood Contrastive Transformer for Change Captioning
Yunbin Tu
Liang Li
Li Su
Kelvin Lu
Qin Huang
ViT
19
14
0
06 Mar 2023
The Change You Want to See
Ragav Sachdeva
Andrew Zisserman
31
3
0
28 Sep 2022
Reasoning about Actions over Visual and Linguistic Modalities: A Survey
Shailaja Keyur Sampat
Maitreya Patel
Subhasish Das
Yezhou Yang
Chitta Baral
ReLM
LM&Ro
LRM
27
12
0
15 Jul 2022
CLIP4IDC: CLIP for Image Difference Captioning
Zixin Guo
T. Wang
Jorma T. Laaksonen
VLM
29
27
0
01 Jun 2022
Training and challenging models for text-guided fashion image retrieval
Eric Dodds
Jack Culpepper
Gaurav Srivastava
18
8
0
23 Apr 2022
Image Retrieval from Contextual Descriptions
Benno Krojer
Vaibhav Adlakha
Vibhav Vineet
Yash Goyal
Edoardo Ponti
Siva Reddy
19
29
0
29 Mar 2022
Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene
Duo Zheng
Fandong Meng
Q. Si
Hairun Fan
Zipeng Xu
Jie Zhou
Fangxiang Feng
Xiaojie Wang
27
0
0
16 Mar 2022
R
3
^3
3
Net:Relation-embedded Representation Reconstruction Network for Change Captioning
Yunbin Tu
Liang Li
C. Yan
Shengxiang Gao
Zhengtao Yu
32
22
0
20 Oct 2021
Truth-Conditional Captioning of Time Series Data
Harsh Jhamtani
Taylor Berg-Kirkpatrick
AI4TS
38
7
0
05 Oct 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
22
192
0
09 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
67
254
0
14 Jul 2021
Describing and Localizing Multiple Changes with Transformers
Yue Qiu
Shintaro Yamamoto
Kodai Nakashima
Ryota Suzuki
K. Iwata
Hirokatsu Kataoka
Y. Satoh
30
55
0
25 Mar 2021
Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning
Iro Laina
Ruth C. Fong
Andrea Vedaldi
OCL
33
13
0
27 Oct 2020
Like hiking? You probably enjoy nature: Persona-grounded Dialog with Commonsense Expansions
Bodhisattwa Prasad Majumder
Harsh Jhamtani
Taylor Berg-Kirkpatrick
Julian McAuley
30
85
0
07 Oct 2020
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning
Xiangxi Shi
Xu Yang
Jiuxiang Gu
Chenyu You
Jianfei Cai
16
52
0
30 Sep 2020
Neural Naturalist: Generating Fine-Grained Image Comparisons
Maxwell Forbes
Christine Kaeser-Chen
Piyush Sharma
Serge J. Belongie
VLM
64
56
0
09 Sep 2019
1