ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.10584
  4. Cited By
Learning to Describe Differences Between Pairs of Similar Images

Learning to Describe Differences Between Pairs of Similar Images

31 August 2018
Harsh Jhamtani
Taylor Berg-Kirkpatrick
ArXivPDFHTML

Papers citing "Learning to Describe Differences Between Pairs of Similar Images"

30 / 30 papers shown
Title
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
56
0
0
28 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
95
0
0
26 Mar 2025
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
260
0
0
18 Feb 2025
Progress-Aware Video Frame Captioning
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
100
1
0
03 Dec 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
157
1
0
04 Sep 2024
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language
  Models
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao
Daoyuan Chen
Yilun Huang
Yaliang Li
Ying Shen
VLM
40
5
0
08 Aug 2024
Distractors-Immune Representation Learning with Cross-modal Contrastive
  Regularization for Change Captioning
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Chenggang Yan
Qin Huang
42
5
0
16 Jul 2024
Inquire, Interact, and Integrate: A Proactive Agent Collaborative
  Framework for Zero-Shot Multimodal Medical Reasoning
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning
Zishan Gu
Fenglin Liu
Changchang Yin
Ping Zhang
LRM
LM&MA
58
0
0
19 May 2024
Learning to Visually Connect Actions and their Effects
Learning to Visually Connect Actions and their Effects
Eric Peh
Paritosh Parmar
Basura Fernando
24
2
0
19 Jan 2024
C-NERF: Representing Scene Changes as Directional Consistency
  Difference-based NeRF
C-NERF: Representing Scene Changes as Directional Consistency Difference-based NeRF
Rui Huang
Binbin Jiang
Qingyi Zhao
William Wang
Yuxiang Zhang
Qing Guo
35
2
0
05 Dec 2023
Self-supervised Cross-view Representation Reconstruction for Change
  Captioning
Self-supervised Cross-view Representation Reconstruction for Change Captioning
Yunbin Tu
Liang Li
Filippos Christianos
Zheng-Jun Zha
Zhibin Li
Qingming Huang
SSL
26
24
0
28 Sep 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following
  Inspired by Real-World Use
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
31
77
0
12 Aug 2023
Visual Instruction Tuning with Polite Flamingo
Visual Instruction Tuning with Polite Flamingo
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
34
42
0
03 Jul 2023
Neighborhood Contrastive Transformer for Change Captioning
Neighborhood Contrastive Transformer for Change Captioning
Yunbin Tu
Liang Li
Li Su
Kelvin Lu
Qin Huang
ViT
21
14
0
06 Mar 2023
The Change You Want to See
The Change You Want to See
Ragav Sachdeva
Andrew Zisserman
31
3
0
28 Sep 2022
Reasoning about Actions over Visual and Linguistic Modalities: A Survey
Reasoning about Actions over Visual and Linguistic Modalities: A Survey
Shailaja Keyur Sampat
Maitreya Patel
Subhasish Das
Yezhou Yang
Chitta Baral
ReLM
LM&Ro
LRM
27
12
0
15 Jul 2022
CLIP4IDC: CLIP for Image Difference Captioning
CLIP4IDC: CLIP for Image Difference Captioning
Zixin Guo
T. Wang
Jorma T. Laaksonen
VLM
29
27
0
01 Jun 2022
Training and challenging models for text-guided fashion image retrieval
Training and challenging models for text-guided fashion image retrieval
Eric Dodds
Jack Culpepper
Gaurav Srivastava
18
8
0
23 Apr 2022
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
Yuxuan Wang
Difei Gao
Licheng Yu
Stan Weixian Lei
Matt Feiszli
Mike Zheng Shou
17
24
0
01 Apr 2022
Image Retrieval from Contextual Descriptions
Image Retrieval from Contextual Descriptions
Benno Krojer
Vaibhav Adlakha
Vibhav Vineet
Yash Goyal
Edoardo Ponti
Siva Reddy
19
29
0
29 Mar 2022
Spot the Difference: A Cooperative Object-Referring Game in
  Non-Perfectly Co-Observable Scene
Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene
Duo Zheng
Fandong Meng
Q. Si
Hairun Fan
Zipeng Xu
Jie Zhou
Fangxiang Feng
Xiaojie Wang
27
0
0
16 Mar 2022
R$^3$Net:Relation-embedded Representation Reconstruction Network for
  Change Captioning
R3^33Net:Relation-embedded Representation Reconstruction Network for Change Captioning
Yunbin Tu
Liang Li
C. Yan
Shengxiang Gao
Zhengtao Yu
35
22
0
20 Oct 2021
Truth-Conditional Captioning of Time Series Data
Truth-Conditional Captioning of Time Series Data
Harsh Jhamtani
Taylor Berg-Kirkpatrick
AI4TS
43
7
0
05 Oct 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language
  Models
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
Zheyuan Liu
Cristian Rodriguez-Opazo
Damien Teney
Stephen Gould
VLM
22
192
0
09 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
67
254
0
14 Jul 2021
Describing and Localizing Multiple Changes with Transformers
Describing and Localizing Multiple Changes with Transformers
Yue Qiu
Shintaro Yamamoto
Kodai Nakashima
Ryota Suzuki
K. Iwata
Hirokatsu Kataoka
Y. Satoh
30
55
0
25 Mar 2021
Quantifying Learnability and Describability of Visual Concepts Emerging
  in Representation Learning
Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning
Iro Laina
Ruth C. Fong
Andrea Vedaldi
OCL
33
13
0
27 Oct 2020
Like hiking? You probably enjoy nature: Persona-grounded Dialog with
  Commonsense Expansions
Like hiking? You probably enjoy nature: Persona-grounded Dialog with Commonsense Expansions
Bodhisattwa Prasad Majumder
Harsh Jhamtani
Taylor Berg-Kirkpatrick
Julian McAuley
30
85
0
07 Oct 2020
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for
  Change Captioning
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning
Xiangxi Shi
Xu Yang
Jiuxiang Gu
Chenyu You
Jianfei Cai
16
52
0
30 Sep 2020
Neural Naturalist: Generating Fine-Grained Image Comparisons
Neural Naturalist: Generating Fine-Grained Image Comparisons
Maxwell Forbes
Christine Kaeser-Chen
Piyush Sharma
Serge J. Belongie
VLM
64
56
0
09 Sep 2019
1