Learning to Describe Differences Between Pairs of Similar Images

31 August 2018

Papers citing "Learning to Describe Differences Between Pairs of Similar Images"

30 / 30 papers shown

Title
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization Iñigo Pikabea Iñaki Lacunza Oriol Pareras Carlos Escolano Aitor Gonzalez-Agirre Javier Hernando Marta Villegas VLM 56 0 0 28 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning Yiwei Ma Guohai Xu Xiaoshuai Sun Jiayi Ji Jie Lou Debing Zhang Rongrong Ji 95 0 0 26 Mar 2025
Natural Language Generation from Visual Sequences: Challenges and Future Directions Aditya K Surikuchi Raquel Fernández Sandro Pezzelle EGVM 260 0 0 18 Feb 2025
Progress-Aware Video Frame Captioning Zihui Xue Joungbin An Xitong Yang Kristen Grauman 100 1 0 03 Dec 2024
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning Manu Gaur Darshan Singh Makarand Tapaswi 157 1 0 04 Sep 2024
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models Qirui Jiao Daoyuan Chen Yilun Huang Yaliang Li Ying Shen VLM 40 5 0 08 Aug 2024
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning Yunbin Tu Liang-Sheng Li Li Su Chenggang Yan Qin Huang 42 5 0 16 Jul 2024
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning Zishan Gu Fenglin Liu Changchang Yin Ping Zhang LRM LM&MA 58 0 0 19 May 2024
Learning to Visually Connect Actions and their Effects Eric Peh Paritosh Parmar Basura Fernando 24 2 0 19 Jan 2024
C-NERF: Representing Scene Changes as Directional Consistency Difference-based NeRF Rui Huang Binbin Jiang Qingyi Zhao William Wang Yuxiang Zhang Qing Guo 35 2 0 05 Dec 2023
Self-supervised Cross-view Representation Reconstruction for Change Captioning Yunbin Tu Liang Li Filippos Christianos Zheng-Jun Zha Zhibin Li Qingming Huang SSL 26 24 0 28 Sep 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use Yonatan Bitton Hritik Bansal Jack Hessel Rulin Shao Wanrong Zhu Anas Awadalla Josh Gardner Rohan Taori L. Schimdt VLM 31 77 0 12 Aug 2023
Visual Instruction Tuning with Polite Flamingo Delong Chen Jianfeng Liu Wenliang Dai Baoyuan Wang MLLM 34 42 0 03 Jul 2023
Neighborhood Contrastive Transformer for Change Captioning Yunbin Tu Liang Li Li Su Kelvin Lu Qin Huang ViT 21 14 0 06 Mar 2023
The Change You Want to See Ragav Sachdeva Andrew Zisserman 31 3 0 28 Sep 2022
Reasoning about Actions over Visual and Linguistic Modalities: A Survey Shailaja Keyur Sampat Maitreya Patel Subhasish Das Yezhou Yang Chitta Baral ReLM LM&Ro LRM 27 12 0 15 Jul 2022
CLIP4IDC: CLIP for Image Difference Captioning Zixin Guo T. Wang Jorma T. Laaksonen VLM 29 27 0 01 Jun 2022
Training and challenging models for text-guided fashion image retrieval Eric Dodds Jack Culpepper Gaurav Srivastava 18 8 0 23 Apr 2022
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval Yuxuan Wang Difei Gao Licheng Yu Stan Weixian Lei Matt Feiszli Mike Zheng Shou 17 24 0 01 Apr 2022
Image Retrieval from Contextual Descriptions Benno Krojer Vaibhav Adlakha Vibhav Vineet Yash Goyal Edoardo Ponti Siva Reddy 19 29 0 29 Mar 2022
Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene Duo Zheng Fandong Meng Q. Si Hairun Fan Zipeng Xu Jie Zhou Fangxiang Feng Xiaojie Wang 27 0 0 16 Mar 2022
R $^3$ Net:Relation-embedded Representation Reconstruction Network for Change Captioning Yunbin Tu Liang Li C. Yan Shengxiang Gao Zhengtao Yu 35 22 0 20 Oct 2021
Truth-Conditional Captioning of Time Series Data Harsh Jhamtani Taylor Berg-Kirkpatrick AI4TS 43 7 0 05 Oct 2021
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models Zheyuan Liu Cristian Rodriguez-Opazo Damien Teney Stephen Gould VLM 22 192 0 09 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning Matteo Stefanini Marcella Cornia Lorenzo Baraldi S. Cascianelli G. Fiameni Rita Cucchiara 3DV VLM MLLM 67 254 0 14 Jul 2021
Describing and Localizing Multiple Changes with Transformers Yue Qiu Shintaro Yamamoto Kodai Nakashima Ryota Suzuki K. Iwata Hirokatsu Kataoka Y. Satoh 30 55 0 25 Mar 2021
Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning Iro Laina Ruth C. Fong Andrea Vedaldi OCL 33 13 0 27 Oct 2020
Like hiking? You probably enjoy nature: Persona-grounded Dialog with Commonsense Expansions Bodhisattwa Prasad Majumder Harsh Jhamtani Taylor Berg-Kirkpatrick Julian McAuley 30 85 0 07 Oct 2020
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning Xiangxi Shi Xu Yang Jiuxiang Gu Chenyu You Jianfei Cai 16 52 0 30 Sep 2020
Neural Naturalist: Generating Fine-Grained Image Comparisons Maxwell Forbes Christine Kaeser-Chen Piyush Sharma Serge J. Belongie VLM 64 56 0 09 Sep 2019