Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition

5 July 2024

Papers citing "Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition"

5 / 5 papers shown

Title
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? Mohamed Gado Towhid Taliee Muhammad Memon D. Ignatov Radu Timofte 72 0 0 27 Apr 2025
Natural Language Generation from Visual Sequences: Challenges and Future Directions Aditya K Surikuchi Raquel Fernández Sandro Pezzelle EGVM 284 0 0 18 Feb 2025
Generating Visual Stories with Grounded and Coreferent Characters Danyang Liu Mirella Lapata Frank Keller 23 2 0 20 Sep 2024
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 320 4,279 0 30 Jan 2023
ImageNet-21K Pretraining for the Masses T. Ridnik Emanuel Ben-Baruch Asaf Noy Lihi Zelnik-Manor SSeg VLM CLIP 187 690 0 22 Apr 2021