ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.19267
  4. Cited By
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
v1v2v3 (latest)

VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?

27 April 2025
Mohamed Gado
Towhid Taliee
Muhammad Memon
D. Ignatov
Radu Timofte
ArXiv (abs)PDFHTML

Papers citing "VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?"

34 / 34 papers shown
Title
AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance?
AugmentGest: Can Random Data Cropping Augmentation Boost Gesture Recognition Performance?
Nada Aboudeshish
D. Ignatov
Radu Timofte
39
3
0
08 Jun 2025
LEMUR Neural Network Dataset: Towards Seamless AutoML
LEMUR Neural Network Dataset: Towards Seamless AutoML
Arash Torabi Goodarzi
Roman Kochnev
Waleed Khalid
Furui Qin
Tolgay Atinc Uzun
Yashkumar Sanjaybhai Dhameliya
Yash Kanubhai Kathiriya
Zofia Antonina Bentyn
D. Ignatov
Radu Timofte
94
3
0
14 Apr 2025
Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?
Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?
Roman Kochnev
Arash Torabi Goodarzi
Zofia Antonina Bentyn
D. Ignatov
Radu Timofte
150
3
0
08 Apr 2025
Not (yet) the whole story: Evaluating Visual Storytelling Requires More
  than Measuring Coherence, Grounding, and Repetition
Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
57
6
0
05 Jul 2024
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video
  Understanding
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad A Khan
VLMMLLM
81
61
0
13 Jun 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRMALM
166
1,265
0
22 Apr 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video
  Understanding
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
85
78
0
22 Mar 2024
StoryGPT-V: Large Language Models as Consistent Story Visualizers
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen
Mohamed Elhoseiny
VLM
181
11
0
04 Dec 2023
GROOViST: A Metric for Grounding Objects in Visual Storytelling
GROOViST: A Metric for Grounding Objects in Visual Storytelling
Aditya K Surikuchi
Sandro Pezzelle
Raquel Fernández
43
10
0
26 Oct 2023
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative
  Vokens
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens
Kaizhi Zheng
Xuehai He
Xin Eric Wang
MLLM
119
98
0
03 Oct 2023
NExT-GPT: Any-to-Any Multimodal LLM
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
105
506
0
11 Sep 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
165
2,069
0
20 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
571
4,925
0
17 Apr 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.5K
13,472
0
27 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
432
4,656
0
30 Jan 2023
Towards a Unified Multi-Dimensional Evaluator for Text Generation
Towards a Unified Multi-Dimensional Evaluator for Text Generation
Ming Zhong
Yang Liu
Da Yin
Yuning Mao
Yizhu Jiao
Peng Liu
Chenguang Zhu
Heng Ji
Jiawei Han
ELM
93
276
0
13 Oct 2022
RoViST:Learning Robust Metrics for Visual Storytelling
RoViST:Learning Robust Metrics for Visual Storytelling
Eileen Wang
S. Han
Josiah Poon
49
10
0
08 May 2022
Data-to-text Generation with Variational Sequential Planning
Data-to-text Generation with Variational Sequential Planning
Ratish Puduppully
Yao Fu
Mirella Lapata
109
21
0
28 Feb 2022
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
185
789
0
25 Jun 2021
Plot and Rework: Modeling Storylines for Visual Storytelling
Plot and Rework: Modeling Storylines for Visual Storytelling
Chi-Yang Hsu
Yun-Wei Chu
Ting-Hao 'Kenneth' Huang
Lun-Wei Ku
59
31
0
14 May 2021
Planning with Learned Entity Prompts for Abstractive Summarization
Planning with Learned Entity Prompts for Abstractive Summarization
Shashi Narayan
Yao-Min Zhao
Joshua Maynez
Gonçalo Simões
Vitaly Nikolaev
Ryan T. McDonald
LRM
89
119
0
15 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
1.0K
29,926
0
26 Feb 2021
Commonsense Knowledge Aware Concept Selection For Diverse and
  Informative Visual Storytelling
Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling
Hong Chen
Yifei Huang
Hiroya Takamura
Hideki Nakayama
DiffM
84
44
0
05 Feb 2021
Vokenization: Improving Language Understanding with Contextualized,
  Visual-Grounded Supervision
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Hao Tan
Joey Tianyi Zhou
CLIP
83
121
0
14 Oct 2020
PlotMachines: Outline-Conditioned Generation with Dynamic Plot State
  Tracking
PlotMachines: Outline-Conditioned Generation with Dynamic Plot State Tracking
Hannah Rashkin
Asli Celikyilmaz
Yejin Choi
Jianfeng Gao
63
154
0
30 Apr 2020
Knowledge-Enriched Visual Storytelling
Knowledge-Enriched Visual Storytelling
Chao-Chun Hsu
Zi-Yuan Chen
Chi-Yang Hsu
Chih-Chia Li
Tzu-Yuan Lin
Ting-Hao 'Kenneth' Huang
Lun-Wei Ku
DiffM
84
47
0
03 Dec 2019
Step-by-Step: Separating Planning from Realization in Neural
  Data-to-Text Generation
Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation
Amit Moryossef
Yoav Goldberg
Ido Dagan
68
184
0
06 Apr 2019
Plan-And-Write: Towards Better Automatic Storytelling
Plan-And-Write: Towards Better Automatic Storytelling
Lili Yao
Nanyun Peng
R. Weischedel
Kevin Knight
Dongyan Zhao
Rui Yan
95
413
0
14 Nov 2018
A Skeleton-Based Model for Promoting Coherence Among Sentences in
  Narrative Story Generation
A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation
Jingjing Xu
Xuancheng Ren
Yanzhe Zhang
Qi Zeng
Xiaoyan Cai
Xu Sun
LRM
52
106
0
21 Aug 2018
Contextualize, Show and Tell: A Neural Visual Storyteller
Contextualize, Show and Tell: A Neural Visual Storyteller
Diana Gonzalez-Rico
Gibran Fuentes Pineda
43
34
0
03 Jun 2018
GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story
  Generation
GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation
Taehyeong Kim
Min-Oh Heo
Seonil Son
Kyoung-Wha Park
Byoung-Tak Zhang
58
77
0
28 May 2018
No Metrics Are Perfect: Adversarial Reward Learning for Visual
  Storytelling
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling
Xin Eric Wang
Wenhu Chen
Yuan-fang Wang
William Yang Wang
64
160
0
24 Apr 2018
Visual Relationship Detection with Language Priors
Visual Relationship Detection with Language Priors
Cewu Lu
Ranjay Krishna
Michael S. Bernstein
Li Fei-Fei
VLM
84
1,142
0
31 Jul 2016
Visual Storytelling
Visual Storytelling
Ting-Hao 'Kenneth' Huang
Huang
Francis Ferraro
N. Mostafazadeh
Ishan Misra
...
C. L. Zitnick
Devi Parikh
Lucy Vanderwende
Michel Galley
Margaret Mitchell
VGen
73
479
0
13 Apr 2016
1