A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In
Zero Shot

A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot

16 May 2023

Aanisha Bhattacharya

Yaman Kumar Singla

Balaji Krishnamurthy

Papers citing "A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot"

9 / 9 papers shown

Title
Natural Language Generation from Visual Sequences: Challenges and Future Directions Aditya K Surikuchi Raquel Fernández Sandro Pezzelle EGVM 210 0 0 18 Feb 2025
CAP: Evaluation of Persuasive and Creative Image Generation Aysan Aghazadeh Adriana Kovashka EGVM 101 1 0 10 Dec 2024
Measuring and Improving Persuasiveness of Large Language Models Somesh Singh Yaman Kumar Singla Harini SI Balaji Krishnamurthy 35 3 0 03 Oct 2024
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models Jianben He Xingbo Wang Shiyi Liu Guande Wu Claudio Silva Huamin Qu LRM 37 2 0 06 Jun 2024
A Modular Approach for Multimodal Summarization of TV Shows Louis Mahon Mirella Lapata 26 9 0 06 Mar 2024
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior Ashmit Khandelwal Aditya Agrawal Aanisha Bhattacharyya Yaman Kumar Singla Somesh Singh ... Ishita Dasgupta Stefano Petrangeli R. Shah Changyou Chen Balaji Krishnamurthy 24 8 0 01 Sep 2023
Complexity-Based Prompting for Multi-Step Reasoning Yao Fu Hao-Chun Peng Ashish Sabharwal Peter Clark Tushar Khot ReLM LRM 162 414 0 03 Oct 2022
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 328 4,106 0 24 May 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding Hu Xu Gargi Ghosh Po-Yao (Bernie) Huang Dmytro Okhonko Armen Aghajanyan Florian Metze Luke Zettlemoyer Florian Metze Luke Zettlemoyer Christoph Feichtenhofer CLIP VLM 259 558 0 28 Sep 2021