MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations

20 March 2025

Papers citing "MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations"

21 / 21 papers shown

Title
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark Enxin Song Wenhao Chai Weili Xu Jianwen Xie Yuxuan Liu Gaoang Wang 106 6 0 20 Apr 2025
Visual Hallucinations of Multi-modal Large Language Models Wen Huang Hongbin Liu Minxin Guo Neil Zhenqiang Gong MLLM VLM 79 32 0 22 Feb 2024
VTimeLLM: Empower LLM to Grasp Video Moments Bin Huang Xin Wang Hong Chen Zihan Song Wenwu Zhu MLLM 132 128 0 30 Nov 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Qinghao Ye Haiyang Xu Guohai Xu Jiabo Ye Ming Yan ... Junfeng Tian Qiang Qi Ji Zhang Feiyan Huang Jingren Zhou VLM MLLM 286 955 0 27 Apr 2023
Visual Instruction Tuning Haotian Liu Chunyuan Li Qingyang Wu Yong Jae Lee SyDa VLM MLLM 560 4,861 0 17 Apr 2023
Hallucinations in Large Multilingual Translation Models Nuno M. Guerreiro Duarte M. Alves Jonas Waldendorf Barry Haddow Alexandra Birch Pierre Colombo André F.T. Martins VLM HILM LRM 180 152 0 28 Mar 2023
GPT-4 Technical Report OpenAI OpenAI OpenAI Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad ... Shengjia Zhao Tianhao Zheng Juntang Zhuang William Zhuk Barret Zoph LLMAG MLLM 1.5K 14,631 0 15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 429 4,641 0 30 Jan 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 820 9,576 0 28 Jan 2022
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning Ali Furkan Biten L. G. I. Bigorda Dimosthenis Karatzas 136 63 0 04 Oct 2021
LoRA: Low-Rank Adaptation of Large Language Models J. E. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang Weizhu Chen OffRL AI4TS AI4CE ALM AIMat 477 10,496 0 17 Jun 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions Junbin Xiao Xindi Shang Angela Yao Tat-Seng Chua 89 498 0 18 May 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding Jianlin Su Yu Lu Shengfeng Pan Ahmed Murtadha Bo Wen Yunfeng Liu 284 2,500 0 20 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval Max Bain Arsha Nagrani Gül Varol Andrew Zisserman VGen 153 1,186 0 01 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 967 29,731 0 26 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 667 41,369 0 22 Oct 2020
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning Jinpeng Wang Yuting Gao Ke Li Yiqi Lin A. J. Ma Hao Cheng Pai Peng Feiyue Huang Rongrong Ji Xing Sun SSL 95 97 0 12 Sep 2020
Language Models are Few-Shot Learners Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 835 42,332 0 28 May 2020
Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition Jinwoo Choi Chen Gao Joseph C.E. Messou Jia-Bin Huang 135 182 0 11 Dec 2019
CLEVRER: CoLlision Events for Video REpresentation and Reasoning Kexin Yi Yuta Saito Yunzhu Li Pushmeet Kohli Jiajun Wu Antonio Torralba J. Tenenbaum NAI 121 475 0 03 Oct 2019
Object Hallucination in Image Captioning Anna Rohrbach Lisa Anne Hendricks Kaylee Burns Trevor Darrell Kate Saenko 189 439 0 06 Sep 2018