Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video
  Understanding
v1v2v3v4 (latest)

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
    MLLM

Papers citing "Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding"

50 / 875 papers shown
Title
Contrasting with Symile: Simple Model-Agnostic Representation Learning
  for Unlimited Modalities
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited ModalitiesNeural Information Processing Systems (NeurIPS), 2024
A. Saporta
A. Puli
Mark Goldstein
Rajesh Ranganath
219
8
0
01 Nov 2024
SpeechQE: Estimating the Quality of Direct Speech Translation
SpeechQE: Estimating the Quality of Direct Speech TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
211
4
0
28 Oct 2024
Sensor2Text: Enabling Natural Language Interactions for Daily Activity
  Tracking Using Wearable Sensors
Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable SensorsProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2024
228
9
0
26 Oct 2024
Mitigating Object Hallucination via Concentric Causal Attention
Mitigating Object Hallucination via Concentric Causal AttentionNeural Information Processing Systems (NeurIPS), 2024
190
37
0
21 Oct 2024