ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.11618
  4. Cited By
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

25 March 2020
J. Liu
Wenhu Chen
Yu Cheng
Zhe Gan
Licheng Yu
Yiming Yang
Jingjing Liu
    MLLM
    VGen
ArXivPDFHTML

Papers citing "VIOLIN: A Large-Scale Dataset for Video-and-Language Inference"

16 / 16 papers shown
Title
A Modular Approach for Multimodal Summarization of TV Shows
A Modular Approach for Multimodal Summarization of TV Shows
Louis Mahon
Mirella Lapata
26
9
0
06 Mar 2024
Valley: Video Assistant with Large Language model Enhanced abilitY
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
45
189
0
12 Jun 2023
Long-Form Video-Language Pre-Training with Multimodal Temporal
  Contrastive Learning
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TS
VLM
20
68
0
12 Oct 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
40
45
0
03 May 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
Hazel Doughty
Cees G. M. Snoek
25
19
0
23 Mar 2022
Multi-Scale Self-Contrastive Learning with Hard Negative Mining for
  Weakly-Supervised Query-based Video Grounding
Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding
Shentong Mo
Daizong Liu
Wei Hu
SSL
21
6
0
08 Mar 2022
A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text
  Spotter with Transformer
A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer
Weijia Wu
Yuanqiang Cai
Debing Zhang
Sibo Wang
Zhuang Li
Jiahong Li
Yejun Tang
Hong Zhou
33
29
0
09 Dec 2021
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for
  Video-and-Language Inference
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference
Juncheng Li
Siliang Tang
Linchao Zhu
Haochen Shi
Xuanwen Huang
Fei Wu
Yi Yang
Yueting Zhuang
22
28
0
26 Jul 2021
Building a Video-and-Language Dataset with Human Actions for Multimodal
  Logical Inference
Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference
Riko Suzuki
Hitomi Yanaka
K. Mineshima
D. Bekki
VGen
MLLM
16
1
0
27 Jun 2021
MELINDA: A Multimodal Dataset for Biomedical Experiment Method
  Classification
MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification
Te-Lin Wu
Shikhar Singh
S. Paul
Gully A. Burns
Nanyun Peng
27
18
0
16 Dec 2020
What is More Likely to Happen Next? Video-and-Language Future Event
  Prediction
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
22
72
0
15 Oct 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
43
493
0
01 May 2020
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
119
275
0
24 Jan 2020
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
232
438
0
25 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
158
1,464
0
06 Jun 2016
1