Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.11566
Cited By
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
26 February 2020
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Object Relational Graph with Teacher-Recommended Learning for Video Captioning"
50 / 115 papers shown
Title
Towards Efficient Partially Relevant Video Retrieval with Active Moment Discovering
Peipei Song
Li Zhang
Long Lan
Weidong Chen
D. Guo
Xun Yang
Meng Wang
19
0
0
15 Apr 2025
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning
Caihua Liu
Xu Li
Wenjing Xue
Wei Tang
Xia Feng
56
0
0
20 Feb 2025
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
66
2
0
14 Nov 2024
Pseudo-labeling with Keyword Refining for Few-Supervised Video Captioning
Ping Li
Tao Wang
Xinkui Zhao
Xianghua Xu
Mingli Song
34
3
0
06 Nov 2024
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities
Ehsan Faghihi
Mohammedreza Zarenejad
Ali-Asghar Beheshti Shirazi
42
0
0
04 Nov 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
26
0
0
22 Oct 2024
MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
Tieyuan Chen
Huabin Liu
Tianyao He
Yihang Chen
Chaofan Gan
...
Cheng Zhong
Yang Zhang
Yingxue Wang
Hui Lin
Weiyao Lin
VGen
CML
39
5
0
26 Sep 2024
HOTVCOM: Generating Buzzworthy Comments for Videos
Yuyan Chen
Yiwen Qian
Songzhou Yan
Jiyuan Jia
Zhixu Li
Yanghua Xiao
Xiaobo Li
Ming Yang
Qingpei Guo
31
7
0
23 Sep 2024
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset
Yuchen Yang
Yingxuan Duan
VGen
32
0
0
19 Jun 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
55
4
0
10 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
51
9
1
09 Jun 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViT
VGen
26
4
0
19 Apr 2024
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Hao Wu
Huabin Liu
Yu Qiao
Xiao Sun
3DV
16
7
0
03 Apr 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
69
14
0
26 Mar 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
Xinyu Wang
Bohan Zhuang
Qi Wu
14
11
0
12 Jan 2024
Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition
Xuzheng Yu
Chen Jiang
Wei Zhang
Tian Gan
Linlin Chao
Jianan Zhao
Yuan Cheng
Qingpei Guo
Wei Chu
20
0
0
09 Jan 2024
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Yifan Lu
Ziqi Zhang
Chunfen Yuan
Peng Li
Yan Wang
Bing Li
Weiming Hu
34
3
0
25 Dec 2023
Subject-Oriented Video Captioning
Yunchuan Ma
Chang Teng
Yuankai Qi
Guorong Li
Laiyun Qing
Qi Wu
Qingming Huang
24
0
0
20 Dec 2023
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Rahul Pratap Singh
Bishmoy Paul
Ali Dabouei
Min Xu
22
1
0
10 Dec 2023
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Zineng Tang
Ziyi Yang
Mahmoud Khademi
Yang Liu
Chenguang Zhu
Mohit Bansal
LRM
MLLM
AuLLM
54
44
0
30 Nov 2023
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
23
26
0
25 Sep 2023
Accurate and Fast Compressed Video Captioning
Yaojie Shen
Xin Gu
Kai Xu
Hengrui Fan
Longyin Wen
Libo Zhang
ViT
23
26
0
22 Sep 2023
Collaborative Three-Stream Transformers for Video Captioning
Hao Wang
Libo Zhang
Hengrui Fan
Tiejian Luo
34
6
0
18 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
46
455
0
11 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
34
20
0
27 Aug 2023
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion
Yutao Jin
Bin Liu
Jing Wang
32
1
0
13 Aug 2023
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
32
25
0
13 Jul 2023
Any-to-Any Generation via Composable Diffusion
Zineng Tang
Ziyi Yang
Chenguang Zhu
Michael Zeng
Joey Tianyi Zhou
VGen
DiffM
31
171
0
19 May 2023
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation
Xilun Chen
L. Yu
Wenhan Xiong
Barlas Ouguz
Yashar Mehdad
Wen-tau Yih
VGen
26
3
0
04 May 2023
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Erik Cambria
Fatih Porikli
3DV
27
20
0
22 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
31
102
0
17 Apr 2023
Graph Attention for Automated Audio Captioning
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
14
8
0
07 Apr 2023
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Vladislav Lialin
Stephen Rawls
David M. Chan
Shalini Ghosh
Anna Rumshisky
Wael Hamza
VLM
AI4TS
28
6
0
04 Apr 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLM
VLM
29
23
0
29 Mar 2023
Fine-grained Audible Video Description
Xuyang Shen
Dong Li
Jinxing Zhou
Zhen Qin
Bowen He
...
Yuchao Dai
Lingpeng Kong
Meng Wang
Yu Qiao
Yiran Zhong
VGen
36
11
0
27 Mar 2023
GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation
Ji Qi
Jifan Yu
Teng Tu
Kunyu Gao
Yifan Xu
...
Juanzi Li
Jie Tang
Weidong Guo
Hui Liu
Yu-Syuan Xu
28
19
0
26 Mar 2023
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
Dohwan Ko
Joon-Young Choi
Hyeong Kyu Choi
Kyoung-Woon On
Byungseok Roh
Hyunwoo J. Kim
52
19
0
23 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
27
47
0
22 Mar 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
24
10
0
12 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
36
220
0
27 Feb 2023
STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training
Weihong Zhong
Mao Zheng
Duyu Tang
Xuan Luo
Heng Gong
Xiaocheng Feng
Bing Qin
32
8
0
20 Feb 2023
ADAPT: Action-aware Driving Caption Transformer
Bu Jin
Xinyi Liu
Yupeng Zheng
Pengfei Li
Hao Zhao
Tong Zhang
Yuhang Zheng
Guyue Zhou
Jingjing Liu
25
69
0
01 Feb 2023
Semi-Parametric Video-Grounded Text Generation
Sungdong Kim
Jin-Hwa Kim
Jiyoung Lee
Minjoon Seo
VGen
24
14
0
27 Jan 2023
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
32
46
0
09 Dec 2022
Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
Xian Zhong
Zipeng Li
Shuqin Chen
Kui Jiang
Chen Chen
Mang Ye
DiffM
VGen
19
40
0
28 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
19
23
0
22 Nov 2022
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Peng Jin
Jinfa Huang
Fenglin Liu
Xian Wu
Shen Ge
Guoli Song
David A. Clifton
Jing Chen
VLM
42
63
0
21 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
29
16
0
17 Nov 2022
Respecting Transfer Gap in Knowledge Distillation
Yulei Niu
Long Chen
Chan Zhou
Hanwang Zhang
26
23
0
23 Oct 2022
Contrastive Video-Language Learning with Fine-grained Frame Sampling
Zixu Wang
Yujie Zhong
Yishu Miao
Lin Ma
Lucia Specia
46
11
0
10 Oct 2022
1
2
3
Next