ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.12977
  4. Cited By
Language-free Training for Zero-shot Video Grounding

Language-free Training for Zero-shot Video Grounding

24 October 2022
Dahye Kim
Jungin Park
Jiyoung Lee
S. Park
Kwanghoon Sohn
ArXivPDFHTML

Papers citing "Language-free Training for Zero-shot Video Grounding"

49 / 49 papers shown
Title
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
125
3,355
0
16 Oct 2022
Revisiting the "Video" in Video-Language Understanding
Revisiting the "Video" in Video-Language Understanding
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
62
159
0
03 Jun 2022
Prompt-based Learning for Unpaired Image Captioning
Prompt-based Learning for Unpaired Image Captioning
Peipei Zhu
Tianlin Li
Lin Zhu
Zhenglong Sun
Weishi Zheng
Yaowei Wang
Chen Chen
VLM
57
31
0
26 May 2022
Compositional Temporal Grounding with Structured Variational Cross-Graph
  Correspondence Learning
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning
Juncheng Li
Junlin Xie
Long Qian
Linchao Zhu
Siliang Tang
Leilei Gan
Yi Yang
Yueting Zhuang
Xinze Wang
59
74
0
24 Mar 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
78
51
0
16 Mar 2022
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP
Zihao Wang
Wei Liu
Qian He
Xin-ru Wu
Zili Yi
CLIP
VLM
229
74
0
01 Mar 2022
Unsupervised Temporal Video Grounding with Deep Semantic Clustering
Unsupervised Temporal Video Grounding with Deep Semantic Clustering
Daizong Liu
Xiaoye Qu
Yinzhen Wang
Xing Di
Kai Zou
Yu Cheng
Zichuan Xu
Pan Zhou
64
51
0
14 Jan 2022
LAFITE: Towards Language-Free Training for Text-to-Image Generation
LAFITE: Towards Language-Free Training for Text-to-Image Generation
Yufan Zhou
Ruiyi Zhang
Changyou Chen
Chunyuan Li
Chris Tensmeyer
Tong Yu
Jiuxiang Gu
Jinhui Xu
Tong Sun
VLM
66
166
0
27 Nov 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
293
567
0
28 Sep 2021
Zero-shot Natural Language Video Localization
Zero-shot Natural Language Video Localization
Jinwoo Nam
Daechul Ahn
Dongyeop Kang
S. Ha
Jonghyun Choi
113
43
0
29 Aug 2021
Support-Set Based Cross-Supervision for Video Grounding
Support-Set Based Cross-Supervision for Video Grounding
Xinpeng Ding
N. Wang
Shiwei Zhang
De Cheng
Xiaomeng Li
Ziyuan Huang
Mingqian Tang
Xinbo Gao
52
42
0
24 Aug 2021
Cross-Sentence Temporal and Semantic Relations in Video Activity
  Localisation
Cross-Sentence Temporal and Semantic Relations in Video Activity Localisation
Jiabo Huang
Yang Liu
S. Gong
Hailin Jin
77
61
0
23 Jul 2021
Weakly Supervised Temporal Adjacent Network for Language Grounding
Weakly Supervised Temporal Adjacent Network for Language Grounding
Yuechen Wang
Jiajun Deng
Wen-gang Zhou
Houqiang Li
55
67
0
30 Jun 2021
Bridge to Answer: Structure-aware Graph Interaction Network for Video
  Question Answering
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
202
100
0
29 Apr 2021
Embracing Uncertainty: Decoupling and De-bias for Robust Temporal
  Grounding
Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding
Hao Zhou
Chongyang Zhang
Yan Luo
Yanjun Chen
Chuanping Hu
23
52
0
31 Mar 2021
Context-aware Biaffine Localizing Network for Temporal Sentence
  Grounding
Context-aware Biaffine Localizing Network for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Jianfeng Dong
Pan Zhou
Yu Cheng
Wei Wei
Zichuan Xu
Yulai Xie
34
145
0
22 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
681
28,659
0
26 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse
  Sampling
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
96
651
0
11 Feb 2021
VLG-Net: Video-Language Graph Matching Network for Video Grounding
VLG-Net: Video-Language Graph Matching Network for Video Grounding
Mattia Soldan
Mengmeng Xu
Sisi Qu
Jesper N. Tegnér
Guohao Li
53
70
0
19 Nov 2020
Reinforcement Learning for Weakly Supervised Temporal Grounding of
  Natural Language in Untrimmed Videos
Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos
Jie Wu
Guanbin Li
Xiaoguang Han
Liang Lin
OffRL
AI4TS
46
56
0
18 Sep 2020
VLANet: Video-Language Alignment Network for Weakly-Supervised Video
  Moment Retrieval
VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval
Minuk Ma
Sunjae Yoon
Junyeong Kim
Youngjoon Lee
Sunghun Kang
Chang D. Yoo
61
78
0
24 Aug 2020
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment
  Retrieval in Videos
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos
Zhu Zhang
Zhijie Lin
Zhou Zhao
Jieming Zhu
Xiuqiang He
56
69
0
19 Aug 2020
SumGraph: Video Summarization via Recursive Graph Modeling
SumGraph: Video Summarization via Recursive Graph Modeling
Jungin Park
Jiyoung Lee
Ig-Jae Kim
Kwanghoon Sohn
36
54
0
17 Jul 2020
Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
Debidatta Dwibedi
Y. Aytar
Jonathan Tompson
P. Sermanet
Andrew Zisserman
AI4TS
40
109
0
27 Jun 2020
Local-Global Video-Text Interactions for Temporal Grounding
Local-Global Video-Text Interactions for Temporal Grounding
Jonghwan Mun
Minsu Cho
Bohyung Han
52
269
0
16 Apr 2020
Dense Regression Network for Video Grounding
Dense Regression Network for Video Grounding
Runhao Zeng
Haoming Xu
Wenbing Huang
Peihao Chen
Mingkui Tan
Chuang Gan
57
283
0
07 Apr 2020
Weakly-Supervised Multi-Level Attentional Reconstruction Network for
  Grounding Textual Queries in Videos
Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos
Yijun Song
Jingwen Wang
Lin Ma
Zhou Yu
Jun Yu
38
61
0
16 Mar 2020
Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of
  Sentence in Video
Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video
Zhenfang Chen
Lin Ma
Wenhan Luo
Peng Tang
Kwan-Yee K. Wong
32
68
0
25 Jan 2020
Learning 2D Temporal Adjacent Networks for Moment Localization with
  Natural Language
Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
Songyang Zhang
Houwen Peng
Jianlong Fu
Jiebo Luo
42
465
0
08 Dec 2019
Weakly-Supervised Video Moment Retrieval via Semantic Completion Network
Weakly-Supervised Video Moment Retrieval via Semantic Completion Network
Zhijie Lin
Zhou Zhao
Zhu Zhang
Qi. Wang
Huasheng Liu
43
149
0
19 Nov 2019
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video
  Moment Retrieval
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval
Reuben Tan
Huijuan Xu
Kate Saenko
Bryan A. Plummer
42
67
0
27 Sep 2019
WSLLN: Weakly Supervised Natural Language Localization Networks
WSLLN: Weakly Supervised Natural Language Localization Networks
M. Gao
L. Davis
R. Socher
Caiming Xiong
37
80
0
31 Aug 2019
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Iro Laina
Christian Rupprecht
Nassir Navab
SSL
44
103
0
25 Aug 2019
Proposal-free Temporal Moment Localization of a Natural-Language Query
  in Video using Guided Attention
Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
F. Saleh
Hongdong Li
Stephen Gould
51
147
0
20 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
408
24,160
0
26 Jul 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
Amit K. Roy-Chowdhury
99
194
0
05 Apr 2019
Weakly Supervised Dense Event Captioning in Videos
Weakly Supervised Dense Event Captioning in Videos
Xuguang Duan
Wen-bing Huang
Chuang Gan
Jingdong Wang
Wenwu Zhu
Junzhou Huang
51
149
0
10 Dec 2018
MAN: Moment Alignment Network for Natural Language Moment Retrieval via
  Iterative Graph Adjustment
MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment
Da Zhang
Xiyang Dai
Xin Eric Wang
Yuan-fang Wang
L. Davis
41
303
0
30 Nov 2018
Unsupervised Image Captioning
Unsupervised Image Captioning
Yang Feng
Lin Ma
Wei Liu
Jiebo Luo
VLM
SSL
55
201
0
27 Nov 2018
To Find Where You Talk: Temporal Sentence Localization in Video with
  Attention Based Location Regression
To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression
Yitian Yuan
Tao Mei
Wenwu Zhu
59
332
0
19 Apr 2018
Localizing Moments in Video with Natural Language
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
91
940
0
04 Aug 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
199
7,961
0
22 May 2017
TALL: Temporal Activity Localization via Language Query
TALL: Temporal Activity Localization via Language Query
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
108
813
0
05 May 2017
Dense-Captioning Events in Videos
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
120
1,225
0
02 May 2017
Categorical Reparameterization with Gumbel-Softmax
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
221
5,323
0
03 Nov 2016
Hollywood in Homes: Crowdsourcing Data Collection for Activity
  Understanding
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Gunnar Sigurdsson
Gül Varol
Xinyu Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
77
1,238
0
06 Apr 2016
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
850
149,474
0
22 Dec 2014
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence
  Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung
Çağlar Gülçehre
Kyunghyun Cho
Yoshua Bengio
293
12,662
0
11 Dec 2014
1