ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.02101
  4. Cited By
TALL: Temporal Activity Localization via Language Query
v1v2 (latest)

TALL: Temporal Activity Localization via Language Query

5 May 2017
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
ArXiv (abs)PDFHTML

Papers citing "TALL: Temporal Activity Localization via Language Query"

50 / 433 papers shown
Title
COOT: Cooperative Hierarchical Transformer for Video-Text Representation
  Learning
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViTCLIP
77
174
0
01 Nov 2020
What is More Likely to Happen Next? Video-and-Language Future Event
  Prediction
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
96
73
0
15 Oct 2020
DORi: Discovering Object Relationship for Moment Localization of a
  Natural-Language Query in Video
DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Basura Fernando
Hongdong Li
Stephen Gould
192
10
0
13 Oct 2020
A Simple Yet Effective Method for Video Temporal Grounding with
  Cross-Modality Attention
A Simple Yet Effective Method for Video Temporal Grounding with Cross-Modality Attention
Binjie Zhang
Yu Li
Chun Yuan
D. Xu
Pin Jiang
Ying Shan
33
5
0
23 Sep 2020
Frame-wise Cross-modal Matching for Video Moment Retrieval
Frame-wise Cross-modal Matching for Video Moment Retrieval
Haoyu Tang
Jihua Zhu
Meng Liu
Zan Gao
Zhiyong Cheng
86
62
0
22 Sep 2020
Reinforcement Learning for Weakly Supervised Temporal Grounding of
  Natural Language in Untrimmed Videos
Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos
Jie Wu
Guanbin Li
Xiaoguang Han
Liang Lin
OffRLAI4TS
84
56
0
18 Sep 2020
Linear Temporal Public Announcement Logic: a new perspective for
  reasoning about the knowledge of multi-classifiers
Linear Temporal Public Announcement Logic: a new perspective for reasoning about the knowledge of multi-classifiers
Amirhoshang Hoseinpour Dehkordi
Majid Alizadeh
A. Movaghar
19
0
0
08 Sep 2020
Video Moment Retrieval via Natural Language Queries
Xinli Yu
Mohsen Malmir
C. He
Yue Liu
Rex Wu
27
1
0
04 Sep 2020
Uncovering Hidden Challenges in Query-Based Video Moment Retrieval
Uncovering Hidden Challenges in Query-Based Video Moment Retrieval
Mayu Otani
Yuta Nakashima
Esa Rahtu
J. Heikkilä
149
76
0
01 Sep 2020
Sentence Guided Temporal Modulation for Dynamic Video Thumbnail
  Generation
Sentence Guided Temporal Modulation for Dynamic Video Thumbnail Generation
Mrigank Rochan
Mahesh Kumar Krishna Reddy
Yang Wang
53
7
0
31 Aug 2020
VLANet: Video-Language Alignment Network for Weakly-Supervised Video
  Moment Retrieval
VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval
Minuk Ma
Sunjae Yoon
Junyeong Kim
Youngjoon Lee
Sunghun Kang
Chang D. Yoo
92
78
0
24 Aug 2020
Text-based Localization of Moments in a Video Corpus
Text-based Localization of Moments in a Video Corpus
Sudipta Paul
Niluthpol Chowdhury Mithun
Amit K. Roy-Chowdhury
46
15
0
20 Aug 2020
Generating Adjacency Matrix for Video Relocalization
Generating Adjacency Matrix for Video Relocalization
Yuanen Zhou
Mingfei Wang
Ruolin Wang
Shuwei Huo
24
0
0
19 Aug 2020
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment
  Retrieval in Videos
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos
Zhu Zhang
Zhijie Lin
Zhou Zhao
Jieming Zhu
Xiuqiang He
81
69
0
19 Aug 2020
Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video
  Grounding
Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video Grounding
Zhu Zhang
Zhou Zhao
Zhijie Lin
Baoxing Huai
Jing Yuan
99
35
0
16 Aug 2020
Fine-grained Iterative Attention Network for TemporalLanguage
  Localization in Videos
Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos
Xiaoye Qu
Peng Tang
Zhikang Zhou
Yu Cheng
Jianfeng Dong
Pan Zhou
86
92
0
06 Aug 2020
Jointly Cross- and Self-Modal Graph Attention Network for Query-Based
  Moment Localization
Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization
Daizong Liu
Xiaoye Qu
Xiao-Yang Liu
Jianfeng Dong
Pan Zhou
Zichuan Xu
92
129
0
04 Aug 2020
Learning Modality Interaction for Temporal Sentence Localization and
  Event Captioning in Videos
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
99
102
0
28 Jul 2020
Graph Neural Network for Video Relocalization
Graph Neural Network for Video Relocalization
Yuanen Zhou
Mingfei Wang
Ruolin Wang
Shuwei Huo
26
0
0
20 Jul 2020
Modality Shifting Attention Network for Multi-modal Video Question
  Answering
Modality Shifting Attention Network for Multi-modal Video Question Answering
Junyeong Kim
Minuk Ma
T. Pham
Kyungsu Kim
Chang D. Yoo
84
72
0
04 Jul 2020
Weak Supervision and Referring Attention for Temporal-Textual
  Association Learning
Weak Supervision and Referring Attention for Temporal-Textual Association Learning
Zhiyuan Fang
Shu Kong
Zhe Wang
Charless C. Fowlkes
Yezhou Yang
66
17
0
21 Jun 2020
Language Guided Networks for Cross-modal Moment Retrieval
Language Guided Networks for Cross-modal Moment Retrieval
Kun Liu
Huadong Ma
Chuang Gan
30
2
0
18 Jun 2020
Video Moment Localization using Object Evidence and Reverse Captioning
Video Moment Localization using Object Evidence and Reverse Captioning
Madhawa Vidanapathirana
Supriya Pandhre
Sonia Raychaudhuri
Anjali Khurana
18
1
0
18 Jun 2020
Dense-Caption Matching and Frame-Selection Gating for Temporal
  Localization in VideoQA
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Hyounghun Kim
Zineng Tang
Joey Tianyi Zhou
80
31
0
13 May 2020
A Benchmark for Structured Procedural Knowledge Extraction from Cooking
  Videos
A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos
Frank F. Xu
Lei Ji
Botian Shi
Junyi Du
Graham Neubig
Yonatan Bisk
Nan Duan
41
21
0
02 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLMVLMOffRLAI4TS
133
506
0
01 May 2020
Span-based Localizing Network for Natural Language Video Localization
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
110
316
0
29 Apr 2020
Inferring Temporal Compositions of Actions Using Probabilistic Automata
Inferring Temporal Compositions of Actions Using Probabilistic Automata
Rodrigo Santa Cruz
A. Cherian
Basura Fernando
Dylan Campbell
Stephen Gould
39
2
0
28 Apr 2020
Multiple Visual-Semantic Embedding for Video Retrieval from Query
  Sentence
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence
Huy Manh Nguyen
Tomo Miyazaki
Yoshihiro Sugaya
S. Omachi
139
1
0
16 Apr 2020
Local-Global Video-Text Interactions for Temporal Grounding
Local-Global Video-Text Interactions for Temporal Grounding
Jonghwan Mun
Minsu Cho
Bohyung Han
98
270
0
16 Apr 2020
YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in
  Domain-Specific Videos
YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos
Shizhe Chen
Weiying Wang
Ludan Ruan
Linli Yao
Qin Jin
30
3
0
12 Apr 2020
Dense Regression Network for Video Grounding
Dense Regression Network for Video Grounding
Runhao Zeng
Haoming Xu
Wenbing Huang
Peihao Chen
Mingkui Tan
Chuang Gan
88
284
0
07 Apr 2020
Sub-Instruction Aware Vision-and-Language Navigation
Sub-Instruction Aware Vision-and-Language Navigation
Yicong Hong
Cristian Rodriguez-Opazo
Qi Wu
Stephen Gould
129
72
0
06 Apr 2020
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
J. Liu
Wenhu Chen
Yu Cheng
Zhe Gan
Licheng Yu
Yiming Yang
Jingjing Liu
MLLMVGen
99
70
0
25 Mar 2020
Weakly-Supervised Multi-Level Attentional Reconstruction Network for
  Grounding Textual Queries in Videos
Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos
Yijun Song
Jingwen Wang
Lin Ma
Zhou Yu
Jun Yu
71
61
0
16 Mar 2020
Mi YouTube es Su YouTube? Analyzing the Cultures using YouTube
  Thumbnails of Popular Videos
Mi YouTube es Su YouTube? Analyzing the Cultures using YouTube Thumbnails of Popular Videos
Songyang Zhang
Tolga Aktas
Jiebo Luo
31
5
0
27 Jan 2020
Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of
  Sentence in Video
Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video
Zhenfang Chen
Lin Ma
Wenhan Luo
Peng Tang
Kwan-Yee K. Wong
51
68
0
25 Jan 2020
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
221
288
0
24 Jan 2020
Zero-Shot Activity Recognition with Videos
Zero-Shot Activity Recognition with Videos
Evin Pınar Örnek
19
1
0
22 Jan 2020
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form
  Sentences
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences
Zhu Zhang
Zhou Zhao
Yang Zhao
Qi. Wang
Huasheng Liu
Lianli Gao
99
118
0
19 Jan 2020
Tree-Structured Policy based Progressive Reinforcement Learning for
  Temporally Language Grounding in Video
Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video
Jie Wu
Guanbin Li
Si Liu
Liang Lin
OffRL
71
104
0
18 Jan 2020
Learning 2D Temporal Adjacent Networks for Moment Localization with
  Natural Language
Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
Songyang Zhang
Houwen Peng
Jianlong Fu
Jiebo Luo
75
470
0
08 Dec 2019
Compositional Temporal Visual Grounding of Natural Language Event
  Descriptions
Compositional Temporal Visual Grounding of Natural Language Event Descriptions
Jonathan C. Stroud
Ryan McCaffrey
Rada Mihalcea
Jia Deng
Olga Russakovsky
64
4
0
04 Dec 2019
Weakly-Supervised Video Moment Retrieval via Semantic Completion Network
Weakly-Supervised Video Moment Retrieval via Semantic Completion Network
Zhijie Lin
Zhou Zhao
Zhu Zhang
Qi. Wang
Huasheng Liu
87
150
0
19 Nov 2019
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding
  in Videos
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos
Yitian Yuan
Lin Ma
Jingwen Wang
Wei Liu
Wenwu Zhu
102
244
0
31 Oct 2019
Rekall: Specifying Video Events using Compositions of Spatiotemporal
  Labels
Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels
Daniel Y. Fu
Will Crichton
James Hong
Xinwei Yao
Haotian Zhang
A. Truong
A. Narayan
Maneesh Agrawala
Christopher Ré
Kayvon Fatahalian
64
49
0
07 Oct 2019
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
Kexin Yi
Yuta Saito
Yunzhu Li
Pushmeet Kohli
Jiajun Wu
Antonio Torralba
J. Tenenbaum
NAI
143
475
0
03 Oct 2019
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video
  Moment Retrieval
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval
Reuben Tan
Huijuan Xu
Kate Saenko
Bryan A. Plummer
99
67
0
27 Sep 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event
  Captioning
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
75
81
0
22 Sep 2019
Temporally Grounding Language Queries in Videos by Contextual
  Boundary-aware Prediction
Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction
Jingwen Wang
Lin Ma
Wenhao Jiang
76
183
0
11 Sep 2019
Previous
123456789
Next