Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.04995
Cited By
v1
v2
v3 (latest)
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
9 March 2023
Yimeng Zhang
Xin Chen
Jinghan Jia
Sijia Liu
Ke Ding
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Text-Visual Prompting for Efficient 2D Temporal Video Grounding"
46 / 46 papers shown
Title
Understanding and Improving Visual Prompting: A Label-Mapping Perspective
Aochuan Chen
Yuguang Yao
Pin-Yu Chen
Yihua Zhang
Sijia Liu
VPVLM
VLM
118
81
0
21 Nov 2022
Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge Devices
Yimeng Zhang
A. Kamath
Qiucheng Wu
Zhiwen Fan
Wuyang Chen
Zhangyang Wang
Shiyu Chang
Sijia Liu
Cong Hao
55
6
0
16 Oct 2022
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
VPVLM
VLM
256
568
0
06 Oct 2022
Fairness Reprogramming
Guanhua Zhang
Yihua Zhang
Yang Zhang
Wenqi Fan
Qing Li
Sijia Liu
Shiyu Chang
AAML
158
40
0
21 Sep 2022
Exploring Visual Prompts for Adapting Large-Scale Models
Hyojin Bahng
Ali Jahanian
S. Sankaranarayanan
Phillip Isola
VLM
VPVLM
LRM
68
272
0
31 Mar 2022
How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective
Yimeng Zhang
Yuguang Yao
Jinghan Jia
Jinfeng Yi
Min-Fong Hong
Shiyu Chang
Sijia Liu
AAML
112
34
0
27 Mar 2022
Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning
Pin-Yu Chen
VLM
166
64
0
22 Feb 2022
Prompting Visual-Language Models for Efficient Video Understanding
Chen Ju
Tengda Han
Kunhao Zheng
Ya Zhang
Weidi Xie
VPVLM
VLM
89
380
0
08 Dec 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
281
224
0
24 Sep 2021
Natural Language Video Localization with Learnable Moment Proposals
Shaoning Xiao
Long Chen
Jian Shao
Yueting Zhuang
Jun Xiao
66
43
0
22 Sep 2021
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Eric Wang
Zhi Wang
Wenwu Zhu
92
47
0
16 Sep 2021
Why Adversarial Reprogramming Works, When It Fails, and How to Tell the Difference
Yang Zheng
Xiaoyi Feng
Zhaoqiang Xia
Xiaoyue Jiang
Ambra Demontis
Maura Pintor
Battista Biggio
Fabio Roli
AAML
76
22
0
26 Aug 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
584
4,084
0
18 Apr 2021
GPT Understands, Too
Xiao Liu
Yanan Zheng
Zhengxiao Du
Ming Ding
Yujie Qian
Zhilin Yang
Jie Tang
VLM
168
1,179
0
18 Mar 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
130
664
0
11 Feb 2021
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li
Percy Liang
252
4,299
0
01 Jan 2021
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Timo Schick
Hinrich Schütze
132
974
0
15 Sep 2020
Transfer Learning without Knowing: Reprogramming Black-box Machine Learning Models with Scarce Data and Limited Resources
Yun-Yun Tsai
Pin-Yu Chen
Tsung-Yi Ho
AAML
MLAU
BDL
80
99
0
17 Jul 2020
Appearance-Preserving 3D Convolution for Video-based Person Re-identification
Xinqian Gu
Hong Chang
Bingpeng Ma
Hongkai Zhang
Xilin Chen
3DH
3DPC
62
137
0
16 Jul 2020
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
90
315
0
29 Apr 2020
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
146
1,024
0
09 Apr 2020
Dense Regression Network for Video Grounding
Runhao Zeng
Haoming Xu
Wenbing Huang
Peihao Chen
Mingkui Tan
Chuang Gan
81
283
0
07 Apr 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
97
261
0
22 Jan 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
348
1,617
0
21 Jan 2020
Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video
Jie Wu
Guanbin Li
Si Liu
Liang Lin
OffRL
64
104
0
18 Jan 2020
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OOD
ObjD
60
320
0
10 Jan 2020
Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
Songyang Zhang
Houwen Peng
Jianlong Fu
Jiebo Luo
75
470
0
08 Dec 2019
How Can We Know What Language Models Know?
Zhengbao Jiang
Frank F. Xu
Jun Araki
Graham Neubig
KELM
144
1,409
0
28 Nov 2019
Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
Zhaohui Zheng
Ping Wang
Wei Liu
Jinze Li
Rongguang Ye
Dongwei Ren
NoLa
114
3,707
0
19 Nov 2019
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
576
2,674
0
03 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
175
1,666
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
250
2,488
0
20 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
153
1,963
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
243
3,695
0
06 Aug 2019
Tripping through time: Efficient Localization of Activities in Videos
Meera Hahn
Asim Kadav
James M. Rehg
H. Graf
83
86
0
22 Apr 2019
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
169
3,282
0
10 Dec 2018
Adversarial Reprogramming of Neural Networks
Gamaleldin F. Elsayed
Ian Goodfellow
Jascha Narain Sohl-Dickstein
OOD
AAML
45
183
0
28 Jun 2018
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
260
1,898
0
28 Dec 2017
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
155
1,333
0
13 Dec 2017
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
300
8,917
0
21 Nov 2017
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
123
949
0
04 Aug 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
240
8,038
0
22 May 2017
TALL: Temporal Activity Localization via Language Query
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
127
824
0
05 May 2017
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
144
1,249
0
02 May 2017
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
224
2,493
0
01 Apr 2015
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan
Andrew Zisserman
256
7,542
0
09 Jun 2014
1