Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.01638
Cited By
VLG: General Video Recognition with Web Textual Knowledge
3 December 2022
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VLG: General Video Recognition with Web Textual Knowledge"
50 / 78 papers shown
Title
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
129
99
0
04 Jul 2022
OCSampler: Compressing Videos to One Clip with Single-step Sampling
Jintao Lin
Haodong Duan
Kai-xiang Chen
Dahua Lin
Limin Wang
59
24
0
12 Jan 2022
Prompting Visual-Language Models for Efficient Video Understanding
Chen Ju
Tengda Han
Kunhao Zheng
Ya Zhang
Weidi Xie
VPVLM
VLM
75
376
0
08 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
144
689
0
02 Dec 2021
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
Changyao Tian
Wenhai Wang
Xizhou Zhu
Jifeng Dai
Yu Qiao
VLM
57
72
0
26 Nov 2021
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
117
904
0
22 Nov 2021
UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
Andra Acsintoae
Andrei Florescu
Mariana-Iuliana Georgescu
Tudor Mare
Paul Sumedrea
Radu Tudor Ionescu
Fahad Shahbaz Khan
M. Shah
AI4TS
56
92
0
16 Nov 2021
A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark
Zhenxi Zhu
Limin Wang
Sheng Guo
Gangshan Wu
96
32
0
24 Oct 2021
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
101
44
0
21 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
188
372
0
17 Sep 2021
Self Supervision to Distillation for Long-Tailed Visual Recognition
Tianhao Li
Limin Wang
Gangshan Wu
71
103
0
09 Sep 2021
Parametric Contrastive Learning
Jiequan Cui
Zhisheng Zhong
Shu Liu
Bei Yu
Jiaya Jia
84
277
0
26 Jul 2021
Evidential Deep Learning for Open Set Action Recognition
Wentao Bao
Qi Yu
Yu Kong
CML
EDL
82
140
0
21 Jul 2021
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
94
1,481
0
24 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
101
129
0
21 Jun 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
45
132
0
20 May 2021
VideoLT: Large-scale Long-tailed Video Recognition
Xing Zhang
Zuxuan Wu
Zejia Weng
Huazhu Fu
Jingjing Chen
Yu-Gang Jiang
Larry S. Davis
67
42
0
06 May 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
127
1,257
0
22 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
300
588
0
22 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
403
802
0
18 Apr 2021
Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
Weiyao Wang
Matt Feiszli
Heng Wang
Du Tran
VOS
57
125
0
10 Apr 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
205
2,147
0
29 Mar 2021
An Image is Worth 16x16 Words, What is a Video Worth?
Gilad Sharir
Asaf Noy
Lihi Zelnik-Manor
ViT
62
125
0
25 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
826
29,341
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
422
3,839
0
11 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
362
2,045
0
09 Feb 2021
Few-shot Action Recognition with Prototype-centered Attentive Learning
Xiatian Zhu
Antoine Toisoul
Juan-Manuel Prez-Ra
Li Zhang
Brais Martínez
Tao Xiang
61
53
0
20 Jan 2021
Temporal-Relational CrossTransformers for Few-Shot Action Recognition
Toby Perrett
A. Masullo
T. Burghardt
Majid Mirmehdi
Dima Damen
ViT
78
148
0
15 Jan 2021
TDN: Temporal Difference Networks for Efficient Action Recognition
Limin Wang
Zhan Tong
Bin Ji
Gangshan Wu
75
397
0
18 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
555
40,961
0
22 Oct 2020
Feature Space Augmentation for Long-Tailed Data
Peng Chu
Xiao Bian
Shaopeng Liu
Haibin Ling
71
240
0
09 Aug 2020
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
Yue Meng
Chung-Ching Lin
Yikang Shen
P. Sattigeri
Leonid Karlinsky
A. Oliva
Kate Saenko
Rogerio Feris
49
144
0
31 Jul 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
58
48
0
29 Jul 2020
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant
Dhruv Batra
Peter Anderson
Alex Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
71
85
0
23 Jul 2020
Depthwise Spatio-Temporal STFT Convolutional Neural Networks for Human Action Recognition
Sudhakar Kumawat
Manisha Verma
Yuta Nakashima
Shanmuganathan Raman
164
43
0
22 Jul 2020
Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval
Xun Yang
Jianfeng Dong
Yixin Cao
Xun Wang
Meng Wang
Tat-Seng Chua
52
139
0
06 Jul 2020
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
125
1,018
0
09 Apr 2020
TEA: Temporal Excitation and Aggregation for Action Recognition
Yan-Ran Li
Bin Ji
Xintian Shi
Jianguo Zhang
Bin Kang
Limin Wang
ViT
82
447
0
03 Apr 2020
Equalization Loss for Long-Tailed Object Recognition
Jingru Tan
Changbao Wang
Buyu Li
Quanquan Li
Wanli Ouyang
Changqing Yin
Junjie Yan
315
462
0
11 Mar 2020
On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering
Xinyu Wang
Yuliang Liu
Chunhua Shen
Chun Chet Ng
Canjie Luo
Lianwen Jin
C. Chan
Anton Van Den Hengel
Liangwei Wang
81
97
0
24 Feb 2020
Learning Spatiotemporal Features via Video and Text Pair Discrimination
Tianhao Li
Limin Wang
VGen
57
57
0
16 Jan 2020
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGen
SSL
114
712
0
13 Dec 2019
BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition
Boyan Zhou
Quan Cui
Xiu-Shen Wei
Zhao-Min Chen
285
798
0
05 Dec 2019
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Mathew Monfort
Bowen Pan
K. Ramakrishnan
A. Andonian
Barry A. McNamara
A. Lascelles
Quanfu Fan
Dan Gutfreund
Rogerio Feris
A. Oliva
VLM
76
68
0
01 Nov 2019
Decoupling Representation and Classifier for Long-Tailed Recognition
Bingyi Kang
Saining Xie
Marcus Rohrbach
Zhicheng Yan
Albert Gordo
Jiashi Feng
Yannis Kalantidis
OODD
172
1,217
0
21 Oct 2019
ProtoGAN: Towards Few Shot Learning for Action Recognition
Sai Kumar Dwivedi
Vikram Gupta
Rahul Mitra
Shuaib Ahmed
Arjun Jain
64
95
0
17 Sep 2019
STM: SpatioTemporal and Motion Encoding for Action Recognition
Boyuan Jiang
Mengmeng Wang
Weihao Gan
Wei Wu
Junjie Yan
79
382
0
07 Aug 2019
TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition
M. Bishay
Georgios Zoumpourlis
Ioannis Patras
ViT
49
155
0
21 Jul 2019
A Short Note on the Kinetics-700 Human Action Dataset
João Carreira
Eric Noland
Chloe Hillier
Andrew Zisserman
70
452
0
15 Jul 2019
Few-Shot Video Classification via Temporal Alignment
Kaidi Cao
Jingwei Ji
Zhangjie Cao
C. Chang
Juan Carlos Niebles
AI4TS
70
239
0
27 Jun 2019
1
2
Next