ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.03191
  4. Cited By
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

6 December 2022
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
Zhiyu Zhao
Hongjie Zhang
Jilan Xu
Yi Liu
Zun Wang
Sen Xing
Guo Chen
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
    VLM
    VGen
ArXivPDFHTML

Papers citing "InternVideo: General Video Foundation Models via Generative and Discriminative Learning"

50 / 241 papers shown
Title
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models
Yue Zhang
Zhiyang Xu
Ying Shen
Parisa Kordjamshidi
Lifu Huang
34
6
0
04 Oct 2024
Saliency-Guided DETR for Moment Retrieval and Highlight Detection
Saliency-Guided DETR for Moment Retrieval and Highlight Detection
Aleksandr Gordeev
Vladimir Dokholyan
Irina Tolstykh
Maksim Kuprashevich
23
4
0
02 Oct 2024
Solution for Temporal Sound Localisation Task of ECCV Second Perception
  Test Challenge 2024
Solution for Temporal Sound Localisation Task of ECCV Second Perception Test Challenge 2024
Haowei Gu
Weihao Zhu
Yang Yang
37
0
0
29 Sep 2024
Video DataFlywheel: Resolving the Impossible Data Trinity in
  Video-Language Understanding
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
Xiao Wang
Jianlong Wu
Zijia Lin
Fuzheng Zhang
Di Zhang
Liqiang Nie
VGen
37
1
0
29 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
60
11
0
26 Sep 2024
Skills Made to Order: Efficient Acquisition of Robot Cooking Skills
  Guided by Multiple Forms of Internet Data
Skills Made to Order: Efficient Acquisition of Robot Cooking Skills Guided by Multiple Forms of Internet Data
Mrinal Verghese
C. Atkeson
34
0
0
23 Sep 2024
Learning to Localize Actions in Instructional Videos with LLM-Based
  Multi-Pathway Text-Video Alignment
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Yuxiao Chen
K. Li
Wentao Bao
Deep Patel
Yu Kong
Martin Renqiang Min
Dimitris N. Metaxas
DiffM
33
1
0
22 Sep 2024
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
Yongqi Wang
Xinxiao Wu
Shuo Yang
Jiebo Luo
125
1
0
19 Sep 2024
Uncertainty-Guided Self-Questioning and Answering for Video-Language Alignment
Uncertainty-Guided Self-Questioning and Answering for Video-Language Alignment
Jin Chen
Kaijing Ma
Haojian Huang
Jiayu Shen
Han Fang
Xianghao Zang
Chao Ban
79
2
0
17 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
72
15
0
05 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
132
7
0
02 Sep 2024
Training-free Video Temporal Grounding using Large-scale Pre-trained
  Models
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng
Xinhao Cai
Qingchao Chen
Yuxin Peng
Yang Liu
34
4
0
29 Aug 2024
GenRec: Unifying Video Generation and Recognition with Diffusion Models
GenRec: Unifying Video Generation and Recognition with Diffusion Models
Zejia Weng
Xitong Yang
Zhen Xing
Zuxuan Wu
Yu-Gang Jiang
VGen
DiffM
42
5
0
27 Aug 2024
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
Qirui Chen
Shangzhe Di
Weidi Xie
26
12
0
26 Aug 2024
HabitAction: A Video Dataset for Human Habitual Behavior Recognition
HabitAction: A Video Dataset for Human Habitual Behavior Recognition
Hongwu Li
Zhenliang Zhang
Wei Wang
18
0
0
24 Aug 2024
Continuous Perception Benchmark
Continuous Perception Benchmark
Zeyu Wang
Zhenzhen Weng
Serena Yeung-Levy
VLM
31
0
0
15 Aug 2024
From Recognition to Prediction: Leveraging Sequence Reasoning for Action
  Anticipation
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation
Xin Liu
Chao Hao
Zitong Yu
Huanjing Yue
Jingyu Yang
33
1
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
45
1
0
04 Aug 2024
Text-Guided Video Masked Autoencoder
Text-Guided Video Masked Autoencoder
D. Fan
Jue Wang
Shuai Liao
Zhikang Zhang
Vimal Bhat
Xinyu Li
VGen
28
3
0
01 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
43
5
0
31 Jul 2024
Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model
Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model
Zhichao Zhang
Xinyue Li
Wei Sun
Jun Jia
Xiongkuo Min
...
Puyi Wang
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Guangtao Zhai
EGVM
50
5
0
31 Jul 2024
End-to-End Video Question Answering with Frame Scoring Mechanisms and
  Adaptive Sampling
End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling
Jianxin Liang
Xiaojun Meng
Yueqian Wang
Chang Liu
Qun Liu
Dongyan Zhao
29
5
0
21 Jul 2024
Rethinking Video-Text Understanding: Retrieval from Counterfactually
  Augmented Data
Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
Wufei Ma
Kai Li
Zhongshi Jiang
Moustafa Meshry
Qihao Liu
Huiyu Wang
Christian Hane
Alan L. Yuille
VGen
40
1
0
18 Jul 2024
OmChat: A Recipe to Train Multimodal Language Models with Strong Long
  Context and Video Understanding
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Tiancheng Zhao
Qianqian Zhang
Kyusong Lee
Peng Liu
Lu Zhang
Chunxin Fang
Jiajia Liao
Kelei Jiang
Yibo Ma
Ruochen Xu
MLLM
VLM
51
5
0
06 Jul 2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting,
  and Transportation
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Yuhan Zhu
Yuyang Ji
Zhiyu Zhao
Gangshan Wu
Limin Wang
VLM
39
7
0
05 Jul 2024
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
Le Yang
Ziwei Zheng
Yizeng Han
Hao-Ran Cheng
Shiji Song
Gao Huang
Fan Li
58
8
0
03 Jul 2024
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
KeyVideoLLM: Towards Large-scale Video Keyframe Selection
Hao Liang
Jiapeng Li
Tianyi Bai
Xijie Huang
Linzhuang Sun
Zhengren Wang
Conghui He
Bin Cui
Chong Chen
Wentao Zhang
VGen
29
7
0
03 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description
  Models
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
44
52
0
30 Jun 2024
Enhancing Video-Language Representations with Structural Spatio-Temporal
  Alignment
Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment
Hao Fei
Shengqiong Wu
Meishan Zhang
M. Zhang
Tat-Seng Chua
Shuicheng Yan
AI4TS
47
40
0
27 Jun 2024
MatchTime: Towards Automatic Soccer Game Commentary Generation
MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao
Haoning Wu
Chang-rui Liu
Yanfeng Wang
Weidi Xie
38
7
0
26 Jun 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
29
2
0
26 Jun 2024
MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment
  Retrieval
MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval
Weitong Cai
Jiabo Huang
Shaogang Gong
Hailin Jin
Yang Liu
44
0
0
25 Jun 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video
  Understanding
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Xinyu Fang
Kangrui Mao
Haodong Duan
Xiangyu Zhao
Yining Li
Dahua Lin
Kai Chen
VLM
55
61
0
20 Jun 2024
Multi-modal Transfer Learning between Biological Foundation Models
Multi-modal Transfer Learning between Biological Foundation Models
Juan Jose Garau-Luis
Patrick Bordes
Liam Gonzalez
Masa Roller
Bernardo P. de Almeida
...
Stefan Laurent
Jan Grzegorzewski
Maren Lang
Thomas Pierrot
Guillaume Richard
AI4CE
38
3
0
20 Jun 2024
ViLCo-Bench: VIdeo Language COntinual learning Benchmark
ViLCo-Bench: VIdeo Language COntinual learning Benchmark
Tianqi Tang
Shohreh Deldari
Hao Xue
Celso De Melo
Flora D. Salim
CLL
34
2
0
19 Jun 2024
DrVideo: Document Retrieval Based Long Video Understanding
DrVideo: Document Retrieval Based Long Video Understanding
Ziyu Ma
Chenhui Gou
Hengcan Shi
Bin Sun
Shutao Li
Hamid Rezatofighi
Jianfei Cai
VLM
36
13
0
18 Jun 2024
CARLOR @ Ego4D Step Grounding Challenge: Bayesian temporal-order priors
  for test time refinement
CARLOR @ Ego4D Step Grounding Challenge: Bayesian temporal-order priors for test time refinement
Carlos Plou
Lorenzo Mur-Labadia
Ruben Martinez-Cantin
Ana C. Murillo
BDL
43
1
0
13 Jun 2024
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
Jongwoo Park
Kanchana Ranasinghe
Kumara Kahatapitiya
Wonjeong Ryoo
Donghyun Kim
Michael S. Ryoo
65
20
0
13 Jun 2024
Generalist Multimodal AI: A Review of Architectures, Challenges and
  Opportunities
Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
Sai Munikoti
Ian Stewart
Sameera Horawalavithana
Henry Kvinge
Tegan H. Emerson
Sandra E Thompson
Karl Pazdernik
38
2
0
08 Jun 2024
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
Xingrui Wang
Wufei Ma
Angtian Wang
Shuo Chen
Adam Kortylewski
Alan L. Yuille
34
3
0
02 Jun 2024
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang
Shoubin Yu
Elias Stengel-Eskin
Jaehong Yoon
Feng Cheng
Gedas Bertasius
Mohit Bansal
48
56
0
29 May 2024
Matryoshka Multimodal Models
Matryoshka Multimodal Models
Mu Cai
Jianwei Yang
Jianfeng Gao
Yong Jae Lee
VLM
45
25
0
27 May 2024
Hawk: Learning to Understand Open-World Video Anomalies
Hawk: Learning to Understand Open-World Video Anomalies
Jiaqi Tang
Hao Lu
Ruizheng Wu
Xiaogang Xu
Ke Ma
Cheng Fang
Bin Guo
Jiangbo Lu
Qifeng Chen
Ying-Cong Chen
VLM
40
9
0
27 May 2024
Streaming Long Video Understanding with Large Language Models
Streaming Long Video Understanding with Large Language Models
Rui Qian
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Shuangrui Ding
Dahua Lin
Jiaqi Wang
VLM
33
40
0
25 May 2024
Active Object Detection with Knowledge Aggregation and Distillation from
  Large Models
Active Object Detection with Knowledge Aggregation and Distillation from Large Models
Dejie Yang
Yang Liu
37
3
0
21 May 2024
A Survey on Backbones for Deep Video Action Recognition
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
28
1
0
09 May 2024
How Good is my Video LMM? Complex Video Reasoning and Robustness
  Evaluation Suite for Video-LMMs
How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
Muhammad Uzair Khattak
Muhammad Ferjad Naeem
Jameel Hassan
Muzammal Naseer
Federico Tombari
Fahad Shahbaz Khan
Salman Khan
LRM
ELM
37
10
0
06 May 2024
FlexiFilm: Long Video Generation with Flexible Conditions
FlexiFilm: Long Video Generation with Flexible Conditions
Yichen Ouyang
Jianhao Yuan
Hao Zhao
Gaoang Wang
Bo-Lu Zhao
DiffM
42
6
0
29 Apr 2024
Step Differences in Instructional Video
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
32
5
0
24 Apr 2024
ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition
ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition
Otto Brookes
Majid Mirmehdi
H. Kühl
T. Burghardt
35
3
0
13 Apr 2024
Previous
12345
Next