ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03206
  4. Cited By
Perceiver: General Perception with Iterative Attention

Perceiver: General Perception with Iterative Attention

4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
    VLM
    ViT
    MDE
ArXivPDFHTML

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 682 papers shown
Title
Multi-Modal Foundation Models for Computational Pathology: A Survey
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
46
0
0
12 Mar 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Gedas Bertasius
Lorenzo Torresani
159
0
0
12 Mar 2025
iManip: Skill-Incremental Learning for Robotic Manipulation
Zexin Zheng
Jia-Feng Cai
Xiao-Ming Wu
Yi-Lin Wei
Yu-Ming Tang
Wei-Shi Zheng
CLL
54
0
0
10 Mar 2025
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
Hanzhi Chen
Boyang Sun
Anran Zhang
Marc Pollefeys
Stefan Leutenegger
LM&Ro
72
0
0
10 Mar 2025
Removing Averaging: Personalized Lip-Sync Driven Characters Based on Identity Adapter
Yanyu Zhu
Licheng Bai
Jintao Xu
Jiwei Tang
Hai-tao Zheng
38
0
0
09 Mar 2025
Optimal Transport for Brain-Image Alignment: Unveiling Redundancy and Synergy in Neural Information Processing
Yang Xiao
Wang Lu
Jie Ji
Ruimeng Ye
Gen Li
Xiaolong Ma
Bo Hui
OT
47
0
0
09 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
90
2
0
08 Mar 2025
ALVI Interface: Towards Full Hand Motion Decoding for Amputees Using sEMG
ALVI Interface: Towards Full Hand Motion Decoding for Amputees Using sEMG
A. Kovalev
Anna Makarova
Petr Chizhov
Matvey Antonov
Gleb Duplin
...
Viacheslav Gostevskii
Vladimir Bessonov
Andrey Tsurkan
Mikhail Korobok
Aleksejs Timčenko
41
0
0
28 Feb 2025
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation
Shaharukh Khan
Ayush Tarun
Ali Faraz
Palash Kamble
Vivek Dahiya
Praveen Kumar Pokala
Ashish Kulkarni
Chandra Khatri
Abhinav Ravi
Shubham Agarwal
139
0
0
27 Feb 2025
Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions
Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions
R. Lucassen
Sander P.J. Moonemans
Tijn van de Luijtgaarden
Gerben E. Breimer
W. Blokx
M. Veta
MedIm
60
2
0
26 Feb 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Graph Perceiver IO: A General Architecture for Graph Structured Data
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
98
2
0
24 Feb 2025
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
Ibrahim Fayad
Max Zimmer
Martin Schwartz
P. Ciais
Fabian Gieseke
Gabriel Belouze
Sarah Brood
A. D. Truchis
Alexandre d’Aspremont
AI4TS
43
0
0
24 Feb 2025
Chitrarth: Bridging Vision and Language for a Billion People
Chitrarth: Bridging Vision and Language for a Billion People
Shaharukh Khan
Ayush Tarun
Abhinav Ravi
Ali Faraz
Akshat Patidar
Praveen Kumar Pokala
Anagha Bhangare
Raja Kolla
Chandra Khatri
Shubham Agarwal
VLM
124
1
0
21 Feb 2025
FreeBlend: Advancing Concept Blending with Staged Feedback-Driven Interpolation Diffusion
FreeBlend: Advancing Concept Blending with Staged Feedback-Driven Interpolation Diffusion
Yufan Zhou
Haoyu Shen
Huan Wang
DiffM
106
0
0
17 Feb 2025
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
Sumin An
Junyoung Sung
Wonpyo Park
Chanjun Park
Paul Hongsuck Seo
97
0
0
10 Feb 2025
VILP: Imitation Learning with Latent Video Planning
VILP: Imitation Learning with Latent Video Planning
Zhengtong Xu
Qiang Qiu
Yu She
VGen
75
1
0
03 Feb 2025
Imitation Game for Adversarial Disillusion with Multimodal Generative Chain-of-Thought Role-Play
Imitation Game for Adversarial Disillusion with Multimodal Generative Chain-of-Thought Role-Play
Ching-Chun Chang
Fan-Yun Chen
Shih-Hong Gu
Kai Gao
Hanrui Wang
Isao Echizen
AAML
157
0
0
31 Jan 2025
CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation
CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation
Hwan Heo
Jangyeong Kim
Seongyeong Lee
Jeong A Wi
Junyoung Choi
Sangjun Ahn
52
0
0
17 Jan 2025
Principles for Responsible AI Consciousness Research
Principles for Responsible AI Consciousness Research
Patrick Butlin
Theodoros Lappas
38
1
0
13 Jan 2025
EdgeTAM: On-Device Track Anything Model
EdgeTAM: On-Device Track Anything Model
Chong Zhou
Chenchen Zhu
Yunyang Xiong
Saksham Suri
Fanyi Xiao
...
Raghuraman Krishnamoorthi
Bo Dai
Chen Change Loy
Vikas Chandra
Bilge Soran
VLM
65
0
0
13 Jan 2025
Natural Language Supervision for Low-light Image Enhancement
Natural Language Supervision for Low-light Image Enhancement
Jiahui Tang
Kaihua Zhou
Zhijian Luo
Yueen Hou
43
0
0
11 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
D. Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
104
109
0
10 Jan 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
S. Chen
Yuxiao Luo
Yue Ma
Yu Qiao
Yali Wang
Mamba
42
1
0
08 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
40
28
0
02 Jan 2025
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
X. Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
61
18
0
31 Dec 2024
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
60
24
0
31 Dec 2024
AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
Se Jin Park
Yeonju Kim
Hyeongseop Rha
Bella Godiva
Y. Ro
36
1
0
23 Dec 2024
A Full Transformer-based Framework for Automatic Pain Estimation using
  Videos
A Full Transformer-based Framework for Automatic Pain Estimation using Videos
Stefanos Gkikas
M. Tsiknakis
MedIm
ViT
104
8
0
19 Dec 2024
Towards Generalist Robot Policies: What Matters in Building
  Vision-Language-Action Models
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models
Xinghang Li
Peiyan Li
Minghuan Liu
Dong Wang
Jirong Liu
Bingyi Kang
Xiao Ma
Tao Kong
Hanbo Zhang
Huaping Liu
LM&Ro
97
18
0
18 Dec 2024
A Concept-Centric Approach to Multi-Modality Learning
A Concept-Centric Approach to Multi-Modality Learning
Yuchong Geng
Ao Tang
83
0
0
18 Dec 2024
Advances in Transformers for Robotic Applications: A Review
Advances in Transformers for Robotic Applications: A Review
Nikunj Sanghai
Nik Bear Brown
AI4CE
81
0
0
13 Dec 2024
A Decade of Deep Learning: A Survey on The Magnificent Seven
A Decade of Deep Learning: A Survey on The Magnificent Seven
Dilshod Azizov
Muhammad Arslan Manzoor
Velibor Bojkovic
Yingxu Wang
Zhongqi Wang
...
Liang Li
Siwei Liu
Yu Zhong
Wei Liu
Shangsong Liang
OOD
AI4TS
MedIm
120
0
0
13 Dec 2024
GEXIA: Granularity Expansion and Iterative Approximation for Scalable
  Multi-grained Video-language Learning
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning
Yuanda Wang
Zhikang Zhang
Jue Wang
D. Fan
Zhenlin Xu
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
VLM
82
1
0
10 Dec 2024
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
Guanxing Lu
Tengbo Yu
Haoyuan Deng
Season Si Chen
Yansong Tang
Ziwei Wang
77
3
0
09 Dec 2024
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following
  Models Need for Efficient Generation
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
Ao Wang
Hui Chen
Jianchao Tan
Kaipeng Zhang
Xunliang Cai
Zijia Lin
J. Han
Guiguang Ding
VLM
77
3
0
04 Dec 2024
SIL-RRT*: Learning Sampling Distribution through Self Imitation Learning
SIL-RRT*: Learning Sampling Distribution through Self Imitation Learning
Xuzhe Dang
Stefan Edelkamp
66
0
0
26 Nov 2024
Solaris: A Foundation Model of the Sun
Solaris: A Foundation Model of the Sun
Harris Abdul Majid
Pietro Sittoni
Francesco Tudisco
64
0
0
25 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
75
0
0
24 Nov 2024
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
110
3
0
22 Nov 2024
SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers
SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers
Shravan Venkatraman
Jaskaran Singh Walia
J. Raheja
ViT
33
0
0
14 Nov 2024
NeuralDEM -- Real-time Simulation of Industrial Particulate Flows
NeuralDEM -- Real-time Simulation of Industrial Particulate Flows
Benedikt Alkin
Tobias Kronlachner
Samuele Papa
Stefan Pirker
Thomas Lichtenegger
Johannes Brandstetter
PINN
AI4CE
54
1
1
14 Nov 2024
Moving Off-the-Grid: Scene-Grounded Video Representations
Moving Off-the-Grid: Scene-Grounded Video Representations
Sjoerd van Steenkiste
Daniel Zoran
Yi Yang
Yulia Rubanova
Rishabh Kabra
...
Thomas Keck
João Carreira
Alexey Dosovitskiy
Mehdi S. M. Sajjadi
Thomas Kipf
31
3
0
08 Nov 2024
Wave Network: An Ultra-Small Language Model
Wave Network: An Ultra-Small Language Model
Xin Zhang
Victor S. Sheng
39
1
0
04 Nov 2024
Adaptive Length Image Tokenization via Recurrent Allocation
Adaptive Length Image Tokenization via Recurrent Allocation
Shivam Duggal
Phillip Isola
Antonio Torralba
William T. Freeman
VLM
37
5
0
04 Nov 2024
PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary
  Views
PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views
Xin Fei
Wenzhao Zheng
Yueqi Duan
W. Zhan
M. Tomizuka
Kurt Keutzer
Jiwen Lu
3DGS
30
3
0
24 Oct 2024
PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding
PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding
Vinh Nguyen
3DV
21
0
0
22 Oct 2024
ARCADE: Scalable Demonstration Collection and Generation via Augmented
  Reality for Imitation Learning
ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning
Yue Yang
Bryce Ikeda
Gedas Bertasius
D. Szafir
28
4
0
21 Oct 2024
SEA: State-Exchange Attention for High-Fidelity Physics Based
  Transformers
SEA: State-Exchange Attention for High-Fidelity Physics Based Transformers
Parsa Esmati
Amirhossein Dadashzadeh
Vahid Goodarzi
Nicolas Larrosa
Nicolo Grilli
27
0
0
20 Oct 2024
Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation
Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation
Jiayu Xiong
Jing Wang
Hengjing Xiang
Jun Xue
Chen Xu
Zhouqiang Jiang
30
0
0
20 Oct 2024
AugInsert: Learning Robust Visual-Force Policies via Data Augmentation
  for Object Assembly Tasks
AugInsert: Learning Robust Visual-Force Policies via Data Augmentation for Object Assembly Tasks
Ryan Diaz
Adam Imdieke
Vivek Veeriah
Karthik Desingh
23
0
0
19 Oct 2024
Previous
12345...121314
Next