ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03206
  4. Cited By
Perceiver: General Perception with Iterative Attention

Perceiver: General Perception with Iterative Attention

4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
    VLM
    ViT
    MDE
ArXivPDFHTML

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 682 papers shown
Title
RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe
  Completion using Cascaded Set Transformer
RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set Transformer
Mogan Gim
Donghee Choi
Kana Maruyama
Jihun Choi
Hajung Kim
Donghyeon Park
Jaewoo Kang
40
5
0
14 Oct 2022
Sparse in Space and Time: Audio-visual Synchronisation with Trainable
  Selectors
Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
28
20
0
13 Oct 2022
A Generalist Framework for Panoptic Segmentation of Images and Videos
A Generalist Framework for Panoptic Segmentation of Images and Videos
Ting-Li Chen
Lala Li
Saurabh Saxena
Geoffrey E. Hinton
David J. Fleet
VGen
MLLM
43
102
0
12 Oct 2022
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
Ling Li
D. Thorsley
Joseph Hassoun
ViT
25
17
0
11 Oct 2022
Turbo Training with Token Dropout
Turbo Training with Token Dropout
Tengda Han
Weidi Xie
Andrew Zisserman
ViT
31
10
0
10 Oct 2022
SCAM! Transferring humans between images with Semantic Cross Attention
  Modulation
SCAM! Transferring humans between images with Semantic Cross Attention Modulation
Nicolas Dufour
David Picard
Vicky Kalogeiton
51
13
0
10 Oct 2022
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval
A. Fragomeni
Michael Wray
Dima Damen
CLIP
ViT
25
3
0
09 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering
  via Decoupling Spatial-Temporal Modeling
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
27
2
0
08 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
28
335
0
06 Oct 2022
SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB
  image
SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB image
Florian Langer
Gwangbin Bae
Ignas Budvytis
R. Cipolla
3DPC
47
10
0
03 Oct 2022
Benign Autoencoders
Benign Autoencoders
Semyon Malamud
Teng Andrea Xu
Antoine Didisheim
DRL
AI4CE
14
0
0
02 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
35
120
0
02 Oct 2022
Construction and Evaluation of a Self-Attention Model for Semantic
  Understanding of Sentence-Final Particles
Construction and Evaluation of a Self-Attention Model for Semantic Understanding of Sentence-Final Particles
Shuhei Mandokoro
N. Oka
Akane Matsushima
Chie Fukada
Yuko Yoshimura
Koji Kawahara
Kazuaki Tanaka
20
1
0
01 Oct 2022
Cascaded Multi-Modal Mixing Transformers for Alzheimer's Disease
  Classification with Incomplete Data
Cascaded Multi-Modal Mixing Transformers for Alzheimer's Disease Classification with Incomplete Data
Linfeng Liu
Siyu Liu
Lu Zhang
X. To
F. Nasrallah
Shekhar S. Chandra
MedIm
34
52
0
01 Oct 2022
Real-time Online Video Detection with Temporal Smoothing Transformers
Real-time Online Video Detection with Temporal Smoothing Transformers
Yue Zhao
Philipp Krahenbuhl
ViT
69
57
0
19 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation
Distribution Aware Metrics for Conditional Natural Language Generation
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
45
4
0
15 Sep 2022
Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?
Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?
Yi Wang
Zhiwen Fan
Tianlong Chen
Hehe Fan
Zhangyang Wang
ViT
53
9
0
15 Sep 2022
A patch-based architecture for multi-label classification from single
  label annotations
A patch-based architecture for multi-label classification from single label annotations
Warren Jouanneau
Aurélie Bugeau
Marc Palyart
Nicolas Papadakis
Laurent Vézard
28
0
0
14 Sep 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
D. Fox
LM&Ro
163
457
0
12 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
18
60
0
07 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
28
109
0
31 Aug 2022
A Circular Window-based Cascade Transformer for Online Action Detection
A Circular Window-based Cascade Transformer for Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
42
6
0
30 Aug 2022
Improving Small Molecule Generation using Mutual Information Machine
Improving Small Molecule Generation using Mutual Information Machine
Daniel A. Reidenbach
M. Livne
Rajesh Ilango
M. Gill
Johnny Israeli
28
14
0
18 Aug 2022
Efficient Multimodal Transformer with Dual-Level Feature Restoration for
  Robust Multimodal Sentiment Analysis
Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis
Guoying Zhao
Zheng Lian
B. Liu
Jianhua Tao
32
47
0
16 Aug 2022
Teacher Guided Training: An Efficient Framework for Knowledge Transfer
Teacher Guided Training: An Efficient Framework for Knowledge Transfer
Manzil Zaheer
A. S. Rawat
Seungyeon Kim
Chong You
Himanshu Jain
Andreas Veit
Rob Fergus
Surinder Kumar
VLM
16
2
0
14 Aug 2022
Learning to Generalize with Object-centric Agents in the Open World
  Survival Game Crafter
Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter
Aleksandar Stanić
Yujin Tang
David R Ha
Jürgen Schmidhuber
ELM
29
13
0
05 Aug 2022
COPER: Continuous Patient State Perceiver
COPER: Continuous Patient State Perceiver
V. Chauhan
Anshul Thakur
Odhran O'Donoghue
David A. Clifton
AI4TS
OOD
30
5
0
05 Aug 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides
  Representations and Explorations
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
18
8
0
04 Aug 2022
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
  Cloud Learning
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning
Mahdi Saleh
Yige Wang
Nassir Navab
Benjamin Busam
F. Tombari
3DPC
26
3
0
31 Jul 2022
UAVM: Towards Unifying Audio and Visual Models
UAVM: Towards Unifying Audio and Visual Models
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
30
21
0
29 Jul 2022
Depth Field Networks for Generalizable Multi-view Scene Representation
Depth Field Networks for Generalizable Multi-view Scene Representation
Vitor Campagnolo Guizilini
Igor Vasiljevic
Jiading Fang
Rares Ambrus
G. Shakhnarovich
Matthew R. Walter
Adrien Gaidon
3DV
MDE
32
15
0
28 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Temporal and cross-modal attention for audio-visual zero-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
38
25
0
20 Jul 2022
Residual and Attentional Architectures for Vector-Symbols
Residual and Attentional Architectures for Vector-Symbols
W. Olin-Ammentorp
22
3
0
18 Jul 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer
  to Unlabeled Modality
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
Wei-Ning Hsu
Bowen Shi
SSL
VLM
24
41
0
14 Jul 2022
Transformer-based Context Condensation for Boosting Feature Pyramids in
  Object Detection
Transformer-based Context Condensation for Boosting Feature Pyramids in Object Detection
Zhe Chen
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
19
11
0
14 Jul 2022
MM-ALT: A Multimodal Automatic Lyric Transcription System
MM-ALT: A Multimodal Automatic Lyric Transcription System
Xiangming Gu
Longshen Ou
Danielle Ong
Ye Wang
11
13
0
13 Jul 2022
Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
Nigamaa Nayakanti
Rami Al-Rfou
Aurick Zhou
Kratarth Goel
Khaled S. Refaat
Benjamin Sapp
AI4TS
42
235
0
12 Jul 2022
MaiT: Leverage Attention Masks for More Efficient Image Transformers
MaiT: Leverage Attention Masks for More Efficient Image Transformers
Ling Li
Ali Shafiee Ardestani
Joseph Hassoun
14
1
0
06 Jul 2022
Pure Transformers are Powerful Graph Learners
Pure Transformers are Powerful Graph Learners
Jinwoo Kim
Tien Dat Nguyen
Seonwoo Min
Sungjun Cho
Moontae Lee
Honglak Lee
Seunghoon Hong
43
189
0
06 Jul 2022
Softmax-free Linear Transformers
Softmax-free Linear Transformers
Jiachen Lu
Junge Zhang
Xiatian Zhu
Jianfeng Feng
Tao Xiang
Li Zhang
ViT
16
7
0
05 Jul 2022
Conditioned Human Trajectory Prediction using Iterative Attention Blocks
Conditioned Human Trajectory Prediction using Iterative Attention Blocks
A. Postnikov
A. Gamayunov
Gonzalo Ferrer
10
3
0
29 Jun 2022
Deformable Graph Transformer
Deformable Graph Transformer
Jinyoung Park
Seongjun Yun
Hyeon-ju Park
Jaewoo Kang
Jisu Jeong
KyungHyun Kim
Jung-Woo Ha
Hyunwoo J. Kim
90
7
0
29 Jun 2022
A Unified Sequence Interface for Vision Tasks
A Unified Sequence Interface for Vision Tasks
Ting-Li Chen
Saurabh Saxena
Lala Li
Nayeon Lee
David J. Fleet
Geoffrey E. Hinton
VLM
MLLM
16
148
0
15 Jun 2022
Human Eyes Inspired Recurrent Neural Networks are More Robust Against
  Adversarial Noises
Human Eyes Inspired Recurrent Neural Networks are More Robust Against Adversarial Noises
Minkyu Choi
Yizhen Zhang
Kuan Han
Xiaokai Wang
Zhongming Liu
AAML
GAN
35
4
0
15 Jun 2022
It's Time for Artistic Correspondence in Music and Video
It's Time for Artistic Correspondence in Music and Video
Dídac Surís
Carl Vondrick
Bryan C. Russell
Justin Salamon
16
37
0
14 Jun 2022
Peripheral Vision Transformer
Peripheral Vision Transformer
Juhong Min
Yucheng Zhao
Chong Luo
Minsu Cho
ViT
MDE
32
30
0
14 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
60
527
0
13 Jun 2022
Bringing Image Scene Structure to Video via Frame-Clip Consistency of
  Object Tokens
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Elad Ben-Avraham
Roei Herzig
K. Mangalam
Amir Bar
Anna Rohrbach
Leonid Karlinsky
Trevor Darrell
Amir Globerson
19
0
0
13 Jun 2022
ChordMixer: A Scalable Neural Attention Model for Sequences with
  Different Lengths
ChordMixer: A Scalable Neural Attention Model for Sequences with Different Lengths
Ruslan Khalitov
Tong Yu
Lei Cheng
Zhirong Yang
25
12
0
12 Jun 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional
  MoEs
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Jinguo Zhu
Xizhou Zhu
Wenhai Wang
Xiaohua Wang
Hongsheng Li
Xiaogang Wang
Jifeng Dai
MoMe
MoE
21
66
0
09 Jun 2022
Previous
123...1011121314
Next