Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.03206
Cited By
Perceiver: General Perception with Iterative Attention
4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Perceiver: General Perception with Iterative Attention"
50 / 682 papers shown
Title
Benchmarking Online Sequence-to-Sequence and Character-based Handwriting Recognition from IMU-Enhanced Pens
Felix Ott
David Rügamer
Lucas Heublein
Tim Hamann
Jens Barth
Bernd Bischl
Christopher Mutschler
22
17
0
14 Feb 2022
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
35
836
0
07 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
53
850
0
07 Feb 2022
Webly Supervised Concept Expansion for General Purpose Vision Models
Amita Kamath
Christopher Clark
Tanmay Gupta
Eric Kolve
Derek Hoiem
Aniruddha Kembhavi
VLM
29
54
0
04 Feb 2022
Exploring Transformer Backbones for Heterogeneous Treatment Effect Estimation
Yi-Fan Zhang
Hanlin Zhang
Zachary Chase Lipton
Li Erran Li
Eric P. Xing
OODD
24
29
0
02 Feb 2022
Learning Super-Features for Image Retrieval
Philippe Weinzaepfel
Thomas Lucas
Diane Larlus
Yannis Kalantidis
SupR
VLM
33
45
0
31 Jan 2022
Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices
Mikolaj Malkiñski
Jacek Mańdziuk
120
41
0
28 Jan 2022
From data to functa: Your data point is a function and you can treat it like one
Emilien Dupont
Hyunjik Kim
S. M. Ali Eslami
Danilo Jimenez Rezende
Dan Rosenbaum
TDI
3DPC
178
139
0
28 Jan 2022
Density-Aware Hyper-Graph Neural Networks for Graph-based Semi-supervised Node Classification
Jianpeng Liao
Qian Tao
Jun Yan
GNN
28
3
0
27 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
223
225
0
20 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
Latency Adjustable Transformer Encoder for Language Understanding
Sajjad Kachuee
M. Sharifkhani
29
0
0
10 Jan 2022
Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
S. Song
Li Erran Li
Gao Huang
ViT
33
456
0
03 Jan 2022
SeMask: Semantically Masked Transformers for Semantic Segmentation
Jitesh Jain
Anukriti Singh
Nikita Orlov
Zilong Huang
Jiachen Li
Steven Walton
Humphrey Shi
ViT
29
92
0
23 Dec 2021
Learned Queries for Efficient Local Attention
Moab Arar
Ariel Shamir
Amit H. Bermano
ViT
38
29
0
21 Dec 2021
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
150
14,641
0
20 Dec 2021
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds
Ayush Jain
N. Gkanatsios
Ishita Mediratta
Katerina Fragkiadaki
ObjD
23
99
0
16 Dec 2021
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
18
37
0
08 Dec 2021
Input-level Inductive Biases for 3D Reconstruction
Yifan Wang
Carl Doersch
Relja Arandjelović
João Carreira
Andrew Zisserman
3DV
45
24
0
06 Dec 2021
Hybrid Instance-aware Temporal Fusion for Online Video Instance Segmentation
Xiang Li
Jinglu Wang
Xiao Li
Yan Lu
35
19
0
03 Dec 2021
Efficient Self-Ensemble for Semantic Segmentation
Walid Bousselham
Guillaume Thibault
Lucas Pagano
Archana Machireddy
Joe W. Gray
Y. Chang
Xubo B. Song
ViT
33
24
0
26 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
35
73
0
25 Nov 2021
Conditional Object-Centric Learning from Video
Thomas Kipf
Gamaleldin F. Elsayed
Aravindh Mahendran
Austin Stone
S. Sabour
G. Heigold
Rico Jonschkowski
Alexey Dosovitskiy
Klaus Greff
OCL
41
214
0
24 Nov 2021
Sparse Fusion for Multimodal Transformers
Yi Ding
Alex Rich
Mason Wang
Noah Stier
M. Turk
P. Sen
Tobias Höllerer
ViT
27
7
0
23 Nov 2021
Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture
Daria Bakshandaeva
Denis Dimitrov
V.Ya. Arkhipkin
Alex Shonenkov
M. Potanin
...
Mikhail Martynov
Anton Voronov
Vera Davydova
E. Tutubalina
Aleksandr Petiushko
33
0
0
22 Nov 2021
Rethinking Query, Key, and Value Embedding in Vision Transformer under Tiny Model Constraints
Jaesin Ahn
Jiuk Hong
Jeongwoo Ju
Heechul Jung
ViT
32
3
0
19 Nov 2021
Edge-Native Intelligence for 6G Communications Driven by Federated Learning: A Survey of Trends and Challenges
Mohammad M. Al-Quraan
Lina S. Mohjazi
Lina Bariah
A. Centeno
A. Zoha
Sami Muhaidat
Mérouane Debbah
Muhammad Ali Imran
22
62
0
14 Nov 2021
Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention
S. Tan
Runpei Dong
Kaisheng Ma
22
2
0
03 Nov 2021
With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition
Evangelos Kazakos
Jaesung Huh
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
42
45
0
01 Nov 2021
Hyper-Representations: Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction
Konstantin Schurholt
Dimche Kostadinov
Damian Borth
SSL
23
14
0
28 Oct 2021
SOFT: Softmax-free Transformer with Linear Complexity
Jiachen Lu
Jinghan Yao
Junge Zhang
Martin Danelljan
Hang Xu
Weiguo Gao
Chunjing Xu
Thomas B. Schon
Li Zhang
18
161
0
22 Oct 2021
Inductive Biases and Variable Creation in Self-Attention Mechanisms
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Cyril Zhang
27
116
0
19 Oct 2021
BERMo: What can BERT learn from ELMo?
Sangamesh Kodge
Kaushik Roy
38
3
0
18 Oct 2021
EncT5: A Framework for Fine-tuning T5 as Non-autoregressive Models
Frederick Liu
T. Huang
Shihang Lyu
Siamak Shakeri
Hongkun Yu
Jing Li
36
8
0
16 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
232
1,024
0
13 Oct 2021
Two-argument activation functions learn soft XOR operations like cortical neurons
Kijung Yoon
Emin Orhan
Juhyeon Kim
Xaq Pitkow
MLT
35
0
0
13 Oct 2021
Dynamic Inference with Neural Interpreters
Nasim Rahaman
Muhammad Waleed Gondal
S. Joshi
Peter V. Gehler
Yoshua Bengio
Francesco Locatello
Bernhard Schölkopf
34
31
0
12 Oct 2021
Efficient Training of Audio Transformers with Patchout
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
ViT
32
252
0
11 Oct 2021
Recurrent Attention Models with Object-centric Capsule Representation for Multi-object Recognition
Hossein Adeli
Seoyoung Ahn
G. Zelinsky
OCL
23
3
0
11 Oct 2021
Cross-lingual Transfer of Monolingual Models
Evangelia Gogoulou
Ariel Ekgren
T. Isbister
Magnus Sahlgren
29
17
0
15 Sep 2021
Patch-based Medical Image Segmentation using Matrix Product State Tensor Networks
Raghavendra Selvan
Erik Dam
Soren Alexander Flensborg
Jens Petersen
MedIm
27
2
0
15 Sep 2021
The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning
Yujin Tang
David R Ha
24
75
0
07 Sep 2021
∞
\infty
∞
-former: Infinite Memory Transformer
Pedro Henrique Martins
Zita Marinho
André F. T. Martins
30
11
0
01 Sep 2021
Transformers predicting the future. Applying attention in next-frame and time series forecasting
Radostin Cholakov
T. Kolev
AI4TS
22
16
0
18 Aug 2021
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
...
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
MLLM
VLM
GNN
20
565
0
30 Jul 2021
Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers
Piper Wolters
Logan Sizemore
Chris Daw
Brian Hutchinson
Lauren A. Phillips
29
11
0
28 Jul 2021
A3GC-IP: Attention-Oriented Adjacency Adaptive Recurrent Graph Convolutions for Human Pose Estimation from Sparse Inertial Measurements
Patrik Puchert
Timo Ropinski
3DH
19
3
0
23 Jul 2021
Sequence-to-Sequence Piano Transcription with Transformers
Curtis Hawthorne
Ian Simon
Rigel Swavely
Ethan Manilow
Jesse Engel
32
82
0
19 Jul 2021
Long Short-Term Transformer for Online Action Detection
Mingze Xu
Yuanjun Xiong
Hao Chen
Xinyu Li
Wei Xia
Z. Tu
Stefano Soatto
ViT
32
130
0
07 Jul 2021
Long-Short Transformer: Efficient Transformers for Language and Vision
Chen Zhu
Ming-Yu Liu
Chaowei Xiao
M. Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
ViT
VLM
29
131
0
05 Jul 2021
Previous
1
2
3
...
12
13
14
Next