ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03206
  4. Cited By
Perceiver: General Perception with Iterative Attention

Perceiver: General Perception with Iterative Attention

4 March 2021
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
    VLM
    ViT
    MDE
ArXivPDFHTML

Papers citing "Perceiver: General Perception with Iterative Attention"

50 / 682 papers shown
Title
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
48
115
0
18 May 2023
Discffusion: Discriminative Diffusion Models as Few-shot Vision and
  Language Learners
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners
Xuehai He
Weixi Feng
Tsu-jui Fu
Varun Jampani
Arjun Reddy Akula
P. Narayana
Sugato Basu
William Yang Wang
Qing Guo
DiffM
52
7
0
18 May 2023
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Zhenhailong Wang
Ansel Blume
Sha Li
Genglin Liu
Jaemin Cho
Zineng Tang
Joey Tianyi Zhou
Heng Ji
KELM
VGen
25
26
0
18 May 2023
Soft Prompt Decoding for Multilingual Dense Retrieval
Soft Prompt Decoding for Multilingual Dense Retrieval
Zhiqi Huang
Hansi Zeng
Hamed Zamani
James Allan
RALM
63
13
0
15 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
33
22
0
12 May 2023
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
L. Yu
Daniel Simig
Colin Flaherty
Armen Aghajanyan
Luke Zettlemoyer
M. Lewis
32
84
0
12 May 2023
Musketeer: Joint Training for Multi-task Vision Language Model with Task
  Explanation Prompts
Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts
Zhaoyang Zhang
Yantao Shen
Kunyu Shi
Zhaowei Cai
Jun Fang
Siqi Deng
Hao Yang
Davide Modolo
Z. Tu
Stefano Soatto
VLM
28
2
0
11 May 2023
Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image
  Classification Using Transformers
Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers
Firas Khader
Jakob Nikolas Kather
T. Han
S. Nebelung
Christiane Kuhl
Johannes Stegmaier
Daniel Truhn
MedIm
ViT
11
1
0
11 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
32
85
0
02 May 2023
BenchMD: A Benchmark for Unified Learning on Medical Images and Sensors
BenchMD: A Benchmark for Unified Learning on Medical Images and Sensors
Kathryn Wantlin
Chenwei Wu
Shih-Cheng Huang
Oishi Banerjee
Farah Z. Dadabhoy
...
A. Adamson
Laura Heacock
G. Tison
Alex Tamkin
Pranav Rajpurkar
SSL
OOD
38
2
0
17 Apr 2023
Modeling Dense Multimodal Interactions Between Biological Pathways and
  Histology for Survival Prediction
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction
Guillaume Jaume
Anurag J. Vaidya
Richard J. Chen
Drew F. K. Williamson
Paul Pu Liang
Faisal Mahmood
41
43
0
13 Apr 2023
Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention
  and Residual Connection in Kernel Space
Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention and Residual Connection in Kernel Space
Seokju Yun
Youngmin Ro
ViT
27
2
0
13 Apr 2023
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
Ziteng Gao
Zhan Tong
Limin Wang
Mike Zheng Shou
33
9
0
07 Apr 2023
Attention: Marginal Probability is All You Need?
Attention: Marginal Probability is All You Need?
Ryan Singh
Christopher L. Buckley
31
2
0
07 Apr 2023
SLM: End-to-end Feature Selection via Sparse Learnable Masks
SLM: End-to-end Feature Selection via Sparse Learnable Masks
Yihe Dong
Sercan Ö. Arik
32
3
0
06 Apr 2023
Scalable and Accurate Self-supervised Multimodal Representation Learning
  without Aligned Video and Text Data
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Vladislav Lialin
Stephen Rawls
David M. Chan
Shalini Ghosh
Anna Rumshisky
Wael Hamza
VLM
AI4TS
28
6
0
04 Apr 2023
Monocular 3D Object Detection with Bounding Box Denoising in 3D by
  Perceiver
Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver
Xianpeng Liu
Ce Zheng
K. Cheng
Nan Xue
Guo-Jun Qi
Tianfu Wu
3DPC
31
6
0
03 Apr 2023
FinderNet: A Data Augmentation Free Canonicalization aided Loop
  Detection and Closure technique for Point clouds in 6-DOF separation
FinderNet: A Data Augmentation Free Canonicalization aided Loop Detection and Closure technique for Point clouds in 6-DOF separation
Sudarshan S. Harithas
Gurkirat Singh
Aneesh Chavan
Sarthak Sharma
Suraj Patni
Chetan Arora
K. M. Krishna
3DPC
29
3
0
03 Apr 2023
Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency
  Department
Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency Department
Sabri Boughorbel
Fethi Jarray
Abdulaziz Yousuf Al-Homaid
Rashid Niaz
Khalid Alyafei
26
0
0
03 Apr 2023
Towards Flexible Multi-modal Document Models
Towards Flexible Multi-modal Document Models
Naoto Inoue
Kotaro Kikuchi
E. Simo-Serra
Mayu Otani
Kota Yamaguchi
42
20
0
31 Mar 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
24
43
0
31 Mar 2023
Multi-modal learning for geospatial vegetation forecasting
Multi-modal learning for geospatial vegetation forecasting
V. Benson
Claire Robin
C. Requena-Mesa
Lazaro Alonso
Nuno Carvalhais
José A. Cortés
Zhihan Gao
Nora Linscheid
M. Weynants
Markus Reichstein
30
11
0
28 Mar 2023
Object Discovery from Motion-Guided Tokens
Object Discovery from Motion-Guided Tokens
Zhipeng Bao
P. Tokmakov
Yu-xiong Wang
Adrien Gaidon
M. Hebert
OCL
43
20
0
27 Mar 2023
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents
Tenglong Ao
Zeyi Zhang
Libin Liu
DiffM
VGen
72
145
0
26 Mar 2023
ViPFormer: Efficient Vision-and-Pointcloud Transformer for Unsupervised
  Pointcloud Understanding
ViPFormer: Efficient Vision-and-Pointcloud Transformer for Unsupervised Pointcloud Understanding
Hongyu Sun
Yongcai Wang
Xudong Cai
Xuewei Bai
Deying Li
ViT
3DPC
24
8
0
25 Mar 2023
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained
  Experts
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts
Kastan Day
D. Christl
Rohan Salvi
Pranav Sriram
ViT
27
1
0
24 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual
  Transformers
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
32
1
0
21 Mar 2023
Towards End-to-End Generative Modeling of Long Videos with
  Memory-Efficient Bidirectional Transformers
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers
Jaehoon Yoo
Semin Kim
Doyup Lee
Chiheon Kim
Seunghoon Hong
31
3
0
20 Mar 2023
Unified Visual Relationship Detection with Vision and Language Models
Unified Visual Relationship Detection with Vision and Language Models
Long Zhao
Liangzhe Yuan
Boqing Gong
Huayu Chen
Florian Schroff
Ming Yang
Hartwig Adam
Ting Liu
ObjD
32
9
0
16 Mar 2023
Relax, it doesn't matter how you get there: A new self-supervised
  approach for multi-timescale behavior analysis
Relax, it doesn't matter how you get there: A new self-supervised approach for multi-timescale behavior analysis
Mehdi Azabou
Michael J. Mendelson
Nauman Ahad
Maks Sorokin
S. Thakoor
Carolina Urzay
Eva L. Dyer
30
4
0
15 Mar 2023
Making Vision Transformers Efficient from A Token Sparsification View
Making Vision Transformers Efficient from A Token Sparsification View
Shuning Chang
Pichao Wang
Ming Lin
Fan Wang
David Junhao Zhang
Rong Jin
Mike Zheng Shou
ViT
45
24
0
15 Mar 2023
Brain Diffuser: An End-to-End Brain Image to Brain Network Pipeline
Brain Diffuser: An End-to-End Brain Image to Brain Network Pipeline
Xuhang Chen
Baiying Lei
Chi-Man Pun
Shuqiang Wang
MedIm
DiffM
32
21
0
11 Mar 2023
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation
Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation
Qichen Fu
Xingyu Liu
Ran Xu
Juan Carlos Niebles
Kris M. Kitani
ViT
29
13
0
09 Mar 2023
Sample Efficient Multimodal Semantic Augmentation for Incremental
  Summarization
Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization
Sumanta Bhattacharyya
R. Manuvinakurike
Sahisnu Mazumder
Saurav Sahay
VLM
18
0
0
08 Mar 2023
Where We Are and What We're Looking At: Query Based Worldwide Image
  Geo-localization Using Hierarchies and Scenes
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
Brandon Clark
Alec Kerrigan
P. Kulkarni
V. Cepeda
M. Shah
19
21
0
07 Mar 2023
A Light-Weight Contrastive Approach for Aligning Human Pose Sequences
A Light-Weight Contrastive Approach for Aligning Human Pose Sequences
R. Collins
3DH
16
1
0
07 Mar 2023
Your representations are in the network: composable and parallel
  adaptation for large scale models
Your representations are in the network: composable and parallel adaptation for large scale models
Yonatan Dukler
Alessandro Achille
Hao Yang
Varsha Vivek
L. Zancato
Benjamin Bowman
Avinash Ravichandran
Charless C. Fowlkes
A. Swaminathan
Stefano Soatto
28
3
0
07 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
44
21
0
04 Mar 2023
AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
  Reasoning
AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning
Xijun Wang
Ruiqi Xian
Tianrui Guan
Celso M. de Melo
Stephen M. Nogar
Aniket Bera
Tianyi Zhou
16
11
0
02 Mar 2023
Directed Diffusion: Direct Control of Object Placement through Attention
  Guidance
Directed Diffusion: Direct Control of Object Placement through Attention Guidance
W. Ma
J. P. Lewis
Avisek Lahiri
Thomas Leung
W. Kleijn
DiffM
16
65
0
25 Feb 2023
Language-Driven Representation Learning for Robotics
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Percy Liang
LM&Ro
SSL
47
145
0
24 Feb 2023
Optical Transformers
Optical Transformers
Maxwell G. Anderson
Shifan Ma
Tianyu Wang
Logan G. Wright
Peter L. McMahon
20
20
0
20 Feb 2023
TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual
  Vision Transformer for Fast Arbitrary One-Shot Image Generation
TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual Vision Transformer for Fast Arbitrary One-Shot Image Generation
Yunliang Jiang
Li Yan
Xiongtao Zhang
Yong-Jin Liu
Da-Song Sun
ViT
29
5
0
16 Feb 2023
Cross-Modal Fine-Tuning: Align then Refine
Cross-Modal Fine-Tuning: Align then Refine
Junhong Shen
Liam Li
Lucio Dery
Corey Staten
M. Khodak
Graham Neubig
Ameet Talwalkar
33
35
0
11 Feb 2023
DNArch: Learning Convolutional Neural Architectures by Backpropagation
DNArch: Learning Convolutional Neural Architectures by Backpropagation
David W. Romero
Neil Zeghidour
AI4CE
21
4
0
10 Feb 2023
Reversible Vision Transformers
Reversible Vision Transformers
K. Mangalam
Haoqi Fan
Yanghao Li
Chaoxiong Wu
Bo Xiong
Christoph Feichtenhofer
Jitendra Malik
ViT
11
45
0
09 Feb 2023
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot
  Image Captioning
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Zhuolin Yang
Ming-Yu Liu
Zihan Liu
V. Korthikanti
Weili Nie
...
Yuke Zhu
M. Shoeybi
Bryan Catanzaro
Chaowei Xiao
Anima Anandkumar
VLM
RALM
34
39
0
09 Feb 2023
Efficient Attention via Control Variates
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
34
18
0
09 Feb 2023
Efficient Joint Learning for Clinical Named Entity Recognition and
  Relation Extraction Using Fourier Networks: A Use Case in Adverse Drug Events
Efficient Joint Learning for Clinical Named Entity Recognition and Relation Extraction Using Fourier Networks: A Use Case in Adverse Drug Events
A. Yazdani
D. Proios
H. Rouhizadeh
Douglas Teodoro
21
7
0
08 Feb 2023
Multi-View Masked World Models for Visual Robotic Manipulation
Multi-View Masked World Models for Visual Robotic Manipulation
Younggyo Seo
Junsup Kim
Stephen James
Kimin Lee
Jinwoo Shin
Pieter Abbeel
VGen
25
55
0
05 Feb 2023
Previous
123...8910...121314
Next