Title
From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation Xin Liu Chao Hao Zitong Yu Huanjing Yue Jingyu Yang 45 1 0 05 Aug 2024
Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow Philip Wiese Gamze İslamoğlu Moritz Scherer Luka Macan Victor J. B. Jung Luca Bompani Francesco Conti Luca Benini 47 0 0 05 Aug 2024
Unsupervised Representation Learning by Balanced Self Attention Matching Daniel Shalam Simon Korman SSL 43 0 0 04 Aug 2024
What Happens Without Background? Constructing Foreground-Only Data for Fine-Grained Tasks Yuetian Wang W. Hou Qinmu Peng Xinge You 47 0 0 04 Aug 2024
Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers Weijie Zheng Xingjun Ma Hanxun Huang Zuxuan Wu Yu-Gang Jiang AAML 45 0 0 03 Aug 2024
POA: Pre-training Once for Models of All Sizes Yingying Zhang Xin Guo Jiangwei Lao Lei Yu Lixiang Ru Jian Wang Guo Ye Huimei He Jingdong Chen Ming Yang 78 1 0 02 Aug 2024
Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology Eric Zimmermann Eugene Vorontsov Julian Viret Adam Casson Michal Zelechowski ... Razik Yousfi Thomas J. Fuchs Nicolò Fusi Siqi Liu Kristen Severson MedIm 51 30 0 01 Aug 2024
Privacy-preserving datasets by capturing feature distributions with Conditional VAEs Francesco Di Salvo David Tafler Sebastian Doerrich Christian Ledig CML 45 0 0 01 Aug 2024
Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation Cephas Mpungu Qiyuan Chen Xiaoye Qu Jiashuo Sun G. Mapp VLM RALM LRM 46 16 0 01 Aug 2024
IN-Sight: Interactive Navigation through Sight Philipp Schoch Fan Yang Yuntao Ma Stefan Leutenegger Marco Hutter Quentin Leboutet 49 3 0 01 Aug 2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? Richard Ren Steven Basart Adam Khoja Alice Gatti Long Phan ... Alexander Pan Gabriel Mukobi Ryan H. Kim Stephen Fitz Dan Hendrycks ELM 36 22 0 31 Jul 2024
EZSR: Event-based Zero-Shot Recognition Yan Yang Sehwan Kim Dongxu Li Y. Sun 43 0 0 31 Jul 2024
Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2 Lv Tang Bo Li VLM 40 7 0 31 Jul 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models Ming-Kuan Wu Xinyue Cai Jiayi Ji Jiale Li Oucheng Huang Gen Luo Hao Fei Xiaoshuai Sun Rongrong Ji MLLM 65 7 0 31 Jul 2024
StreetSurfaceVis: a dataset of crowdsourced street-level imagery with semi-automated annotations of road surface type and quality Alexandra Kapp Edith Hoffmann Esther Weigmann Helena Mihaljević 35 1 0 31 Jul 2024
Small Object Few-shot Segmentation for Vision-based Industrial Inspection Zilong Zhang Chang Niu Yi Lin Jingchi Jiang Xuefeng Chen 49 1 0 31 Jul 2024
Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM Can Wang Hongliang Zhong Menglei Chai Mingming He DongDong Chen Jing Liao LM&Ro 3DV LRM 40 4 0 31 Jul 2024
Segment Anything for Videos: A Systematic Survey Chunhui Zhang Yawen Cui Weilin Lin Guanjie Huang Yan Rong Li Liu Shiguang Shan VLM 52 6 0 31 Jul 2024
CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning Yuexi Du Brian Chang Nicha Dvornek MedIm VLM 50 2 0 30 Jul 2024
dopanim: A Dataset of Doppelganger Animals with Noisy Annotations from Multiple Humans M. Herde Denis Huseljic Lukas Rauch Bernhard Sick 49 1 0 30 Jul 2024
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities Lorenzo Baraldi Federico Cocchi Marcella Cornia Lorenzo Baraldi Alessandro Nicolosi Rita Cucchiara 43 8 0 29 Jul 2024
Improving 2D Feature Representations by 3D-Aware Fine-Tuning Yuanwen Yue Anurag Das Francis Engelmann Siyu Tang J. E. Lenssen 57 24 0 29 Jul 2024
SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction cCaughan Koksal Ghazal Ghazaei Felix Holm Azade Farshad Nassir Navab MedIm 51 2 0 29 Jul 2024
Theia: Distilling Diverse Vision Foundation Models for Robot Learning Jinghuan Shang Karl Schmeckpeper Brandon B. May M. Minniti Tarik Kelestemur David Watkins Laura Herlant VLM 41 23 0 29 Jul 2024
ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 Wenjun Huang Jiakai Pan Jiahao Tang Yanyu Ding Yifei Xing Yuhe Wang Zhengzhuo Wang Jianguo Hu Mamba 58 5 0 29 Jul 2024
Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets Muhammad Abdullah Jamal Omid Mohareri 49 1 0 29 Jul 2024
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions Ashkan Taghipour Morteza Ghahremani Bennamoun Aref Miri Rekavandi Zinuo Li Hamid Laga F. Boussaïd VGen 84 2 0 27 Jul 2024
PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery Fernando Julio Cendra Bingchen Zhao Kai Han VLM CLL 56 6 0 26 Jul 2024
SHIC: Shape-Image Correspondences with no Keypoint Supervision Aleksandar Shtedritski Christian Rupprecht Andrea Vedaldi 3DPC 3DH 3DV 35 3 0 26 Jul 2024
QT-TDM: Planning with Transformer Dynamics Model and Autoregressive Q-Learning Mostafa Kotb C. Weber Muhammad Burhan Hafez Stefan Wermter 46 1 0 26 Jul 2024
From 2D to 3D: AISG-SLA Visual Localization Challenge Jialin Gao Bill Ong Darld Lwi Zhen Hao Ng Xun Wei Yee ... Johan Edstedt Kirill Brodt Clémentin Boittiaux Maxime Ferrera S. Konev 22 0 0 26 Jul 2024
Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation Jingjun Yi Qi Bi Hao Zheng Haolan Zhan Wei Ji Yawen Huang Yuexiang Li Yefeng Zheng 43 12 0 26 Jul 2024
Trajectory-aligned Space-time Tokens for Few-shot Action Recognition Pulkit Kumar Namitha Padmanabhan Luke Luo Sai Saketh Rambhatla Abhinav Shrivastava 50 4 0 25 Jul 2024
Automated Ensemble Multimodal Machine Learning for Healthcare F. Imrie Stefan Denner Lucas S. Brunschwig Klaus H. Maier-Hein M. Schaar 29 2 1 25 Jul 2024
IRIS: Wireless Ring for Vision-based Smart Home Interaction Maruchi Kim Antonio Glenn Bandhav Veluri Yunseo Lee Eyoel Gebre Aditya Bagaria Shwetak Patel Shyamnath Gollakota 31 3 0 25 Jul 2024
The Curious Case of Representational Alignment: Unravelling Visio-Linguistic Tasks in Emergent Communication Tom Kouwenhoven Max Peeperkorn Bram van Dijk Tessa Verhoef 34 3 0 25 Jul 2024
Unified Lexical Representation for Interpretable Visual-Language Alignment Yifan Li Yikai Wang Yanwei Fu Dongyu Ru Zheng Zhang Tong He VLM 42 4 0 25 Jul 2024
Unsqueeze [CLS] Bottleneck to Learn Rich Representations Qing Su Shihao Ji 36 0 0 24 Jul 2024
Pretrained Visual Representations in Reinforcement Learning Emlyn Williams Athanasios Polydoros SSL 20 1 0 24 Jul 2024
Graph Neural Networks: A suitable Alternative to MLPs in Latent 3D Medical Image Classification? Johannes Kiechle Daniel M. Lang Stefan M. Fischer Lina Felsner J. Peeken Julia A. Schnabel MedIm 51 0 0 24 Jul 2024
Nonverbal Immediacy Analysis in Education: A Multimodal Computational Model Urovs Petković Jonas Frenkel Olaf Hellwich Rebecca Lazarides 41 1 0 24 Jul 2024
PlantTrack: Task-Driven Plant Keypoint Tracking with Zero-Shot Sim2Real Transfer Samhita Marri A. N. Sivakumar N. Uppalapati Girish Chowdhary 26 0 0 23 Jul 2024
SINDER: Repairing the Singular Defects of DINOv2 Haoqian Wang Tong Zhang Mathieu Salzmann 39 1 0 23 Jul 2024
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model Yiwei Ma Zhibin Wang Xiaoshuai Sun Weihuang Lin Qiang-feng Zhou Jiayi Ji Rongrong Ji MLLM VLM 59 1 0 23 Jul 2024
Reconstructing Training Data From Real World Models Trained with Transfer Learning Yakir Oz Gilad Yehudai Gal Vardi Itai Antebi Michal Irani Niv Haim 43 2 0 22 Jul 2024
MILAN: Milli-Annotations for Lidar Semantic Segmentation Nermin Samet Gilles Puy Oriane Siméoni Renaud Marlet 3DPC 47 0 0 22 Jul 2024
AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection Yunkang Cao Jiangning Zhang Luca Frittoli Yuqi Cheng Weiming Shen Giacomo Boracchi VLM 61 29 0 22 Jul 2024
MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics Alexander Melekhin Dmitry Yudin Ilia Petryashin Vitaly Bezuglyj 53 1 0 22 Jul 2024
Probing Fine-Grained Action Understanding and Cross-View Generalization of Foundation Models Thinesh Thiyakesan Ponbagavathi Kunyu Peng Alina Roitberg 56 1 0 22 Jul 2024
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models Amir Mohammad Karimi Mamaghan Samuele Papa Karl Henrik Johansson Stefan Bauer Andrea Dittadi OCL 56 5 0 22 Jul 2024