Title
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models Jin-Hwa Kim Yunji Kim Jiyoung Lee Kang Min Yoo Sang-Woo Lee EGVM 36 32 0 25 May 2022
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training Gi-Cheon Kang Sungdong Kim Jin-Hwa Kim Donghyun Kwak Byoung-Tak Zhang 32 10 0 25 May 2022
End-to-End Multimodal Fact-Checking and Explanation Generation: A Challenging Dataset and Models Barry Menglong Yao Aditya Shah Lichao Sun Jin-Hee Cho Lifu Huang MLLM LRM 46 78 0 25 May 2022
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts Akari Asai Mohammadreza Salehi Matthew E. Peters Hannaneh Hajishirzi 130 100 0 24 May 2022
M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing Zhikang Li Huiling Zhou Shuai Bai Peike Li Chang Zhou Hongxia Yang 34 4 0 24 May 2022
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization Shruti Palaskar Akshita Bhagia Yonatan Bisk Florian Metze A. Black Ana Marasović 18 4 0 24 May 2022
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment Tuan Dinh Jy-yong Sohn Shashank Rajput Timothy Ossowski Yifei Ming Junjie Hu Dimitris Papailiopoulos Kangwook Lee 28 0 0 23 May 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding Chitwan Saharia William Chan Saurabh Saxena Lala Li Jay Whang ... Raphael Gontijo-Lopes Tim Salimans Jonathan Ho David J Fleet Mohammad Norouzi VLM 66 5,778 0 23 May 2022
Decoder Denoising Pretraining for Semantic Segmentation Emmanuel B. Asiedu Simon Kornblith Ting Chen Niki Parmar Matthias Minderer Mohammad Norouzi AI4CE 196 26 0 23 May 2022
Markedness in Visual Semantic AI Robert Wolfe Aylin Caliskan VLM 27 35 0 23 May 2022
GR-GAN: Gradual Refinement Text-to-image Generation Bo Yang Fangxiang Feng Xiaojie Wang EGVM 13 7 0 23 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models Yuan Yao Qi-An Chen Ao Zhang Wei Ji Zhiyuan Liu Tat-Seng Chua Maosong Sun VLM MLLM 26 38 0 23 May 2022
Covariance Matrix Adaptation MAP-Annealing Matthew C. Fontaine Stefanos Nikolaidis 45 25 0 22 May 2022
Visually-Augmented Language Modeling Weizhi Wang Li Dong Hao Cheng Haoyu Song Xiaodong Liu Xifeng Yan Jianfeng Gao Furu Wei VLM 36 18 0 20 May 2022
The developmental trajectory of object recognition robustness: children are like small adults but unlike big deep neural networks Lukas Huber Robert Geirhos Felix Wichmann 54 16 0 20 May 2022
RankGen: Improving Text Generation with Large Ranking Models Kalpesh Krishna Yapei Chang John Wieting Mohit Iyyer AIMat 24 68 0 19 May 2022
Voxel-informed Language Grounding Rodolfo Corona Shizhan Zhu Dan Klein Trevor Darrell 138 11 0 19 May 2022
TransTab: Learning Transferable Tabular Transformers Across Tables Zifeng Wang Jimeng Sun LMTD 36 135 0 19 May 2022
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars Fangzhou Hong Mingyuan Zhang Liang Pan Zhongang Cai Lei Yang Ziwei Liu CLIP 98 79 0 17 May 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval Max Bain Arsha Nagrani Gül Varol Andrew Zisserman CLIP 129 62 0 17 May 2022
Sparse Visual Counterfactual Explanations in Image Space Valentyn Boreiko Maximilian Augustin Francesco Croce Philipp Berens Matthias Hein BDL CML 30 26 0 16 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization Luke Melas-Kyriazi Christian Rupprecht Iro Laina Andrea Vedaldi 30 159 0 16 May 2022
On the Difficulty of Defending Self-Supervised Learning against Model Extraction Adam Dziedzic Nikita Dhawan Muhammad Ahmad Kaleem Jonas Guan Nicolas Papernot MIACV 54 22 0 16 May 2022
Aligning Robot Representations with Humans Andreea Bobu Andi Peng 27 0 0 15 May 2022
Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training C. Seibold Simon Reiß M. Sarfraz Rainer Stiefelhagen Jens Kleesiek 18 31 0 14 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches Anirudh S. Sundar Larry Heck 38 29 0 13 May 2022
A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities Yisheng Song Ting-Yuan Wang S. Mondal J. P. Sahoo SLR 50 344 0 13 May 2022
The Creativity of Text-to-Image Generation J. Oppenlaender 25 192 0 13 May 2022
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning Hongbin Liu Jinyuan Jia Neil Zhenqiang Gong 25 34 0 13 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics David M. Chan Austin Myers Sudheendra Vijayanarasimhan David A. Ross Bryan Seybold John F. Canny 28 6 0 12 May 2022
Simple Open-Vocabulary Object Detection with Vision Transformers Matthias Minderer A. Gritsenko Austin Stone Maxim Neumann Dirk Weissenborn ... Zhuoran Shen Tianlin Li Xiaohua Zhai Thomas Kipf N. Houlsby ObjD CLIP VLM ViT OCL 34 307 0 12 May 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning Zixin Wen Yuanzhi Li SSL 27 34 0 12 May 2022
Automated Audio Captioning: An Overview of Recent Progress and New Challenges Xinhao Mei Xubo Liu Mark D. Plumbley Wenwu Wang 29 37 0 12 May 2022
Deep Learning and Synthetic Media Raphaël Millière 23 18 0 11 May 2022
Learning to Retrieve Videos by Asking Questions Avinash Madasu Junier Oliva Gedas Bertasius VGen 32 16 0 11 May 2022
DISARM: Detecting the Victims Targeted by Harmful Memes Shivam Sharma Md. Shad Akhtar Preslav Nakov Tanmoy Chakraborty 13 29 0 11 May 2022
Learning to Answer Visual Questions from Web Videos Antoine Yang Antoine Miech Josef Sivic Ivan Laptev Cordelia Schmid ViT 34 33 0 10 May 2022
Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training Jing Yang Junwen Chen Keiji Yanai ViT 21 5 0 10 May 2022
Weakly-supervised segmentation of referring expressions Robin Strudel Ivan Laptev Cordelia Schmid 22 21 0 10 May 2022
When does dough become a bagel? Analyzing the remaining mistakes on ImageNet Vijay Vasudevan Benjamin Caine Raphael Gontijo-Lopes Sara Fridovich-Keil Rebecca Roelofs VLM UQCV 46 57 0 09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao Teli Ma Hongsheng Li Ziyi Lin Jifeng Dai Yu Qiao ViT 19 121 0 08 May 2022
Generating Representative Samples for Few-Shot Classification Jingyi Xu Hieu M. Le VLM 21 61 0 05 May 2022
Language Models Can See: Plugging Visual Controls in Text Generation Yixuan Su Tian Lan Yahui Liu Fangyu Liu Dani Yogatama Yan Wang Lingpeng Kong Nigel Collier VLM MLLM 46 97 0 05 May 2022
Relational Representation Learning in Visually-Rich Documents Xin Li Yan Zheng Yiqing Hu H. Cao Yunfei Wu Deqiang Jiang Yinsong Liu Bo Ren 18 12 0 05 May 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision Henghui Zhao Isma Hadji Nikita Dvornik Konstantinos G. Derpanis Richard P. Wildes Allan D. Jepson 31 45 0 04 May 2022
Explain to Not Forget: Defending Against Catastrophic Forgetting with XAI Sami Ede Serop Baghdadlian Leander Weber A. Nguyen Dario Zanca Wojciech Samek Sebastian Lapuschkin CLL 27 6 0 04 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models Jiahui Yu Zirui Wang Vijay Vasudevan Legg Yeung Mojtaba Seyedhosseini Yonghui Wu VLM CLIP OffRL 79 1,256 0 04 May 2022
All You May Need for VQA are Image Captions Soravit Changpinyo Doron Kukliansky Idan Szpektor Xi Chen Nan Ding Radu Soricut 32 70 0 04 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework Ziyi Yang Yuwei Fang Chenguang Zhu Reid Pryzant Dongdong Chen ... Bin Xiao Yuanxun Lu Takuya Yoshioka Michael Zeng Xuedong Huang 40 45 0 03 May 2022
Comparison of CoModGANs, LaMa and GLIDE for Art Inpainting- Completing M.C Escher's Print Gallery Lucia Cipolina-Kun Simone Caenazzo Gaston Mazzei 19 2 0 03 May 2022