Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1709.07871
Cited By
FiLM: Visual Reasoning with a General Conditioning Layer
22 September 2017
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
FAtt
AIMat
OffRL
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FiLM: Visual Reasoning with a General Conditioning Layer"
50 / 1,312 papers shown
Title
Data-Driven Radio Propagation Modeling using Graph Neural Networks
Adrien Bufort
Laurent Lebocq
Stefan Cathabard
GNN
46
3
0
08 Jan 2025
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
H. S. Bovbjerg
Jan Østergaard
Jesper Jensen
Zheng-Hua Tan
38
0
0
06 Jan 2025
S-Diff: An Anisotropic Diffusion Model for Collaborative Filtering in Spectral Domain
Rui Xia
Yanhua Cheng
Yongxiang Tang
Xiaocheng Liu
Xialong Liu
Lisong Wang
Peng Jiang
DiffM
38
0
0
03 Jan 2025
JADE: Joint-aware Latent Diffusion for 3D Human Generative Modeling
Haorui Ji
Rong Wang
Taojun Lin
Hongdong Li
3DH
43
1
0
31 Dec 2024
Simultaneous Music Separation and Generation Using Multi-Track Latent Diffusion Models
Tornike Karchkhadze
M. Izadi
Shlomo Dubnov
DiffM
44
2
0
31 Dec 2024
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen
Yuqi Wang
Zhaoxiang Zhang
149
7
0
24 Dec 2024
Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection
Andi Xu
Hongsong Wang
Pinle Ding
Jie Gui
DiffM
VGen
30
0
0
23 Dec 2024
HyperCLIP: Adapting Vision-Language models with Hypernetworks
Victor Akinwande
Mohammad Sadegh Norouzzadeh
Devin Willmott
Anna Bair
Madan Ravi Ganesh
J. Zico Kolter
CLIP
VLM
93
0
0
21 Dec 2024
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
A. Schwing
Yuki Mitsufuji
VGen
126
12
0
19 Dec 2024
Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
Xiu Yuan
Tongzhou Mu
Stone Tao
Yunhao Fang
Mengke Zhang
H. Su
OffRL
66
3
0
18 Dec 2024
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
Moritz Reuss
Jyothish Pari
Pulkit Agrawal
Rudolf Lioutikov
DiffM
MoE
84
6
0
17 Dec 2024
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia
M. Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Conghui He
Botian Shi
Tao Chen
Junchi Yan
Bo Zhang
91
7
0
16 Dec 2024
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha
Yapeng Tian
DiffM
VGen
87
2
0
14 Dec 2024
Fast and Robust Visuomotor Riemannian Flow Matching Policy
Haoran Ding
Noémie Jaquier
Jan Peters
Leonel Rozo
86
2
0
14 Dec 2024
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
66
0
0
07 Dec 2024
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing
Yen-Ju Lu
Jing Liu
Thomas Thebaud
Laureano Moro Velázquez
Ariya Rastrow
Najim Dehak
Jesus Villalba
79
1
0
05 Dec 2024
TASR: Timestep-Aware Diffusion Model for Image Super-Resolution
Qinwei Lin
Xiaopeng Sun
Yu Gao
Yujie Zhong
Dengjie Li
Zheng Zhao
Haoqian Wang
71
0
0
04 Dec 2024
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
Junjie Wen
Minjie Zhu
Bo Li
Zhibin Tang
Jinming Li
...
Chengmeng Li
Xiaoyu Liu
Yaxin Peng
Chaomin Shen
Feifei Feng
88
13
0
04 Dec 2024
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Zilyu Ye
Zhiyang Chen
Tiancheng Li
Zemin Huang
Weijian Luo
Guo-jun Qi
DiffM
83
5
0
02 Dec 2024
Zero-shot Musical Stem Retrieval with Joint-Embedding Predictive Architectures
Alain Riou
Antonin Gagnere
Gaëtan Hadjeres
Stefan Lattner
Geoffroy Peeters
91
0
0
29 Nov 2024
Unpacking the Individual Components of Diffusion Policy
Xiu Yuan
84
0
0
27 Nov 2024
MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers
Ruoxi Zhu
Zhengzhong Tu
Jiaming Liu
A. Bovik
Yibo Fan
ViT
70
7
0
26 Nov 2024
Multi-Resolution Generative Modeling of Human Motion from Limited Data
David Eduardo Moreno-Villamarín
A. Hilsmann
Peter Eisert
DiffM
3DH
86
0
0
25 Nov 2024
Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors
Soumava Paul
Prakhar Kaushik
Alan L. Yuille
3DGS
DiffM
186
0
0
24 Nov 2024
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
79
2
0
20 Nov 2024
Generating 3D-Consistent Videos from Unposed Internet Photos
Gene Chou
Kai Zhang
Sai Bi
Hao Tan
Zexiang Xu
Fujun Luan
Bharath Hariharan
Noah Snavely
3DGS
VGen
89
3
0
20 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
29
1
0
17 Nov 2024
TDSM: Triplet Diffusion for Skeleton-Text Matching in Zero-Shot Action Recognition
Jeonghyeok Do
Munchurl Kim
49
1
0
16 Nov 2024
NeuralDEM -- Real-time Simulation of Industrial Particulate Flows
Benedikt Alkin
Tobias Kronlachner
Samuele Papa
Stefan Pirker
Thomas Lichtenegger
Johannes Brandstetter
PINN
AI4CE
54
1
1
14 Nov 2024
Artificial Intelligence for Biomedical Video Generation
Linyuan Li
Jianing Qiu
Anujit Saha
Lin Li
Poyuan Li
Mengxian He
Ziyu Guo
Wu Yuan
VGen
63
1
0
12 Nov 2024
Modulating State Space Model with SlowFast Framework for Compute-Efficient Ultra Low-Latency Speech Enhancement
Longbiao Cheng
Ashutosh Pandey
Buye Xu
T. Delbruck
V. Ithapu
Shih-Chii Liu
37
0
0
04 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
56
0
0
02 Nov 2024
Is Multiple Object Tracking a Matter of Specialization?
G. Mancusi
Mattia Bernardi
Aniello Panariello
Angelo Porrello
Rita Cucchiara
Simone Calderara
MoMe
36
1
0
01 Nov 2024
EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching
Xinwang Chen
Ning Liu
Bo Li
Feifei Feng
Jian Tang
42
2
0
31 Oct 2024
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference
Changwoo Lee
Soo Min Kwon
Qing Qu
Hun-Seok Kim
27
0
0
28 Oct 2024
Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features
Abdelrahman Abdelwahab
Abdelrahman Abdelwahab
Ayaan Vaswani
Advait Bharathulwar
Arnav Kommaraju
24
1
0
26 Oct 2024
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
Kyle Hatch
Ashwin Balakrishna
Oier Mees
Suraj Nair
Seohong Park
...
Masha Itkina
Benjamin Eysenbach
Sergey Levine
Thomas Kollar
Benjamin Burchfiel
65
2
0
26 Oct 2024
Considerations for Distribution Shift Robustness of Diagnostic Models in Healthcare
Arno Blaas
Adam Goliñski
Andrew C. Miller
Luca Zappella
J. Jacobsen
Christina Heinze-Deml
OOD
28
0
0
25 Oct 2024
Diffusion for Multi-Embodiment Grasping
Roman Freiberg
Alexander Qualmann
Ngo Anh Vien
Gerhard Neumann
29
3
0
24 Oct 2024
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Yuang Ai
Xiaoqiang Zhou
Huaibo Huang
Xiaotian Han
Zhengyu Chen
Quanzeng You
Hongxia Yang
47
9
0
24 Oct 2024
Unified Microphone Conversion: Many-to-Many Device Mapping via Feature-wise Linear Modulation
Myeonghoon Ryu
Hongseok Oh
Suji Lee
Han Park
23
0
0
23 Oct 2024
Composing Diffusion Policies for Few-shot Learning of Movement Trajectories
Omkar Patil
Anant Sah
N. Gopalan
DiffM
23
1
0
22 Oct 2024
Allegro: Open the Black Box of Commercial-Level Video Generation Model
Yuan Zhou
Qiuyue Wang
Yuxuan Cai
Huan Yang
VGen
VLM
85
26
0
20 Oct 2024
LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration
Yuang Ai
Huaibo Huang
Ran He
35
2
0
20 Oct 2024
FoMo: A Foundation Model for Mobile Traffic Forecasting with Diffusion Model
Haoye Chai
Shiyuan Zhang
Xiaoqian Qi
Yong Li
32
0
0
20 Oct 2024
CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation
Shangning Xia
Hongjie Fang
Hao-Shu Fang
Cewu Lu
CML
34
5
0
19 Oct 2024
Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation
Sung-Wook Lee
Yen-Ling Kuo
Yen-Ling Kuo
26
4
0
18 Oct 2024
GAN-Based Speech Enhancement for Low SNR Using Latent Feature Conditioning
Shrishti Saha Shetu
Emanuël A. P. Habets
Andreas Brendel
31
1
0
17 Oct 2024
The Latent Road to Atoms: Backmapping Coarse-grained Protein Structures with Latent Diffusion
Xu Han
Yuancheng Sun
Kai Chen
Kang Liu
Qiwei Ye
DiffM
AI4CE
29
0
0
17 Oct 2024
Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance
Mitsuhiko Nakamoto
Oier Mees
Aviral Kumar
Sergey Levine
OffRL
79
13
0
17 Oct 2024
Previous
1
2
3
4
5
6
...
25
26
27
Next