ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1709.07871
  4. Cited By
FiLM: Visual Reasoning with a General Conditioning Layer
v1v2 (latest)

FiLM: Visual Reasoning with a General Conditioning Layer

22 September 2017
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
    FAttAIMatOffRLAI4CE
ArXiv (abs)PDFHTML

Papers citing "FiLM: Visual Reasoning with a General Conditioning Layer"

50 / 1,349 papers shown
Title
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
441
1
0
07 May 2025
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Divyansh Srivastava
Xiang Zhang
He Wen
Chenru Wen
Zhuowen Tu
DiffM
84
0
0
07 May 2025
The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis
The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis
Bernardo Torres
Geoffroy Peeters
G. Richard
69
0
0
06 May 2025
StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data
StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data
Yuxuan Mu
Hung Yu Ling
Yi Shi
Ismael Baira Ojeda
Pengcheng Xi
Chang Shu
F. Zinno
Xue Bin Peng
76
0
0
06 May 2025
DPNet: Dynamic Pooling Network for Tiny Object Detection
DPNet: Dynamic Pooling Network for Tiny Object Detection
Luqi Gong
Haotian Chen
Yushen Chen
Tianliang Yao
Chao Li
Shuai Zhao
Guangjie Han
ObjD
450
0
0
05 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
114
0
0
01 May 2025
J-PARSE: Jacobian-based Projection Algorithm for Resolving Singularities Effectively in Inverse Kinematic Control of Serial Manipulators
J-PARSE: Jacobian-based Projection Algorithm for Resolving Singularities Effectively in Inverse Kinematic Control of Serial Manipulators
Shivani Guptasarma
Matthew Strong
HongHao Zhen
Monroe Kennedy III
81
0
0
01 May 2025
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Haifeng Huang
Xinyi Chen
Yuxiao Chen
Haoyang Li
Xiaoshen Han
Zihao Wang
Tai Wang
Jiangmiao Pang
Zhou Zhao
LM&Ro
148
1
0
30 Apr 2025
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
Haoran Geng
Feishi Wang
Songlin Wei
Yuchen Li
Bangjun Wang
...
Hao Dong
Siyuan Huang
Yue Wang
Jitendra Malik
Pieter Abbeel
192
8
0
26 Apr 2025
Salient Region-Guided Spacecraft Image Arbitrary-Scale Super-Resolution Network
Salient Region-Guided Spacecraft Image Arbitrary-Scale Super-Resolution Network
J. Yang
Hu Gao
Ying Zhang
Depeng Dang
128
0
0
25 Apr 2025
CIVIL: Causal and Intuitive Visual Imitation Learning
CIVIL: Causal and Intuitive Visual Imitation Learning
Yinlong Dai
Robert Ramirez Sanchez
Ryan Jeronimus
Shahabedin Sagheb
Cara M. Nunez
Heramb Nemlekar
Dylan P. Losey
133
1
0
24 Apr 2025
SPECI: Skill Prompts based Hierarchical Continual Imitation Learning for Robot Manipulation
SPECI: Skill Prompts based Hierarchical Continual Imitation Learning for Robot Manipulation
Jingkai Xu
Xiangli Nie
73
0
0
22 Apr 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&RoLRM
102
0
0
22 Apr 2025
FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models
FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models
Kuanting Wu
Kei Ota
Asako Kanezaki
DiffMVGen
118
0
0
20 Apr 2025
Plain Transformers Can be Powerful Graph Learners
Plain Transformers Can be Powerful Graph Learners
Liheng Ma
Soumyasundar Pal
Yingxue Zhang
Philip Torr
Mark Coates
83
0
0
17 Apr 2025
Towards Forceful Robotic Foundation Models: a Literature Survey
Towards Forceful Robotic Foundation Models: a Literature Survey
William Xie
N. Correll
OffRL
138
4
0
16 Apr 2025
Autoregressive Distillation of Diffusion Transformers
Autoregressive Distillation of Diffusion Transformers
Yeongmin Kim
Sotiris Anagnostidis
Yuming Du
Edgar Schönfeld
Jonas Kohler
Markos Georgopoulos
Albert Pumarola
Ali K. Thabet
A. Sanakoyeu
80
0
0
15 Apr 2025
Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization
Efficient Task-specific Conditional Diffusion Policies: Shortcut Model Acceleration and SO(3) Optimization
Haiyong Yu
Yanqiong Jin
Yonghao He
Wei Sui
80
0
0
14 Apr 2025
Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models
Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models
Hao Ren
Yiming Zeng
Zetong Bi
Zhaoliang Wan
Junlong Huang
Hui Cheng
441
1
0
14 Apr 2025
Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space
Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space
Kangwei Liu
Junwu Liu
Xiaowei Yi
Jinlin Guo
Yun Cao
VGen
17
0
0
14 Apr 2025
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
149
0
0
11 Apr 2025
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Kai Wang
Hao Luo
Yibing Song
Gao Huang
Fan Wang
Yang You
163
0
0
09 Apr 2025
Robust Fusion Controller: Degradation-aware Image Fusion with Fine-grained Language Instructions
Robust Fusion Controller: Degradation-aware Image Fusion with Fine-grained Language Instructions
Hao Zhang
Yanping Zha
Qingwei Zhuang
Z. Shao
Jiayi Ma
95
0
0
08 Apr 2025
PRISM: Probabilistic Representation for Integrated Shape Modeling and Generation
PRISM: Probabilistic Representation for Integrated Shape Modeling and Generation
Lei Cheng
Mahdi Saleh
Qing Cheng
Lu Sang
Hongli Xu
Zorah Lähner
F. Tombari
48
0
0
06 Apr 2025
MultiNeRF: Multiple Watermark Embedding for Neural Radiance Fields
MultiNeRF: Multiple Watermark Embedding for Neural Radiance Fields
Yash Kulthe
Andrew Gilbert
John Collomosse
128
0
0
03 Apr 2025
Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization
Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization
Haishan Wang
Mohammad Hassan Vali
Arno Solin
3DGS
107
0
0
03 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kai Zhang
MGenVGen
297
1
0
01 Apr 2025
Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation
Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation
Abhiram Maddukuri
Z. L. Jiang
Lawrence Yunliang Chen
Soroush Nasiriany
Yuqi Xie
...
Scott Reed
Ken Goldberg
Ajay Mandlekar
Linxi Fan
Yuke Zhu
141
7
0
31 Mar 2025
Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation
Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation
Zahra Tehraninasab
Amar Kumar
Tal Arbel
MedIm
107
0
0
30 Mar 2025
Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes
Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes
Binh Thien Nguyen
Masahiro Yasuda
Daiki Takeuchi
Daisuke Niizumi
Yasunori Ohishi
Noboru Harada
107
1
0
28 Mar 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCLVLM
178
3
0
27 Mar 2025
A multi-agentic framework for real-time, autonomous freeform metasurface design
A multi-agentic framework for real-time, autonomous freeform metasurface design
Robert Lupoiu
Yixuan Shao
Tianxiang Dai
Chenkai Mao
Kofi Edee
Jonathan A. Fan
AI4CE
108
1
0
26 Mar 2025
RoboFlamingo-Plus: Fusion of Depth and RGB Perception with Vision-Language Models for Enhanced Robotic Manipulation
RoboFlamingo-Plus: Fusion of Depth and RGB Perception with Vision-Language Models for Enhanced Robotic Manipulation
Sheng Wang
VLM
128
2
0
25 Mar 2025
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Zhi Hou
Tianyi Zhang
Yuwen Xiong
Haonan Duan
Hengjun Pu
...
Chengyang Zhao
X. Zhu
Yu Qiao
Jifeng Dai
Yuxiao Chen
143
6
0
25 Mar 2025
Unpaired Object-Level SAR-to-Optical Image Translation for Aircraft with Keypoints-Guided Diffusion Models
Unpaired Object-Level SAR-to-Optical Image Translation for Aircraft with Keypoints-Guided Diffusion Models
Ruixi You
Hecheng Jia
Feng Xu
DiffM
70
0
0
25 Mar 2025
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Chengxiang Huang
Yake Wei
Zequn Yang
D. Hu
97
2
0
24 Mar 2025
DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model
DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model
Kangwei Liu
Junwu Liu
Yun Cao
Jinlin Guo
Xiaowei Yi
DiffM
73
0
0
24 Mar 2025
LightLoc: Learning Outdoor LiDAR Localization at Light Speed
LightLoc: Learning Outdoor LiDAR Localization at Light Speed
Wenbo Li
Chen Liu
Shangshu Yu
Dunqiang Liu
Yin Zhou
Siqi Shen
Chenglu Wen
Cheng-Yu Wang
65
0
0
22 Mar 2025
PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning
PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning
Yan Zhang
Yao Feng
Alpár Cseke
Nitin Saini
Nathan Bajandas
Nicolas Heron
M. Black
DiffMVGen
119
1
0
21 Mar 2025
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
Jiangran Lyu
Ziming Li
Xuesong Shi
Chaoyi Xu
Yizhou Wang
He Wang
104
0
0
21 Mar 2025
Real-Time Diffusion Policies for Games: Enhancing Consistency Policies with Q-Ensembles
Real-Time Diffusion Policies for Games: Enhancing Consistency Policies with Q-Ensembles
Ruoqi Zhang
Ziwei Luo
Jens Sjölund
Per Mattsson
Linus Gisslén
Alessandro Sestini
69
1
0
21 Mar 2025
Diffusion-augmented Graph Contrastive Learning for Collaborative Filter
Diffusion-augmented Graph Contrastive Learning for Collaborative Filter
Fan Huang
Wei Wang
DiffM
104
0
0
20 Mar 2025
SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer
SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer
Hongda Liu
Longguang Wang
Ye Zhang
Ziru Yu
Yulan Guo
Mamba
117
0
0
20 Mar 2025
Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling
Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling
Yanchen Luo
Zhiyuan Liu
Yi Zhao
Changhao Nai
Kenji Kawaguchi
Tat-Seng Chua
Xiang Wang
Yang Zhang
Xiang Wang
MedIm
162
0
0
19 Mar 2025
LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding
LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding
Amirhossein Kazerouni
Soroush Mehraban
Michael Brudno
Babak Taati
90
2
0
19 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
101
0
0
19 Mar 2025
SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing
SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing
Seokhyeon Hong
Chaelin Kim
Serin Yoon
Junghyun Nam
Sihun Cha
Junyong Noh
DiffMVGen
104
2
0
18 Mar 2025
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
Wupeng Wang
Zexu Pan
Jingru Lin
Shuai Wang
Haizhou Li
110
0
0
16 Mar 2025
Image-Goal Navigation Using Refined Feature Guidance and Scene Graph Enhancement
Zhicheng Feng
Xieyuanli Chen
Chenghao Shi
Lun Luo
Ziyang Chen
Yun Liu
Huimin Lu
80
1
0
14 Mar 2025
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
Hejia Chen
Haoxian Zhang
Shoulong Zhang
Xiaoqiang Liu
Sisi Zhuang
Yuan Zhang
Pengfei Wan
Di Zhang
Shuai Li
85
3
0
14 Mar 2025
Previous
12345...252627
Next