ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.05665
  4. Cited By
ImageBind: One Embedding Space To Bind Them All

ImageBind: One Embedding Space To Bind Them All

9 May 2023
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
    VLM
ArXivPDFHTML

Papers citing "ImageBind: One Embedding Space To Bind Them All"

50 / 172 papers shown
Title
Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation
Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation
Ben Harwood
Amir Dezfouli
Iadine Chadès
Conrad Sanderson
36
0
0
30 Apr 2024
Aligning Knowledge Graphs Provided by Humans and Generated from Neural
  Networks in Specific Tasks
Aligning Knowledge Graphs Provided by Humans and Generated from Neural Networks in Specific Tasks
Tangrui Li
Jun Zhou
43
0
0
23 Apr 2024
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Jie Ma
Min Hu
Pinghui Wang
Wangchun Sun
Lingyun Song
Hongbin Pei
Jun Liu
Youtian Du
39
4
0
18 Apr 2024
OmniSat: Self-Supervised Modality Fusion for Earth Observation
OmniSat: Self-Supervised Modality Fusion for Earth Observation
Guillaume Astruc
Nicolas Gonthier
Clement Mallet
Loic Landrieu
38
25
0
12 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
44
20
0
09 Apr 2024
Segment Any 3D Object with Language
Segment Any 3D Object with Language
Seungjun Lee
Yuyang Zhao
Gim Hee Lee
44
1
0
02 Apr 2024
MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in
  Conversations with Multimodal Language Models
MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models
Zebang Cheng
Fuqiang Niu
Yuxiang Lin
Zhi-Qi Cheng
Bowen Zhang
Xiaojiang Peng
31
7
0
31 Mar 2024
Long-Tailed Anomaly Detection with Learnable Class Names
Long-Tailed Anomaly Detection with Learnable Class Names
Chih-Hui Ho
Kuan-Chuan Peng
Nuno Vasconcelos
OODD
43
6
0
29 Mar 2024
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
Ming Yan
Yan Zhang
Shuqiang Cai
Shuqi Fan
Xincheng Lin
...
Siqi Shen
Chenglu Wen
Lan Xu
Yuexin Ma
Cheng-Yu Wang
51
6
0
28 Mar 2024
Unsupervised Audio-Visual Segmentation with Modality Alignment
Unsupervised Audio-Visual Segmentation with Modality Alignment
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Jiangkang Deng
Xiatian Zhu
VOS
43
5
0
21 Mar 2024
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization
  with Vision-Language Models
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models
Elaine Sui
Xiaohan Wang
Serena Yeung-Levy
VLM
30
5
0
19 Mar 2024
Continuous Object State Recognition for Cooking Robots Using Pre-Trained
  Vision-Language Models and Black-box Optimization
Continuous Object State Recognition for Cooking Robots Using Pre-Trained Vision-Language Models and Black-box Optimization
Kento Kawaharazuka
Naoaki Kanazawa
Yoshiki Obinata
K. Okada
Masayuki Inaba
32
5
0
13 Mar 2024
Diffusion Model-Based Image Editing: A Survey
Diffusion Model-Based Image Editing: A Survey
Yi Huang
Jiancheng Huang
Yifan Liu
Mingfu Yan
Jiaxi Lv
Jianzhuang Liu
Wei Xiong
He Zhang
Liangliang Cao
Liangliang Cao
EGVM
66
87
0
27 Feb 2024
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoning
Han Zou
Qiyang Zhao
Lina Bariah
Yu Tian
M. Bennis
S. Lasaulce
101
12
0
26 Feb 2024
LLMBind: A Unified Modality-Task Integration Framework
LLMBind: A Unified Modality-Task Integration Framework
Bin Zhu
Munan Ning
Peng Jin
Bin Lin
Jinfa Huang
...
Junwu Zhang
Zhenyu Tang
Mingjun Pan
Xing Zhou
Li-ming Yuan
MLLM
40
6
0
22 Feb 2024
User-LLM: Efficient LLM Contextualization with User Embeddings
User-LLM: Efficient LLM Contextualization with User Embeddings
Lin Ning
Luyang Liu
Jiaxing Wu
Neo Wu
D. Berlowitz
Sushant Prakash
Bradley Green
S. O’Banion
Jun Xie
57
34
0
21 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
45
29
0
20 Feb 2024
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with
  Queryable Objects and Open-Set Relationships
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships
Sebastian Koch
Narunas Vaskevicius
Mirco Colosi
Pedro Hermosilla
Timo Ropinski
3DPC
36
26
0
19 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
85
4
0
08 Feb 2024
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Jorge Sánchez
Rodrigo Laguna
VLM
44
0
0
29 Jan 2024
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
Chenyu Wang
Weixin Luo
Qianyu Chen
Haonan Mai
Jindi Guo
Sixun Dong
Xiaohua Xuan
MLLM
LLMAG
52
19
0
19 Jan 2024
GroundingGPT:Language Enhanced Multi-modal Grounding Model
GroundingGPT:Language Enhanced Multi-modal Grounding Model
Zhaowei Li
Qi Xu
Dong Zhang
Hang Song
Yiqing Cai
...
Junting Pan
Zefeng Li
Van Tu Vu
Zhida Huang
Tao Wang
36
38
0
11 Jan 2024
Low-Resource Vision Challenges for Foundation Models
Low-Resource Vision Challenges for Foundation Models
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
30
5
0
09 Jan 2024
AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident
  Analysis
AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis
Kebin Wu
Wenbin Li
Xiaofei Xiao
21
3
0
05 Jan 2024
Towards Robust Multimodal Prompting With Missing Modalities
Towards Robust Multimodal Prompting With Missing Modalities
Jaehyuk Jang
Yooseung Wang
Changick Kim
VLM
30
10
0
26 Dec 2023
CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization
  in Healthcare
CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare
Akash Ghosh
Arkadeep Acharya
Raghav Jain
Sriparna Saha
Aman Chadha
Setu Sinha
35
29
0
16 Dec 2023
LEMON: Learning 3D Human-Object Interaction Relation from 2D Images
LEMON: Learning 3D Human-Object Interaction Relation from 2D Images
Yuhang Yang
Wei Zhai
Hongcheng Luo
Yang Cao
Zheng-Jun Zha
27
23
0
14 Dec 2023
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
50
64
0
11 Dec 2023
Audio-Visual LLM for Video Understanding
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLM
MLLM
27
38
0
11 Dec 2023
NLLG Quarterly arXiv Report 09/23: What are the most influential current
  AI Papers?
NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Ran Zhang
Aida Kostikova
Christoph Leiter
Jonas Belouadi
Daniil Larionov
Yanran Chen
Vivian Fresen
Steffen Eger
42
0
0
09 Dec 2023
Unsupervised Multi-modal Feature Alignment for Time Series Representation Learning
Unsupervised Multi-modal Feature Alignment for Time Series Representation Learning
Cheng Liang
Donghua Yang
Zhiyu Liang
Hongzhi Wang
Zheng Liang
Xiyang Zhang
Jianfeng Huang
AI4TS
202
1
0
09 Dec 2023
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
Andrea Caraffa
Davide Boscaini
Amir Hamza
Fabio Poiesi
61
15
0
01 Dec 2023
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines
Hamed Damirchi
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Javen Qinfeng Shi
Stephen Gould
Anton Van Den Hengel
VLM
47
0
0
29 Nov 2023
UniIR: Training and Benchmarking Universal Multimodal Information
  Retrievers
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Cong Wei
Yang Chen
Haonan Chen
Hexiang Hu
Ge Zhang
Jie Fu
Alan Ritter
Wenhu Chen
47
53
0
28 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
67
21
0
28 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating
  Video-based Large Language Models
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
27
58
0
27 Nov 2023
Chain of Images for Intuitively Reasoning
Chain of Images for Intuitively Reasoning
Fanxu Meng
Haotong Yang
Yiding Wang
Muhan Zhang
LRM
36
7
0
09 Nov 2023
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
34
2
0
28 Oct 2023
Leveraging Image-Text Similarity and Caption Modification for the
  DataComp Challenge: Filtering Track and BYOD Track
Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track
Shuhei Yokoo
Peifei Zhu
Yuchi Ishikawa
Mikihiro Tanaka
Masayoshi Kondo
Hirokatsu Kataoka
24
0
0
23 Oct 2023
Extending Multi-modal Contrastive Representations
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
29
5
0
13 Oct 2023
Exploring the Creation and Humanization of Digital Life: Consciousness
  Simulation and Human-Machine Interaction
Exploring the Creation and Humanization of Digital Life: Consciousness Simulation and Human-Machine Interaction
Qikang Zhang
AI4CE
13
1
0
10 Oct 2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large
  Language Models
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
38
12
0
09 Oct 2023
Improving Discriminative Multi-Modal Learning with Large-Scale
  Pre-Trained Models
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models
Chenzhuang Du
Yue Zhao
Chonghua Liao
Jiacheng You
Jie Fu
Hang Zhao
47
2
0
08 Oct 2023
Beyond Task Performance: Evaluating and Reducing the Flaws of Large
  Multimodal Models with In-Context Learning
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning
Mustafa Shukor
Alexandre Ramé
Corentin Dancette
Matthieu Cord
LRM
MLLM
46
20
0
01 Oct 2023
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Peng Gao
Peng Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
49
117
0
07 Sep 2023
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language
  Models
AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
Zhaopeng Gu
Bingke Zhu
Guibo Zhu
Yingying Chen
Ming Tang
Jinqiao Wang
VLM
MLLM
37
102
0
29 Aug 2023
Mobile Foundation Model as Firmware
Mobile Foundation Model as Firmware
Jinliang Yuan
Chenchen Yang
Dongqi Cai
Shihe Wang
Xin Yuan
...
Di Zhang
Hanzi Mei
Xianqing Jia
Shangguang Wang
Mengwei Xu
40
19
0
28 Aug 2023
Adversarial Illusions in Multi-Modal Embeddings
Adversarial Illusions in Multi-Modal Embeddings
Tingwei Zhang
Rishi Jha
Eugene Bagdasaryan
Vitaly Shmatikov
AAML
34
8
0
22 Aug 2023
An Outlook into the Future of Egocentric Vision
An Outlook into the Future of Egocentric Vision
Chiara Plizzari
Gabriele Goletto
Antonino Furnari
Siddhant Bansal
Francesco Ragusa
G. Farinella
Dima Damen
Tatiana Tommasi
EgoV
40
38
0
14 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
42
224
0
10 Aug 2023
Previous
1234
Next