Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2208.02515
Cited By
Fine-Grained Semantically Aligned Vision-Language Pre-Training
4 August 2022
Juncheng Li
Xin He
Longhui Wei
Long Qian
Linchao Zhu
Lingxi Xie
Yueting Zhuang
Qi Tian
Siliang Tang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fine-Grained Semantically Aligned Vision-Language Pre-Training"
50 / 54 papers shown
Title
Adaptation Method for Misinformation Identification
Yangping Chen
Weijie Shi
Mengze Li
Yue Cui
H. Chen
Jia Zhu
Jiajie Xu
34
0
0
19 Apr 2025
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Francesco P. Ramunno
Paolo Massa
Vitaliy Kinakh
Brandon Panos
A. Csillaghy
S. Voloshynovskiy
DiffM
53
0
0
31 Mar 2025
Multi-Granular Multimodal Clue Fusion for Meme Understanding
Li Zheng
Hao Fei
Ting Dai
Zuquan Peng
Fei Li
Huisheng Ma
Chong Teng
Donghong Ji
60
0
0
16 Mar 2025
MV-CLAM: Multi-View Molecular Interpretation with Cross-Modal Projection via Language Model
Sumin Ha
Jun Hyeong Kim
Yinhua Piao
Sun Kim
49
0
0
23 Feb 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
Hao Li
Li Yuan
Shuicheng Yan
Jie Chen
54
1
0
31 Dec 2024
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
Shengqiong Wu
Hao Fei
Liangming Pan
William Yang Wang
Shuicheng Yan
Tat-Seng Chua
LRM
75
1
0
15 Dec 2024
CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives
Armin Saghafian
Amirmohammad Izadi
Negin Hashemi Dijujin
M. Baghshah
66
0
0
29 Nov 2024
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Qifan Yu
Wei Chow
Zhongqi Yue
Kaihang Pan
Yang Wu
Xiaoyang Wan
Juncheng Billy Li
Siliang Tang
H. Zhang
Yueting Zhuang
DiffM
106
17
0
24 Nov 2024
IPO: Interpretable Prompt Optimization for Vision-Language Models
Yingjun Du
Wenfang Sun
Cees G. M. Snoek
VLM
25
2
0
20 Oct 2024
Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration
Kaihang Pan
Zhaoyu Fan
Juncheng Li
Qifan Yu
Hao Fei
Siliang Tang
Richang Hong
Hanwang Zhang
Qianru Sun
KELM
31
6
0
30 Sep 2024
PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis
Meng Luo
Hao Fei
Bobo Li
Shengqiong Wu
Qian Liu
Soujanya Poria
Erik Cambria
M. Lee
W. Hsu
30
7
0
18 Aug 2024
Semantic Codebook Learning for Dynamic Recommendation Models
Zheqi Lv
Shaoxuan He
Ahmed Salem
Minxing Zhang
Wenqiao Zhang
Jingyuan Chen
Yang Zhang
Fei Wu
28
5
0
31 Jul 2024
Mining Open Semantics from CLIP: A Relation Transition Perspective for Few-Shot Learning
Cilin Yan
Haochen Wang
Xiaolong Jiang
Yao Hu
Xu Tang
Guoliang Kang
E. Gavves
VLM
29
0
0
17 Jun 2024
OVMR: Open-Vocabulary Recognition with Multi-Modal References
Zehong Ma
Shiliang Zhang
Longhui Wei
Qi Tian
VLM
41
0
0
07 Jun 2024
Auto-Encoding Morph-Tokens for Multimodal LLM
Kaihang Pan
Siliang Tang
Juncheng Li
Zhaoyu Fan
Wei Chow
Shuicheng Yan
Tat-Seng Chua
Yueting Zhuang
Hanwang Zhang
MLLM
35
17
0
03 May 2024
VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations
Sri Harsha Dumpala
Aman Jaiswal
Chandramouli Shama Sastry
E. Milios
Sageev Oore
Hassan Sajjad
VLM
CoGe
48
0
0
25 Apr 2024
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales
Minghe Gao
Shuang Chen
Liang Pang
Yuan Yao
Jisheng Dang
Wenqiao Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
Tat-Seng Chua
LRM
40
5
0
17 Apr 2024
Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
Weiwei Cao
Jianpeng Zhang
Yingda Xia
Tony C. W. Mok
Zi Li
X. Ye
Le Lu
Jian Zheng
Yuxing Tang
Ling Zhang
29
1
0
07 Apr 2024
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian
Juncheng Billy Li
Yu-hao Wu
Yaobo Ye
Hao Fei
Tat-Seng Chua
Yueting Zhuang
Siliang Tang
MLLM
LRM
60
47
0
18 Feb 2024
Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models
Xin He
Longhui Wei
Lingxi Xie
Qi Tian
43
8
0
06 Jan 2024
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Jia-li Zuo
Hanyu Zhou
Ying Nie
Feng Zhang
Tianyu Guo
Nong Sang
Yunhe Wang
Changxin Gao
32
17
0
06 Dec 2023
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling
Haogeng Liu
Qihang Fan
Tingkai Liu
Linjie Yang
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
VGen
26
12
0
08 Oct 2023
Improving Vision Anomaly Detection with the Guidance of Language Modality
Dong Chen
Kaihang Pan
Guoming Wang
Yueting Zhuang
Siliang Tang
26
3
0
04 Oct 2023
Integrating Visual Foundation Models for Enhanced Robot Manipulation and Motion Planning: A Layered Approach
Chenguang Yang
Peng Zhou
Jiaming Qi
19
9
0
20 Sep 2023
ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data
M. Varma
Jean-Benoit Delbrouck
Sarah Hooper
Akshay S. Chaudhari
C. Langlotz
VLM
CoGe
40
5
0
22 Aug 2023
I3: Intent-Introspective Retrieval Conditioned on Instructions
Kaihang Pan
Juncheng Li
Wenjie Wang
Hao Fei
Hongye Song
Wei Ji
Jun Lin
Xiaozhong Liu
Tat-Seng Chua
Siliang Tang
46
5
0
19 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
35
68
0
08 Aug 2023
Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion
Zixuan Ni
Longhui Wei
Jiacheng Li
Siliang Tang
Yueting Zhuang
Qi Tian
DiffM
28
21
0
02 Aug 2023
G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory
Hongxiang Li
Meng Cao
Xuxin Cheng
Yaowei Li
Zhihong Zhu
Yuexian Zou
24
20
0
26 Jul 2023
Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering
Yi Cheng
Hehe Fan
Dongyun Lin
Ying Sun
Mohan S. Kankanhalli
J. Lim
40
4
0
25 Jul 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
24
5
0
23 May 2023
Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document
Xiangnan Chen
Qianwen Xiao
Juncheng Li
Duo Dong
Jun Lin
Xiaozhong Liu
Siliang Tang
34
5
0
23 May 2023
Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration
Qifan Yu
Juncheng Li
Wentao Ye
Siliang Tang
Yueting Zhuang
33
13
0
22 May 2023
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions
Bosheng Qin
Juncheng Li
Siliang Tang
Tat-Seng Chua
Yueting Zhuang
VGen
DiffM
31
16
0
21 May 2023
TG-VQA: Ternary Game of Video Question Answering
Hao Li
Peng Jin
Ze-Long Cheng
Songyang Zhang
Kai-xiang Chen
Zhennan Wang
Chang-rui Liu
Jie Chen
26
10
0
17 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
30
22
0
12 May 2023
Continual Vision-Language Representation Learning with Off-Diagonal Information
Zixuan Ni
Longhui Wei
Siliang Tang
Yueting Zhuang
Qi Tian
VLM
CLL
33
25
0
11 May 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin
Jinfa Huang
Pengfei Xiong
Shangxuan Tian
Chang-rui Liu
Xiang Ji
Li-ming Yuan
Jie Chen
45
49
0
25 Mar 2023
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
Qifan Yu
Juncheng Li
Yuehua Wu
Siliang Tang
Wei Ji
Yueting Zhuang
30
34
0
23 Mar 2023
Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization
Kaihang Pan
Juncheng Billy Li
Hongye Song
Jun Lin
Xiaozhong Liu
Siliang Tang
OffRL
38
10
0
22 Mar 2023
Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models
Juncheng Li
Minghe Gao
Longhui Wei
Siliang Tang
Wenqiao Zhang
Meng Li
Wei Ji
Qi Tian
Tat-Seng Chua
Yueting Zhuang
VLM
VPVLM
34
18
0
12 Mar 2023
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
Jilan Xu
Junlin Hou
Yuejie Zhang
Rui Feng
Yi Wang
Yu Qiao
Weidi Xie
VLM
21
81
0
22 Jan 2023
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding
Juncheng Li
Siliang Tang
Linchao Zhu
Wenqiao Zhang
Yi Yang
Tat-Seng Chua
Fei Wu
Y. Zhuang
BDL
24
14
0
22 Jan 2023
TIER: Text-Image Entropy Regularization for CLIP-style models
Anil Palepu
Andrew L. Beam
MedIm
20
6
0
13 Dec 2022
Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction
Kai Shen
Yichong Leng
Xuejiao Tan
Si-Qi Tang
Yuan Zhang
Wenjie Liu
Ed Lin
27
13
0
23 Nov 2022
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
26
21
0
09 Oct 2022
ISS: Image as Stepping Stone for Text-Guided 3D Shape Generation
Zhengzhe Liu
Peng Dai
Ruihui Li
Xiaojuan Qi
Chi-Wing Fu
DiffM
179
25
0
09 Sep 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
54
158
0
25 Aug 2022
Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Juncheng Billy Li
Junlin Xie
Linchao Zhu
Long Qian
Siliang Tang
...
Haochen Shi
Shengyu Zhang
Longhui Wei
Qi Tian
Yueting Zhuang
34
12
0
03 Aug 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,137
0
28 Jan 2022
1
2
Next