ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.08981
  4. Cited By
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

17 February 2021
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
    VLM
ArXivPDFHTML

Papers citing "Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts"

50 / 850 papers shown
Title
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion
  Models
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models
Tong Wu
Yinghao Xu
Ryan Po
Mengchen Zhang
Guandao Yang
Jiaqi Wang
Ziqiang Liu
Dahua Lin
Gordon Wetzstein
76
0
0
10 Dec 2024
Florence-VL: Enhancing Vision-Language Models with Generative Vision
  Encoder and Depth-Breadth Fusion
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen
Jianwei Yang
Haiping Wu
Dianqi Li
Jianfeng Gao
Tianyi Zhou
Bin Xiao
VLM
60
4
0
05 Dec 2024
FLAIR: VLM with Fine-grained Language-informed Image Representations
FLAIR: VLM with Fine-grained Language-informed Image Representations
Rui Xiao
Sanghwan Kim
Mariana-Iuliana Georgescu
Zeynep Akata
Stephan Alaniz
VLM
CLIP
76
2
0
04 Dec 2024
Eyes on the Road: State-of-the-Art Video Question Answering Models
  Assessment for Traffic Monitoring Tasks
Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks
Joseph Raj Vishal
Divesh Basina
Aarya Choudhary
Bharatesh Chakravarthi
67
1
0
02 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
79
2
0
02 Dec 2024
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Zichun Liao
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VGen
95
5
0
02 Dec 2024
Advancing Myopia To Holism: Fully Contrastive Language-Image
  Pre-training
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang
Chen Ju
Weixiong Lin
Shuai Xiao
Mengting Chen
...
Mingshuai Yao
Jinsong Lan
Ying Chen
Qingwen Liu
Yanfeng Wang
VLM
CLIP
72
4
0
30 Nov 2024
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding
  with Superior Temporal Localization Ability
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
Shimin Chen
Xiaohan Lan
Yitian Yuan
Zequn Jie
Lin Ma
VLM
MLLM
78
13
0
27 Nov 2024
InsightEdit: Towards Better Instruction Following for Image Editing
InsightEdit: Towards Better Instruction Following for Image Editing
Yingjing Xu
Jie Kong
Jiazhi Wang
Xiao Pan
Bo Lin
Qiang Liu
DiffM
88
1
0
26 Nov 2024
DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large
  Language Models in Autonomous Driving
DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving
Xianda Guo
Ruijun Zhang
Yiqun Duan
Yuhang He
Chenming Zhang
Shuai Liu
Long Chen
LRM
91
11
0
20 Nov 2024
MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Ruichuan An
Sihan Yang
Ming Lu
Kai Zeng
Yulin Luo
...
Hao Liang
Qi She
Shanghang Zhang
W. Zhang
Wentao Zhang
90
5
0
18 Nov 2024
Harnessing Vision Foundation Models for High-Performance, Training-Free
  Open Vocabulary Segmentation
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Yuheng Shi
Minjing Dong
Chang Xu
VLM
43
1
0
14 Nov 2024
Boosting Latent Diffusion with Perceptual Objectives
Boosting Latent Diffusion with Perceptual Objectives
Tariq Berrada
Pietro Astolfi
Jakob Verbeek
Melissa Hall
Marton Havasi
M. Drozdzal
Yohann Benchetrit
Adriana Romero Soriano
Karteek Alahari
48
0
0
06 Nov 2024
VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation
VLA-3D: A Dataset for 3D Semantic Scene Understanding and Navigation
Haochen Zhang
Nader Zantout
Pujith Kachana
Zongyuan Wu
Ji Zhang
Wenshan Wang
3DV
LM&Ro
41
5
0
05 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Classification Done Right for Vision-Language Pre-Training
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
50
2
0
05 Nov 2024
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
Tariq Berrada Ifriqi
Pietro Astolfi
Melissa Hall
Reyhane Askari Hemmat
Yohann Benchetrit
...
Matthew Muckley
Karteek Alahari
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
AI4CE
VLM
54
2
0
05 Nov 2024
Membership Inference Attacks against Large Vision-Language Models
Membership Inference Attacks against Large Vision-Language Models
Zhan Li
Yongtao Wu
Yihang Chen
F. Tonin
Elias Abad Rocamora
V. Cevher
44
4
0
05 Nov 2024
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic
  Vision-Language Negatives
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
Maitreya Patel
Abhiram Kusumba
Sheng Cheng
Changhoon Kim
Tejas Gokhale
Chitta Baral
Yezhou Yang
CoGe
CLIP
56
7
0
04 Nov 2024
SeafloorAI: A Large-scale Vision-Language Dataset for Seafloor
  Geological Survey
SeafloorAI: A Large-scale Vision-Language Dataset for Seafloor Geological Survey
Kien X. Nguyen
Fengchun Qiao
Arthur Trembanis
Xi Peng
26
0
0
31 Oct 2024
Face-MLLM: A Large Face Perception Model
Face-MLLM: A Large Face Perception Model
Haomiao Sun
Mingjie He
Tianheng Lian
Hu Han
Shiguang Shan
VLM
CVBM
LRM
25
5
0
28 Oct 2024
Rectified Diffusion Guidance for Conditional Generation
Rectified Diffusion Guidance for Conditional Generation
Mengfei Xia
Nan Xue
Yujun Shen
Ran Yi
Tieliang Gong
Yong-Jin Liu
DiffM
33
3
0
24 Oct 2024
Probabilistic Language-Image Pre-Training
Probabilistic Language-Image Pre-Training
Sanghyuk Chun
Wonjae Kim
Song Park
Sangdoo Yun
MLLM
VLM
CLIP
132
4
2
24 Oct 2024
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM
  Pretraining
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Han Huang
Yuqi Huo
Zijia Zhao
Haoyu Lu
Shu Wu
Bin Wang
Qiang Liu
Weipeng Chen
Liang Wang
VLM
27
1
0
21 Oct 2024
Test-time Adaptation for Cross-modal Retrieval with Query Shift
Test-time Adaptation for Cross-modal Retrieval with Query Shift
Haobin Li
Peng Hu
Qianjun Zhang
Xi Peng
Xiting Liu
Mouxing Yang
TTA
33
0
0
21 Oct 2024
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Yin Xie
Kaicheng Yang
Ninghua Yang
Weimo Deng
Xiangzi Dai
...
Yumeng Wang
Xiang An
Yongle Zhao
Ziyong Feng
Jiankang Deng
MLLM
VLM
45
1
0
18 Oct 2024
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion
  Model
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
ZiDong Wang
Zeyu Lu
Di Huang
Cai Zhou
Wanli Ouyang
and Lei Bai
76
3
0
17 Oct 2024
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Shicheng Xu
Liang Pang
Yunchang Zhu
Huawei Shen
Xueqi Cheng
MLLM
36
1
0
16 Oct 2024
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient
  Multimodal Learning
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
Qingqing Cao
Mahyar Najibi
Sachin Mehta
CLIP
DiffM
35
1
0
15 Oct 2024
Difficult Task Yes but Simple Task No: Unveiling the Laziness in
  Multimodal LLMs
Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs
Sihang Zhao
Youliang Yuan
Xiaoying Tang
Pinjia He
36
3
0
15 Oct 2024
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
Guiyu Zhang
Huan-ang Gao
Zijian Jiang
Hao Zhao
Zhedong Zheng
EGVM
52
6
0
15 Oct 2024
MEV Capture Through Time-Advantaged Arbitrage
MEV Capture Through Time-Advantaged Arbitrage
Robin Fritsch
Maria Ines Silva
A. Mamageishvili
Benjamin Livshits
E. Felten
33
1
0
14 Oct 2024
Dynamic Multimodal Evaluation with Flexible Complexity by
  Vision-Language Bootstrapping
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Yue Yang
S. Zhang
Wenqi Shao
Kaipeng Zhang
Yi Bin
Yu Wang
Ping Luo
28
3
0
11 Oct 2024
Emerging Pixel Grounding in Large Multimodal Models Without Grounding
  Supervision
Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision
Shengcao Cao
Liang-Yan Gui
Yu-Xiong Wang
46
3
0
10 Oct 2024
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Jiatao Gu
Yuyang Wang
Yizhe Zhang
Qihang Zhang
Dinghuai Zhang
Navdeep Jaitly
Josh Susskind
Shuangfei Zhai
DiffM
33
12
0
10 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
32
5
0
10 Oct 2024
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow
Fu-Yun Wang
Ling Yang
Zhaoyang Huang
Mengdi Wang
Hongsheng Li
34
14
0
09 Oct 2024
Enhancing Vision-Language Model Pre-training with Image-text Pair
  Pruning Based on Word Frequency
Enhancing Vision-Language Model Pre-training with Image-text Pair Pruning Based on Word Frequency
Mingliang Liang
Martha Larson
VLM
CLIP
26
0
0
09 Oct 2024
Temporal Image Caption Retrieval Competition -- Description and Results
Temporal Image Caption Retrieval Competition -- Description and Results
Jakub Pokrywka
Piotr Wierzchoñ
Kornel Weryszko
Krzysztof Jassem
52
0
0
08 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to
  See
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
Phu Pham
Phu Pham
Kun Wan
Yu-Jhe Li
Zeliang Zhang
Daniel Miranda
Ajinkya Kale
Ajinkya Kale
Chenliang Xu
29
5
0
08 Oct 2024
Temporal Reasoning Transfer from Text to Video
Temporal Reasoning Transfer from Text to Video
Lei Li
Yuanxin Liu
Linli Yao
Peiyuan Zhang
Chenxin An
Lean Wang
Xu Sun
Lingpeng Kong
Qi Liu
LRM
45
7
0
08 Oct 2024
Sparse Repellency for Shielded Generation in Text-to-image Diffusion
  Models
Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models
Michael Kirchhof
James Thornton
Pierre Ablin
Louis Béthune
Eugène Ndiaye
Marco Cuturi
51
2
0
08 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
66
65
0
08 Oct 2024
LoTLIP: Improving Language-Image Pre-training for Long Text
  Understanding
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
Wei Wu
Kecheng Zheng
Shuailei Ma
Fan Lu
Yuxin Guo
Yifei Zhang
Wei Chen
Qingpei Guo
Yujun Shen
Zheng-Jun Zha
VLM
30
9
0
07 Oct 2024
Removing Distributional Discrepancies in Captions Improves Image-Text
  Alignment
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
Yuheng Li
Haotian Liu
Mu Cai
Yijun Li
Eli Shechtman
Zhe Lin
Yong Jae Lee
Krishna Kumar Singh
VLM
141
1
0
01 Oct 2024
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP
  Models
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models
Jiapeng Wang
Chengyu Wang
Kunzhe Huang
Jun Huang
Lianwen Jin
CLIP
VLM
37
3
0
01 Oct 2024
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common
  Sense Reasoning
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning
Niki Maria Foteinopoulou
Enjie Ghorbel
Djamila Aouada
23
2
0
01 Oct 2024
Unleashing the Potentials of Likelihood Composition for Multi-modal
  Language Models
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
Shitian Zhao
Renrui Zhang
Xu Luo
Yan Wang
Shanghang Zhang
Peng Gao
18
0
0
01 Oct 2024
Illustrious: an Open Advanced Illustration Model
Illustrious: an Open Advanced Illustration Model
Sang Hyun Park
Jun Young Koh
Junha Lee
Joy Song
Dongha Kim
Hoyeon Moon
Hyunju Lee
Min Song
VLM
41
1
0
30 Sep 2024
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Heeseong Shin
Chaehyun Kim
Sunghwan Hong
Seokju Cho
Anurag Arnab
Paul Hongsuck Seo
Seungryong Kim
VLM
34
1
0
30 Sep 2024
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D
  Diffusion
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion
Yukun Huang
Jianan Wang
Ailing Zeng
Zheng-Jun Zha
Lei Zhang
Xihui Liu
3DGS
34
5
0
25 Sep 2024
Previous
123456...151617
Next