ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.02853
  4. Cited By
Learning Instance-Level Representation for Large-Scale Multi-Modal
  Pretraining in E-commerce

Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce

6 April 2023
Yang Jin
Yongzhi Li
Zehuan Yuan
Yadong Mu
ArXivPDFHTML

Papers citing "Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce"

6 / 6 papers shown
Title
Captions Speak Louder than Images (CASLIE): Generalizing Foundation
  Models for E-commerce from High-quality Multimodal Instruction Data
Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
Xinyi Ling
B. Peng
Hanwen Du
Zhihui Zhu
Xia Ning
31
0
0
22 Oct 2024
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal
  Large Language Models Via Error Detection
ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection
Yibo Yan
Shen Wang
Jiahao Huo
Hang Li
Yangqiu Song
...
Kun Wang
Hui Xiong
Philip S. Yu
Xuming Hu
Qingsong Wen
LRM
36
14
0
06 Oct 2024
BeFA: A General Behavior-driven Feature Adapter for Multimedia Recommendation
BeFA: A General Behavior-driven Feature Adapter for Multimedia Recommendation
Qile Fan
Penghang Yu
Zhiyi Tan
Bing-Kun Bao
Guanming Lu
37
1
0
01 Jun 2024
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViT
VLM
192
501
0
22 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
392
4,154
0
28 Jan 2022
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
328
3,708
0
11 Feb 2021
1