ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.03254
  4. Cited By
Vision Language Transformers: A Survey

Vision Language Transformers: A Survey

6 July 2023
Clayton Fields
C. Kennington
    VLM
ArXivPDFHTML

Papers citing "Vision Language Transformers: A Survey"

12 / 12 papers shown
Title
VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs
VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs
Kelaiti Xiao
Liang Yang
Paerhati Tulajiang
Hongfei Lin
MLLM
77
0
0
25 Mar 2025
Performance Evaluation of Deep Learning and Transformer Models Using
  Multimodal Data for Breast Cancer Classification
Performance Evaluation of Deep Learning and Transformer Models Using Multimodal Data for Breast Cancer Classification
Sadam Hussain
Mansoor Ali
Usman Naseem
Beatriz Alejandra Bosques Palomo
Mario Alexis Monsivais Molina
Jorge Alberto Garza Abdala
Daly Betzabeth Avendano Avalos
Servando Cardona-Huerta
T. Aaron Gulliver
Jose Gerardo Tamez Pena
32
1
0
14 Oct 2024
On the Computational Modeling of Meaning: Embodied Cognition Intertwined
  with Emotion
On the Computational Modeling of Meaning: Embodied Cognition Intertwined with Emotion
C. Kennington
LM&Ro
19
0
0
10 Jul 2023
LAVIS: A Library for Language-Vision Intelligence
LAVIS: A Library for Language-Vision Intelligence
Dongxu Li
Junnan Li
Hung Le
Guangsen Wang
Silvio Savarese
S. Hoi
VLM
126
51
0
15 Sep 2022
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object
  Knowledge Distillation
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation
Yongfei Liu
Chenfei Wu
Shao-Yen Tseng
Vasudev Lal
Xuming He
Nan Duan
CLIP
VLM
53
28
0
22 Sep 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,781
0
24 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
284
1,084
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
301
3,708
0
11 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
256
525
0
04 Feb 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,198
0
01 Sep 2014
1