ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.06775
  4. Cited By
Vokenization: Improving Language Understanding with Contextualized,
  Visual-Grounded Supervision

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

14 October 2020
Hao Tan
Joey Tianyi Zhou
    CLIP
ArXivPDFHTML

Papers citing "Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision"

29 / 29 papers shown
Title
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
Mohamed Gado
Towhid Taliee
Muhammad Memon
D. Ignatov
Radu Timofte
72
0
0
27 Apr 2025
AudioBERT: Audio Knowledge Augmented Language Model
AudioBERT: Audio Knowledge Augmented Language Model
Hyunjong Ok
Suho Yoo
Jaeho Lee
AuLLM
RALM
VLM
53
0
0
17 Jan 2025
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
203
0
0
01 Dec 2024
Grounding Spatial Relations in Text-Only Language Models
Grounding Spatial Relations in Text-Only Language Models
Gorka Azkune
Ander Salaberria
Eneko Agirre
42
0
0
20 Mar 2024
Detecting Concrete Visual Tokens for Multimodal Machine Translation
Detecting Concrete Visual Tokens for Multimodal Machine Translation
Braeden Bowen
Vipin Vijayan
Scott Grigsby
Timothy Anderson
Jeremy Gwinnup
34
2
0
05 Mar 2024
WinoViz: Probing Visual Properties of Objects Under Different States
WinoViz: Probing Visual Properties of Objects Under Different States
Woojeong Jin
Tejas Srinivasan
Jesse Thomason
Xiang Ren
33
1
0
21 Feb 2024
Training on Synthetic Data Beats Real Data in Multimodal Relation
  Extraction
Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction
Zilin Du
Haoxin Li
Xu Guo
Boyang Li
35
1
0
05 Dec 2023
VIP5: Towards Multimodal Foundation Models for Recommendation
VIP5: Towards Multimodal Foundation Models for Recommendation
Shijie Geng
Juntao Tan
Shuchang Liu
Zuohui Fu
Yongfeng Zhang
32
70
0
23 May 2023
Movie Box Office Prediction With Self-Supervised and Visually Grounded
  Pretraining
Movie Box Office Prediction With Self-Supervised and Visually Grounded Pretraining
Qin Chao
Eunsoo Kim
Boyang Albert Li
21
1
0
20 Apr 2023
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative
  Latent Attention
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
24
9
0
21 Nov 2022
Detecting Euphemisms with Literal Descriptions and Visual Imagery
Detecting Euphemisms with Literal Descriptions and Visual Imagery
.Ilker Kesen
Aykut Erdem
Erkut Erdem
Iacer Calixto
29
4
0
08 Nov 2022
How direct is the link between words and images?
How direct is the link between words and images?
Hassan Shahmohammadi
Maria Heitmeier
Elnaz Shafaei-Bajestan
Hendrik P. A. Lensch
Harald Baayen
32
0
0
30 Jun 2022
VALHALLA: Visual Hallucination for Machine Translation
VALHALLA: Visual Hallucination for Machine Translation
Yi Li
Yikang Shen
Yoon Kim
Chun-Fu Chen
Rogerio Feris
David D. Cox
Nuno Vasconcelos
MLLM
40
38
0
31 May 2022
Visually-Augmented Language Modeling
Visually-Augmented Language Modeling
Weizhi Wang
Li Dong
Hao Cheng
Haoyu Song
Xiaodong Liu
Xifeng Yan
Jianfeng Gao
Furu Wei
VLM
36
18
0
20 May 2022
Lexical Knowledge Internalization for Neural Dialog Generation
Lexical Knowledge Internalization for Neural Dialog Generation
Zhiyong Wu
Wei Bi
Xiang Li
Lingpeng Kong
B. Kao
21
2
0
04 May 2022
MCSE: Multimodal Contrastive Learning of Sentence Embeddings
MCSE: Multimodal Contrastive Learning of Sentence Embeddings
Miaoran Zhang
Marius Mosbach
David Ifeoluwa Adelani
Michael A. Hedderich
Dietrich Klakow
36
35
0
22 Apr 2022
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
  Visual Models
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
Chunyuan Li
Haotian Liu
Liunian Harold Li
Pengchuan Zhang
J. Aneja
...
Ping Jin
Houdong Hu
Zicheng Liu
Yong Jae Lee
Jianfeng Gao
50
145
0
19 Apr 2022
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua Wu
Haifeng Wang
MLLM
19
21
0
17 Mar 2022
Productivity, Portability, Performance: Data-Centric Python
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
54
95
0
01 Jul 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
30
274
0
09 Jun 2021
Good for Misconceived Reasons: An Empirical Revisiting on the Need for
  Visual Context in Multimodal Machine Translation
Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation
Zhiyong Wu
Lingpeng Kong
W. Bi
Xiang Li
B. Kao
LRM
23
77
0
30 May 2021
Explaining the Road Not Taken
Explaining the Road Not Taken
Hua Shen
Ting-Hao 'Kenneth' Huang
FAtt
XAI
27
9
0
27 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
29
33
0
18 Mar 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
25
295
0
22 Feb 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,434
0
04 Jan 2021
Deep Learning and the Global Workspace Theory
Deep Learning and the Global Workspace Theory
R. V. Rullen
Ryota Kanai
45
66
0
04 Dec 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
Aggregated Residual Transformations for Deep Neural Networks
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
303
10,233
0
16 Nov 2016
1