ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.00902
  4. Cited By
Language Quantized AutoEncoders: Towards Unsupervised Text-Image
  Alignment

Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment

2 February 2023
Hao Liu
Wilson Yan
Pieter Abbeel
ArXivPDFHTML

Papers citing "Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment"

23 / 23 papers shown
Title
Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Guotao Liang
Baoquan Zhang
Zhiyuan Wen
Junteng Zhao
Yunming Ye
Kola Ye
Yao He
57
0
0
03 Mar 2025
From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval
From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval
Jian Jia
Jingtong Gao
Ben Xue
Junhao Wang
Qingpeng Cai
Quan Chen
Xiangyu Zhao
Peng Jiang
Kun Gai
OffRL
77
0
0
18 Feb 2025
Autoregressive Models in Vision: A Survey
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
48
9
0
08 Nov 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
43
0
0
26 Oct 2024
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of
  99%
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
Lei Zhu
Fangyun Wei
Yanye Lu
Dong Chen
VLM
43
34
0
17 Jun 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot
  Audio Task Learner
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Dongchao Yang
Haohan Guo
Yuanyuan Wang
Rongjie Huang
Xiang Li
Xu Tan
Xixin Wu
Helen Meng
AuLLM
47
15
0
14 Jun 2024
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species
  Genomic Sequence Modeling
VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling
Siyuan Li
Zedong Wang
Zicheng Liu
Di Wu
Cheng Tan
Jiangbin Zheng
Yufei Huang
Stan Z. Li
40
7
0
13 May 2024
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image
  Generation
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
Yingshan Chang
Yasi Zhang
Zhiyuan Fang
Yingnian Wu
Yonatan Bisk
Feng Gao
EGVM
45
5
0
25 Mar 2024
UniCode: Learning a Unified Codebook for Multimodal Large Language
  Models
UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng
Bohan Zhou
Yicheng Feng
Ye Wang
Zongqing Lu
VLM
MLLM
46
7
0
14 Mar 2024
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu
Fangyun Wei
Yanye Lu
MLLM
VLM
52
17
0
12 Mar 2024
Low-dose CT Denoising with Language-engaged Dual-space Alignment
Low-dose CT Denoising with Language-engaged Dual-space Alignment
Zhihao Chen
Tao Chen
Chenhui Wang
Chuang Niu
Ge Wang
Hongming Shan
32
7
0
10 Mar 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
39
14
0
31 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware
  representations to LLMs and Emergent Cross-modal Reasoning
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Chenyu You
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
41
46
0
30 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
71
4
0
10 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei
Chenxi Liu
Siyuan Qiao
Zhishuai Zhang
Alan Yuille
Jiahui Yu
VLM
DiffM
37
10
0
01 Nov 2023
Can Language Models Learn to Listen?
Can Language Models Learn to Listen?
Evonne Ng
Sanjay Subramanian
Dan Klein
Angjoo Kanazawa
Trevor Darrell
Shiry Ginosar
35
17
0
21 Aug 2023
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen
  LLMs
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Lijun Yu
Yong Cheng
Zhiruo Wang
Vivek Kumar
Wolfgang Macherey
...
Yonatan Bisk
Ming Yang
Kevin Patrick Murphy
Alexander G. Hauptmann
Lu Jiang
MLLM
22
49
0
30 Jun 2023
Seeing in Words: Learning to Classify through Language Bottlenecks
Seeing in Words: Learning to Classify through Language Bottlenecks
Khalid Saifullah
Yuxin Wen
Jonas Geiping
Micah Goldblum
Tom Goldstein
VLM
18
2
0
29 Jun 2023
Bytes Are All You Need: Transformers Operating Directly On File Bytes
Bytes Are All You Need: Transformers Operating Directly On File Bytes
Maxwell Horton
Sachin Mehta
Ali Farhadi
Mohammad Rastegari
VLM
22
6
0
31 May 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
521
0
02 Jan 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
277
525
0
04 Feb 2021
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,217
0
01 Sep 2014
1