ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.09019
  4. Cited By
LAVIS: A Library for Language-Vision Intelligence

LAVIS: A Library for Language-Vision Intelligence

15 September 2022
Dongxu Li
Junnan Li
Hung Le
Guangsen Wang
Silvio Savarese
S. Hoi
    VLM
ArXivPDFHTML

Papers citing "LAVIS: A Library for Language-Vision Intelligence"

14 / 14 papers shown
Title
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
Hongzhan Lin
Ziyang Luo
Bo Wang
Ruichao Yang
Jing Ma
45
24
0
03 Jan 2024
Large Models for Time Series and Spatio-Temporal Data: A Survey and
  Outlook
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
35
117
0
16 Oct 2023
BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile
  Screenshot Captioning
BLIP-Adapter: Parameter-Efficient Transfer Learning for Mobile Screenshot Captioning
Ching-Yu Chiang
I-Hua Chang
Shih-Wei Liao
44
1
0
26 Sep 2023
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
25
5
0
06 Jul 2023
Evaluating Object Hallucination in Large Vision-Language Models
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
95
699
0
17 May 2023
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image
  Diffusion Models
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
Hila Chefer
Yuval Alaluf
Yael Vinker
Lior Wolf
Daniel Cohen-Or
DiffM
73
497
0
31 Jan 2023
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval
Manh-Duy Nguyen
Binh T. Nguyen
C. Gurrin
VLM
28
4
0
11 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,137
0
28 Jan 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
259
558
0
28 Sep 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
284
1,084
0
17 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
280
1,982
0
09 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
256
525
0
04 Feb 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjD
VLM
260
157
0
02 Jan 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
1