ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.14100
  4. Cited By
GIT: A Generative Image-to-text Transformer for Vision and Language

GIT: A Generative Image-to-text Transformer for Vision and Language

27 May 2022
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
    VLM
ArXivPDFHTML

Papers citing "GIT: A Generative Image-to-text Transformer for Vision and Language"

50 / 405 papers shown
Title
Whitened CLIP as a Likelihood Surrogate of Images and Captions
Whitened CLIP as a Likelihood Surrogate of Images and Captions
Roy Betser
Meir Yossef Levi
Guy Gilboa
31
0
0
11 May 2025
GIF: Generative Inspiration for Face Recognition at Scale
GIF: Generative Inspiration for Face Recognition at Scale
Saeed Ebrahimi
Sahar Rahimi
Ali Dabouei
Srinjoy Das
Jeremy M. Dawson
Nasser M. Nasrabadi
CVBM
150
0
0
05 May 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
D. Jiang
Ziyu Guo
Renrui Zhang
Zhuofan Zong
Hao Li
Le Zhuo
Shilin Yan
Pheng-Ann Heng
Hao Li
LRM
69
1
0
01 May 2025
HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer's Disease
HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer's Disease
Qiuhui Chen
Jintao Wang
Gang Wang
Yi Hong
52
0
0
27 Apr 2025
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting
Jian Hu
Dimitrios Korkinof
S. Gong
Mariano Beguerisse-Díaz
VLM
38
0
0
22 Apr 2025
LLM-based Semantic Augmentation for Harmful Content Detection
LLM-based Semantic Augmentation for Harmful Content Detection
Elyas Meguellati
Assaad Zeghina
S. Sadiq
Gianluca Demartini
36
0
0
22 Apr 2025
What Lurks Within? Concept Auditing for Shared Diffusion Models at Scale
What Lurks Within? Concept Auditing for Shared Diffusion Models at Scale
Xiaoyong Yuan
Xiaolong Ma
Linke Guo
Lan Zhang
DiffM
37
0
0
21 Apr 2025
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena
Tommaso Apicella
Stefano Rosa
Pietro Morerio
Alessio Del Bue
Lorenzo Natale
39
0
0
11 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Qing Guo
Z. Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
VLM
LRM
69
1
0
10 Apr 2025
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
Ruotian Peng
Haiying He
Yake Wei
Yandong Wen
D. Hu
VLM
39
0
0
09 Apr 2025
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
Sakib Reza
Xiyun Song
Heather Yu
Zongfang Lin
Mohsen Moghaddam
Mario Sznaier
29
0
0
07 Apr 2025
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Francesco P. Ramunno
Paolo Massa
Vitaliy Kinakh
Brandon Panos
A. Csillaghy
S. Voloshynovskiy
DiffM
53
0
0
31 Mar 2025
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
Jiaqi Liao
Z. Yang
Linjie Li
Dianqi Li
Kevin Qinghong Lin
Yu-Xi Cheng
Lijuan Wang
MLLM
LRM
62
0
0
25 Mar 2025
Improved Alignment of Modalities in Large Vision Language Models
Improved Alignment of Modalities in Large Vision Language Models
Kartik Jangra
Aman Kumar Singh
Yashwani Mann
Geetanjali Rathee
VLM
52
0
0
25 Mar 2025
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
63
0
0
18 Mar 2025
Hoi2Anomaly: An Explainable Anomaly Detection Approach Guided by Human-Object Interaction
Hoi2Anomaly: An Explainable Anomaly Detection Approach Guided by Human-Object Interaction
Yuhan Wang
Cheng Liu
Daou Zhang
Weichao Wu
41
0
0
13 Mar 2025
Large-scale Pre-training for Grounded Video Caption Generation
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
59
0
0
13 Mar 2025
Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding
Shunqi Mao
Chaoyi Zhang
Weidong Cai
MLLM
149
0
0
13 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad
Vibhav Vineet
Y. S. Rawat
VLM
136
1
0
11 Mar 2025
SuperCap: Multi-resolution Superpixel-based Image Captioning
Henry Senior
Luca Rossi
Gregory Slabaugh
Shanxin Yuan
VLM
65
0
0
11 Mar 2025
LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs
Hanyu Zhou
Gim Hee Lee
42
0
0
10 Mar 2025
SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding
Juhyeon Park
P. Y. Kim
Jiook Cha
Shinjae Yoo
Taesup Moon
50
0
0
09 Mar 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Rui Zhao
Weijia Mao
Mike Zheng Shou
66
0
0
05 Mar 2025
LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset
Wenqi Guo
Yiyang Du
Shan Du
75
1
0
04 Mar 2025
HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning
Maria Lymperaiou
Giorgos Filandrianos
Angeliki Dimitriou
Athanasios Voulodimos
Giorgos Stamou
MLLM
40
0
0
01 Mar 2025
Fast 3D point clouds retrieval for Large-scale 3D Place Recognition
Fast 3D point clouds retrieval for Large-scale 3D Place Recognition
Chahine-Nicolas Zede
Laurent Carrafa
Valérie Gouet-Brunet
3DPC
41
0
0
28 Feb 2025
Visual Zero-Shot E-Commerce Product Attribute Value Extraction
Visual Zero-Shot E-Commerce Product Attribute Value Extraction
Jiaying Gong
Ming Cheng
Hongda Shen
Pierre-Yves Vandenbussche
Janet Jenq
Hoda Eldardiry
44
0
0
21 Feb 2025
Quantifying Memorization and Retriever Performance in Retrieval-Augmented Vision-Language Models
Quantifying Memorization and Retriever Performance in Retrieval-Augmented Vision-Language Models
Peter Carragher
Abhinand Jha
R Raghav
Kathleen M. Carley
RALM
75
0
0
20 Feb 2025
Pretrained Image-Text Models are Secretly Video Captioners
Pretrained Image-Text Models are Secretly Video Captioners
Chunhui Zhang
Yiren Jian
Z. Ouyang
Soroush Vosoughi
VLM
82
4
0
20 Feb 2025
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-Text Decoding
Weikang Qiu
Zheng Huang
Haoyu Hu
Aosong Feng
Yujun Yan
Rex Ying
47
0
0
18 Feb 2025
BabyLM Turns 3: Call for papers for the 2025 BabyLM workshop
BabyLM Turns 3: Call for papers for the 2025 BabyLM workshop
Lucas Charpentier
Leshem Choshen
Ryan Cotterell
Mustafa Omer Gul
Michael Y. Hu
...
Candace Ross
Raj Sanjay Shah
Alex Warstadt
Ethan Gotlieb Wilcox
Adina Williams
55
2
0
15 Feb 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
S. Kamath S
Nakul Sharma
Manish Gupta
Anand Mishra
55
1
0
28 Jan 2025
MASS: Overcoming Language Bias in Image-Text Matching
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
32
0
0
20 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
104
109
0
10 Jan 2025
Decoding fMRI Data into Captions using Prefix Language Modeling
Decoding fMRI Data into Captions using Prefix Language Modeling
Vyacheslav Shen
Kassymzhomart Kunanbayev
Dae-Shik Kim
30
0
0
07 Jan 2025
GIT-CXR: End-to-End Transformer for Chest X-Ray Report Generation
Iustin Sîrbu
Iulia-Renata Sîrbu
Jasmina Bogojeska
Traian Rebedea
MedIm
ViT
LM&MA
36
0
0
05 Jan 2025
Altogether: Image Captioning via Re-aligning Alt-text
Altogether: Image Captioning via Re-aligning Alt-text
Hu Xu
Po-Yao (Bernie) Huang
Xiaoqing Ellen Tan
Ching-Feng Yeh
Jacob Kahn
...
Luke Zettlemoyer
Wen-tau Yih
Shang-Wen Li
Saining Xie
Christoph Feichtenhofer
DiffM
43
6
0
31 Dec 2024
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
55
3
0
31 Dec 2024
Improving Generated and Retrieved Knowledge Combination Through
  Zero-shot Generation
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation
Xinkai Du
Quanjie Han
Chao Lv
Yi Liu
Yalin Sun
Hao Shu
Hongbo Shan
Maosong Sun
RALM
43
0
0
25 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Mingda Zhang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
103
4
0
12 Dec 2024
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on
  Developmentally Plausible Corpora
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu
Aaron Mueller
Candace Ross
Adina Williams
Tal Linzen
Chengxu Zhuang
Ryan Cotterell
Leshem Choshen
Alex Warstadt
Ethan Gotlieb Wilcox
96
7
0
06 Dec 2024
FINECAPTION: Compositional Image Captioning Focusing on Wherever You
  Want at Any Granularity
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua
Qing Liu
Lingzhi Zhang
Jing Shi
Zhifei Zhang
Yilin Wang
Jianming Zhang
Jiebo Luo
CoGe
VLM
92
6
0
23 Nov 2024
Spider: Any-to-Many Multimodal LLM
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
66
2
0
14 Nov 2024
Grounded Video Caption Generation
Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
38
0
0
12 Nov 2024
Efficient Federated Finetuning of Tiny Transformers with Resource-Constrained Devices
Efficient Federated Finetuning of Tiny Transformers with Resource-Constrained Devices
Kilian Pfeiffer
Mohamed Aboelenien Ahmed
R. Khalili
J. Henkel
38
0
0
12 Nov 2024
Multi-Modal interpretable automatic video captioning
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
33
0
0
11 Nov 2024
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Mathilde Caron
Alireza Fathi
Cordelia Schmid
Ahmet Iscen
39
1
0
31 Oct 2024
NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video
  Reconstruction
NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction
Z. Gong
Guangyin Bao
Qi Zhang
Zhongwei Wan
Duoqian Miao
...
Changwei Wang
Rongtao Xu
Liang Hu
Ke Liu
Yu Zhang
DiffM
VGen
53
8
0
25 Oct 2024
WASP: A Weight-Space Approach to Detecting Learned Spuriousness
WASP: A Weight-Space Approach to Detecting Learned Spuriousness
Cristian Daniel Păduraru
Antonio Bărbălău
Radu Filipescu
Andrei Liviu Nicolicioiu
Elena Burceanu
25
0
0
24 Oct 2024
Reducing Hallucinations in Vision-Language Models via Latent Space
  Steering
Reducing Hallucinations in Vision-Language Models via Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Zou
VLM
LLMSV
50
5
0
21 Oct 2024
123456789
Next