Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.09666
Cited By
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
20 July 2022
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features"
50 / 57 papers shown
Title
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
49
0
0
13 May 2025
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning
Yubo Zhang
Pedro Botelho
Trevor Gordon
Gil Zussman
I. Kadota
50
0
0
31 Mar 2025
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
63
0
0
18 Mar 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
63
0
0
28 Jan 2025
GIT-CXR: End-to-End Transformer for Chest X-Ray Report Generation
Iustin Sîrbu
Iulia-Renata Sîrbu
Jasmina Bogojeska
Traian Rebedea
MedIm
ViT
LM&MA
31
0
0
05 Jan 2025
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation
Tiancheng Gu
Kaicheng Yang
Xiang An
Ziyong Feng
Dongnan Liu
Weidong Cai
74
1
0
20 Nov 2024
LLV-FSR: Exploiting Large Language-Vision Prior for Face Super-resolution
Chenyang Wang
Wenjie An
Kui Jiang
Xianming Liu
Junjun Jiang
CVBM
30
0
0
14 Nov 2024
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
Yi-Hao Peng
Faria Huq
Yue Jiang
Jason Wu
Amanda Li
Jeffrey P. Bigham
Amy Pavel
DiffM
27
4
0
30 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
26
1
0
28 Sep 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
29
0
0
09 Aug 2024
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris M. Kitani
Kristen Grauman
VGen
44
2
0
01 Aug 2024
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
30
1
0
16 Jul 2024
Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts
Aditya Sharma
Michael Saxon
William Yang Wang
VLM
34
2
0
24 Jun 2024
Evaluating Durability: Benchmark Insights into Multimodal Watermarking
Jielin Qiu
William Jongwon Han
Xuandong Zhao
Shangbang Long
Christos Faloutsos
Lei Li
62
1
0
06 Jun 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
31
0
0
27 May 2024
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul M. Chilimbi
VLM
AI4TS
52
4
0
21 Mar 2024
Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition
Jielin Qiu
William Jongwon Han
Winfred Wang
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Christos Faloutsos
Lei Li
Lijuan Wang
VLM
58
2
0
19 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
39
14
0
06 Mar 2024
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
34
24
0
28 Feb 2024
Image captioning for Brazilian Portuguese using GRIT model
Rafael Silva de Alencar
William Alberto Cruz Castañeda
Marcellus Amadeus
13
1
0
07 Feb 2024
Return-Aligned Decision Transformer
Tsunehiko Tanaka
Kenshi Abe
Kaito Ariu
Tetsuro Morimura
Edgar Simo-Serra
OffRL
59
1
0
06 Feb 2024
Image Fusion via Vision-Language Model
Zixiang Zhao
Lilun Deng
Haowen Bai
Yukun Cui
Zhipeng Zhang
...
Haotong Qin
Dongdong Chen
Jiangshe Zhang
Peng Wang
Luc Van Gool
VLM
30
18
0
03 Feb 2024
KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
Anh-Cuong Pham
Van-Quang Nguyen
Thi-Hong Vuong
Quang-Thuy Ha
21
1
0
16 Jan 2024
Detours for Navigating Instructional Videos
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
26
6
0
03 Jan 2024
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
Xinpeng Wang
Xiaoyuan Yi
Han Jiang
Shanlin Zhou
Zhihua Wei
Xing Xie
25
12
0
13 Dec 2023
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
Junyu Lu
Ruyi Gan
Di Zhang
Xiaojun Wu
Ziwei Wu
Renliang Sun
Jiaxing Zhang
Pingjian Zhang
Yan Song
MLLM
VLM
20
15
0
08 Dec 2023
Improving Image Captioning via Predicting Structured Concepts
Ting Wang
Weidong Chen
Yuanhe Tian
Yan Song
Zhendong Mao
24
8
0
14 Nov 2023
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
Sijin Chen
Hongyuan Zhu
Mingsheng Li
Xin Chen
Peng Guo
Yinjie Lei
Gang Yu
Taihao Li
Tao Chen
11
18
0
06 Sep 2023
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
55
19
0
23 Aug 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
Zijie Song
Zhenzhen Hu
Yuanen Zhou
Ye Zhao
Richang Hong
Meng Wang
21
2
0
19 Jul 2023
KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Zhongzhen Huang
Xiaofan Zhang
Shaoting Zhang
MedIm
25
51
0
20 Jun 2023
Embodied Executable Policy Learning with Language-based Scene Summarization
Jielin Qiu
Mengdi Xu
William Jongwon Han
Seungwhan Moon
Ding Zhao
LM&Ro
24
7
0
09 Jun 2023
Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
27
1
0
31 May 2023
A request for clarity over the End of Sequence token in the Self-Critical Sequence Training
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
26
6
0
20 May 2023
DiffCap: Exploring Continuous Diffusion on Image Captioning
Yufeng He
Zefan Cai
Xu Gan
Baobao Chang
DiffM
21
5
0
20 May 2023
ImageCaptioner
2
^2
2
: Image Captioner for Image Captioning Bias Amplification Assessment
Eslam Mohamed Bakr
Pengzhan Sun
Erran L. Li
Mohamed Elhoseiny
22
6
0
10 Apr 2023
Model-Agnostic Gender Debiased Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
22
18
0
07 Apr 2023
Cross-Modal Causal Intervention for Medical Report Generation
Weixing Chen
Yang Liu
Ce Wang
Jiarui Zhu
Shen Zhao
Guanbin Li
Cheng-Lin Liu
Liang Lin
31
5
0
16 Mar 2023
Towards Universal Vision-language Omni-supervised Segmentation
Bowen Dong
Jiaxi Gu
Jianhua Han
Hang Xu
W. Zuo
VLM
33
1
0
12 Mar 2023
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
Zequn Zeng
Hao Zhang
Zhengjue Wang
Ruiying Lu
Dongsheng Wang
Bo Chen
BDL
DiffM
19
33
0
04 Mar 2023
Towards Local Visual Modeling for Image Captioning
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
R. Ji
ViT
19
71
0
13 Feb 2023
Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
SSL
VLM
27
4
0
26 Jan 2023
End-to-End 3D Dense Captioning with Vote2Cap-DETR
Sijin Chen
Hongyuan Zhu
Xin Chen
Yinjie Lei
Tao Chen
YU Gang
ViT
21
52
0
06 Jan 2023
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
22
51
0
05 Jan 2023
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning
Ukyo Honda
Taro Watanabe
Yuji Matsumoto
13
9
0
06 Dec 2022
Multilingual Communication System with Deaf Individuals Utilizing Natural and Visual Languages
Tuan-Luc Huynh
Khoi-Nguyen Nguyen-Ngoc
Chi-Bien Chu
Minh-Triet Tran
Trung-Nghia Le
SLR
13
0
0
01 Dec 2022
Exploring Discrete Diffusion Models for Image Captioning
Zixin Zhu
Yixuan Wei
Jianfeng Wang
Zhe Gan
Zheng-Wei Zhang
Le Wang
G. Hua
Lijuan Wang
Zicheng Liu
Han Hu
DiffM
VLM
23
17
0
21 Nov 2022
CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
Shi-You Xu
VLM
DiffM
32
11
0
10 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data
Panos Achlioptas
M. Ovsjanikov
Leonidas J. Guibas
Sergey Tulyakov
71
11
0
04 Oct 2022
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
J. Hu
Roberto Cavicchioli
Alessandro Capotondi
26
21
0
13 Aug 2022
1
2
Next