ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.04376
  4. Cited By
Discriminability objective for training descriptive captions

Discriminability objective for training descriptive captions

12 March 2018
Ruotian Luo
Brian L. Price
Scott D. Cohen
Gregory Shakhnarovich
ArXivPDFHTML

Papers citing "Discriminability objective for training descriptive captions"

50 / 99 papers shown
Title
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
45
0
0
03 Apr 2025
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning
Manu Gaur
Darshan Singh
Makarand Tapaswi
124
1
0
04 Sep 2024
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
Jun-Hyung Park
Hyuntae Park
Youjin Kang
Eojin Jeon
SangKeun Lee
32
0
0
15 Aug 2024
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Morris Alper
Hadar Averbuch-Elor
VLM
37
7
0
11 Jul 2024
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
An-Chieh Cheng
Hongxu Yin
Yang Fu
Qiushan Guo
Ruihan Yang
Jan Kautz
Xiaolong Wang
Sifei Liu
LRM
61
44
0
03 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
34
0
0
01 Jun 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
54
10
0
12 Mar 2024
Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP
  Guided Reinforcement Learning
Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning
Antoine Chaffin
Ewa Kijak
Vincent Claveau
24
0
0
21 Feb 2024
MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning
MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning
Wanqing Cui
Keping Bi
J. Guo
Xueqi Cheng
SyDa
ReLM
RALM
LRM
37
8
0
21 Feb 2024
LLM4VG: Large Language Models Evaluation for Video Grounding
LLM4VG: Large Language Models Evaluation for Video Grounding
Wei Feng
Xin Wang
Hong Chen
Zeyang Zhang
Zihan Song
Yuwei Zhou
Wenwu Zhu
39
8
0
21 Dec 2023
Informative Scene Graph Generation via Debiasing
Informative Scene Graph Generation via Debiasing
Lianli Gao
Xinyu Lyu
Yuyu Guo
Yuxuan Hu
Yuanyou Li
Lu Xu
Hengtao Shen
Jingkuan Song
23
5
0
10 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge
  using Vision-Language Pre-Training Model
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIP
VLM
27
10
0
02 Aug 2023
Improving Reference-based Distinctive Image Captioning with Contrastive
  Rewards
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
29
9
0
25 Jun 2023
Learning Descriptive Image Captioning via Semipermeable Maximum
  Likelihood Estimation
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
Zihao Yue
Anwen Hu
Liang Zhang
Qin Jin
24
2
0
23 Jun 2023
Pragmatic Inference with a CLIP Listener for Contrastive Captioning
Pragmatic Inference with a CLIP Listener for Contrastive Captioning
Jiefu Ou
Benno Krojer
Daniel Fried
21
5
0
15 Jun 2023
DisCLIP: Open-Vocabulary Referring Expression Generation
DisCLIP: Open-Vocabulary Referring Expression Generation
Lior Bracha
E. Shaar
Aviv Shamsian
Ethan Fetaya
Gal Chechik
ObjD
30
7
0
30 May 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image
  Captions
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
26
27
0
28 May 2023
IMAGINATOR: Pre-Trained Image+Text Joint Embeddings using Word-Level
  Grounding of Images
IMAGINATOR: Pre-Trained Image+Text Joint Embeddings using Word-Level Grounding of Images
Varuna Krishna
S. Suryavardan
Shreyash Mishra
Sathyanarayanan Ramamoorthy
Parth Patwa
Megha Chakraborty
Aman Chadha
Amitava Das
Amit P. Sheth
VLM
25
3
0
12 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
24
10
0
03 May 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Cross-Domain Image Captioning with Discriminative Finetuning
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
27
19
0
04 Apr 2023
Effective End-to-End Vision Language Pretraining with Semantic Visual
  Loss
Effective End-to-End Vision Language Pretraining with Semantic Visual Loss
Xiaofeng Yang
Fayao Liu
Guosheng Lin
VLM
26
7
0
18 Jan 2023
Text-Guided Mask-free Local Image Retouching
Text-Guided Mask-free Local Image Retouching
Zerun Liu
Fan Zhang
Jingxuan He
Jin Wang
Zhangye Wang
Lechao Cheng
DiffM
33
5
0
15 Dec 2022
Cross-Modal Similarity-Based Curriculum Learning for Image Captioning
Cross-Modal Similarity-Based Curriculum Learning for Image Captioning
Hongkuan Zhang
Saku Sugawara
Akiko Aizawa
Lei Zhou
Ryohei Sasano
Koichi Takeda
VLM
27
4
0
14 Dec 2022
Switching to Discriminative Image Captioning by Relieving a Bottleneck
  of Reinforcement Learning
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning
Ukyo Honda
Taro Watanabe
Yuji Matsumoto
13
9
0
06 Dec 2022
Look, Read and Ask: Learning to Ask Questions by Reading Text in Images
Look, Read and Ask: Learning to Ask Questions by Reading Text in Images
Soumya Jahagirdar
Shankar Gangisetty
Anand Mishra
19
4
0
23 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal
  Pre-trained Knowledge
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Linli Yao
Wei Chen
Qin Jin
VLM
30
10
0
17 Nov 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image
  Captioning
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
40
10
0
04 Oct 2022
Word to Sentence Visual Semantic Similarity for Caption Generation:
  Lessons Learned
Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned
Ahmed Sabir
17
0
0
26 Sep 2022
Aesthetic Attributes Assessment of Images with AMANv2 and DPC-CaptionsV2
Aesthetic Attributes Assessment of Images with AMANv2 and DPC-CaptionsV2
Xinghui Zhou
Xin Jin
Jianwen Lv
Heng Huang
Ming Mao
Shuai Cui
CoGe
18
0
0
09 Aug 2022
Distinctive Image Captioning via CLIP Guided Group Optimization
Distinctive Image Captioning via CLIP Guided Group Optimization
Youyuan Zhang
Jiuniu Wang
Hao Wu
Wenjia Xu
VLM
37
8
0
08 Aug 2022
Rethinking the Reference-based Distinctive Image Captioning
Rethinking the Reference-based Distinctive Image Captioning
Yangjun Mao
Long Chen
Zhihong Jiang
Dong Zhang
Zhimeng Zhang
Jian Shao
Jun Xiao
DiffM
27
22
0
22 Jul 2022
Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation
Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation
Chao Zheng
Lianli Gao
Xinyu Lyu
Pengpeng Zeng
Abdulmotaleb El Saddik
Hengtao Shen
29
14
0
16 Jul 2022
Adaptive Fine-Grained Predicates Learning for Scene Graph Generation
Adaptive Fine-Grained Predicates Learning for Scene Graph Generation
Xinyu Lyu
Lianli Gao
Pengpeng Zeng
Hengtao Shen
Jingkuan Song
36
18
0
11 Jul 2022
Predicting Word Learning in Children from the Performance of Computer
  Vision Systems
Predicting Word Learning in Children from the Performance of Computer Vision Systems
Sunayana Rane
Mira L. Nencheva
Zeyu Wang
C. Lew‐Williams
Olga Russakovsky
Thomas L. Griffiths
13
3
0
07 Jul 2022
Measuring Representational Harms in Image Captioning
Measuring Representational Harms in Image Captioning
Angelina Wang
Solon Barocas
Kristen Laird
Hanna M. Wallach
21
51
0
14 Jun 2022
Fine-grained Image Captioning with CLIP Reward
Fine-grained Image Captioning with CLIP Reward
Jaemin Cho
Seunghyun Yoon
Ajinkya Kale
Franck Dernoncourt
Trung Bui
Joey Tianyi Zhou
CLIP
131
76
0
26 May 2022
On Distinctive Image Captioning via Comparing and Reweighting
On Distinctive Image Captioning via Comparing and Reweighting
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
38
16
0
08 Apr 2022
An Integrated Approach for Video Captioning and Applications
An Integrated Approach for Video Captioning and Applications
Soheyla Amirian
T. Taha
Khaled Rasheed
H. Arabnia
31
1
0
23 Jan 2022
Towards Automated Error Analysis: Learning to Characterize Errors
Towards Automated Error Analysis: Learning to Characterize Errors
Tong Gao
Shivang Singh
Raymond J. Mooney
14
1
0
13 Jan 2022
ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
Xinyu Wang
Min Gui
Yong-jia Jiang
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
41
52
0
13 Dec 2021
Injecting Semantic Concepts into End-to-End Image Captioning
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViT
VLM
24
86
0
09 Dec 2021
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning
  and Visual Grounding
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Qirui Wu
Matthias Nießner
Angel X. Chang
21
29
0
02 Dec 2021
Learning Structural Representations for Recipe Generation and Food
  Retrieval
Learning Structural Representations for Recipe Generation and Food Retrieval
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
18
28
0
04 Oct 2021
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense
  in Text Generation Models
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models
Steven Y. Feng
Kevin Lu
Zhuofu Tao
Malihe Alikhani
Teruko Mitamura
Eduard H. Hovy
Varun Gangal
LRM
32
13
0
08 Sep 2021
Group-based Distinctive Image Captioning with Memory Attention
Group-based Distinctive Image Captioning with Memory Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
16
18
0
20 Aug 2021
Cross-Modal Graph with Meta Concepts for Video Captioning
Cross-Modal Graph with Meta Concepts for Video Captioning
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
25
6
0
14 Aug 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
202
405
0
13 Jul 2021
Contrastive Semantic Similarity Learning for Image Captioning Evaluation
  with Intrinsic Auto-encoder
Contrastive Semantic Similarity Learning for Image Captioning Evaluation with Intrinsic Auto-encoder
Chao Zeng
Tiesong Zhao
Sam Kwong
24
2
0
29 Jun 2021
UMIC: An Unreferenced Metric for Image Captioning via Contrastive
  Learning
UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning
Hwanhee Lee
Seunghyun Yoon
Franck Dernoncourt
Trung Bui
Kyomin Jung
VLM
21
44
0
26 Jun 2021
Understanding and Evaluating Racial Biases in Image Captioning
Understanding and Evaluating Racial Biases in Image Captioning
Dora Zhao
Angelina Wang
Olga Russakovsky
24
134
0
16 Jun 2021
12
Next