ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.07490
  4. Cited By
Top-Down Framework for Weakly-supervised Grounded Image Captioning
v1v2v3 (latest)

Top-Down Framework for Weakly-supervised Grounded Image Captioning

13 June 2023
Chen Cai
Suchen Wang
Kim-Hui Yap
Yi Wang
    ObjD
ArXiv (abs)PDFHTML

Papers citing "Top-Down Framework for Weakly-supervised Grounded Image Captioning"

23 / 23 papers shown
Title
Prophet Attention: Predicting Attention with Future Attention for Image
  Captioning
Prophet Attention: Predicting Attention with Future Attention for Image Captioning
Fenglin Liu
Xuancheng Ren
Xian Wu
Wei Fan
Yuexian Zou
Xu Sun
65
47
0
19 Oct 2022
Improving Visual Grounding with Visual-Linguistic Verification and
  Iterative Reasoning
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
Li Yang
Yan Xu
Chunfen Yuan
Wei Liu
Bing Li
Weiming Hu
ObjD
68
117
0
30 Apr 2022
Injecting Semantic Concepts into End-to-End Image Captioning
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViTVLM
73
88
0
09 Dec 2021
Distributed Attention for Grounded Image Captioning
Distributed Attention for Grounded Image Captioning
Nenglun Chen
Xingjia Pan
Runnan Chen
Lei Yang
Zhiwen Lin
Yuqiang Ren
Haolei Yuan
Xiaowei Guo
Feiyue Huang
Wenping Wang
55
21
0
02 Aug 2021
TransVG: End-to-End Visual Grounding with Transformers
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
74
345
0
17 Apr 2021
Improving Weakly Supervised Visual Grounding by Contrastive Knowledge
  Distillation
Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation
Liwei Wang
Jing-ling Huang
Yin Li
Kun Xu
Zhengyuan Yang
Dong Yu
ObjD
65
83
0
03 Jul 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
More Grounded Image Captioning by Distilling Image-Text Matching Model
Yuanen Zhou
Meng Wang
Daqing Liu
Zhenzhen Hu
Hanwang Zhang
65
126
0
01 Apr 2020
Meshed-Memory Transformer for Image Captioning
Meshed-Memory Transformer for Image Captioning
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
78
884
0
17 Dec 2019
Attention on Attention for Image Captioning
Attention on Attention for Image Captioning
Lun Huang
Wenmin Wang
Jie Chen
Xiao-Yong Wei
72
832
0
19 Aug 2019
Image Captioning: Transforming Objects into Words
Image Captioning: Transforming Objects into Words
Simão Herdade
Armin Kappeler
K. Boakye
Joao Soares
ViT
130
470
0
14 Jun 2019
Multimodal Transformer with Multi-View Visual Representation for Image
  Captioning
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
63
384
0
20 May 2019
Deep Learning for Generic Object Detection: A Survey
Deep Learning for Generic Object Detection: A Survey
Li Liu
Wanli Ouyang
Xiaogang Wang
Paul Fieguth
Jie Chen
Xinwang Liu
M. Pietikäinen
ObjDVLMOOD
174
2,458
0
06 Sep 2018
Recurrent Fusion Network for Image Captioning
Recurrent Fusion Network for Image Captioning
Wenhao Jiang
Lin Ma
Yu-Gang Jiang
Wen Liu
Tong Zhang
ObjD
62
235
0
26 Jul 2018
Self-produced Guidance for Weakly-supervised Object Localization
Self-produced Guidance for Weakly-supervised Object Localization
Xiaolin Zhang
Yunchao Wei
Guoliang Kang
Yi Yang
Thomas Huang
WSOL
111
253
0
24 Jul 2018
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
123
4,221
0
25 Jul 2017
Language Modeling with Gated Convolutional Networks
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin
Angela Fan
Michael Auli
David Grangier
242
2,404
0
23 Dec 2016
Self-critical Sequence Training for Image Captioning
Self-critical Sequence Training for Image Captioning
Steven J. Rennie
E. Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
109
1,890
0
02 Dec 2016
SPICE: Semantic Propositional Image Caption Evaluation
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
108
1,919
0
29 Jul 2016
Learning Deep Features for Discriminative Localization
Learning Deep Features for Discriminative Localization
Bolei Zhou
A. Khosla
Àgata Lapedriza
A. Oliva
Antonio Torralba
SSLSSegFAtt
253
9,338
0
14 Dec 2015
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Julia Hockenmaier
Svetlana Lazebnik
208
2,072
0
19 May 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
350
10,079
0
10 Feb 2015
Deep Visual-Semantic Alignments for Generating Image Descriptions
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
152
5,591
0
07 Dec 2014
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
297
4,508
0
20 Nov 2014
1