ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.06607
  4. Cited By
A Hierarchical Approach for Generating Descriptive Image Paragraphs

A Hierarchical Approach for Generating Descriptive Image Paragraphs

20 November 2016
J. Krause
Justin Johnson
Ranjay Krishna
Li Fei-Fei
    VLM
ArXivPDFHTML

Papers citing "A Hierarchical Approach for Generating Descriptive Image Paragraphs"

50 / 54 papers shown
Title
Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models
Julian Spravil
Sebastian Houben
Sven Behnke
VLM
75
0
0
12 Mar 2025
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
84
26
0
04 Oct 2024
Controllable Contextualized Image Captioning: Directing the Visual
  Narrative through User-Defined Highlights
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
39
1
0
16 Jul 2024
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
Run Luo
Yunshui Li
Longze Chen
Wanwei He
Ting-En Lin
...
Zikai Song
Xiaobo Xia
Tongliang Liu
Min Yang
Binyuan Hui
VLM
DiffM
75
15
0
24 May 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles
Chuhan Zhang
Isabela Albuquerque
Ivana Kajić
Su Wang
...
Jordi Pont-Tuset
Aida Nematzadeh
Anant Nawalgaria
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
132
14
0
25 Apr 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
55
3
0
07 Mar 2024
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
Xiujie Song
Mengyue Wu
Ke Zhu
Chunhao Zhang
Yanyi Chen
LRM
ELM
36
3
0
28 Feb 2024
COCO is "ALL'' You Need for Visual Instruction Fine-tuning
COCO is "ALL'' You Need for Visual Instruction Fine-tuning
Xiaotian Han
Yiqi Wang
Bohan Zhai
Quanzeng You
Hongxia Yang
VLM
MLLM
33
2
0
17 Jan 2024
Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding
Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding
Yuu Jinnai
Ukyo Honda
Tetsuro Morimura
Peinan Zhang
34
6
0
10 Jan 2024
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
73
399
0
28 Nov 2023
Multi Sentence Description of Complex Manipulation Action Videos
Multi Sentence Description of Complex Manipulation Action Videos
Fatemeh Ziaeetabar
Reza Safabakhsh
S. Momtazi
M. Tamosiunaite
F. Worgotter
31
1
0
13 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
45
143
0
10 Nov 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
30
1
0
15 Oct 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
Explore and Tell: Embodied Visual Captioning in 3D Environments
Anwen Hu
Shizhe Chen
Liang Zhang
Qin Jin
LM&Ro
38
2
0
21 Aug 2023
Visual Instruction Tuning with Polite Flamingo
Visual Instruction Tuning with Polite Flamingo
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
34
42
0
03 Jul 2023
Connecting Vision and Language with Video Localized Narratives
Connecting Vision and Language with Video Localized Narratives
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
49
21
0
22 Feb 2023
IC3: Image Captioning by Committee Consensus
IC3: Image Captioning by Committee Consensus
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
32
17
0
02 Feb 2023
Learning to Collocate Visual-Linguistic Neural Modules for Image
  Captioning
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
40
10
0
04 Oct 2022
Competence-based Multimodal Curriculum Learning for Medical Report
  Generation
Competence-based Multimodal Curriculum Learning for Medical Report Generation
Fenglin Liu
Shen Ge
Yuexian Zou
Xian Wu
MedIm
25
131
0
24 Jun 2022
Bypass Network for Semantics Driven Image Paragraph Captioning
Bypass Network for Semantics Driven Image Paragraph Captioning
Qinjie Zheng
Chaoyue Wang
Dadong Wang
24
1
0
21 Jun 2022
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease
  Tags for Medical Report Generation
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation
Di You
Fenglin Liu
Shen Ge
Xiaoxia Xie
Jing Zhang
Xian Wu
ViT
MedIm
26
106
0
18 Mar 2022
NLX-GPT: A Model for Natural Language Explanations in Vision and
  Vision-Language Tasks
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Fawaz Sammani
Tanmoy Mukherjee
Nikos Deligiannis
MILM
ELM
LRM
21
67
0
09 Mar 2022
Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation
Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation
Fenglin Liu
Chenyu You
Xian Wu
Shen Ge
Sheng Wang
Xu Sun
MedIm
81
92
0
08 Nov 2021
A Picture is Worth a Thousand Words: A Unified System for Diverse
  Captions and Rich Images Generation
A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation
Yupan Huang
Bei Liu
Jianlong Fu
Yutong Lu
DiffM
24
5
0
19 Oct 2021
Panoptic Narrative Grounding
Panoptic Narrative Grounding
Cristina González
Nicolás Ayobi
Isabela Hernández
José Hernández
Jordi Pont-Tuset
Pablo Arbeláez
82
22
0
10 Sep 2021
Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report
  Generation With Alternate Learning
Medical-VLBERT: Medical Visual Language BERT for COVID-19 CT Report Generation With Alternate Learning
Guangyi Liu
Yinghong Liao
Fuyu Wang
Bin Zhang
Lu Zhang
...
Xiang Wan
Shaolin Li
Zhen Li
Shuixing Zhang
Shuguang Cui
23
56
0
11 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
67
254
0
14 Jul 2021
Straight to the Gradient: Learning to Use Novel Tokens for Neural Text
  Generation
Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation
Xiang Lin
Simeng Han
Chenyu You
20
24
0
14 Jun 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song
Shizhe Chen
Qin Jin
21
37
0
30 May 2021
Matching Visual Features to Hierarchical Semantic Topics for Image
  Paragraph Captioning
Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning
D. Guo
Ruiying Lu
Bo Chen
Zequn Zeng
Mingyuan Zhou
VLM
25
9
0
10 May 2021
Knowledge driven Description Synthesis for Floor Plan Interpretation
Knowledge driven Description Synthesis for Floor Plan Interpretation
Shreya Goyal
Chiranjoy Chattopadhyay
Gaurav Bhatnagar
3DV
23
13
0
15 Mar 2021
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs
Antoni Rosinol
Andrew Violette
Marcus Abate
Nathan Hughes
Yun Chang
Jingang Shi
Arjun Gupta
Luca Carlone
3DV
36
221
0
18 Jan 2021
Towards Unique and Informative Captioning of Images
Towards Unique and Informative Captioning of Images
Zeyu Wang
Berthy Feng
Karthik Narasimhan
Olga Russakovsky
25
37
0
08 Sep 2020
Textual Description for Mathematical Equations
Textual Description for Mathematical Equations
Ajoy Mondal
C. V. Jawahar
14
2
0
07 Aug 2020
Sparse Graph to Sequence Learning for Vision Conditioned Long Textual
  Sequence Generation
Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation
Aditya Mogadala
Marius Mosbach
Dietrich Klakow
VLM
106
0
0
12 Jul 2020
Dense-Caption Matching and Frame-Selection Gating for Temporal
  Localization in VideoQA
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Hyounghun Kim
Zineng Tang
Joey Tianyi Zhou
30
31
0
13 May 2020
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video
  Captioning
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
Zhiyuan Fang
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
20
60
0
11 Mar 2020
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
Iro Armeni
Zhi-Yang He
JunYoung Gwak
Amir Zamir
Martin Fischer
Jitendra Malik
Silvio Savarese
3DV
3DPC
48
336
0
06 Oct 2019
Encoder-Agnostic Adaptation for Conditional Language Generation
Encoder-Agnostic Adaptation for Conditional Language Generation
Zachary M. Ziegler
Luke Melas-Kyriazi
Sebastian Gehrmann
Alexander M. Rush
AI4CE
11
57
0
19 Aug 2019
Convolutional Auto-encoding of Sentence Topics for Image Paragraph
  Generation
Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation
Jing Wang
Yingwei Pan
Ting Yao
Jinhui Tang
Tao Mei
VLM
BDL
DiffM
19
36
0
01 Aug 2019
Cap2Det: Learning to Amplify Weak Caption Supervision for Object
  Detection
Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection
Keren Ye
Ruotong Wang
Adriana Kovashka
Wei Li
Danfeng Qin
Jesse Berent
25
60
0
23 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
20
132
0
22 Jul 2019
Vispi: Automatic Visual Perception and Interpretation of Chest X-rays
Vispi: Automatic Visual Perception and Interpretation of Chest X-rays
X. Li
Rui Cao
D. Zhu
13
20
0
12 Jun 2019
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Zhengjun Zha
Daqing Liu
Hanwang Zhang
Yongdong Zhang
Feng Wu
25
119
0
06 Jun 2019
Learning to Collocate Neural Modules for Image Captioning
Learning to Collocate Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Jianfei Cai
25
77
0
18 Apr 2019
Natural Language Semantics With Pictures: Some Language & Vision
  Datasets and Potential Uses for Computational Semantics
Natural Language Semantics With Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics
David Schlangen
30
6
0
15 Apr 2019
Clinically Accurate Chest X-Ray Report Generation
Clinically Accurate Chest X-Ray Report Generation
Guanxiong Liu
T. Hsu
Matthew B. A. McDermott
Willie Boag
W. Weng
Peter Szolovits
Marzyeh Ghassemi
MedIm
25
271
0
04 Apr 2019
Inverse Cooking: Recipe Generation from Food Images
Inverse Cooking: Recipe Generation from Food Images
Amaia Salvador
M. Drozdzal
Xavier Giró-i-Nieto
Adriana Romero
23
147
0
14 Dec 2018
A Model for Auto-Programming for General Purposes
A Model for Auto-Programming for General Purposes
J. Weng
11
6
0
12 Oct 2018
Diverse and Coherent Paragraph Generation from Images
Diverse and Coherent Paragraph Generation from Images
Moitreya Chatterjee
A. Schwing
19
66
0
03 Sep 2018
12
Next