ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.10568
  4. Cited By
Auto-Parsing Network for Image Captioning and Visual Question Answering

Auto-Parsing Network for Image Captioning and Visual Question Answering

24 August 2021
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
ArXiv (abs)PDFHTML

Papers citing "Auto-Parsing Network for Image Captioning and Visual Question Answering"

50 / 56 papers shown
Title
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffMVLM
119
0
0
03 Jan 2025
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through
  Scene Graph
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
96
380
0
30 Jun 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
132
1,944
0
13 Apr 2020
X-Linear Attention Networks for Image Captioning
X-Linear Attention Networks for Image Captioning
Yingwei Pan
Ting Yao
Yehao Li
Tao Mei
116
513
0
31 Mar 2020
Say As You Wish: Fine-grained Control of Image Caption Generation with
  Abstract Scene Graphs
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen
Qin Jin
Peng Wang
Qi Wu
DiffM
114
218
0
01 Mar 2020
Unbiased Scene Graph Generation from Biased Training
Unbiased Scene Graph Generation from Biased Training
Kaihua Tang
Yulei Niu
Jianqiang Huang
Jiaxin Shi
Hanwang Zhang
CML
81
700
0
27 Feb 2020
Are Transformers universal approximators of sequence-to-sequence
  functions?
Are Transformers universal approximators of sequence-to-sequence functions?
Chulhee Yun
Srinadh Bhojanapalli
A. S. Rawat
Sashank J. Reddi
Sanjiv Kumar
118
358
0
20 Dec 2019
Meshed-Memory Transformer for Image Captioning
Meshed-Memory Transformer for Image Captioning
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
78
884
0
17 Dec 2019
Heterogeneous Graph Learning for Visual Commonsense Reasoning
Heterogeneous Graph Learning for Visual Commonsense Reasoning
Weijiang Yu
Jingwen Zhou
Weihao Yu
Xiaodan Liang
Nong Xiao
LRM
65
47
0
25 Oct 2019
Tree Transformer: Integrating Tree Structures into Self-Attention
Tree Transformer: Integrating Tree Structures into Self-Attention
Yau-Shian Wang
Hung-yi Lee
Yun-Nung Chen
64
146
0
14 Sep 2019
Hierarchy Parsing for Image Captioning
Hierarchy Parsing for Image Captioning
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
VLM
61
165
0
09 Sep 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLMMLLM
250
2,488
0
20 Aug 2019
Attention on Attention for Image Captioning
Attention on Attention for Image Captioning
Lun Huang
Wenmin Wang
Jie Chen
Xiao-Yong Wei
72
832
0
19 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
62
173
0
14 Aug 2019
Attention is not not Explanation
Attention is not not Explanation
Sarah Wiegreffe
Yuval Pinter
XAIAAMLFAtt
122
914
0
13 Aug 2019
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer
  on Time Series Forecasting
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting
Shiyang Li
Xiaoyong Jin
Yao Xuan
Xiyou Zhou
Wenhu Chen
Yu Wang
Xifeng Yan
AI4TS
109
1,426
0
29 Jun 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
87
808
0
25 Jun 2019
Image Captioning: Transforming Objects into Words
Image Captioning: Transforming Objects into Words
Simão Herdade
Armin Kappeler
K. Boakye
Joao Soares
ViT
130
470
0
14 Jun 2019
Learning to Collocate Neural Modules for Image Captioning
Learning to Collocate Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Jianfei Cai
59
77
0
18 Apr 2019
Cross-Modal Self-Attention Network for Referring Image Segmentation
Cross-Modal Self-Attention Network for Referring Image Segmentation
Linwei Ye
Mrigank Rochan
Zhi Liu
Yang Wang
EgoV
57
478
0
09 Apr 2019
Relation-Aware Graph Attention Network for Visual Question Answering
Relation-Aware Graph Attention Network for Visual Question Answering
Linjie Li
Zhe Gan
Yu Cheng
Jingjing Liu
GNN
169
345
0
29 Mar 2019
Improving Referring Expression Grounding with Cross-modal
  Attention-guided Erasing
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
Xihui Liu
Zihao Wang
Jing Shao
Xiaogang Wang
Hongsheng Li
ObjD
78
184
0
03 Mar 2019
Context-Aware Self-Attention Networks
Context-Aware Self-Attention Networks
Baosong Yang
Jian Li
Derek F. Wong
Lidia S. Chao
Xing Wang
Zhaopeng Tu
60
114
0
15 Feb 2019
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual
  Question Answering
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering
Peng Gao
Zhengkai Jiang
Haoxuan You
Pan Lu
Steven C. H. Hoi
Xiaogang Wang
Hongsheng Li
AIMat
80
365
0
13 Dec 2018
Learning to Assemble Neural Module Tree Networks for Visual Grounding
Learning to Assemble Neural Module Tree Networks for Visual Grounding
Daqing Liu
Hanwang Zhang
Feng Wu
Zhengjun Zha
57
272
0
08 Dec 2018
Auto-Encoding Scene Graphs for Image Captioning
Auto-Encoding Scene Graphs for Image Captioning
Xu Yang
Kaihua Tang
Hanwang Zhang
Jianfei Cai
163
699
0
06 Dec 2018
Learning to Compose Dynamic Tree Structures for Visual Contexts
Learning to Compose Dynamic Tree Structures for Visual Contexts
Kaihua Tang
Hanwang Zhang
Baoyuan Wu
Wenhan Luo
Wen Liu
75
503
0
05 Dec 2018
Exploring Visual Relationship for Image Captioning
Exploring Visual Relationship for Image Captioning
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
80
834
0
19 Sep 2018
Object Hallucination in Image Captioning
Object Hallucination in Image Captioning
Anna Rohrbach
Lisa Anne Hendricks
Kaylee Burns
Trevor Darrell
Kate Saenko
194
443
0
06 Sep 2018
Recurrent Fusion Network for Image Captioning
Recurrent Fusion Network for Image Captioning
Wenhao Jiang
Lin Ma
Yu-Gang Jiang
Wen Liu
Tong Zhang
ObjD
62
235
0
26 Jul 2018
Relational inductive biases, deep learning, and graph networks
Relational inductive biases, deep learning, and graph networks
Peter W. Battaglia
Jessica B. Hamrick
V. Bapst
Alvaro Sanchez-Gonzalez
V. Zambaldi
...
Pushmeet Kohli
M. Botvinick
Oriol Vinyals
Yujia Li
Razvan Pascanu
AI4CENAI
766
3,129
0
04 Jun 2018
Bilinear Attention Networks
Bilinear Attention Networks
Jin-Hwa Kim
Jaehyun Jun
Byoung-Tak Zhang
AIMat
90
877
0
21 May 2018
Improved Fusion of Visual and Language Representations by Dense
  Symmetric Co-Attention for Visual Question Answering
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Duy-Kien Nguyen
Takayuki Okatani
66
280
0
03 Apr 2018
Non-local Neural Networks
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
296
8,917
0
21 Nov 2017
Neural Motifs: Scene Graph Parsing with Global Context
Neural Motifs: Scene Graph Parsing with Global Context
Rowan Zellers
Mark Yatskar
Sam Thomson
Yejin Choi
GNN
93
999
0
17 Nov 2017
Tips and Tricks for Visual Question Answering: Learnings from the 2017
  Challenge
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
Damien Teney
Peter Anderson
Xiaodong He
Anton Van Den Hengel
101
383
0
09 Aug 2017
Bottom-Up and Top-Down Attention for Image Captioning and Visual
  Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
123
4,221
0
25 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
730
132,363
0
12 Jun 2017
Learning to Reason: End-to-End Module Networks for Visual Question
  Answering
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Ronghang Hu
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Kate Saenko
KELMGNNReLMLRM
129
579
0
18 Apr 2017
Visual Translation Embedding Network for Visual Relation Detection
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
242
562
0
27 Feb 2017
Self-critical Sequence Training for Image Captioning
Self-critical Sequence Training for Image Captioning
Steven J. Rennie
E. Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
109
1,890
0
02 Dec 2016
Boosting Image Captioning with Attributes
Boosting Image Captioning with Attributes
Ting Yao
Yingwei Pan
Yehao Li
Zhaofan Qiu
Tao Mei
VLM
89
622
0
05 Nov 2016
Graph-Structured Representations for Visual Question Answering
Graph-Structured Representations for Visual Question Answering
Damien Teney
Lingqiao Liu
Anton Van Den Hengel
GNNNAI
102
420
0
19 Sep 2016
Towards Bayesian Deep Learning: A Framework and Some Existing Methods
Towards Bayesian Deep Learning: A Framework and Some Existing Methods
Hao Wang
Dit-Yan Yeung
BDL
57
225
0
24 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
106
1,919
0
29 Jul 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
225
5,762
0
23 Feb 2016
Neural Module Networks
Neural Module Networks
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Dan Klein
CoGe
139
1,076
0
09 Nov 2015
Bidirectional LSTM-CRF Models for Sequence Tagging
Bidirectional LSTM-CRF Models for Sequence Tagging
Zhiheng Huang
Wenyuan Xu
Kai Yu
184
4,033
0
09 Aug 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMatObjD
525
62,377
0
04 Jun 2015
VQA: Visual Question Answering
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
217
5,503
0
03 May 2015
12
Next