ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.00471
  4. Cited By
Dual Attention Networks for Multimodal Reasoning and Matching

Dual Attention Networks for Multimodal Reasoning and Matching

2 November 2016
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
ArXivPDFHTML

Papers citing "Dual Attention Networks for Multimodal Reasoning and Matching"

47 / 97 papers shown
Title
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning
  Baselines
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
Alex Schwing
LRM
ReLM
37
9
0
31 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual
  Multimodal Representations
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
30
25
0
30 Sep 2019
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video
  Moment Retrieval
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval
Reuben Tan
Huijuan Xu
Kate Saenko
Bryan A. Plummer
28
67
0
27 Sep 2019
Compact Trilinear Interaction for Visual Question Answering
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
36
59
0
26 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image
  Captioning with Neural Scene Graph Generators
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
30
37
0
22 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
25
299
0
12 Sep 2019
MULE: Multimodal Universal Language Embedding
MULE: Multimodal Universal Language Embedding
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
32
40
0
08 Sep 2019
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Tan Wang
Xing Xu
Yang Yang
Alan Hanjalic
Heng Tao Shen
Jingkuan Song
22
145
0
12 Aug 2019
Use What You Have: Video Retrieval Using Representations From
  Collaborative Experts
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
36
387
0
31 Jul 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
36
797
0
25 Jun 2019
Improving Description-based Person Re-identification by
  Multi-granularity Image-text Alignments
Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments
K. Niu
Y. Huang
Wanli Ouyang
Liang Wang
27
138
0
23 Jun 2019
ParNet: Position-aware Aggregated Relation Network for Image-Text
  matching
ParNet: Position-aware Aggregated Relation Network for Image-Text matching
Yaxian Xia
Lun Huang
Wenmin Wang
Xiao-Yong Wei
Jie Chen
27
1
0
17 Jun 2019
Joint Visual-Textual Embedding for Multimodal Style Search
Joint Visual-Textual Embedding for Multimodal Style Search
Gil Sadeh
L. Fritz
Gabi Shalev
Eduard Oks
35
8
0
15 Jun 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via
  Question Answering
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
24
439
0
06 Jun 2019
Vision-to-Language Tasks Based on Attributes and Attention Mechanism
Vision-to-Language Tasks Based on Attributes and Attention Mechanism
Xuelong Li
Aihong Yuan
Xiaoqiang Lu
21
37
0
29 May 2019
Multimodal Transformer with Multi-View Visual Representation for Image
  Captioning
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
27
377
0
20 May 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image
  Representations
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations
Fenglin Liu
Yuanxin Liu
Xuancheng Ren
Xiaodong He
Xu Sun
VLM
34
81
0
15 May 2019
Progressive Attention Memory Network for Movie Story Question Answering
Progressive Attention Memory Network for Movie Story Question Answering
Junyeong Kim
Minuk Ma
Kyungsu Kim
Sungjin Kim
Chang D. Yoo
13
76
0
18 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
Amit K. Roy-Chowdhury
30
193
0
05 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption
  Alignment
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
19
103
0
27 Mar 2019
Improving Referring Expression Grounding with Cross-modal
  Attention-guided Erasing
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
Xihui Liu
Zihao Wang
Jing Shao
Xiaogang Wang
Hongsheng Li
ObjD
19
180
0
03 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha
Kushal Kafle
Christopher Kanan
25
82
0
01 Mar 2019
Image-Question-Answer Synergistic Network for Visual Dialog
Image-Question-Answer Synergistic Network for Visual Dialog
Dalu Guo
Chang Xu
Dacheng Tao
19
74
0
26 Feb 2019
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
Zhe Gan
Yu Cheng
Ahmed El Kholy
Linjie Li
Jingjing Liu
Jianfeng Gao
13
104
0
01 Feb 2019
Multi-task Learning of Hierarchical Vision-Language Representation
Multi-task Learning of Hierarchical Vision-Language Representation
Duy-Kien Nguyen
Takayuki Okatani
28
51
0
03 Dec 2018
Sketch-R2CNN: An Attentive Network for Vector Sketch Recognition
Sketch-R2CNN: An Attentive Network for Vector Sketch Recognition
Lei Li
C. Zou
Youyi Zheng
Qingkun Su
Hongbo Fu
Chiew-Lan Tai
3DPC
38
26
0
20 Nov 2018
Image Chat: Engaging Grounded Conversations
Image Chat: Engaging Grounded Conversations
Kurt Shuster
Samuel Humeau
Antoine Bordes
Jason Weston
23
115
0
02 Nov 2018
Zero-Shot Transfer VQA Dataset
Zero-Shot Transfer VQA Dataset
Yuanpeng Li
Yi Yang
Jianyu Wang
Wei Xu
19
8
0
02 Nov 2018
Engaging Image Captioning Via Personality
Engaging Image Captioning Via Personality
Kurt Shuster
Samuel Humeau
Hexiang Hu
Antoine Bordes
Jason Weston
37
149
0
25 Oct 2018
Textually Enriched Neural Module Networks for Visual Question Answering
Textually Enriched Neural Module Networks for Visual Question Answering
Khyathi Raghavi Chandu
Mary Arpita Pyreddy
Matthieu Felix
N. Joshi
24
6
0
23 Sep 2018
Learning Visual Knowledge Memory Networks for Visual Question Answering
Learning Visual Knowledge Memory Networks for Visual Question Answering
Zhou Su
Chen Zhu
Yinpeng Dong
Dongqi Cai
Yurong Chen
Jianguo Li
34
62
0
13 Jun 2018
Attention-Gated Networks for Improving Ultrasound Scan Plane Detection
Attention-Gated Networks for Improving Ultrasound Scan Plane Detection
Jo Schlemper
Ozan Oktay
Liang Chen
Jacqueline Matthew
C. Knight
Bernhard Kainz
Ben Glocker
Daniel Rueckert
21
92
0
15 Apr 2018
Improved Fusion of Visual and Language Representations by Dense
  Symmetric Co-Attention for Visual Question Answering
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Duy-Kien Nguyen
Takayuki Okatani
30
279
0
03 Apr 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts
Unsupervised Textual Grounding: Linking Words to Image Concepts
Raymond A. Yeh
Minh Do
Alex Schwing
22
40
0
29 Mar 2018
Stacked Cross Attention for Image-Text Matching
Stacked Cross Attention for Image-Text Matching
Kuang-Huei Lee
Xi Chen
G. Hua
Houdong Hu
Xiaodong He
30
1,142
0
21 Mar 2018
VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual
  Questions
VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions
Qing Li
Qingyi Tao
Chenyu You
Jianfei Cai
Jiebo Luo
37
106
0
20 Mar 2018
Discriminability objective for training descriptive captions
Discriminability objective for training descriptive captions
Ruotian Luo
Brian L. Price
Scott D. Cohen
Gregory Shakhnarovich
30
202
0
12 Mar 2018
Tell-and-Answer: Towards Explainable Visual Question Answering using
  Attributes and Captions
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
Qing Li
Jianlong Fu
D. Yu
Tao Mei
Jiebo Luo
FAtt
XAI
CoGe
51
60
0
27 Jan 2018
TieNet: Text-Image Embedding Network for Common Thorax Disease
  Classification and Reporting in Chest X-rays
TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays
Xiaosong Wang
Yifan Peng
Le Lu
Zhiyong Lu
Ronald M. Summers
MedIm
38
462
0
12 Jan 2018
Dual-Path Convolutional Image-Text Embeddings with Instance Loss
Dual-Path Convolutional Image-Text Embeddings with Instance Loss
Zhedong Zheng
Liang Zheng
Michael Garrett
Yi Yang
Mingliang Xu
Yi-Dong Shen
27
470
0
15 Nov 2017
Survey of Recent Advances in Visual Question Answering
Survey of Recent Advances in Visual Question Answering
Supriya Pandhre
Shagun Sodhani
10
14
0
24 Sep 2017
Exploring Human-like Attention Supervision in Visual Question Answering
Exploring Human-like Attention Supervision in Visual Question Answering
Tingting Qiao
Jianfeng Dong
Duanqing Xu
19
104
0
19 Sep 2017
Structured Attentions for Visual Question Answering
Structured Attentions for Visual Question Answering
Chen Zhu
Yanpeng Zhao
Shuaiyi Huang
Kewei Tu
Yi Ma
FAtt
32
106
0
07 Aug 2017
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for
  Visual Question Answering
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering
Zhou Yu
Jun-chen Yu
Jianping Fan
Dacheng Tao
41
663
0
04 Aug 2017
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
Fartash Faghri
David J. Fleet
J. Kiros
Sanja Fidler
VLM
11
181
0
18 Jul 2017
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question
  Answering
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
V. Kazemi
Ali Elqursh
OOD
28
183
0
11 Apr 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
167
1,465
0
06 Jun 2016
Previous
12