ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.08669
  4. Cited By
Visual Dialog

Visual Dialog

26 November 2016
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
ArXivPDFHTML

Papers citing "Visual Dialog"

50 / 184 papers shown
Title
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Xuzhi Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus
KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus
Xiaoming Shi
Zeming Liu
Yiming Lei
Chenkai Zhang
Haitao Leng
...
Qingjie Liu
Wanxiang Che
Shaoguo Liu
Size Li
Yalin Wang
57
1
0
10 Mar 2025
MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents
MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents
Wanqi Yang
Yicong Li
Meng Fang
L. Chen
64
1
0
09 Feb 2025
A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Wiradee Imrattanatrai
Masaki Asada
Kimihiro Hasegawa
Zhi-Qi Cheng
Ken Fukuda
Teruko Mitamura
VGen
56
0
0
30 Jan 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
MLLM
VLM
51
19
0
03 Jan 2025
Learning to Correction: Explainable Feedback Generation for Visual
  Commonsense Reasoning Distractor
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
Jiali Chen
Xusen Hei
Yuqi Xue
Yuancheng Wei
Jiayuan Xie
Yi Cai
Qing Li
MLLM
LRM
83
4
0
08 Dec 2024
Situational Scene Graph for Structured Human-centric Situation Understanding
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
164
1
0
30 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
65
25
0
10 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
84
26
0
04 Oct 2024
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
Peiming Guo
Sinuo Liu
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
M. Zhang
DiffM
47
1
0
16 Aug 2024
Stark: Social Long-Term Multi-Modal Conversation with Persona
  Commonsense Knowledge
Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge
Young-Jun Lee
Dokyong Lee
Junyoung Youn
Kyeongjin Oh
ByungSoo Ko
Jonghwan Hyeon
Ho-Jin Choi
36
2
0
04 Jul 2024
Visualizing Dialogues: Enhancing Image Selection through Dialogue
  Understanding with Large Language Models
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
Chang-Sheng Kao
Yun-Nung Chen
23
0
0
04 Jul 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal
  Large Language Models
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu
Zeyang Zhou
Kexin Huang
Dandan Liang
Yixu Wang
...
Keqing Wang
Yujiu Yang
Yan Teng
Yu Qiao
Yingchun Wang
ELM
50
13
0
11 Jun 2024
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception
Run Luo
Yunshui Li
Longze Chen
Wanwei He
Ting-En Lin
...
Zikai Song
Xiaobo Xia
Tongliang Liu
Min Yang
Binyuan Hui
VLM
DiffM
75
15
0
24 May 2024
Dialogue with Robots: Proposals for Broadening Participation and
  Research in the SLIVAR Community
Dialogue with Robots: Proposals for Broadening Participation and Research in the SLIVAR Community
Casey Kennington
Malihe Alikhani
Heather Pon-Barry
Katherine Atwell
Yonatan Bisk
...
Jivko Sinapov
Angela Stewart
Matthew Stone
Stefanie Tellex
Tom Williams
61
0
0
01 Apr 2024
Towards Deviation-Robust Agent Navigation via Perturbation-Aware
  Contrastive Learning
Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning
Bingqian Lin
Yanxin Long
Yi Zhu
Fengda Zhu
Xiaodan Liang
QiXiang Ye
Liang Lin
34
5
0
09 Mar 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
55
3
0
07 Mar 2024
CODIS: Benchmarking Context-Dependent Visual Comprehension for
  Multimodal Large Language Models
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
Fuwen Luo
Chi Chen
Zihao Wan
Zhaolu Kang
Qidong Yan
...
Xiaoyue Mi
Peng Li
Ning Ma
Maosong Sun
Yang Liu
43
5
0
21 Feb 2024
OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for
  Video-Grounded Dialog
OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog
Adnen Abdessaied
Manuel von Hochmeister
Andreas Bulling
40
2
0
20 Feb 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
38
3
0
19 Feb 2024
Semi-supervised multimodal coreference resolution in image narrations
Semi-supervised multimodal coreference resolution in image narrations
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
38
3
0
20 Oct 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
34
93
0
27 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced
  Text-image Comprehension and Composition
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
MLLM
80
222
0
26 Sep 2023
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
Yunshui Li
Binyuan Hui
Zhaochao Yin
Wanwei He
Run Luo
Yuxing Long
Min Yang
Fei Huang
Yongbin Li
26
1
0
14 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Peng Gao
Peng-Tao Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
49
116
0
07 Sep 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal
  Instruction-Following Models
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
41
22
0
31 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
38
118
0
25 Jul 2023
A Unified Framework for Slot based Response Generation in a Multimodal
  Dialogue System
A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System
Mauajama Firdaus
Avinash Madasu
Asif Ekbal
44
7
0
27 May 2023
Data Roaming and Quality Assessment for Composed Image Retrieval
Data Roaming and Quality Assessment for Composed Image Retrieval
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
48
23
0
16 Mar 2023
Interactive Video Corpus Moment Retrieval using Reinforcement Learning
Interactive Video Corpus Moment Retrieval using Reinforcement Learning
Zhixin Ma
Chong-Wah Ngo
33
3
0
19 Feb 2023
Style-Aware Contrastive Learning for Multi-Style Image Captioning
Style-Aware Contrastive Learning for Multi-Style Image Captioning
Yucheng Zhou
Guodong Long
25
22
0
26 Jan 2023
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real
  World
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World
Hongpeng Lin
Ludan Ruan
Wenke Xia
Peiyu Liu
Jing Wen
...
Di Hu
Ruihua Song
Wayne Xin Zhao
Qin Jin
Zhiwu Lu
VGen
33
9
0
14 Jan 2023
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions
  from Incremental Layout Graph
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph
Yuxing Long
Binyuan Hui
Fulong Ye
Yanyang Li
Zhuoxin Han
Caixia Yuan
Yongbin Li
Xiaojie Wang
LLMAG
30
7
0
05 Jan 2023
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction
  Tuning
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
32
110
0
21 Dec 2022
Modularity through Attention: Efficient Training and Transfer of
  Language-Conditioned Policies for Robot Manipulation
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation
Yifan Zhou
Shubham D. Sonawani
Mariano Phielipp
Simon Stepputtis
H. B. Amor
LM&Ro
33
27
0
08 Dec 2022
Who are you referring to? Coreference resolution in image narrations
Who are you referring to? Coreference resolution in image narrations
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
25
2
0
26 Nov 2022
Unified Multimodal Model with Unlikelihood Training for Visual Dialog
Unified Multimodal Model with Unlikelihood Training for Visual Dialog
Zihao Wang
Junli Wang
Changjun Jiang
MLLM
29
10
0
23 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image
  Captioning
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
24
27
0
17 Nov 2022
Navigating Connected Memories with a Task-oriented Dialog System
Navigating Connected Memories with a Task-oriented Dialog System
Seungwhan Moon
Satwik Kottur
A. Geramifard
Babak Damavandi
35
2
0
15 Nov 2022
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal
  Open-domain Conversation
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
Jiazhan Feng
Qingfeng Sun
Can Xu
Pu Zhao
Yaming Yang
Chongyang Tao
Dongyan Zhao
Qingwei Lin
32
52
0
10 Nov 2022
Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity
  Recognition
Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
H. Haresamudram
Irfan Essa
20
5
0
08 Nov 2022
Extending Phrase Grounding with Pronouns in Visual Dialogues
Extending Phrase Grounding with Pronouns in Visual Dialogues
Panzhong Lu
Xin Zhang
Meishan Zhang
Min Zhang
ObjD
30
4
0
23 Oct 2022
Selective Query-guided Debiasing for Video Corpus Moment Retrieval
Selective Query-guided Debiasing for Video Corpus Moment Retrieval
Sunjae Yoon
Jiajing Hong
Eunseop Yoon
Dahyun Kim
Junyeong Kim
Hee Suk Yoon
Changdong Yoo
41
21
0
17 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image
  Captioning
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
40
10
0
04 Oct 2022
Towards Explainable 3D Grounded Visual Question Answering: A New
  Benchmark and Strong Baseline
Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline
Lichen Zhao
Daigang Cai
Jing Zhang
Lu Sheng
Dong Xu
Ruizhi Zheng
Yinjie Zhao
Lipeng Wang
Xibo Fan
6
23
0
24 Sep 2022
Selecting Stickers in Open-Domain Dialogue through Multitask Learning
Selecting Stickers in Open-Domain Dialogue through Multitask Learning
Zhexin Zhang
Yeshuang Zhu
Zhengcong Fei
Jinchao Zhang
Jie Zhou
25
4
0
16 Sep 2022
Interactive Question Answering Systems: Literature Review
Interactive Question Answering Systems: Literature Review
Giovanni Maria Biancofiore
Yashar Deldjoo
Tommaso Di Noia
E. Sciascio
Fedelucio Narducci
34
13
0
04 Sep 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
25
18
0
01 Aug 2022
Explicit Image Caption Editing
Explicit Image Caption Editing
Zhen Wang
Long Chen
Wenbo Ma
G. Han
Yulei Niu
Jian Shao
Jun Xiao
25
12
0
20 Jul 2022
Deep Sequence Models for Text Classification Tasks
Deep Sequence Models for Text Classification Tasks
S. S. Abdullahi
Su Yiming
Shamsuddeen Hassan Muhammad
A. Mustapha
Ahmad Muhammad Aminu
Abdulkadir Abdullahi
Musa Bello
Saminu Mohammad Aliyu
24
3
0
18 Jul 2022
1234
Next