ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.00423
  4. Cited By
UTC: A Unified Transformer with Inter-Task Contrastive Learning for
  Visual Dialog

UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog

1 May 2022
Cheng Chen
Yudong Zhu
Zhenshan Tan
Qingrong Cheng
Xin Jiang
Qun Liu
X. Gu
ArXivPDFHTML

Papers citing "UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog"

17 / 17 papers shown
Title
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
Cheng Chen
Jiacheng Wei
Tianrun Chen
Chi Zhang
Xiaofeng Yang
...
Bingchen Yang
Chuan-Sheng Foo
Guosheng Lin
Qixing Huang
Fayao Liu
44
1
0
07 Apr 2025
Enhancing Visual Dialog State Tracking through Iterative Object-Entity
  Alignment in Multi-Round Conversations
Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations
Wei Pang
Ruixue Duan
Jinfu Yang
Ning Li
21
0
0
13 Aug 2024
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D
  Prior
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
Cheng Chen
Xiaofeng Yang
Fan Yang
Chengzeng Feng
Zhoujie Fu
Chuan-Sheng Foo
Guosheng Lin
Fayao Liu
50
14
0
14 Mar 2024
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension
  with Enhanced Visual Knowledge Alignment
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Yunxin Li
Xinyu Chen
Baotian Hu
Haoyuan Shi
Min-Ling Zhang
44
3
0
21 Feb 2024
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large
  Multimodal and Language Models
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
Bingbing Wen
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Bill Howe
Lijuan Wang
MLLM
41
1
0
21 Dec 2023
$\mathbb{VD}$-$\mathbb{GR}$: Boosting $\mathbb{V}$isual
  $\mathbb{D}$ialog with Cascaded Spatial-Temporal Multi-Modal
  $\mathbb{GR}$aphs
VD\mathbb{VD}VD-GR\mathbb{GR}GR: Boosting V\mathbb{V}Visual D\mathbb{D}Dialog with Cascaded Spatial-Temporal Multi-Modal GR\mathbb{GR}GRaphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
26
3
0
25 Oct 2023
Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog
Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog
Haoyu Zhang
Meng Liu
Yaowei Wang
Da Cao
Weili Guan
Liqiang Nie
33
0
0
11 Oct 2023
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
Yunshui Li
Binyuan Hui
Zhaochao Yin
Wanwei He
Run Luo
Yuxing Long
Min Yang
Fei Huang
Yongbin Li
24
1
0
14 Sep 2023
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning
  Based on Visually Grounded Conversations
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
Kilichbek Haydarov
Xiaoqian Shen
Avinash Madasu
Mahmoud Salem
Jia Li
Gamaleldin F. Elsayed
Mohamed Elhoseiny
31
4
0
30 Aug 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on
  Joint Textual and Visual Clues
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
Yunxin Li
Baotian Hu
Xinyu Chen
Yuxin Ding
Lin Ma
Min Zhang
LRM
48
14
0
08 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating
  Multi-Modalities as Foreign Languages
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
36
115
0
07 May 2023
UnICLAM:Contrastive Representation Learning with Adversarial Masking for
  Unified and Interpretable Medical Vision Question Answering
UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering
Chenlu Zhan
Peng Peng
Hongsen Wang
Tao Chen
Hongwei Wang
MedIm
23
3
0
21 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation
  Learning
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Maxwell Mbabilla Aladago
A. Piergiovanni
19
1
0
02 Dec 2022
Vision-Language Matching for Text-to-Image Synthesis via Generative
  Adversarial Networks
Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks
Qingrong Cheng
Keyu Wen
X. Gu
VLM
EGVM
29
16
0
20 Aug 2022
The Dialog Must Go On: Improving Visual Dialog via Generative
  Self-Training
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
29
10
0
25 May 2022
Differentiated Relevances Embedding for Group-based Referring Expression
  Comprehension
Differentiated Relevances Embedding for Group-based Referring Expression Comprehension
Fuhai Chen
Xuri Ge
Xiaoshuai Sun
Yue Gao
Jianzhuang Liu
Feiyue Huang
Rongrong Ji
27
0
0
12 Mar 2022
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Long Chen
Xin Yan
Jun Xiao
Hanwang Zhang
Shiliang Pu
Yueting Zhuang
OOD
AAML
154
290
0
14 Mar 2020
1