ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.16436
  4. Cited By
DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning
  in Language Models

DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models

25 October 2023
Ge Zheng
Bin Yang
Jiajin Tang
Hong-Yu Zhou
Sibei Yang
    LRM
    MLLM
ArXivPDFHTML

Papers citing "DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models"

50 / 87 papers shown
Title
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Yifu Yuan
Haiqin Cui
Yibin Chen
Zibin Dong
Fei Ni
Longxin Kou
Jinyi Liu
Pengyi Li
Yan Zheng
Jianye Hao
31
0
0
13 May 2025
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness
Jaehyun Jeon
Janghan Yoon
Minsoo Kim
Sumin Shim
Yejin Choi
Hanbin Kim
Youngjae Yu
AAML
47
0
0
08 May 2025
HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation
HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation
Pei Liu
Xin Liu
Ruoyu Yao
Junming Liu
Siyuan Meng
Ding Wang
Jun Ma
3DV
VLM
152
1
0
13 Apr 2025
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment
Jiayang Sun
H. Wang
Jie Cao
Huaibo Huang
Ran He
DiffM
73
0
0
10 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Qing Guo
Z. Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
VLM
LRM
69
1
0
10 Apr 2025
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Zhiyu Lin
Yifei Gao
Xian Zhao
Yunfan Yang
Jitao Sang
LRM
55
1
0
23 Mar 2025
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu
Siyuan Meng
Yanting Gao
Song Mao
Pinlong Cai
Guohang Yan
Yirong Chen
Zilin Bian
Botian Shi
Ding Wang
54
1
0
17 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yixuan Wang
Shengqiong Wu
Yuyao Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
92
8
0
16 Mar 2025
Tit-for-Tat: Safeguarding Large Vision-Language Models Against Jailbreak Attacks via Adversarial Defense
Shuyang Hao
Y. Wang
Bryan Hooi
Ming Yang
Jiaheng Liu
Chengcheng Tang
Zi Huang
Yujun Cai
AAML
54
0
0
14 Mar 2025
Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation
Qiji Zhou
Yifan Gong
Guangsheng Bao
Hongjie Qiu
Jinqiang Li
Xiangrong Zhu
Huajian Zhang
Yue Zhang
LRM
44
0
0
12 Mar 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Qianqi Yan
Yue Fan
Hongquan Li
Shan Jiang
Yang Zhao
Xinze Guan
Ching-Chen Kuo
Qing Guo
VLM
LRM
76
2
0
22 Feb 2025
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Yujie Lin
Ante Wang
Moye Chen
Jingyao Liu
Hao Liu
Jinsong Su
Xinyan Xiao
LRM
48
2
0
17 Feb 2025
ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images
ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images
Hongyu Ge
Longkun Hao
Zihui Xu
Zhenxin Lin
Bin Li
Shoujun Zhou
Hongjin Zhao
Y. Liu
39
0
0
09 Feb 2025
VidCtx: Context-aware Video Question Answering with Image Models
VidCtx: Context-aware Video Question Answering with Image Models
Andreas Goulas
Vasileios Mezaris
Ioannis Patras
159
0
0
23 Dec 2024
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question
  Answering and Object Localization
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
Tan-Hanh Pham
Hoang-Nam Le
Phu-Vinh Nguyen
Chris Ngo
Truong Son-Hy
AuLLM
LRM
81
1
0
21 Dec 2024
MedCoT: Medical Chain of Thought via Hierarchical Expert
MedCoT: Medical Chain of Thought via Hierarchical Expert
Jiaxiang Liu
Yuan Wang
Jiawei Du
Qiufeng Wang
Zuozhu Liu
LRM
84
9
0
18 Dec 2024
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng
Qiguang Chen
Jin Zhang
Hao Fei
Xiaocheng Feng
Wanxiang Che
Min Li
L. Qin
VLM
MLLM
LRM
75
4
0
17 Dec 2024
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
Shengqiong Wu
Hao Fei
Liangming Pan
William Yang Wang
Shuicheng Yan
Tat-Seng Chua
LRM
69
1
0
15 Dec 2024
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for
  Training-Free Zero-Shot Composed Image Retrieval
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
Yuanmin Tang
Xiaoting Qin
Jingyang Zhang
Jing Yu
Gaopeng Gou
Gang Xiong
Qingwei Ling
Saravan Rajmohan
Dongmei Zhang
Qi Wu
LRM
66
1
0
15 Dec 2024
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Xinshuai Song
Weixing Chen
Yong-Jin Liu
Weikai Chen
Guanbin Li
Liang Lin
123
3
0
12 Dec 2024
Learning to Correction: Explainable Feedback Generation for Visual
  Commonsense Reasoning Distractor
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
Jiali Chen
Xusen Hei
Yuqi Xue
Yuancheng Wei
Jiayuan Xie
Yi Cai
Qing Li
MLLM
LRM
81
4
0
08 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A
  Comprehensive Survey
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
S. Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
101
15
0
03 Dec 2024
Enhancing Visual Reasoning with Autonomous Imagination in Multimodal
  Large Language Models
Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models
Jiaheng Liu
Yumeng Li
Boyuan Xiao
Yichang Jian
Ziang Qin
Tianjia Shao
Yao-Xiang Ding
Kun Zhou
MLLM
LRM
100
3
0
27 Nov 2024
CoA: Chain-of-Action for Generative Semantic Labels
CoA: Chain-of-Action for Generative Semantic Labels
Meng Wei
Zhongnian Li
Peng Ying
Xinzheng Xu
VLM
74
0
0
26 Nov 2024
Thinking Before Looking: Improving Multimodal LLM Reasoning via
  Mitigating Visual Hallucination
Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination
Haojie Zheng
Tianyang Xu
Hanchi Sun
Shu Pu
Ruoxi Chen
Lichao Sun
MLLM
LRM
84
8
0
15 Nov 2024
What Factors Affect Multi-Modal In-Context Learning? An In-Depth
  Exploration
What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration
L. Qin
Qiguang Chen
Hao Fei
Zhi Chen
Min Li
Wanxiang Che
41
5
0
27 Oct 2024
Exploring Prompt Engineering: A Systematic Review with SWOT Analysis
Exploring Prompt Engineering: A Systematic Review with SWOT Analysis
Aditi Singh
Abul Ehtesham
Gaurav Kumar Gupta
Nikhil Kumar Chatta
Saket Kumar
T. T. Khoei
28
1
0
09 Oct 2024
MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA
  with LLM and MLLM Integration
MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration
Lai Wei
Wenkai Wang
Xiaoyu Shen
Yu Xie
Zhihao Fan
Xiaojin Zhang
Zhongyu Wei
Wei Chen
34
4
0
06 Oct 2024
An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable
  Radiology Report Generation
An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
Ahmed Abdulaal
Hugo Fry
Nina Montaña-Brown
Ayodeji Ijishakin
Jack Gao
Stephanie L. Hyland
Daniel C. Alexander
Daniel Coelho De Castro
MedIm
36
8
0
04 Oct 2024
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Zhihan Zhang
Siru Ouyang
Hongming Zhang
Meng Jiang
Dong Yu
VLM
31
5
0
02 Oct 2024
DARE: Diverse Visual Question Answering with Robustness Evaluation
DARE: Diverse Visual Question Answering with Robustness Evaluation
Hannah Sterz
Jonas Pfeiffer
Ivan Vulić
OOD
VLM
21
2
0
26 Sep 2024
Internalizing ASR with Implicit Chain of Thought for Efficient
  Speech-to-Speech Conversational LLM
Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM
Robin Shing-Hei Yuen
Timothy Tin-Long Tse
Jian Zhu
AuLLM
33
3
0
25 Sep 2024
Enhancing Advanced Visual Reasoning Ability of Large Language Models
Enhancing Advanced Visual Reasoning Ability of Large Language Models
Zhiyuan Li
Dongnan Liu
Chaoyi Zhang
Heng Wang
Tengfei Xue
Weidong Cai
VLM
LRM
57
6
0
21 Sep 2024
Benchmarking VLMs' Reasoning About Persuasive Atypical Images
Benchmarking VLMs' Reasoning About Persuasive Atypical Images
Sina Malakouti
Aysan Aghazadeh
Ashmit Khandelwal
Adriana Kovashka
VLM
45
2
0
16 Sep 2024
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual
  Instruction Tuning
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning
Zhihao Li
Yao Du
Yang Liu
Yan Zhang
Yufang Liu
M. Zhang
Xunliang Cai
LRM
35
6
0
21 Aug 2024
A Training-Free Framework for Video License Plate Tracking and
  Recognition with Only One-Shot
A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot
Haoxuan Ding
Qi. Wang
Junyu Gao
Qiang Li
VLM
37
0
0
11 Aug 2024
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
Danfeng Guo
Sumitaka Honji
LRM
62
0
0
31 Jul 2024
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language
  Models?
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
Ben Yao
Yazhou Zhang
Qiuchi Li
Jing Qin
ReLM
LRM
37
3
0
17 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey
  from Co-Development Perspective
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
57
5
0
11 Jul 2024
VSP: Assessing the dual challenges of perception and reasoning in
  spatial planning tasks for VLMs
VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
Qiucheng Wu
Handong Zhao
Michael Stephen Saxon
T. Bui
William Yang Wang
Yang Zhang
Shiyu Chang
CoGe
43
4
0
02 Jul 2024
Evaluating Fairness in Large Vision-Language Models Across Diverse
  Demographic Attributes and Prompts
Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts
Xuyang Wu
Yuan Wang
Hsin-Tai Wu
Zhiqiang Tao
Yi Fang
VLM
40
8
0
25 Jun 2024
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large
  Language Models
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
Wenhao Shi
Zhiqiang Hu
Yi Bin
Junhua Liu
Yang Yang
See-Kiong Ng
Lidong Bing
Roy Ka-Wei Lee
SyDa
MLLM
LRM
34
41
0
25 Jun 2024
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Brandon Huang
Chancharik Mitra
Assaf Arbelle
Leonid Karlinsky
Trevor Darrell
Roei Herzig
41
12
0
21 Jun 2024
Large Language Models are Skeptics: False Negative Problem of
  Input-conflicting Hallucination
Large Language Models are Skeptics: False Negative Problem of Input-conflicting Hallucination
Jongyoon Song
Sangwon Yu
Sungroh Yoon
HILM
30
3
0
20 Jun 2024
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
Xuannan Liu
Zekun Li
Peipei Li
Shuhan Xia
Xing Cui
Linzhi Huang
Huaibo Huang
Weihong Deng
Zhaofeng He
38
13
0
13 Jun 2024
Integrating Text and Image Pre-training for Multi-modal Algorithmic
  Reasoning
Integrating Text and Image Pre-training for Multi-modal Algorithmic Reasoning
Zijian Zhang
Wei Liu
29
0
0
08 Jun 2024
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning
  of Large Language Models
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models
Jianben He
Xingbo Wang
Shiyi Liu
Guande Wu
Claudio Silva
Huamin Qu
LRM
37
2
0
06 Jun 2024
Interactive Text-to-Image Retrieval with Large Language Models: A
  Plug-and-Play Approach
Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach
Saehyung Lee
Sangwon Yu
Junsung Park
Jihun Yi
Sungroh Yoon
KELM
VLM
24
6
0
05 Jun 2024
From Redundancy to Relevance: Enhancing Explainability in Multimodal
  Large Language Models
From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models
Xiaofeng Zhang
Chen Shen
Xiaosong Yuan
Shaotian Yan
Liang Xie
Wenxiao Wang
Chaochen Gu
Hao Tang
Jieping Ye
54
2
0
04 Jun 2024
M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal
  Chain-of-Thought
M3^33CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought
Qiguang Chen
Libo Qin
Jin Zhang
Zhi Chen
Xiao Xu
Wanxiang Che
LRM
34
35
0
26 May 2024
12
Next