ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.03879
  4. Cited By
AI2D-RST: A multimodal corpus of 1000 primary school science diagrams

AI2D-RST: A multimodal corpus of 1000 primary school science diagrams

9 December 2019
Tuomo Hiippala
Malihe Alikhani
Jonas Haverinen
Timo Kalliokoski
E. Logacheva
Serafina Orekhova
Aino Tuomainen
Matthew Stone
J. Bateman
ArXivPDFHTML

Papers citing "AI2D-RST: A multimodal corpus of 1000 primary school science diagrams"

26 / 26 papers shown
Title
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Zheng Liu
Mengjie Liu
Jianfei Chen
Jingwei Xu
Bin Cui
Zeang Sheng
Wentao Zhang
MLLM
59
0
0
14 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
69
2
0
01 Apr 2025
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Qihan Huang
Long Chan
Jinlong Liu
Wanggui He
Hao Jiang
Mingli Song
Jingyuan Chen
Chang Yao
Jie Song
LRM
37
0
0
31 Mar 2025
MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning
MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning
Dawei Yan
Yangfu Li
Qing-Guo Chen
Weihua Luo
Peng Wang
Han Zhang
Chunhua Shen
VGen
VLM
LRM
72
1
0
24 Mar 2025
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
Eduard Allakhverdov
Elizaveta Goncharova
Andrey Kuznetsov
47
0
0
20 Mar 2025
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning
Huilin Deng
Ding Zou
Rui Ma
Hongchen Luo
Yang Cao
Yu Kang
LRM
VLM
60
6
0
10 Mar 2025
Multi-modal Summarization in Model-Based Engineering: Automotive Software Development Case Study
Nenad Petrovic
Yurui Zhang
Moaad Maaroufi
Kuo-Yi Chao
Lukasz Mazur
Fengjunjie Pan
Vahid Zolfaghari
Alois C. Knoll
67
0
0
06 Mar 2025
I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning
I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning
Stephan Rabanser
Nathalie Rauschmayr
Achin Kulshrestha
Petra Poklukar
Wittawat Jitkrittum
Sean Augenstein
Congchao Wang
Federico Tombari
42
0
0
26 Feb 2025
InsightVision: A Comprehensive, Multi-Level Chinese-based Benchmark for Evaluating Implicit Visual Semantics in Large Vision Language Models
InsightVision: A Comprehensive, Multi-Level Chinese-based Benchmark for Evaluating Implicit Visual Semantics in Large Vision Language Models
Xiaofei Yin
Y. Hong
Ya Guo
Yi Tu
Weiqiang Wang
Gongshen Liu
Huijia Zhu
VLM
67
0
0
19 Feb 2025
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Haozhao Wang
Yuxiang Nie
Yongjie Ye
Deng GuanYu
Yanjie Wang
Shuai Li
Haiyang Yu
Jinghui Lu
Can Huang
VLM
MLLM
84
1
0
12 Dec 2024
MLAN: Language-Based Instruction Tuning Improves Zero-Shot
  Generalization of Multimodal Large Language Models
MLAN: Language-Based Instruction Tuning Improves Zero-Shot Generalization of Multimodal Large Language Models
Jianhong Tu
Zhuohao Ni
Nicholas Crispino
Zihao Yu
Michael Bendersky
...
Ruoxi Jia
Xin Liu
Lingjuan Lyu
Dawn Song
Chenguang Wang
VLM
MLLM
54
0
0
15 Nov 2024
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
Hanrong Ye
Haotian Zhang
Erik Daxberger
Lin Chen
Zongyu Lin
...
Haoxuan You
Dan Xu
Zhe Gan
Jiasen Lu
Yinfei Yang
EgoV
MLLM
88
12
0
09 Oct 2024
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical
  Alignment
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Yifei Xing
Xiangyuan Lan
Ruiping Wang
D. Jiang
Wenjun Huang
Qingfang Zheng
Yaowei Wang
Mamba
38
0
0
08 Oct 2024
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
Dawei Yan
Pengcheng Li
Yang Li
Hao Chen
Qingguo Chen
Weihua Luo
Wei Dong
Qingsen Yan
Haokui Zhang
Chunhua Shen
3DV
VLM
51
4
0
15 Sep 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu
Haojia Lin
Zuwei Long
Yunhang Shen
Meng Zhao
...
Ran He
Rongrong Ji
Yunsheng Wu
Caifeng Shan
Xing Sun
MLLM
47
80
0
09 Aug 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
48
282
0
24 Jun 2024
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large
  Vision-Language Models
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
Yuhang Wu
Wenmeng Yu
Yean Cheng
Yan Wang
Xiaohan Zhang
Jiazheng Xu
Ming Ding
Yuxiao Dong
53
1
0
13 Jun 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
71
33
0
29 Mar 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models
  (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
47
70
0
10 Jan 2024
GlitchBench: Can large multimodal models detect video game glitches?
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri
Tianjun Feng
Anh Nguyen
C. Bezemer
MLLM
VLM
LRM
32
9
0
08 Dec 2023
Lyrics: Boosting Fine-grained Language-Vision Alignment and
  Comprehension via Semantic-aware Visual Objects
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
Junyu Lu
Ruyi Gan
Di Zhang
Xiaojun Wu
Ziwei Wu
Renliang Sun
Jiaxing Zhang
Pingjian Zhang
Yan Song
MLLM
VLM
25
15
0
08 Dec 2023
ChartParser: Automatic Chart Parsing for Print-Impaired
ChartParser: Automatic Chart Parsing for Print-Impaired
Anukriti Kumar
T. Ganu
Saikat Guha
LMTD
11
0
0
16 Nov 2022
COSMic: A Coherence-Aware Generation Metric for Image Descriptions
COSMic: A Coherence-Aware Generation Metric for Image Descriptions
Mert Inan
P. Sharma
Baber Khalid
Radu Soricut
Matthew Stone
Malihe Alikhani
EGVM
29
13
0
11 Sep 2021
Semiotically-grounded distant viewing of diagrams: insights from two
  multimodal corpora
Semiotically-grounded distant viewing of diagrams: insights from two multimodal corpora
Tuomo Hiippala
J. Bateman
14
11
0
08 Mar 2021
Introducing the diagrammatic semiotic mode
Introducing the diagrammatic semiotic mode
Tuomo Hiippala
J. Bateman
6
3
0
30 Jan 2020
Classifying Diagrams and Their Parts using Graph Neural Networks: A
  Comparison of Crowd-Sourced and Expert Annotations
Classifying Diagrams and Their Parts using Graph Neural Networks: A Comparison of Crowd-Sourced and Expert Annotations
Tuomo Hiippala
20
1
0
05 Dec 2019
1