ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.07073
  4. Cited By
Pixtral 12B

Pixtral 12B

9 October 2024
Pravesh Agrawal
Szymon Antoniak
Emma Bou Hanna
Baptiste Bout
Devendra Singh Chaplot
Jessica Chudnovsky
Diogo Costa
Baudouin De Monicault
Saurabh Garg
Théophile Gervet
Soham Ghosh
Amélie Héliou
Paul Jacob
Albert Q. Jiang
Kartik Khandelwal
Timothée Lacroix
Guillaume Lample
Diego de Las Casas
Thibaut Lavril
Teven Le Scao
Andy Lo
William Marshall
Louis Martin
A. Mensch
Pavankumar Muddireddy
Valera Nemychnikova
Marie Pellat
Patrick von Platen
Nikhil Raghuraman
Baptiste Rozière
Alexandre Sablayrolles
Lucile Saulnier
Romain Sauvestre
Wendy Shang
Roman Soletskyi
Lawrence Stewart
Pierre Stock
Joachim Studnia
Sandeep Subramanian
Sagar Vaze
Thomas Wang
Sophia Yang
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Pixtral 12B"

35 / 35 papers shown
Title
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
70
0
0
03 May 2025
Robotic Task Ambiguity Resolution via Natural Language Interaction
Robotic Task Ambiguity Resolution via Natural Language Interaction
Eugenio Chisari
Jan Ole von Hartz
Fabien Despinoy
Abhinav Valada
LM&Ro
51
0
0
24 Apr 2025
Benchmarking Vision Language Models on German Factual Data
Benchmarking Vision Language Models on German Factual Data
René Peinl
Vincent Tischler
CoGe
69
0
0
15 Apr 2025
Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data
Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data
Samarth Mishra
Kate Saenko
Venkatesh Saligrama
CoGe
LRM
39
0
0
07 Apr 2025
Reasoning LLMs for User-Aware Multimodal Conversational Agents
Reasoning LLMs for User-Aware Multimodal Conversational Agents
Hamed Rahimi
Jeanne Cattoni
Meriem Beghili
Mouad Abrini
Mahdi Khoramshahi
Maribel Pino
Mohamed Chetouani
LRM
36
0
0
02 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
75
0
0
01 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
69
2
0
01 Apr 2025
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language
Yoonshik Kim
Jaeyoon Jung
37
0
0
31 Mar 2025
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
Jixuan Leng
Chengsong Huang
Langlin Huang
Bill Yuchen Lin
William W. Cohen
Haohan Wang
Jiaxin Huang
LRM
49
0
0
30 Mar 2025
StarFlow: Generating Structured Workflow Outputs From Sketch Images
StarFlow: Generating Structured Workflow Outputs From Sketch Images
Patrice Bechard
Chao Wang
Amirhossein Abaskohi
Juan A. Rodriguez
Christopher Pal
David Vazquez
Spandana Gella
Sai Rajeswar
Perouz Taslakian
38
0
0
27 Mar 2025
Vision as LoRA
Vision as LoRA
Han Wang
Yongjie Ye
Bingru Li
Yuxiang Nie
Jinghui Lu
Jingqun Tang
Yanjie Wang
Can Huang
88
1
0
26 Mar 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLM
OffRL
LRM
94
0
0
26 Mar 2025
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
Kexian Tang
Junyao Gao
Yanhong Zeng
Haodong Duan
Yanan Sun
Zhening Xing
Wenran Liu
Kaifeng Lyu
Kai-xiang Chen
ELM
LRM
56
1
0
25 Mar 2025
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Yuxiao Chen
L. Meng
Wujian Peng
Zuxuan Wu
Yu-Gang Jiang
VLM
53
0
0
24 Mar 2025
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
Lucas Morin
Valéry Weber
A. Nassar
Gerhard Ingmar Meijer
Luc Van Gool
Yawei Li
Peter W. J. Staar
64
1
0
20 Mar 2025
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Nvidia
Hassan Abu Alhaija
Jose M. Alvarez
Maciej Bala
Tiffany Cai
...
Yuchong Ye
Xiaodong Yang
Boxin Wang
Fangyin Wei
Yu Zeng
VGen
95
2
0
18 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yujie Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
Tieniu Tan
188
2
0
18 Mar 2025
Historic Scripts to Modern Vision: A Novel Dataset and A VLM Framework for Transliteration of Modi Script to Devanagari
Historic Scripts to Modern Vision: A Novel Dataset and A VLM Framework for Transliteration of Modi Script to Devanagari
Harshal Kausadikar
Tanvi Kale
Onkar Susladkar
Sparsh Mittal
60
0
0
17 Mar 2025
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification
Ans Munir
Faisal Z. Qureshi
M. H. Khan
Mohsen Ali
VLM
70
0
0
15 Mar 2025
EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks
Yi Zhang
Qiang Zhang
Xiaozhu Ju
Ziqiang Liu
Jilei Mao
...
Jiaxu Wang
Yiqun Duan
Jiahang Cao
Renjing Xu
Jian Tang
LM&Ro
LRM
62
0
0
14 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
199
0
0
11 Mar 2025
Multi-modal Summarization in Model-Based Engineering: Automotive Software Development Case Study
Nenad Petrovic
Yurui Zhang
Moaad Maaroufi
Kuo-Yi Chao
Lukasz Mazur
Fengjunjie Pan
Vahid Zolfaghari
Alois C. Knoll
67
0
0
06 Mar 2025
ToFu: Visual Tokens Reduction via Fusion for Multi-modal, Multi-patch, Multi-image Task
Vittorio Pippi
Matthieu Guillaumin
S. Cascianelli
Rita Cucchiara
M. Jaritz
Loris Bazzani
64
0
0
06 Mar 2025
Scientific Reasoning: Assessment of Multimodal Generative LLMs
Florian Dreyer
Ekaterina Kolos
Daria Matiash
ReLM
LRM
65
0
0
03 Mar 2025
Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional Networks
Game State and Spatio-temporal Action Detection in Soccer using Graph Neural Networks and 3D Convolutional Networks
Jeremie Ochin
Guillaume Devineau
Bogdan Stanciulescu
Sotiris Manitsaris
3DPC
74
1
0
24 Feb 2025
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Nidhal Jegham
Marwan Abdelatti
Abdeltawab Hendawi
VLM
LRM
60
1
0
23 Feb 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
Feiyu Xiong
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
76
19
0
21 Jan 2025
Chimera: Improving Generalist Model with Domain-Specific Experts
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
Mingxing Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
90
4
0
08 Dec 2024
DLaVA: Document Language and Vision Assistant for Answer Localization
  with Enhanced Interpretability and Trustworthiness
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness
Ahmad Mohammadshirazi
Pinaki Prasad Guha Neogi
Ser-Nam Lim
R. Ramnath
70
1
0
29 Nov 2024
SCoTT: Wireless-Aware Path Planning with Vision Language Models and
  Strategic Chains-of-Thought
SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought
Aladin Djuhera
Vlad-Costin Andrei
Amin Seffo
Holger Boche
Walid Saad
90
0
0
27 Nov 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
109
7
0
27 Nov 2024
Efficient Self-Improvement in Multimodal Large Language Models: A
  Model-Level Judge-Free Approach
Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach
Shijian Deng
Wentian Zhao
Yu-Jhe Li
Kun Wan
Daniel Miranda
Ajinkya Kale
Yapeng Tian
LRM
72
6
0
26 Nov 2024
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
Éloi Zablocki
Valentin Gerard
Amaia Cardiel
Eric Gaussier
Matthieu Cord
Eduardo Valle
84
0
0
23 Nov 2024
Teaching VLMs to Localize Specific Objects from In-context Examples
Teaching VLMs to Localize Specific Objects from In-context Examples
Sivan Doveh
Nimrod Shabtay
Wei Lin
Eli Schwartz
Hilde Kuehne
...
Leonid Karlinsky
James Glass
Assaf Arbelle
S. Ullman
Muhammad Jehanzeb Mirza
VLM
106
1
0
20 Nov 2024
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
Eduardo R. Corral-Soto
Yang Liu
Tongtong Cao
Y. Ren
Liu Bingbing
55
0
0
14 Oct 2024
1