
v1v2 (latest)
A Token-level Text Image Foundation Model for Document Understanding
Papers citing "A Token-level Text Image Foundation Model for Document Understanding"
50 / 94 papers shown
Title |
---|
![]() MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang Mingfei Gao Zhe Gan Philipp Dufter Nina Wenzel ...Haoxuan You Zirui Wang Afshin Dehghan Peter Grasch Yinfei Yang |
![]() MouSi: Poly-Visual-Expert Vision-Language Models Xiaoran Fan Tao Ji Changhao Jiang Shuo Li Senjie Jin ...Qi Zhang Xipeng Qiu Xuanjing Huang Zuxuan Wu Yunchun Jiang |