D$^2$TV: Dual Knowledge Distillation and Target-oriented Vision Modeling
  for Many-to-Many Multimodal Summarization
v1v2 (latest)

D2^2TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization

    VLM

Papers citing "D$^2$TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization"

13 / 13 papers shown
Title

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.