VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video
  Paragraph Captioning

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

Papers citing "VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning"