ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.10549
  4. Cited By
Cross-modal Attention Congruence Regularization for Vision-Language
  Relation Alignment

Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment

20 December 2022
Rohan Pandey
Rulin Shao
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
ArXivPDFHTML

Papers citing "Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment"

10 / 10 papers shown
Title
Progressive Compositionality in Text-to-Image Generative Models
Progressive Compositionality in Text-to-Image Generative Models
Xu Han
Linghao Jin
Xiaofeng Liu
Paul Pu Liang
CoGe
103
2
0
22 Oct 2024
OLIVE: Object Level In-Context Visual Embeddings
OLIVE: Object Level In-Context Visual Embeddings
Timothy Ossowski
Junjie Hu
OCL
VLM
52
0
0
02 Jun 2024
Encoding and Controlling Global Semantics for Long-form Video Question
  Answering
Encoding and Controlling Global Semantics for Long-form Video Question Answering
Thong Nguyen
Zhiyuan Hu
Xiaobao Wu
Cong-Duy Nguyen
See-Kiong Ng
A. Luu
43
2
0
30 May 2024
Prompting Large Vision-Language Models for Compositional Reasoning
Prompting Large Vision-Language Models for Compositional Reasoning
Timothy Ossowski
Ming Jiang
Junjie Hu
CoGe
VLM
LRM
43
3
0
20 Jan 2024
Targeted Image Data Augmentation Increases Basic Skills Captioning
  Robustness
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness
Valentin Barriere
Felipe del Rio
Andres Carvallo De Ferari
Carlos Aspillaga
Eugenio Herrera-Berg
Cristian Buc Calderon
DiffM
27
0
0
27 Sep 2023
Vision-Language Dataset Distillation
Vision-Language Dataset Distillation
Xindi Wu
Byron Zhang
Zhiwei Deng
Olga Russakovsky
DD
VLM
33
8
0
15 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following
  Inspired by Real-World Use
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
31
77
0
12 Aug 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
What You See is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom
Yonatan Bitton
Soravit Changpinyo
Roee Aharoni
Jonathan Herzig
Oran Lang
E. Ofek
Idan Szpektor
EGVM
51
73
0
17 May 2023
Simple Token-Level Confidence Improves Caption Correctness
Simple Token-Level Confidence Improves Caption Correctness
Suzanne Petryk
Spencer Whitehead
Joseph E. Gonzalez
Trevor Darrell
Anna Rohrbach
Marcus Rohrbach
28
7
0
11 May 2023
Improving BERT with Syntax-aware Local Attention
Improving BERT with Syntax-aware Local Attention
Zhongli Li
Qingyu Zhou
Chao Li
Ke Xu
Yunbo Cao
61
44
0
30 Dec 2020
1