ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.05078
  4. Cited By
CLIP-Event: Connecting Text and Images with Event Structures

CLIP-Event: Connecting Text and Images with Event Structures

13 January 2022
Manling Li
Ruochen Xu
Shuohang Wang
Luowei Zhou
Xudong Lin
Chenguang Zhu
Michael Zeng
Heng Ji
Shih-Fu Chang
    VLM
    CLIP
ArXivPDFHTML

Papers citing "CLIP-Event: Connecting Text and Images with Event Structures"

50 / 65 papers shown
Title
Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real
  Image Editing
Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing
Jiancheng Huang
Yi Huang
Jianzhuang Liu
Donghao Zhou
Yong-Jin Liu
Shifeng Chen
DiffM
106
0
0
15 Dec 2024
Scalable Early Childhood Reading Performance Prediction
Scalable Early Childhood Reading Performance Prediction
Zhongkai Shangguan
Zanming Huang
Eshed Ohn-Bar
Ola Ozernov-Palchik
Derek Kosty
Michael Stoolmiller
Hank Fien
AI4Ed
70
1
0
05 Dec 2024
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image
  Segmentation
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Seongsu Ha
Chaeyun Kim
Donghwa Kim
Junho Lee
Sangho Lee
Joonseok Lee
50
2
0
03 Nov 2024
Identifying Implicit Social Biases in Vision-Language Models
Identifying Implicit Social Biases in Vision-Language Models
Kimia Hamidieh
Haoran Zhang
Walter Gerych
Thomas Hartvigsen
Marzyeh Ghassemi
VLM
36
11
0
01 Nov 2024
ARMADA: Attribute-Based Multimodal Data Augmentation
ARMADA: Attribute-Based Multimodal Data Augmentation
Xiaomeng Jin
Jeonghwan Kim
Yu Zhou
Kuan-Hao Huang
Te-Lin Wu
Nanyun Peng
Heng Ji
26
2
0
19 Aug 2024
OPDR: Order-Preserving Dimension Reduction for Semantic Embedding of
  Multimodal Scientific Data
OPDR: Order-Preserving Dimension Reduction for Semantic Embedding of Multimodal Scientific Data
Chengyu Gong
Gefei Shen
Luanzheng Guo
Nathan R. Tallent
Dongfang Zhao
21
1
0
15 Aug 2024
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
DIVE: Towards Descriptive and Diverse Visual Commonsense Generation
Jun-Hyung Park
Hyuntae Park
Youjin Kang
Eojin Jeon
SangKeun Lee
27
0
0
15 Aug 2024
MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with
  Large Language Models
MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models
Haoxuan Li
Zhengmao Yang
Yunshan Ma
Yi Bin
Yang Yang
Tat-Seng Chua
33
0
0
08 Aug 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
27
1
0
30 Jul 2024
MMUTF: Multimodal Multimedia Event Argument Extraction with Unified
  Template Filling
MMUTF: Multimodal Multimedia Event Argument Extraction with Unified Template Filling
Philipp Seeberger
Dominik Wagner
K. Riedhammer
27
0
0
18 Jun 2024
GenEARL: A Training-Free Generative Framework for Multimodal Event
  Argument Role Labeling
GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling
Hritik Bansal
Po-Nien Kung
P. Brantingham
Weisheng Wang
Miao Zheng
VLM
34
1
0
07 Apr 2024
Cross-Modal Conditioned Reconstruction for Language-guided Medical Image
  Segmentation
Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation
Xiaoshuang Huang
Hongxiang Li
Meng Cao
Long Chen
Chenyu You
Dong An
VLM
41
5
0
03 Apr 2024
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
Yuchen Suo
Fan Ma
Linchao Zhu
Yi Yang
40
19
0
24 Mar 2024
Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for
  Boosting Neural Network Training
Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training
Zeliang Zhang
Jinyang Jiang
Zhuo Liu
Susan Liang
Yijie Peng
Chenliang Xu
29
0
0
18 Mar 2024
Text-Guided Variational Image Generation for Industrial Anomaly
  Detection and Segmentation
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
Mingyu Lee
Jongwon Choi
29
8
0
10 Mar 2024
UMIE: Unified Multimodal Information Extraction with Instruction Tuning
UMIE: Unified Multimodal Information Extraction with Instruction Tuning
Lin Sun
Kai Zhang
Qingyuan Li
Renze Lou
29
13
0
05 Jan 2024
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in
  Chart Captioning
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Kung-Hsiang Huang
Mingyang Zhou
Hou Pong Chan
Yi Ren Fung
Zhenhailong Wang
Lingyu Zhang
Shih-Fu Chang
Heng Ji
19
33
0
15 Dec 2023
Learning Generalizable Perceptual Representations for Data-Efficient
  No-Reference Image Quality Assessment
Learning Generalizable Perceptual Representations for Data-Efficient No-Reference Image Quality Assessment
Suhas Srinath
Shankhanil Mitra
Shika Rao
R. Soundararajan
OOD
21
5
0
08 Dec 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
J. Park
Jack Hessel
Khyathi Raghavi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
26
11
0
08 Dec 2023
Prompt Tuning for Zero-shot Compositional Learning
Prompt Tuning for Zero-shot Compositional Learning
Lingyu Zhang
Ting Hua
Yilin Shen
Hongxia Jin
VLM
38
0
0
02 Dec 2023
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Franciskus Xaverius Erick
Mina Rezaei
Johanna P. Müller
Bernhard Kainz
20
0
0
30 Nov 2023
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided
  Code-Vision Representation
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Yangyi Chen
Xingyao Wang
Manling Li
Derek Hoiem
Heng Ji
30
11
0
22 Nov 2023
SPOT! Revisiting Video-Language Models for Event Understanding
SPOT! Revisiting Video-Language Models for Event Understanding
Gengyuan Zhang
Jinhe Bi
Jindong Gu
Yanyu Chen
Volker Tresp
27
2
0
21 Nov 2023
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in
  Event Extraction
TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction
Kuan-Hao Huang
I-Hung Hsu
Tanmay Parekh
Zhiyu Xie
Zixuan Zhang
Premkumar Natarajan
Kai-Wei Chang
Nanyun Peng
Heng Ji
29
16
0
16 Nov 2023
Towards a Unified Transformer-based Framework for Scene Graph Generation
  and Human-object Interaction Detection
Towards a Unified Transformer-based Framework for Scene Graph Generation and Human-object Interaction Detection
Tao He
Lianli Gao
Jingkuan Song
Yuan-Fang Li
ViT
26
11
0
03 Nov 2023
Defining a New NLP Playground
Defining a New NLP Playground
Sha Li
Chi Han
Pengfei Yu
Carl N. Edwards
Manling Li
...
Yi Ren Fung
Charles Yu
Joel R. Tetreault
Eduard H. Hovy
Heng Ji
35
5
0
31 Oct 2023
Envisioning Narrative Intelligence: A Creative Visual Storytelling
  Anthology
Envisioning Narrative Intelligence: A Creative Visual Storytelling Anthology
Brett A. Halperin
S. Lukin
CoGe
63
24
0
06 Oct 2023
Multimodal Question Answering for Unified Information Extraction
Multimodal Question Answering for Unified Information Extraction
Yuxuan Sun
Kai Zhang
Yu-Chuan Su
32
8
0
04 Oct 2023
Seal2Real: Prompt Prior Learning on Diffusion Model for Unsupervised
  Document Seal Data Generation and Realisation
Seal2Real: Prompt Prior Learning on Diffusion Model for Unsupervised Document Seal Data Generation and Realisation
Jiancheng Huang
Yifan Liu
Yi Huang
Shifeng Chen
DiffM
VLM
26
4
0
01 Oct 2023
FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image
  Editing
FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing
Songyan Chen
Jiancheng Huang
DiffM
29
13
0
26 Sep 2023
Diversified Ensemble of Independent Sub-Networks for Robust
  Self-Supervised Representation Learning
Diversified Ensemble of Independent Sub-Networks for Robust Self-Supervised Representation Learning
Amirhossein Vahidi
Lisa Wimmer
H. Gündüz
Bernd Bischl
Eyke Hüllermeier
Mina Rezaei
OOD
UQCV
30
4
0
28 Aug 2023
Composed Image Retrieval using Contrastive Learning and Task-oriented
  CLIP-based Features
Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features
Alberto Baldrati
Marco Bertini
Tiberio Uricchio
A. Bimbo
CLIP
CoGe
11
29
0
22 Aug 2023
ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in
  Situation Recognition
ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition
Debaditya Roy
Dhruv Verma
Basura Fernando
VLM
CLIP
26
4
0
02 Jul 2023
Training Multimedia Event Extraction With Generated Images and Captions
Training Multimedia Event Extraction With Generated Images and Captions
Zilin Du
Yunxin Li
Xu Guo
Yidan Sun
Boyang Albert Li
DiffM
21
7
0
15 Jun 2023
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Kim Hoang Tran
Anh Duy Le Dinh
Tien-Phat Nguyen
Thinh Phan
Pha Nguyen
Khoa Luu
Don Adjeroh
Gianfranco Doretto
Ngan Hoang Le
VOT
33
5
0
28 May 2023
Few-shot Domain-Adaptive Visually-fused Event Detection from Text
Few-shot Domain-Adaptive Visually-fused Event Detection from Text
Farhad Moghimifar
Fatemeh Shiri
Van Nguyen
Gholamreza Haffari
Yuanyou Li
VLM
30
2
0
04 May 2023
VERITE: A Robust Benchmark for Multimodal Misinformation Detection
  Accounting for Unimodal Bias
VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias
Stefanos-Iordanis Papadopoulos
C. Koutlis
Symeon Papadopoulos
P. Petrantonakis
80
19
0
27 Apr 2023
Verbs in Action: Improving verb understanding in video-language models
Verbs in Action: Improving verb understanding in video-language models
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
37
70
0
13 Apr 2023
Subject-driven Text-to-Image Generation via Apprenticeship Learning
Subject-driven Text-to-Image Generation via Apprenticeship Learning
Wenhu Chen
Hexiang Hu
Yandong Li
Nataniel Rui
Xuhui Jia
Ming-Wei Chang
William W. Cohen
DiffM
16
187
0
01 Apr 2023
Seeing What You Said: Talking Face Generation Guided by a Lip Reading
  Expert
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
Jiadong Wang
Xinyuan Qian
Malu Zhang
R. Tan
Haizhou Li
EGVM
22
93
0
29 Mar 2023
Causal schema induction for knowledge discovery
Causal schema induction for knowledge discovery
Michael Regan
Jena D. Hwang
Keisuke Sakaguchi
James Pustejovsky
CML
16
1
0
27 Mar 2023
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained
  Experts
Video Pre-trained Transformer: A Multimodal Mixture of Pre-trained Experts
Kastan Day
D. Christl
Rohan Salvi
Pranav Sriram
ViT
24
1
0
24 Mar 2023
Weakly Supervised Video Representation Learning with Unaligned Text for
  Sequential Videos
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
Sixun Dong
Huazhang Hu
Dongze Lian
Weixin Luo
Yichen Qian
Shenghua Gao
ViT
AI4TS
23
11
0
22 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
eP-ALM: Efficient Perceptual Augmentation of Language Models
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLM
VLM
32
29
0
20 Mar 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
31
202
0
20 Feb 2023
Vision and Structured-Language Pretraining for Cross-Modal Food
  Retrieval
Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval
Mustafa Shukor
Nicolas Thome
Matthieu Cord
CLIP
CoGe
29
8
0
08 Dec 2022
Zero-Shot Classification by Logical Reasoning on Natural Language
  Explanations
Zero-Shot Classification by Logical Reasoning on Natural Language Explanations
Chi Han
Hengzhi Pei
Xinya Du
Heng Ji
NAI
10
3
0
07 Nov 2022
Video Event Extraction via Tracking Visual States of Arguments
Video Event Extraction via Tracking Visual States of Arguments
Guang Yang
Manling Li
Jiajie Zhang
Xudong Lin
Shih-Fu Chang
Heng Ji
32
9
0
03 Nov 2022
Learning to Decompose Visual Features with Latent Textual Prompts
Learning to Decompose Visual Features with Latent Textual Prompts
Feng Wang
Manling Li
Xudong Lin
Hairong Lv
A. Schwing
Heng Ji
VLM
19
23
0
09 Oct 2022
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature
  Alignment
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
26
21
0
09 Oct 2022
12
Next