ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02265
  4. Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
    SSL
    VLM
ArXivPDFHTML

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,093 papers shown
Title
TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems
Khang H. N. Vo
D. Q. Nguyen
T. Nguyen
Tho Quan
50
0
0
09 Mar 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Chan hur
Jeong-hun Hong
Dong-hun Lee
Dabin Kang
Semin Myeong
Sang-hyo Park
Hyeyoung Park
60
0
0
07 Mar 2025
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems
Biao Ouyang
Yingying Zhang
Hanyin Cheng
Yang Shu
Chenjuan Guo
Bin Yang
Qingsong Wen
L. Fan
Christian S. Jensen
56
1
0
06 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
182
0
0
05 Mar 2025
Perceptual Visual Quality Assessment: Principles, Methods, and Future Directions
Wei Zhou
Hadi Amirpour
Christian Timmerer
Guangtao Zhai
P. Callet
Alan C. Bovik
53
0
0
01 Mar 2025
Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series
Yanan Niu
Roy Sarkis
D. Psaltis
Mario Paolone
Christophe Moser
Luisa Lambertini
36
0
0
28 Feb 2025
Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems
Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems
Faisal Mohammad
Duksan Ryu
64
0
0
28 Feb 2025
RTGen: Real-Time Generative Detection Transformer
RTGen: Real-Time Generative Detection Transformer
Chi Ruan
ObjD
VLM
52
0
0
28 Feb 2025
Deciphering the complaint aspects: Towards an aspect-based complaint identification model with video complaint dataset in finance
Sarmistha Das
Basha Mujavarsheik
R E Zera Lyngkhoi
Sriparna Saha
Alka Maurya
36
0
0
26 Feb 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
80
1
0
25 Feb 2025
Are Large Language Models Good Data Preprocessors?
Are Large Language Models Good Data Preprocessors?
Elyas Meguellati
Nardiena A. Pratama
S. Sadiq
Gianluca Demartini
62
0
0
24 Feb 2025
Vision Language Models in Medicine
Beria Chingnabe Kalpelbe
Angel Gabriel Adaambiik
Wei Peng
VLM
LM&MA
89
2
0
24 Feb 2025
Beyond Pattern Recognition: Probing Mental Representations of LMs
Beyond Pattern Recognition: Probing Mental Representations of LMs
Moritz Miller
Kumar Shridhar
ReLM
LRM
58
0
0
23 Feb 2025
Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank Adaptation
Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank Adaptation
Yuheng Ji
Yue Liu
Zhicheng Zhang
Zhao Zhang
Yuting Zhao
Gang Zhou
Xingwei Zhang
Xinwang Liu
Xiaolong Zheng
VLM
126
4
0
21 Feb 2025
Multi-Faceted Multimodal Monosemanticity
Multi-Faceted Multimodal Monosemanticity
Hanqi Yan
Xiangxiang Cui
Lu Yin
Paul Pu Liang
Yulan He
Yifei Wang
44
0
0
16 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Ming Shan Hee
Roy Ka-Wei Lee
VLM
85
0
0
16 Feb 2025
Handwritten Text Recognition: A Survey
Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
106
0
0
12 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
Vision-Language Models for Edge Networks: A Comprehensive Survey
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
70
3
0
11 Feb 2025
Foundation Models for Anomaly Detection: Vision and Challenges
Foundation Models for Anomaly Detection: Vision and Challenges
Jing Ren
Tao Tang
Hong Jia
Haytham Fayek
Xiaodong Li
Suyu Ma
Xiwei Xu
Feng Xia
66
0
0
10 Feb 2025
Performance Analysis of Traditional VQA Models Under Limited Computational Resources
Jihao Gu
49
0
0
09 Feb 2025
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
Hengzhu Tang
Zefeng Zhang
Zhiping Li
Zhenyu Zhang
Xing Wu
Li Gao
Suqi Cheng
Dawei Yin
65
1
0
09 Feb 2025
A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions
Elisa Negrini
Yuxuan Liu
Liu Yang
Stanley Osher
Hayden Schaeffer
AI4CE
93
0
0
09 Feb 2025
A Self-supervised Multimodal Deep Learning Approach to Differentiate Post-radiotherapy Progression from Pseudoprogression in Glioblastoma
A Self-supervised Multimodal Deep Learning Approach to Differentiate Post-radiotherapy Progression from Pseudoprogression in Glioblastoma
A. Gomaa
Yixing Huang
Pluvio Stephan
Katharina Breininger
Benjamin Frey
...
U. Gaipl
Christoph Bert
R. Fietkau
M. Schmidt
F. Putz
89
0
0
06 Feb 2025
Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection
Multimodal Inverse Attention Network with Intrinsic Discriminant Feature Exploitation for Fake News Detection
Tianlin Zhang
En Yu
Yi Shao
Shuai Li
Sujuan Hou
Jiande Sun
60
0
0
03 Feb 2025
Continually Evolved Multimodal Foundation Models for Cancer Prognosis
Continually Evolved Multimodal Foundation Models for Cancer Prognosis
Jie Peng
Shuang Zhou
Longwei Yang
Yiran Song
Mohan Zhang
Kaixiong Zhou
Feng Xie
Mingquan Lin
Rui Zhang
Tianlong Chen
90
0
0
30 Jan 2025
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
Lin Chen
Qi Yang
Kun Ding
Zhu Li
Gang Shen
Fei Li
Qiyuan Cao
Shiming Xiang
VLM
61
0
0
29 Jan 2025
Multi-Modality Transformer for E-Commerce: Inferring User Purchase Intention to Bridge the Query-Product Gap
Srivatsa Mallapragada
Ying Xie
Varsha Rani Chawan
Zeyad Hailat
Yuanbo Wang
52
0
0
28 Jan 2025
BrainGuard: Privacy-Preserving Multisubject Image Reconstructions from Brain Activities
BrainGuard: Privacy-Preserving Multisubject Image Reconstructions from Brain Activities
Zhibo Tian
Ruijie Quan
Fan Ma
Kun Zhan
Yi Yang
36
1
0
24 Jan 2025
Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols
Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols
John Joon Young Chung
Melissa Roemmele
Max Kreminski
VGen
75
0
0
23 Jan 2025
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality
Yanming Xiu
T. Scargill
M. Gorlatova
72
2
0
22 Jan 2025
MASS: Overcoming Language Bias in Image-Text Matching
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
32
0
0
20 Jan 2025
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
Soumya Dutta
Sriram Ganapathy
39
2
0
20 Jan 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
159
2
0
14 Jan 2025
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering
Anupam Pandey
Deepjyoti Bodo
Arpan Phukan
Asif Ekbal
43
0
0
13 Jan 2025
MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection
MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection
Kaiying Yan
Moyang Liu
Yukun Liu
Ruibo Fu
Zhengqi Wen
J. Tao
Xuefei Liu
Guanjun Li
41
0
0
12 Jan 2025
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework
Run Shao
Cheng Yang
Qiujun Li
Qing Zhu
Yongjun Zhang
...
Yu Liu
Yong Tang
Dapeng Liu
Shizhong Yang
Haifeng Li
111
0
0
08 Jan 2025
Multimodal Multihop Source Retrieval for Web Question Answering
Multimodal Multihop Source Retrieval for Web Question Answering
Navya Yarrabelly
Saloni Mittal
36
0
0
07 Jan 2025
Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models
Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models
Malak Mansour
Ahmed Aly
Bahey Tharwat
Sarim Hashmi
Dong An
Ian Reid
LM&Ro
ELM
LRM
56
0
0
07 Jan 2025
Foundations of GenIR
Qingyao Ai
Jingtao Zhan
Yong Liu
51
0
0
06 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
88
12
0
06 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
102
48
0
03 Jan 2025
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform
Cheonsu Jeong
77
0
0
01 Jan 2025
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Yue Zhang
Ziqiao Ma
Jialu Li
Yanyuan Qiao
Zun Wang
J. Chai
Qi Wu
Joey Tianyi Zhou
Parisa Kordjamshidi
LRM
63
19
0
31 Dec 2024
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
Palash Nandi
Shivam Sharma
Tanmoy Chakraborty
38
1
0
31 Dec 2024
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
55
4
0
31 Dec 2024
Improving Generated and Retrieved Knowledge Combination Through
  Zero-shot Generation
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation
Xinkai Du
Quanjie Han
Chao Lv
Yi Liu
Yalin Sun
Hao Shu
Hongbo Shan
Maosong Sun
RALM
45
0
0
25 Dec 2024
Multi-Agents Based on Large Language Models for Knowledge-based Visual
  Question Answering
Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Zhenqi Wang
50
0
0
24 Dec 2024
Prompting Large Language Models with Rationale Heuristics for
  Knowledge-based Visual Question Answering
Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Fengyuan Liu
LRM
125
58
0
22 Dec 2024
Bringing Multimodality to Amazon Visual Search System
Bringing Multimodality to Amazon Visual Search System
Xinliang Zhu
Michael Huang
Han Ding
Jinyu Yang
Kelvin Chen
...
Son Dinh Tran
Benjamin Z. Yao
Doug Gray
Anuj Bindal
Arnab Dhua
79
3
0
17 Dec 2024
BioBridge: Unified Bio-Embedding with Bridging Modality in Code-Switched
  EMR
BioBridge: Unified Bio-Embedding with Bridging Modality in Code-Switched EMR
Jangyeong Jeon
Sangyeon Cho
Dongjoon Lee
Changhee Lee
Junyeong Kim
75
0
0
16 Dec 2024
Previous
12345...404142
Next