Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.08358
Cited By
v1
v2
v3 (latest)
MixGen: A New Multi-Modal Data Augmentation
16 June 2022
Xiaoshuai Hao
Yi Zhu
Srikar Appalaraju
Aston Zhang
Wanqian Zhang
Boyang Li
Mu Li
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MixGen: A New Multi-Modal Data Augmentation"
19 / 19 papers shown
Title
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning
Yuting Li
Lai Wei
Kaipeng Zheng
Jingyuan Huang
Linghe Kong
Lichao Sun
Weiran Huang
AAML
LRM
VLM
84
0
0
11 Jun 2025
Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought
Shuyi Zhang
Xiaoshuai Hao
Yingbo Tang
Lingfeng Zhang
Pengwei Wang
Zhongyuan Wang
Hongxuan Ma
Shanghang Zhang
VGen
AI4TS
61
0
0
10 Jun 2025
Uneven Event Modeling for Partially Relevant Video Retrieval
Sa Zhu
Huashan Chen
Wanqian Zhang
Jinchao Zhang
Zexian Yang
Xiaoshuai Hao
Bo Li
48
1
0
01 Jun 2025
SynRES: Towards Referring Expression Segmentation in the Wild via Synthetic Data
Dong-Hee Kim
Hyunjee Song
Donghyun Kim
292
0
0
23 May 2025
MIDAS: Mixing Ambiguous Data with Soft Labels for Dynamic Facial Expression Recognition
Ryosuke Kawamura
Hideaki Hayashi
Noriko Takemura
Hajime Nagahara
CVBM
3DH
102
4
0
28 Feb 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
189
15
0
28 Feb 2025
Contrastive Visual Data Augmentation
Yu Zhou
B. Li
Mohan Tang
Xiaomeng Jin
Te-Lin Wu
Kuan-Hao Huang
Heng Ji
Kai-Wei Chang
Nanyun Peng
117
0
0
24 Feb 2025
Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification
Raja Kumar
Raghav Singhal
Pranamya Kulkarni
Deval Mehta
Kshitij S. Jadhav
83
0
0
26 Sep 2024
FTF-ER: Feature-Topology Fusion-Based Experience Replay Method for Continual Graph Learning
Jinhui Pang
Changqing Lin
Xiaoshuai Hao
Rong Yin
Zixuan Wang
Zhihui Zhang
Jinglin He
Huang Tai Sheng
83
4
0
28 Jul 2024
BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation
Peng Hao
Xiaobing Wang
Yingying Jiang
Hanchao Jia
Xiaoshuai Hao
Shaowei Cui
Junhang Wei
Xiaoshuai Hao
152
3
0
26 Jul 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
168
12
0
05 Mar 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
298
3
0
28 Dec 2023
Team AcieLee: Technical Report for EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023
Yuqi Li
Yi-Jhen Luo
Xiaoshuai Hao
Chuanguang Yang
Zhulin An
Dantong Song
Wei Yi
76
0
0
15 Jun 2023
Learning Multimodal Data Augmentation in Feature Space
Zichang Liu
Zhiqiang Tang
Xingjian Shi
Aston Zhang
Mu Li
Anshumali Shrivastava
A. Wilson
98
23
0
29 Dec 2022
Teaching Structured Vision&Language Concepts to Vision&Language Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Yikang Shen
Roei Herzig
...
Donghyun Kim
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
126
72
0
21 Nov 2022
Unifying Vision-Language Representation Space with Single-tower Transformer
Jiho Jang
Chaerin Kong
D. Jeon
Seonhoon Kim
Nojun Kwak
113
21
0
21 Nov 2022
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
60
22
0
15 Nov 2022
Beyond Instance Discrimination: Relation-aware Contrastive Self-supervised Learning
Yifei Zhang
Chang-rui Liu
Yu Zhou
Weiping Wang
QiXiang Ye
Xiangyang Ji
SSL
ISeg
BDL
88
7
0
02 Nov 2022
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLM
CLIP
100
27
0
29 Aug 2022
1