ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.07780
  4. Cited By
Towards Lightweight Transformer via Group-wise Transformation for
  Vision-and-Language Tasks

Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks

16 April 2022
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
    ViT
ArXivPDFHTML

Papers citing "Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks"

23 / 23 papers shown
Title
Not All Attention is Needed: Parameter and Computation Efficient
  Transfer Learning for Multi-modal Large Language Models
Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models
Qiong Wu
Weihao Ye
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
MoE
52
1
0
22 Mar 2024
Fast Text-to-3D-Aware Face Generation and Manipulation via Direct
  Cross-modal Mapping and Geometric Regularization
Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization
Jinlu Zhang
Yiyi Zhou
Qiancheng Zheng
Xiaoxiong Du
Gen Luo
Jun Peng
Xiaoshuai Sun
Rongrong Ji
3DH
27
3
0
11 Mar 2024
Rotated Multi-Scale Interaction Network for Referring Remote Sensing
  Image Segmentation
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan Liu
Yiwei Ma
Xiaoqing Zhang
Haowei Wang
Jiayi Ji
Xiaoshuai Sun
Rongrong Ji
24
38
0
19 Dec 2023
Towards Omni-supervised Referring Expression Segmentation
Towards Omni-supervised Referring Expression Segmentation
Minglang Huang
Yiyi Zhou
Gen Luo
Guannan Jiang
Weilin Zhuang
Xiaoshuai Sun
24
0
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
43
36
0
01 Nov 2023
Parameter and Computation Efficient Transfer Learning for
  Vision-Language Pre-trained Models
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Qiong Wu
Wei Yu
Yiyi Zhou
Shubin Huang
Xiaoshuai Sun
Rongrong Ji
VLM
26
7
0
04 Sep 2023
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object
  Detection
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Yifan Xu
Mengdan Zhang
Xiaoshan Yang
Changsheng Xu
ObjD
32
5
0
30 Aug 2023
Robust Visual Question Answering: Datasets, Methods, and Future
  Challenges
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
32
18
0
21 Jul 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
  Checkpoints
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
24
581
0
22 May 2023
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group
  Attention
EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
Xinyu Liu
Houwen Peng
Ningxin Zheng
Yuqing Yang
Han Hu
Yixuan Yuan
ViT
25
277
0
11 May 2023
Visual Tuning
Visual Tuning
Bruce X. B. Yu
Jianlong Chang
Haixin Wang
Lin Liu
Shijie Wang
...
Lingxi Xie
Haojie Li
Zhouchen Lin
Qi Tian
Chang Wen Chen
VLM
51
38
0
10 May 2023
Transformers in Single Object Tracking: An Experimental Survey
Transformers in Single Object Tracking: An Experimental Survey
Janani Kugarajeevan
T. Kokul
A. Ramanan
Subha Fernando
35
35
0
23 Feb 2023
Towards Efficient Visual Adaption via Structural Re-parameterization
Towards Efficient Visual Adaption via Structural Re-parameterization
Gen Luo
Minglang Huang
Yiyi Zhou
Xiaoshuai Sun
Guannan Jiang
Zhiyu Wang
Rongrong Ji
VLM
VPVLM
14
78
0
16 Feb 2023
Dynamic Prototype Mask for Occluded Person Re-Identification
Dynamic Prototype Mask for Occluded Person Re-Identification
Lei Tan
Pingyang Dai
Rongrong Ji
Yongjian Wu
16
68
0
19 Jul 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of
  One-Stage Referring Expression Comprehension
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension
Gen Luo
Yiyi Zhou
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
21
10
0
17 Apr 2022
PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image
  Generation
PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation
Jing He
Yiyi Zhou
Qi Zhang
Jun Peng
Yunhang Shen
Xiaoshuai Sun
Chao Chen
Rongrong Ji
18
5
0
02 Apr 2022
Towards Language-guided Visual Recognition via Dynamic Convolutions
Towards Language-guided Visual Recognition via Dynamic Convolutions
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yongjian Wu
Yue Gao
Rongrong Ji
ObjD
33
19
0
17 Oct 2021
Improving Image Captioning by Leveraging Intra- and Inter-layer Global
  Representation in Transformer Network
Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network
Jiayi Ji
Yunpeng Luo
Xiaoshuai Sun
Fuhai Chen
Gen Luo
Yongjian Wu
Yue Gao
Rongrong Ji
ViT
51
170
0
13 Dec 2020
Multi-task Collaborative Network for Joint Referring Expression
  Comprehension and Segmentation
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Liujuan Cao
Chenglin Wu
Cheng Deng
Rongrong Ji
ObjD
176
286
0
19 Mar 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
233
576
0
12 Sep 2019
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
  Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
950
20,572
0
17 Apr 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
167
1,464
0
06 Jun 2016
1