Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.05832
Cited By
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
12 April 2022
Thomas Wang
Adam Roberts
Daniel Hesslow
Teven Le Scao
Hyung Won Chung
Iz Beltagy
Julien Launay
Colin Raffel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?"
50 / 125 papers shown
Title
Attention Is Not All You Need: The Importance of Feedforward Networks in Transformer Models
Isaac Gerber
34
0
0
10 May 2025
Automatic Calibration for Membership Inference Attack on Large Language Models
Saleh Zare Zade
Yao Qiang
Xiangyu Zhou
Hui Zhu
Mohammad Amin Roshani
Prashant Khanduri
Dongxiao Zhu
37
1
0
06 May 2025
TimeCapsule: Solving the Jigsaw Puzzle of Long-Term Time Series Forecasting with Compressed Predictive Representations
Yihang Lu
Yangyang Xu
Qitao Qing
Xianwei Meng
AI4TS
49
0
0
17 Apr 2025
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao
Isaac Chung
Imene Kerboua
Jamie Stirling
Xin Zhang
Márton Kardos
Roman Solomatin
Noura Al Moubayed
Kenneth C. Enevoldsen
Niklas Muennighoff
VLM
42
0
0
14 Apr 2025
Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation
Biao Zhang
Fedor Moiseev
Joshua Ainslie
Paul Suganthan
Min Ma
Surya Bhupatiraju
Fede Lebron
Orhan Firat
Armand Joulin
Zhe Dong
AI4CE
31
0
0
08 Apr 2025
Beyond Words: A Latent Memory Approach to Internal Reasoning in LLMs
José I. Orlicki
LRM
66
0
0
28 Feb 2025
Retrieval Backward Attention without Additional Training: Enhance Embeddings of Large Language Models via Repetition
Yifei Duan
Raphael Shang
Deng Liang
Yongqiang Cai
87
0
0
28 Feb 2025
AutoCas: Autoregressive Cascade Predictor in Social Networks via Large Language Models
Yuhao Zheng
Chenghua Gong
Rui Sun
Juyuan Zhang
Liming Pan
Linyuan Lv
37
0
0
25 Feb 2025
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding
Konstantin Berestizshevsky
Renzo Andri
Lukas Cavigelli
80
1
0
12 Feb 2025
Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation
Zhi Qu
Yiran Wang
Jiannan Mao
Chenchen Ding
Hideki Tanaka
Masao Utiyama
Taro Watanabe
LRM
40
0
0
06 Jan 2025
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
X. Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
61
18
0
31 Dec 2024
Detecting anxiety and depression in dialogues: a multi-label and explainable approach
Francisco de Arriba-Pérez
Silvia García-Méndez
AI4MH
39
0
0
23 Dec 2024
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
Zanlin Ni
Yulin Wang
Renping Zhou
Yizeng Han
Jiayi Guo
Zhiyuan Liu
Yuan Yao
Gao Huang
63
4
0
11 Nov 2024
Training Compute-Optimal Protein Language Models
Xingyi Cheng
Bo Chen
Pan Li
Jing Gong
Jie Tang
Le Song
84
13
0
04 Nov 2024
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Chong Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
37
4
0
14 Oct 2024
ENTP: Encoder-only Next Token Prediction
Ethan Ewer
Daewon Chae
Thomas Zeng
Jinkyu Kim
Kangwook Lee
38
3
0
02 Oct 2024
How Does Code Pretraining Affect Language Model Task Performance?
Jackson Petty
Sjoerd van Steenkiste
Tal Linzen
68
8
0
06 Sep 2024
Enhancing Source Code Security with LLMs: Demystifying The Challenges and Generating Reliable Repairs
Nafis Tanveer Islam
Joseph Khoury
Andrew Seong
E. Bou-Harb
Peyman Najafirad
AAML
38
3
0
01 Sep 2024
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Zanlin Ni
Yulin Wang
Renping Zhou
Rui Lu
Jiayi Guo
Jinyi Hu
Zhiyuan Liu
Yuan Yao
Gao Huang
37
7
0
31 Aug 2024
Assessing Contamination in Large Language Models: Introducing the LogProber method
Nicolas Yax
Pierre-Yves Oudeyer
Stefano Palminteri
38
4
0
26 Aug 2024
Arctic-TILT. Business Document Understanding at Sub-Billion Scale
Łukasz Borchmann
Michał Pietruszka
Wojciech Ja'skowski
Dawid Jurkiewicz
Piotr Halama
...
Gabriela Nowakowska
Artur Zawłocki
Łukasz Duhr
Paweł Dyda
Michał Turski
VLM
39
1
0
08 Aug 2024
Efficient Fusion and Task Guided Embedding for End-to-end Autonomous Driving
Yipin Guo
Yilin Lang
Qinyuan Ren
50
0
0
03 Jul 2024
Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining
Qi Zhang
Tianqi Du
Haotian Huang
Yifei Wang
Yisen Wang
42
3
0
01 Jul 2024
LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation
Yuhao Wang
Yichao Wang
Zichuan Fu
Xiangyang Li
Xiangyu Zhao
Huifeng Guo
Ruiming Tang
41
12
0
18 Jun 2024
From Instance Training to Instruction Learning: Task Adapters Generation from Instructions
Huanxuan Liao
Yao Xu
Shizhu He
Yuanzhe Zhang
Yanchao Hao
Shengping Liu
Kang Liu
Jun Zhao
50
1
0
18 Jun 2024
Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions
Cheng Tan
Dongxin Lyu
Siyuan Li
Zhangyang Gao
Jingxuan Wei
Siqi Ma
Zicheng Liu
Stan Z. Li
LLMAG
48
10
0
09 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
39
0
06 Jun 2024
KGLink: A column type annotation method that combines knowledge graph and pre-trained language model
Yubo Wang
Hao Xin
Lei Chen
LMTD
21
3
0
01 Jun 2024
Scaling Laws for Discriminative Classification in Large Language Models
Dean Wyatte
Fatemeh Tahmasbi
Ming Li
Thomas Markovich
47
2
0
24 May 2024
Optimizing Large Language Models for OpenAPI Code Completion
Bohdan Petryshyn
M. Lukoševičius
LLMAG
ALM
40
0
0
24 May 2024
Bitune: Bidirectional Instruction-Tuning
D. J. Kopiczko
Tijmen Blankevoort
Yuki Markus Asano
35
2
0
23 May 2024
Lessons from the Trenches on Reproducible Evaluation of Language Models
Stella Biderman
Hailey Schoelkopf
Lintang Sutawika
Leo Gao
J. Tow
...
Xiangru Tang
Kevin A. Wang
Genta Indra Winata
Franccois Yvon
Andy Zou
ELM
ALM
138
53
3
23 May 2024
Automated Program Repair: Emerging trends pose and expose problems for benchmarks
J. Renzullo
Pemma Reiter
Westley Weimer
Stephanie Forrest
42
1
0
08 May 2024
A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model
Weiqi Zhang
Jiexia Ye
Ke Yi
Yongzi Yu
Ziyue Li
Jia Li
Fugee Tsung
AI4TS
AI4CE
45
22
0
03 May 2024
Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification
Pierre Lepagnol
Thomas Gerald
Sahar Ghannay
Christophe Servan
Sophie Rosset
49
7
0
17 Apr 2024
RAR-b: Reasoning as Retrieval Benchmark
Chenghao Xiao
G. Thomas
Al Moubayed
LRM
RALM
36
8
0
09 Apr 2024
Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models
Jingyang Zhang
Jingwei Sun
Eric C. Yeats
Ouyang Yang
Martin Kuo
Jianyi Zhang
Hao Frank Yang
Hai "Helen" Li
43
42
0
03 Apr 2024
BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue Generation
Yuhong He
Yongqi Zhang
Shizhu He
Jun Wan
LRM
44
1
0
28 Mar 2024
RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers
Zhichao Xu
35
12
0
27 Mar 2024
Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models
Chengzhe Feng
Yanan Sun
Ke Li
Pan Zhou
Jiancheng Lv
Aojun Lu
51
1
0
20 Mar 2024
ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine Translation
Shaojie Dai
Xin Liu
Ping Luo
Yue Yu
LRM
32
1
0
11 Mar 2024
Towards Generalizable and Interpretable Motion Prediction: A Deep Variational Bayes Approach
Juanwu Lu
Wei Zhan
Masayoshi Tomizuka
Yeping Hu
OOD
BDL
46
3
0
10 Mar 2024
Benchmarking zero-shot stance detection with FlanT5-XXL: Insights from training data, prompting, and decoding strategies into its near-SoTA performance
Rachith Aiyappa
Shruthi Senthilmani
Jisun An
Haewoon Kwak
Yong-Yeol Ahn
29
3
0
01 Mar 2024
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
Kuniaki Saito
Kihyuk Sohn
Chen-Yu Lee
Yoshitaka Ushiku
66
2
0
16 Feb 2024
A survey of recent methods for addressing AI fairness and bias in biomedicine
Yifan Yang
Mingquan Lin
Han Zhao
Yifan Peng
Furong Huang
Zhiyong Lu
37
15
0
13 Feb 2024
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Yuchen Hu
Chen Chen
Chao-Han Huck Yang
Ruizhe Li
Dong Zhang
Zhehuai Chen
E. Chng
20
21
0
10 Feb 2024
AutoTimes: Autoregressive Time Series Forecasters via Large Language Models
Yong Liu
Guo Qin
Xiangdong Huang
Jianmin Wang
Mingsheng Long
AI4TS
35
22
0
04 Feb 2024
Timer: Generative Pre-trained Transformers Are Large Time Series Models
Yong Liu
Haoran Zhang
Chenyu Li
Xiangdong Huang
Jianmin Wang
Mingsheng Long
AIFin
AI4TS
AI4CE
42
50
0
04 Feb 2024
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
Ke Ye
Heinrich Jiang
Afshin Rostamizadeh
Ayan Chakrabarti
Giulia DeSalvo
Jean-François Kagy
Lazaros Karydas
Gui Citovsky
Sanjiv Kumar
36
0
0
24 Jan 2024
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study
Yifan Jiang
Cyril Allauzen
Tongzhou Chen
Kilol Gupta
Ke Hu
James Qin
Yu Zhang
Yongqiang Wang
Shuo-yiin Chang
Tara N. Sainath
MoMe
37
10
0
23 Jan 2024
1
2
3
Next