ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.17858
  4. Cited By
Improving Your Model Ranking on Chatbot Arena by Vote Rigging

Improving Your Model Ranking on Chatbot Arena by Vote Rigging

29 January 2025
Rui Min
Tianyu Pang
Chao Du
Qian Liu
Minhao Cheng
Min Lin
    AAML
ArXiv (abs)PDFHTML

Papers citing "Improving Your Model Ranking on Chatbot Arena by Vote Rigging"

49 / 49 papers shown
Title
Ranked Voting based Self-Consistency of Large Language Models
Ranked Voting based Self-Consistency of Large Language Models
Weiqin Wang
Yile Wang
Hui Huang
LRM
74
0
0
16 May 2025
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
Zhiyuan Zeng
Yizhong Wang
Hannaneh Hajishirzi
Pang Wei Koh
ELM
97
8
0
11 Mar 2025
Life-Cycle Routing Vulnerabilities of LLM Router
Qiqi Lin
Xiaoyang Ji
Shengfang Zhai
Qingni Shen
Zhi-Li Zhang
Yuejian Fang
Yansong Gao
AAML
80
1
0
09 Mar 2025
Challenges in Trustworthy Human Evaluation of Chatbots
Challenges in Trustworthy Human Evaluation of Chatbots
Wenting Zhao
Alexander M. Rush
Tanya Goyal
ALM
113
3
0
05 Dec 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
75
13
0
09 Oct 2024
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLMMoEOSLM
125
904
0
31 Jul 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
116
636
0
18 Jun 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
  Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRMALM
143
1,249
0
22 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
110
400
0
06 Apr 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi
Zenghui Yuan
Yinuo Liu
Yue Huang
Pan Zhou
Lichao Sun
Neil Zhenqiang Gong
AAML
109
56
0
26 Mar 2024
BadEdit: Backdooring large language models by model editing
BadEdit: Backdooring large language models by model editing
Yanzhou Li
Tianlin Li
Kangjie Chen
Jian Zhang
Shangqing Liu
Wenhan Wang
Tianwei Zhang
Yang Liu
SyDaAAMLKELM
104
66
0
20 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLMLRM
269
570
0
07 Mar 2024
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang
Lianmin Zheng
Ying Sheng
Anastasios Nikolas Angelopoulos
Tianle Li
...
Hao Zhang
Banghua Zhu
Michael I. Jordan
Joseph E. Gonzalez
Ion Stoica
OSLM
154
591
0
07 Mar 2024
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on
  Zero-shot LLM Assessment
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
Vyas Raina
Adian Liusie
Mark Gales
AAMLELM
75
62
0
21 Feb 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
Humans or LLMs as the Judge? A Study on Judgement Biases
Guiming Hardy Chen
Shunian Chen
Ziche Liu
Feng Jiang
Benyou Wang
137
112
0
16 Feb 2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety
  Training
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Evan Hubinger
Carson E. Denison
Jesse Mu
Mike Lambert
Meg Tong
...
Sören Mindermann
Ryan Greenblatt
Buck Shlegeris
Nicholas Schiefer
Ethan Perez
LLMAG
73
174
0
10 Jan 2024
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
  Depth Up-Scaling
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Dahyun Kim
Chanjun Park
Sanghoon Kim
Wonsung Lee
Wonho Song
...
Hyunbyung Park
Gyoungjin Gim
Mikyoung Cha
Hwalsuk Lee
Sunghun Kim
ALMELM
86
147
0
23 Dec 2023
Universal Jailbreak Backdoors from Poisoned Human Feedback
Universal Jailbreak Backdoors from Poisoned Human Feedback
Javier Rando
Florian Tramèr
86
74
0
24 Nov 2023
Token Prediction as Implicit Classification to Identify LLM-Generated
  Text
Token Prediction as Implicit Classification to Identify LLM-Generated Text
Yutian Chen
Hao Kang
Vivian Zhai
Liangze Li
Rita Singh
Bhiksha Raj
DeLMO
26
26
0
15 Nov 2023
Rethinking Benchmark and Contamination for Language Models with
  Rephrased Samples
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
Shuo Yang
Wei-Lin Chiang
Lianmin Zheng
Joseph E. Gonzalez
Ion Stoica
ALM
55
129
0
08 Nov 2023
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination
  for each Benchmark
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark
Oscar Sainz
Jon Ander Campos
Iker García-Ferrero
Julen Etxaniz
Oier López de Lacalle
Eneko Agirre
69
184
0
27 Oct 2023
Zephyr: Direct Distillation of LM Alignment
Zephyr: Direct Distillation of LM Alignment
Lewis Tunstall
E. Beeching
Nathan Lambert
Nazneen Rajani
Kashif Rasul
...
Nathan Habib
Nathan Sarrazin
Omar Sanseviero
Alexander M. Rush
Thomas Wolf
ALM
102
395
0
25 Oct 2023
Mistral 7B
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELRM
84
2,229
0
10 Oct 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
264
1,895
0
28 Sep 2023
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Guan-Bo Wang
Sijie Cheng
Xianyuan Zhan
Xiangang Li
Sen Song
Yang Liu
ALM
125
252
0
20 Sep 2023
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt
  Injection
Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
Jun Yan
Vikas Yadav
Shiyang Li
Lichang Chen
Zheng Tang
Hai Wang
Vijay Srinivasan
Xiang Ren
Hongxia Jin
SILM
73
103
0
31 Jul 2023
Three Bricks to Consolidate Watermarks for Large Language Models
Three Bricks to Consolidate Watermarks for Large Language Models
Pierre Fernandez
Antoine Chaffin
Karim Tit
Vivien Chappelier
Teddy Furon
WaLM
92
53
0
26 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
332
12,044
0
18 Jul 2023
On the Exploitability of Instruction Tuning
On the Exploitability of Instruction Tuning
Manli Shu
Jiong Wang
Chen Zhu
Jonas Geiping
Chaowei Xiao
Tom Goldstein
SILM
100
97
0
28 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
393
4,422
0
09 Jun 2023
On the Reliability of Watermarks for Large Language Models
On the Reliability of Watermarks for Large Language Models
John Kirchenbauer
Jonas Geiping
Yuxin Wen
Manli Shu
Khalid Saifullah
Kezhi Kong
Kasun Fernando
Aniruddha Saha
Micah Goldblum
Tom Goldstein
WaLM
57
121
0
07 Jun 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis,
  and LLMs Evaluations
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
107
82
0
07 Jun 2023
Undetectable Watermarks for Language Models
Undetectable Watermarks for Language Models
Miranda Christ
Sam Gunn
Or Zamir
WaLM
60
143
0
25 May 2023
Ghostbuster: Detecting Text Ghostwritten by Large Language Models
Ghostbuster: Detecting Text Ghostwritten by Large Language Models
Vivek Verma
Eve Fleisig
Nicholas Tomlin
Dan Klein
DeLMO
56
60
0
24 May 2023
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction
  Tuning for Large Language Models
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
Lyne Tchapmi
Mingyu Derek Ma
Fei Wang
Chaowei Xiao
Muhao Chen
SILM
123
84
0
24 May 2023
GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content
GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content
Yutian Chen
Hao Kang
Vivian Zhai
Liang Li
Rita Singh
B. Ramakrishnan
DeLMO
68
58
0
13 May 2023
Robust Multi-bit Natural Language Watermarking through Invariant
  Features
Robust Multi-bit Natural Language Watermarking through Invariant Features
Kiyoon Yoo
Wonhyuk Ahn
Jiho Jang
Nojun Kwak
WaLM
214
82
0
03 May 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,631
0
15 Mar 2023
A Watermark for Large Language Models
A Watermark for Large Language Models
John Kirchenbauer
Jonas Geiping
Yuxin Wen
Jonathan Katz
Ian Miers
Tom Goldstein
VLMWaLM
100
504
0
24 Jan 2023
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation,
  and Detection
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Biyang Guo
Xin Zhang
Ziyuan Wang
Minqi Jiang
Jinran Nie
Yuxuan Ding
Jianwei Yue
Yupeng Wu
DeLMOELM
111
620
0
18 Jan 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
883
13,148
0
04 Mar 2022
Guiding Neural Story Generation with Reader Models
Guiding Neural Story Generation with Reader Models
Xiangyu Peng
Kaige Xie
Amal Alabdulkarim
Harshith Kayam
Samihan Dani
Mark O. Riedl
71
19
0
16 Dec 2021
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
323
4,533
0
27 Oct 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
233
5,635
0
07 Jul 2021
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework
  for Scrutinizing Machine Text
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Yao Dou
Maxwell Forbes
Rik Koncel-Kedziorski
Noah A. Smith
Yejin Choi
DeLMO
78
128
0
02 Jul 2021
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
182
4,526
0
07 Sep 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
841
42,332
0
28 May 2020
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
674
24,528
0
26 Jul 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,196
0
20 Apr 2018
1