ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.04615
  4. Cited By
Beyond the Imitation Game: Quantifying and extrapolating the
  capabilities of language models

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

9 June 2022
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal Md Shoeb
Abubakar Abid
Adam Fisch
Adam R. Brown
Adam Santoro
Aditya Gupta
Adrià Garriga-Alonso
Agnieszka Kluska
Aitor Lewkowycz
Akshat Agarwal
Alethea Power
Alex Ray
Alex Warstadt
Alexander W. Kocurek
Ali Safaya
Ali Tazarv
Alice Xiang
Alicia Parrish
Allen Nie
Aman Hussain
Amanda Askell
Amanda Dsouza
Ambrose Slone
Ameet Rahane
Anantharaman S. Iyer
Anders Andreassen
Andrea Madotto
Andrea Santilli
Andreas Stuhlmuller
Andrew M. Dai
Andrew La
Andrew Kyle Lampinen
Andy Zou
Angela Jiang
Angelica Chen
Anh Vuong
Animesh Gupta
Anna Gottardi
Antonio Norelli
Anu Venkatesh
Arash Gholamidavoodi
Arfa Tabassum
Arul Menezes
Arun Kirubarajan
A. Mullokandov
Ashish Sabharwal
Austin Herrick
Avia Efrat
Aykut Erdem
Ayla Karakacs
B. R. Roberts
B. S. Loe
Barret Zoph
Bartlomiej Bojanowski
Batuhan Ozyurt
Behnam Hedayatnia
Behnam Neyshabur
Benjamin Inden
Benno Stein
Berk Ekmekci
Bill Yuchen Lin
B. Howald
Bryan Orinion
Cameron Diao
Cameron Dour
Catherine Stinson
Cedrick Argueta
César Ferri Ramírez
Chandan Singh
Charles Rathkopf
Chenlin Meng
Chitta Baral
Chiyu Wu
Chris Callison-Burch
Chris Waites
Christian Voigt
Christopher D. Manning
Christopher Potts
Cindy Ramirez
Clara E. Rivera
Clemencia Siro
Colin Raffel
Courtney Ashcraft
Cristina Garbacea
Damien Sileo
Daniel H Garrette
Dan Hendrycks
D. Kilman
Dan Roth
Daniel Freeman
Daniel Khashabi
Daniel Levy
D. González
Danielle R. Perszyk
Danny Hernandez
Danqi Chen
Daphne Ippolito
D. Gilboa
David Dohan
D. Drakard
David Jurgens
Debajyoti Datta
Deep Ganguli
Denis Emelin
Denis Kleyko
Deniz Yuret
Derek Chen
Derek Tam
Dieuwke Hupkes
Diganta Misra
Dilyar Buzan
Dimitri Coelho Mollo
Diyi Yang
Dong-Ho Lee
Dylan Schrader
Ekaterina Shutova
E. D. Cubuk
Elad Segal
Eleanor Hagerman
Elizabeth Barnes
E. Donoway
Ellie Pavlick
Emanuele Rodolà
Emma Lam
Eric Chu
Eric Tang
Erkut Erdem
Ernie Chang
Ethan A. Chi
Ethan Dyer
E. Jerzak
Ethan Kim
Eunice Engefu Manyasi
Evgenii Zheltonozhskii
Fanyue Xia
F. Siar
Fernando Martínez-Plumed
Francesca Happé
François Chollet
Frieda Rong
Gaurav Mishra
Genta Indra Winata
Gerard de Melo
Germán Kruszewski
Giambattista Parascandolo
Giorgio Mariani
Gloria Xinyue Wang
Gonzalo Jaimovitch-López
Gregor Betz
Guy Gur-Ari
Hana Galijasevic
Hannah Kim
Hannah Rashkin
Hannaneh Hajishirzi
Harsh Mehta
H. Bogar
Henry Shevlin
Hinrich Schütze
Hiromu Yakura
Hongming Zhang
Hugh Mee Wong
Ian Ng
Isaac Noble
Jaap Jumelet
Jack Geissinger
John Kernion
Jacob Hilton
Jaehoon Lee
J. F. Fisac
James B. Simon
James Koppel
James Zheng
James Zou
Jan Kocoñ
Jana Thompson
Janelle Wingfield
Jared Kaplan
Jarema Radom
Jascha Narain Sohl-Dickstein
Jason Phang
Jason W. Wei
J. Yosinski
Jekaterina Novikova
Jelle Bosscher
Jennifer Marsh
Jeremy Kim
Jeroen Taal
Jesse Engel
Jesujoba Oluwadara Alabi
Jiacheng Xu
Jiaming Song
Jillian Tang
Jane W Waweru
John Burden
John Miller
John U. Balis
Jonathan Batchelder
Jonathan Berant
Jorg Frohberg
Jos Rozen
Jose Hernandez-Orallo
Joseph Boudeman
Joseph Guerr
Joseph Jones
Joshua B. Tenenbaum
Joshua S. Rule
Joyce Chua
Kamil Kanclerz
Karen Livescu
K. Krauth
Karthik Gopalakrishnan
Katerina Ignatyeva
K. Markert
Kaustubh D. Dhole
Kevin Gimpel
Kevin Omondi
Kory W. Mathewson
Kristen Chiafullo
Ksenia Shkaruta
Kumar Shridhar
Kyle McDonell
Kyle Richardson
Laria Reynolds
Leo Gao
Li Zhang
Liam Dugan
Lianhui Qin
Lidia Contreras Ochando
Louis-Philippe Morency
Luca Moschella
Luca Lam
Lucy Noble
Ludwig Schmidt
Luheng He
Luis Oliveros Colón
Luke Metz
Lutfi Kerem cSenel
Maarten Bosma
Maarten Sap
Maartje ter Hoeve
Maheen Farooqi
Manaal Faruqui
Mantas Mazeika
Marco Baturan
Marco Marelli
Marco Maru
Maria Jose Ram’irez Quintana
M. Tolkiehn
Mario Giulianelli
Martha Lewis
Martin Potthast
Matthew L. Leavitt
Matthias Hagen
M. Schubert
Medina Baitemirova
Melody Arnaud
M. McElrath
Michael A. Yee
Michael Cohen
Michael Gu
Michael Ivanitskiy
Michael Starritt
Michael Strube
Michal Swkedrowski
Michele Bevilacqua
Michihiro Yasunaga
Mihir Kale
Mike Cain
Mimee Xu
Mirac Suzgun
Mitch Walker
Monica Tiwari
Mohit Bansal
Moin Aminnaseri
Mor Geva
Mozhdeh Gheini
T. MukundVarma
Nanyun Peng
Nathan A. Chi
Nayeon Lee
Neta Gur-Ari Krakover
Nicholas Cameron
Nicholas Roberts
Nick Doiron
Nicole Martinez
Nikita Nangia
Niklas Deckers
Niklas Muennighoff
N. Keskar
Niveditha Iyer
Noah Constant
Noah Fiedel
Nuan Wen
Oliver Zhang
Omar Agha
Omar Elbaghdadi
Omer Levy
Owain Evans
Pablo Antonio Moreno Casares
P. Doshi
Pascale Fung
Paul Pu Liang
Paul Vicol
Pegah Alipoormolabashi
Peiyuan Liao
Percy Liang
Peter Chang
P. Eckersley
Phu Mon Htut
P. Hwang
P. Milkowski
P. Patil
Pouya Pezeshkpour
Priti Oli
Qiaozhu Mei
Qing Lyu
Qinlang Chen
Rabin Banjade
Rachel Etta Rudolph
Raefer Gabriel
Rahel Habacker
Ramon Risco
Raphael Milliere
Rhythm Garg
Richard Barnes
Rif A. Saurous
Riku Arakawa
Robbe Raymaekers
Robert Frank
Rohan Sikand
Roman Novak
Roman Sitelew
Ronan Le Bras
Rosanne Liu
Rowan Jacobs
Rui Zhang
Ruslan Salakhutdinov
Ryan A. Chi
Ryan Lee
Ryan Stovall
Ryan Teehan
Rylan Yang
Sahib Singh
Saif M. Mohammad
Sajant Anand
Sam Dillavou
Sam Shleifer
Sam Wiseman
Samuel Gruetter
Samuel R. Bowman
S. Schoenholz
Sanghyun Han
Sanjeev Kwatra
Sarah A. Rous
Sarik Ghazarian
Sayan Ghosh
Sean Casey
Sebastian Bischoff
Sebastian Gehrmann
Sebastian Schuster
Sepideh Sadeghi
Shadi S. Hamdan
Sharon Zhou
Shashank Srivastava
Sherry Shi
Shikhar Singh
Shima Asaadi
S. Gu
Shubh Pachchigar
Shubham Toshniwal
Shyam Upadhyay
Shyamolima Debnath
Debnath
Siamak Shakeri
Simon Thormeyer
Simone Melzi
Siva Reddy
S. Makini
Soo-hwan Lee
Spencer Bradley Torene
Sriharsha Hatwar
S. Dehaene
Stefan Divic
Stefano Ermon
Stella Biderman
Stephanie Lin
Stephen Prasad
Steven T Piantadosi
Stuart M. Shieber
Summer Misherghi
S. Kiritchenko
Swaroop Mishra
Tal Linzen
Tal Schuster
Tao Li
Tao Yu
Tariq Ali
Tatsunori Hashimoto
Te-Lin Wu
T. Desbordes
Theodore Rothschild
Thomas Phan
Tianle Wang
Tiberius Nkinyili
Timo Schick
T. Kornev
T. Tunduny
Tobias Gerstenberg
T. Chang
Trishala Neeraj
Tushar Khot
Tyler Shultz
Uri Shaham
Vedant Misra
Vera Demberg
Victoria Nyamai
Vikas Raunak
V. Ramasesh
Vinay Uday Prabhu
Vishakh Padmakumar
Vivek Srikumar
W. Fedus
William Saunders
William Zhang
Wout Vossen
Xiang Ren
Xiaoyu Tong
Xinran Zhao
Xinyi Wu
Xudong Shen
Yadollah Yaghoobzadeh
Yair Lakretz
Yangqiu Song
Yasaman Bahri
Yejin Choi
Yichi Yang
Yiding Hao
Yifu Chen
Yonatan Belinkov
Yu Hou
Yufang Hou
Yuntao Bai
Zachary Seid
Zhuoye Zhao
Zijian Wang
Zijie J. Wang
Zirui Wang
Ziyi Wu
    ELM
ArXivPDFHTML

Papers citing "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models"

50 / 111 papers shown
Title
Large Language Models Do Multi-Label Classification Differently
Large Language Models Do Multi-Label Classification Differently
Marcus Ma
Georgios Chochlakis
Niyantha Maruthu Pandiyan
Jesse Thomason
Shrikanth Narayanan
48
0
0
23 May 2025
The emergence of sparse attention: impact of data distribution and benefits of repetition
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet
Francesco dÁngelo
Andrew Kyle Lampinen
Stephanie C. Y. Chan
75
0
0
23 May 2025
R3: Robust Rubric-Agnostic Reward Models
R3: Robust Rubric-Agnostic Reward Models
David Anugraha
Zilu Tang
Lester James V. Miranda
Hanyang Zhao
Mohammad Rifqi Farhansyah
Garry Kuwanto
Derry Wijaya
Genta Indra Winata
80
0
0
19 May 2025
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
Vincent Koc
LM&MA
40
0
0
17 May 2025
SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning
SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning
Yige Xu
Xu Guo
Zhiwei Zeng
Chunyan Miao
BDL
LRM
87
1
0
16 May 2025
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Rei Higuchi
Taiji Suzuki
64
0
0
12 May 2025
Prompt Engineering: How Prompt Vocabulary affects Domain Knowledge
Prompt Engineering: How Prompt Vocabulary affects Domain Knowledge
Dimitri Schreiter
47
0
0
10 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
127
0
0
03 May 2025
Do Large Language Models know who did what to whom?
Do Large Language Models know who did what to whom?
Joseph M. Denning
Xiaohan
Bryor Snefjella
Idan A. Blank
125
1
0
23 Apr 2025
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Jasper Götting
Pedro Medeiros
Jon G Sanders
Nathaniel Li
Long Phan
Karam Elabd
Lennart Justen
Dan Hendrycks
Seth Donoughe
ELM
84
2
0
21 Apr 2025
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Judy Hanwen Shen
Carlos Guestrin
93
0
0
09 Apr 2025
Assessing how hyperparameters impact Large Language Models' sarcasm detection performance
Assessing how hyperparameters impact Large Language Models' sarcasm detection performance
Montgomery Gole
Andriy Miranskyy
AI4MH
47
0
0
08 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Ziyi Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
128
4
0
01 Apr 2025
Measuring AI Ability to Complete Long Tasks
Measuring AI Ability to Complete Long Tasks
Thomas Kwa
Ben West
Joel Becker
Amy Deng
Katharyn Garcia
...
Lucas Jun Koba Sato
H. Wijk
Daniel M. Ziegler
Elizabeth Barnes
Lawrence Chan
ELM
140
9
0
18 Mar 2025
ConSCompF: Consistency-focused Similarity Comparison Framework for Generative Large Language Models
ConSCompF: Consistency-focused Similarity Comparison Framework for Generative Large Language Models
Alexey Karev
Dong Xu
82
0
0
18 Mar 2025
Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation
Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation
Sayak Nag
Udita Ghosh
Sarosij Bose
Calvin-Khang Ta
Jiachen Li
Amit K. Roy-Chowdhury
150
0
0
18 Mar 2025
SuperBPE: Space Travel for Language Models
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
79
6
0
17 Mar 2025
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
Weihao Xuan
Rui Yang
Heli Qi
Qingcheng Zeng
Yunze Xiao
...
Edison Marrese-Taylor
Shijian Lu
Yusuke Iwasawa
Yutaka Matsuo
Irene Li
ELM
112
6
0
13 Mar 2025
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Richard Ren
Arunim Agarwal
Mantas Mazeika
Cristina Menghini
Robert Vacareanu
...
Matias Geralnik
Adam Khoja
Dean Lee
Summer Yue
Dan Hendrycks
HILM
ALM
104
1
0
05 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
91
0
0
28 Feb 2025
BIG-Bench Extra Hard
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELM
LRM
199
10
0
26 Feb 2025
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models
Yuxuan Zhang
CLL
ALM
107
1
0
25 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
140
9
0
24 Feb 2025
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Zhenyu Pan
Haozheng Luo
Manling Li
Han Liu
LRM
78
16
0
24 Feb 2025
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems
E. Davis
S. Aaronson
ELM
133
22
0
21 Feb 2025
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models
Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models
Seonil Son
Ju-Min Oh
Heegon Jin
Cheolhun Jang
Jeongbeom Jeong
Kuntae Kim
93
0
0
20 Feb 2025
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Zehan Li
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
68
1
0
20 Feb 2025
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models
Sherzod Hakimov
Lara Pfennigschmidt
David Schlangen
ELM
75
0
0
17 Feb 2025
Evaluating Step-by-step Reasoning Traces: A Survey
Evaluating Step-by-step Reasoning Traces: A Survey
Jinu Lee
Julia Hockenmaier
LRM
ELM
68
2
0
17 Feb 2025
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs
Yige Xu
Xu Guo
Zhiwei Zeng
Chunyan Miao
LLMAG
CLL
LRM
91
18
0
17 Feb 2025
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning
Unlocking the Power of Function Vectors for Characterizing and Mitigating Catastrophic Forgetting in Continual Instruction Tuning
Gangwei Jiang
Caigao Jiang
Zhaoyi Li
Siqiao Xue
Jun-ping Zhou
Linqi Song
Defu Lian
Yin Wei
CLL
MU
85
1
0
16 Feb 2025
Unbiased Evaluation of Large Language Models from a Causal Perspective
Unbiased Evaluation of Large Language Models from a Causal Perspective
Meilin Chen
Jian Tian
Liang Ma
Di Xie
Weijie Chen
Jiang Zhu
ALM
ELM
92
0
0
10 Feb 2025
Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property Type
Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property Type
Seokwon Song
Taehyun Lee
Jaewoo Ahn
Jae Hyuk Sung
Gunhee Kim
CoGe
123
0
0
10 Feb 2025
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez-Llorca
ELM
187
2
0
10 Feb 2025
IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates
IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates
Aissatou Diallo
Antonis Bikakis
Luke Dickens
Anthony Hunter
Rob Miller
LRM
59
0
0
05 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
348
5
0
04 Feb 2025
A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks
A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks
Elie Antoine
Frédéric Béchet
Géraldine Damnati
Philippe Langlais
100
1
0
29 Jan 2025
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Xinyu Tang
Xiaolei Wang
Wayne Xin Zhao
Siyuan Lu
Yaliang Li
Ji-Rong Wen
LRM
74
16
0
28 Jan 2025
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar
Harshay Shah
Dan Busbridge
Alaaeldin Mohamed Elnouby Ali
J. Susskind
Vimal Thilak
MoE
LRM
61
5
0
28 Jan 2025
Option-ID Based Elimination For Multiple Choice Questions
Option-ID Based Elimination For Multiple Choice Questions
Zhenhao Zhu
Bulou Liu
Qingyao Ai
Yang Liu
69
0
0
25 Jan 2025
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Yinfang Chen
Manish Shetty
Gagan Somashekar
Minghua Ma
Yogesh L. Simmhan
Jonathan Mace
Chetan Bansal
Rujia Wang
Saravan Rajmohan
59
1
0
12 Jan 2025
Neuro-Symbolic AI in 2024: A Systematic Review
Neuro-Symbolic AI in 2024: A Systematic Review
Brandon C. Colelough
William Regli
NAI
93
10
0
09 Jan 2025
"Yeah Right!" -- Do LLMs Exhibit Multimodal Feature Transfer?
"Yeah Right!" -- Do LLMs Exhibit Multimodal Feature Transfer?
Benjamin Z. Reichman
Kartik Talamadupula
77
0
0
07 Jan 2025
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs
LLM-jp
Akiko Aizawa
Eiji Aramaki
Bowen Chen
Fei Cheng
...
Yuya Yamamoto
Yusuke Yamauchi
Hitomi Yanaka
Rio Yokota
Koichiro Yoshino
70
15
0
31 Dec 2024
ICLR: In-Context Learning of Representations
ICLR: In-Context Learning of Representations
Core Francisco Park
Andrew Lee
Ekdeep Singh Lubana
Yongyi Yang
Maya Okawa
Kento Nishi
Martin Wattenberg
Hidenori Tanaka
AIFin
160
4
0
29 Dec 2024
ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models
ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models
Yuxi Sun
Wei Gao
Jing Ma
Hongzhan Lin
Ziyang Luo
Wenxuan Zhang
ELM
117
0
0
17 Dec 2024
PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection
PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection
Sepideh Mamooler
Syrielle Montariol
Alexander Mathis
Antoine Bosselut
114
1
0
16 Dec 2024
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
Davide Paglieri
Bartłomiej Cupiał
Samuel Coward
Ulyana Piterbarg
Maciej Wolczyk
...
Lerrel Pinto
Rob Fergus
Jakob Foerster
Jack Parker-Holder
Tim Rocktaschel
LLMAG
LRM
152
16
0
20 Nov 2024
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Aliyah R. Hsu
James Zhu
Zhichao Wang
Bin Bi
Shubham Mehrotra
...
Sougata Chaudhuri
Regunathan Radhakrishnan
S. Asur
Claire Na Cheng
Bin Yu
ALM
LRM
113
0
0
03 Nov 2024
Soft Condorcet Optimization for Ranking of General Agents
Soft Condorcet Optimization for Ranking of General Agents
Marc Lanctot
Kate Larson
Michael Kaisers
Quentin Berthet
I. Gemp
Manfred Diaz
Roberto-Rafael Maura-Rivero
Yoram Bachrach
Anna Koop
Doina Precup
140
0
0
31 Oct 2024
123
Next